tag:blogger.com,1999:blog-6569681.post3813396298224709356..comments2024-01-15T13:17:33.771-08:00Comments on Geeking with Greg: Yahoo Pig and Google SawzallGreg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-6569681.post-15302268102728673292008-01-15T18:19:00.000-08:002008-01-15T18:19:00.000-08:00Christopher Olston did a talk on Pig at the U of M...Christopher Olston did a talk on Pig at the U of Michigan today - I took notes and will figure out some way to post them to Slideshare.<BR/><BR/>I think it's called "Pig" because the Yahoo Hadoop infrastructure is so slow; they sound like they are working in a batch job processing environment with the minimum job turnaround time measured in hours, rather than the Google turn time measured in minutes. Perhaps the Yahoo guys need to rent some Amazon EC2 time.Edward Vielmettihttps://www.blogger.com/profile/07421049499752624699noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-6137732925699087732007-07-17T06:30:00.000-07:002007-07-17T06:30:00.000-07:00Do you have a new link for abacus? The one above d...Do you have a new link for abacus? The one above doesn't work.Ed Burnettehttps://www.blogger.com/profile/00602922769425368599noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-30550302714457443892007-04-29T11:30:00.000-07:002007-04-29T11:30:00.000-07:00@Sriram, Does it run on Linux or Solaris or BSD? I...@Sriram,<BR/> Does it run on Linux or Solaris or BSD? If not, it is DOA for most of us in the internet business. Today, it is exceptionally rare to see a web start-up using Windows. Your development tools may be cool, but tying them to Windows is a deal-breaker for us.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-21510209569522241342007-04-26T22:41:00.000-07:002007-04-26T22:41:00.000-07:00Abacus is a program contributed by a Yahoo develop...Abacus is a program contributed by a Yahoo developer that does simple aggregations. Pig is a more ambitious project.<BR/><BR/>Both are open source, so why speculate? Check them out.E14https://www.blogger.com/profile/10051378895383335213noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-16665766793927893792007-04-26T20:59:00.000-07:002007-04-26T20:59:00.000-07:00Greg: I haven't looked at Y!'s little porky yet (t...Greg: I haven't looked at Y!'s little porky yet (though I point pigs, horses, and other animals to my kid all the time these days), but Hadoop has Abacus:<BR/>http://lucene.apache.org/hadoop/api/org/apache/hadoop/abacus/package-summary.html<BR/><BR/>Perhaps that's what Pig really is, under its Y! skin.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-62232463957868338842007-04-26T15:45:00.000-07:002007-04-26T15:45:00.000-07:00Yup - that's a good summary. I'm a big fan of Linq...Yup - that's a good summary. I'm a big fan of Linq's 'bring your own query processor' model - it is amazing to see so many extensions being built on top of it.<BR/><BR/>- Sriram Krishnan<BR/>www.sriramkrishnan.comSriramhttps://www.blogger.com/profile/14514067854950617393noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-14710529581429348432007-04-26T14:24:00.000-07:002007-04-26T14:24:00.000-07:00Hi, Sriram and Jeff. Thanks for the references to...Hi, Sriram and Jeff. Thanks for the references to Microsoft's DryadLINQ and PLINQ.<BR/><BR/>From what I can tell, it appears PLINQ is designed for running on a single multi-core system, not a massive cluster. Or am I misreading slide 3 of the presentation you linked to, Jeff? Unfortunately, it appears there are no published papers on PLINQ and no project page yet, so I am having a bit of a hard time figuring out the details.<BR/><BR/>On DryadLINQ, from the 2007 paper (<A HREF="http://research.microsoft.com/research/sv/dryad/eurosys07.pdf" REL="nofollow">PDF</A>) on Dryad, it appears that Dryad is more complicated but has similar goals to MapReduce. <BR/><BR/>The <A HREF="http://research.microsoft.com/research/sv/dryad/" REL="nofollow">Dryad project</A> page has a nice, high-level summary of the goals:<BR/><BR/><I>Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming.</I><BR/><BR/>From the paper, it appears that Dryad also has a high level scripting language called Nebula that may be similar in its goals to Sawzall.<BR/><BR/>As for DryadLINQ, it appears to be another language on top of Dryad, this time based on <A HREF="http://msdn2.microsoft.com/en-us/netframework/aa904594.aspx" REL="nofollow">LINQ</A>. <BR/><BR/>Is that all correct?Greg Lindenhttps://www.blogger.com/profile/09216403000599463072noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-69635598312451088502007-04-26T13:54:00.000-07:002007-04-26T13:54:00.000-07:00Hi, Steve. Sorry, you're right, I probably should...Hi, Steve. Sorry, you're right, I probably should have put a header on the examples to explain what they do.<BR/><BR/>The Sawzall example looks "at a set of search query logs and construct a map showing how the queries are distributed around the globe."<BR/><BR/>The Pig example finds "queries for which the highest−pagerank page in the result set did not appear among the top 5 results."Greg Lindenhttps://www.blogger.com/profile/09216403000599463072noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-91953297183249593732007-04-26T13:14:00.000-07:002007-04-26T13:14:00.000-07:00Interesting... hadn't heard about DryadLinq. I tho...Interesting... hadn't heard about DryadLinq. I thought PLINQ (http://www.bluebytesoftware.com/blog/PermaLink,guid,200c3151-fbd5-4bfe-bb1e-0d6b90c6442b.aspx) was Microsoft's entry into that field.jsmhttps://www.blogger.com/profile/10492989886941859956noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-36690766675113697362007-04-26T13:03:00.000-07:002007-04-26T13:03:00.000-07:00It's worth noting that the syntax for Sawzall look...It's worth noting that the syntax for Sawzall looks a lot like that of Scala. The <A HREF="http://www.scala-lang.org" REL="nofollow">Scala language</A> is currently implemented on the JVM and interoperates nicely with Java. In addition it has libraries for Erlang-like concurrency constructs and support. More importantly, one of the authors on their OOPSLA 2005 paper was Matthias Zenger, who works for Google. That might not mean anything, but it's fun to speculate!Charles Gordonhttps://www.blogger.com/profile/18220920684633877544noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-77072630072116334882007-04-26T12:12:00.000-07:002007-04-26T12:12:00.000-07:00I'm always curious about new languages, so I'm won...I'm always curious about new languages, so I'm wondering what these two examples do. I get that the first one calculates # of queries per time slice by lat/long. But I can't read the second one at all... Any ideas?Unknownhttps://www.blogger.com/profile/02432079575497623938noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-6997693323510825772007-04-26T11:00:00.000-07:002007-04-26T11:00:00.000-07:00DryadLinq (http://research.microsoft.com/research/...DryadLinq (http://research.microsoft.com/research/sv/dryad/) is something similar from us. It uses the extensible nature of Linq to make it run jobs on Dryad.Sriramhttps://www.blogger.com/profile/14514067854950617393noreply@blogger.com