tag:blogger.com,1999:blog-6569681.post8870025927299953376..comments2024-03-24T10:38:16.997-07:00Comments on Geeking with Greg: Browsing behavior for web crawlingGreg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-6569681.post-75642563808901430592011-12-21T11:36:16.858-08:002011-12-21T11:36:16.858-08:00I wonder whether toolbar penetration is rising or ...I wonder whether toolbar penetration is rising or falling. Chrome is gaining share and emphasizes simplicity which arguably means no toolbars. Naturally, valuable insights can still be drawn even if penetration is low or falling, if you consider that the sample might be biased toward visitors who like toolbars. My hunch is that less sophisticated uses may adopt toolbars more heavily but I could be wrong. Do you have any insights on penetration and biases?Tillirixhttps://www.blogger.com/profile/00506274187305960023noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-21416028159412331272011-12-02T08:15:00.391-08:002011-12-02T08:15:00.391-08:00Thanks, Srikanta, I realize this is not the first ...Thanks, Srikanta, I realize this is not the first attempt to use browsing data from proxies or other sources, but the scale of the effort is very different here. The most important part of the Yahoo paper is the scale of the browsing data they have. The Yahoo toolbar is widely installed. Yahoo (and a few others) have an enormous amount of data on what people do on the Web, and their paper quantifies how useful that kind of big data on browsing behavior is for web crawling.Greg Lindenhttps://www.blogger.com/profile/09216403000599463072noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-60816764265537314492011-12-02T06:02:43.845-08:002011-12-02T06:02:43.845-08:00A variant of this approach was also discussed in o...A variant of this approach was also discussed in our paper "EverLast: A Distributed Architecture for Preserving the Web" [JCDL 2009]. I think Yahoo's paper quantifies what all believed always -- seed-based crawlers have limited use when it comes to reaching interesting parts of the highly dynamic Web!Srikanta Bedathurhttps://www.blogger.com/profile/14266672197112702629noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-49309734726580976692011-12-01T08:19:16.262-08:002011-12-01T08:19:16.262-08:00Yes, Google doesn't talk about it much, but th...Yes, Google doesn't talk about it much, but they appear to have been using toolbar data for a long time. See, for example, my 2008 post, "<a href="http://glinden.blogspot.com/2008/07/google-toolbar-data-and-actual-surfer.html" rel="nofollow">Google Toolbar data and the actual surfer model</a>". The enormous value of this toolbar data explains why Google pushed so hard on toolbar installations, making expensive deals with, for example, Adobe to get it installed when Flash is installed and on every Dell computer.<br /><br />I think there are two reasons why this Yahoo paper is important. First, most people who use toolbars probably are not aware of how their data is being used, so there is a press story here (perhaps a positive story on how the Google brain learns from everyone who uses Google, or perhaps a negative story on privacy). Second, many in the search industry, including researchers and executives I've talked to in the past, have doubted the value of browsing behavior data for crawling and relevance rank, and this article might help convince more of those who need convincing.Greg Lindenhttps://www.blogger.com/profile/09216403000599463072noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-5630837631668043312011-11-30T21:52:08.176-08:002011-11-30T21:52:08.176-08:00Google has been doing this for 7 years.Google has been doing this for 7 years.Anonymousnoreply@blogger.com