tag:blogger.com,1999:blog-6569681.post115474711114178078..comments2024-01-15T13:17:33.771-08:00Comments on Geeking with Greg: A chance to play with big dataGreg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.comBlogger38125tag:blogger.com,1999:blog-6569681.post-1155650840950733012006-08-15T07:07:00.000-07:002006-08-15T07:07:00.000-07:00A site where you can search this data is here:http...A site where you can search this data is here:<BR/><BR/>http://www.datablunder.com/logitems/queryAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155527385801091692006-08-13T20:49:00.000-07:002006-08-13T20:49:00.000-07:00it's not too difficult to identify the more verbos...it's not too difficult to identify the more verbose users. i've identified several just today, including names, addresses, phone, email, myspace, and in some cases credit and ssn. there are definitely some identified queries that would cause a great deal of embarrassment if known, and the potential for identity theft or blackmail is high. i could be more understanding of aol if this data was stolen, but to be so incredibly dense as to release it to the public without considering the ramifications frankly borders on mental retardation. surely aol has data mined this stuff before. i just can't imagine anyone would be that stupid.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155328549184299722006-08-11T13:35:00.000-07:002006-08-11T13:35:00.000-07:00Anonymous, yes, my example was silly. But it stil...Anonymous, yes, my example was silly. But it still illustrates a point. Think about all the actions you perform out in the world. What is, what should be, and what possibly can be, private? <BR/><BR/>As far as your example about your food choices and your insurance company (something Reto also mentioned), certainly you have seen the "Ordering Pizza in 2010" video from the ACLU. It's a good one. I myself have those same fears.<BR/><BR/>But let us compare what AOL has done, with what is already happening. Take, for example, the Google privacy policy: "<I>We may combine personal information collected from you with information from other Google services or <B>third parties</B> [emphasis mine] to provide a better user experience, including customizing content for you</I>"<BR/><BR/>In other words, Google admits that they could very well "mashup" your data with data that they have received from (third party) insurance companies. <BR/><BR/>So your nightmare scenario is already out there. The release of this AOL data didn't add anything new. You've already got companies that you use preparing themselves to "mashup" your data. The AOL data is the least of our worries.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155299975372709122006-08-11T05:39:00.000-07:002006-08-11T05:39:00.000-07:00Jeremy: Yes. Anything I tell a company that I don'...Jeremy: Yes. Anything I tell a company that I don't explicitly give them permission to tell someone else should be considered private. <BR/><BR/>Do I realistically expect this? No. Would I be justifiably angry if I found this not to be the case? Yes.<BR/><BR/>Deciding what should be private is easy for some -- health and finance spring to mind -- but more difficult to pin down for others. My food order? *I* don't care, but maybe if my insurance company was adjusting my premium based on my diet I would. Point is, while it's good practice to assume nothing you do or say will remain private (particularly online), we should still demand privacy from those who we trust with our information.Reto Meierhttps://www.blogger.com/profile/04583545000534514486noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155196130887365592006-08-10T00:48:00.000-07:002006-08-10T00:48:00.000-07:00Uh, jeremey, that was a rather silly example, wasn...Uh, jeremey, that was a rather silly example, wasn't it? A more realistic analogy might be that someone calls Domino's and orders an extra large meat lovers with extra cheese, and the phone company releases the call details. Furthermore, maybe that person had just previously talked to their insurance company, who get the data set, correlate the call times, find out that our friend is at higher risk for coronary disease due to his diet and decide to up his premium ...Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155159067257107222006-08-09T14:31:00.000-07:002006-08-09T14:31:00.000-07:00arethusa writes: "I swear to God if you mention "b...arethusa writes: "<I>I swear to God if you mention "but it wasn't a bank account #!" one more time I'm going to wave a magic wand and send you back to junior high civics class.</I><BR/><BR/>Um, dude, with all respect: Are you high? I mentioned "bank account" once and only once. And it was only after you brought it up. <BR/><BR/>But let's go to what you are really saying: Privacy is not just about bank accounts.<BR/><BR/>I guess my question to you is: Of all the information that you give to other people and/or companies, what constitutes private data? Certainly when you give a company your soc security number, that is private. But how about when you type a query? Is that really "private information", as defined by law or even ethics?<BR/><BR/>Here is another example: let us suppose you are ordering a meal in a restaurant. You issue your query: "Can I have the salad?". Is that considered private data? Is the restaurant required not to disclose that data to anyone? Will they be legally brought to court if they publish the fact that you ordered a salad? Is your ordering of the salad "private data"?<BR/><BR/>And if not, why is the query you type into AOL private?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155150026016193532006-08-09T12:00:00.000-07:002006-08-09T12:00:00.000-07:00Parker, you make a good point, but there are still...Parker, you make a good point, but there are still problems with what you are saying. Regarding your name/car/pizza example: Guess what? I actually do search for the names of my co-workers, so I can find other papers and projects they've done in the past. I also happen to admire some of the cars they drive, such as the Toyota Prius, and have searched for that car as a result of riding in the passenger seat with them. Finally, since we work/live in the same general area (after all, we are co-workers), I know that I've searched for pizza joints near where they live.<BR/><BR/>According to you, you would see this search history, and automatically think the searcher is my co-worker. It has his name, his car, and his local pizza joint. <BR/><BR/>But you would be wrong.<BR/><BR/>I admit sometimes, though, you could be right. And sometimes you could make a pretty informed guess.<BR/><BR/>So at this point, I go back to the scope and scale argument. So far you have a 1 in 657,000 chance of randomly being identified. If you turn it around, and look at the probability of someone actually going out and looking for you, specifically, and being able to piece together all the pieces they need to really nail it down to you, and not to an employer, co-worker, friend, family member, random fan, random web freak, despised arch-enemy, or whoever happened to be <I>looking</I> for you, then I think the chances are very low of someone actually succeeding.<BR/><BR/>If someone is really looking for dirt on you, on specifically <I>you</I>, then there are methods that are both much more effective and much more efficient than trawling the AOL data. They could steal your garbage. They could try logging in to your gmail account using the name of your pet dog. Whatever. <BR/><BR/>There are serious violations of our privacy occuring all the time in the U.S. Think of the recent AT&T/government scandal. That, IMO, is much worse than this AOL data.<BR/><BR/>But if this move by AOL starts a good dialogue, then I am all for that. I just think there is a lot of overreaction at the moment.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155142906744780802006-08-09T10:01:00.000-07:002006-08-09T10:01:00.000-07:00You didn't take that walk outside in the sun and t...You didn't take that walk outside in the sun and then nap, did you? I swear to God if you mention "but it wasn't a bank account #!" <I>one more time</I> I'm going to wave a magic wand and send you back to junior high civics class. <BR/><BR/>Privacy laws are not in place to solely protect your bank account # or almost any kind of site that asks you to input personal info wouldn't have a "Privacy Agreement" (ever read one? For eg. AOL'S?). And there's no magic "million" number that companies have to reach before they ethically and legally have compromised a customer's privacy.Arethusahttps://www.blogger.com/profile/16715437585568507176noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155142340255017992006-08-09T09:52:00.000-07:002006-08-09T09:52:00.000-07:00Cheers Greg. I agree, there was a lot of hypervent...Cheers Greg. <BR/><BR/>I agree, there was a lot of hyperventilation about this. I don't think ID theft is that big of a problem here, either.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155138246623902502006-08-09T08:44:00.000-07:002006-08-09T08:44:00.000-07:00"If someone comes up with a clear example of a pri..."If someone comes up with a clear example of a privacy violation from this AOL data, I would be convinced."<BR/><BR/>the clear examples are growing, people only need to look at the data to serve them up to you. now you should eat your humble pie and announce your "convincedness." instead you are proposing "mitigating factors" in a could-be-satire denial of solid...err...data. Ironic how personal agenda clouds the ability to datamine.<BR/><BR/>we won't hate you if you say "yeah i'm a researcher. i love playing with data. therefore i support this type of release of information despite any (perceived or real) consequences." but we (i) <I>will</I> respect you less if you make a challenge and then refuse to accept the evidence you've demanded<BR/><BR/>personally, i love the idea of digging into such a big, real dataset, gonna start tonight. but to echo Ho John Lee, happy to not be one of those AOL customers... <BR/><BR/>to be honest with myself i say that ultimately it's wrong, but i will participate anyway.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155135985759815452006-08-09T08:06:00.001-07:002006-08-09T08:06:00.001-07:00Good point, Kevin Murphy. I apologize for charact...Good point, Kevin Murphy. I apologize for characterizing your prediction as "outlandish".<BR/><BR/>I continue to believe some of the other claims are outlandish -- that this release of data will lead to identity theft and exposure of very private data at anywhere near the scale of past scandals -- but I apologize for describing your words in that manner.<BR/><BR/>There were privacy issues with this AOL data release that need to be addressed. We have to find a way to facilitate information retrieval research -- to build the next generation of search and help people all over the world quickly find the information they need -- without risking individual privacy.Greg Lindenhttps://www.blogger.com/profile/09216403000599463072noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155109358711559732006-08-09T00:42:00.000-07:002006-08-09T00:42:00.000-07:00Hi Jeremy, I'm not sure why this problem is so har...Hi Jeremy, I'm not sure why this problem is so hard to understand. The fact that it's not possible to conclusively prove that an IP address or a set of searches belong to a person is irrelevant. For a moment, stop focusing on the technical details of this issue like synchronizing timestamps, and look at it from the perspective of a real user. <BR/><BR/>Jeremy, let's say your full name appeared as part of a set of searches done by user 12345. The next search done by that same user was about repairing a car of the model and year that you own. Another search is for a pizza place in your neighborhood, a hotel in your last vacation spot that you told all your friends about, and also searches about your favorite programming language B# (although I would wonder about a programmer using AOL?), and a few aquaintances. Interspersed with these are searches for "transexual teen escorts who take credit cards" and "free syphilis clinic". <BR/><BR/>So, lets say your geeky co-worker / boss / prospective employer searches this data for your name "Jeremy FullName" just out of curiosity and finds those bits of information all nicely grouped together. <BR/><BR/>Now, do you think that it matters one bit that they cannot prove with 100% certainty that you were the person behind the machine? Would you really just shrug it off and say "you can't prove it was me, I just won't admit to it." <BR/><BR/>If you really think you would, then feel free to post all your future web postings using your full name and home address, since no one could prove 100% that it was you who are sitting at that machine. That's the same lack of common sense that these researchers demonstrated in the first place.<BR/><BR/>Actually, I think the damage here is much worse than getting your credit card # posted online, because in that case, who cares, just close those accounts. But if real people get associated with some of these searches, rightly or wrongly, then as they say "where can they go to get their reputations back"?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155106240558203072006-08-08T23:50:00.000-07:002006-08-08T23:50:00.000-07:00My point, arethusa, uke, and anonymous, is that if...My point, arethusa, uke, and anonymous, is that if the reporter had gone to this lady's house, and showed her the list of searches, and she had said "No, those are not my searches", could the reporter still have positively and conclusively identified her?<BR/><BR/>We'd need more details on the reporter's methods to know for sure. But from this article, it doesn't sound like it.<BR/><BR/>This whole "expose" hinges on this lady's cooperativeness. <BR/><BR/>And, even with her cooperativeness, the other point still stands about not knowing which queries were actually hers, and which were for her friends.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155104547074382852006-08-08T23:22:00.001-07:002006-08-08T23:22:00.001-07:00Hi Greg,I'm the guy who wrote Inflammatory Comment...Hi Greg,<BR/><BR/>I'm the guy who wrote Inflammatory Comment #5, cited in your post, above.<BR/><BR/>I wrote the inflammatory words, which you quote above, "I expect it's a matter of time before a major national newspaper prints an interview with somebody identified and embarrassed in this manner."<BR/><BR/>With the NYT article you cite published less than 48 hours later, perhaps you would reconsider your characterization of my words as "outlandish".<BR/><BR/>(But don't remove the link to my blog! You're PR6! I've subscribed to your blog, perhaps you could subscribe to mine! We could geek out together!)<BR/><BR/>KevinAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155104530015275242006-08-08T23:22:00.000-07:002006-08-08T23:22:00.000-07:00Ok, great, person #4417749 was "exposed". But let...Ok, great, person #4417749 was "exposed". But let's read a little more closely. The searches led reporters to make a pretty reasonable guess. But, in the end, how was she actually discovered?<BR/><BR/><I>“Those are my searches,” she said, after a reporter read part of the list to her.</I><BR/><BR/>She self-identified. No one actually proved it was her. <BR/><BR/>Furthermore, the article goes on to say that many of the things you think you learned about here were, in fact, false:<BR/><BR/><I>Her search history includes “hand tremors,” “nicotine effects on the body,” “dry mouth” and “bipolar.” But in an interview, Ms. Arnold said she routinely researched medical conditions for her friends to assuage their anxieties. Explaining her queries about nicotine, for example, she said: “I have a friend who needs to quit smoking and I want to help her do it.”</I><BR/><BR/>So, wow, we have been able to identify 1 person out of over 657,000. And she's a little old lady with no bank account numbers revealed. And half of what we think we learned about her was actually about her friends. And, without actually asking her, we have no way of knowing which of those ailments are hers, and which are her friends. Which puts everything into suspect territory, and thus useless knowledge. Just as I suspected above.<BR/><BR/>Scale and scope. 1 out of 657,000. You have a greater chance of dying by falling out of a building (<A HREF="http://www.nsc.org/lrs/statinfo/odds.htm" REL="nofollow">see here</A>) than you do being identified as an AOL querier. Sorry, not even identified... educatedly guessed. <BR/><BR/>I still stand by feeling that you have more of a privacy risk by getting your trash stolen than you do with the AOL queries in this dataset.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155103435916525752006-08-08T23:03:00.000-07:002006-08-08T23:03:00.000-07:00I have one more comment to make. Anonymous wrote:...I have one more comment to make. Anonymous wrote: <I>Each entry is time stamped to the second, and includes the web site the user clicked on. It is incredibly easy for any one who owns a website that one of these AOL users clicked on to access their logs, and correlate the time and HTTP referal headers to that specific user ID. Therefore you can attach an IP address and time to a UserID in the AOL data - this makes identifying the person a lot easier.</I><BR/><BR/>Incredibly easy, you say? Maybe. But I foresee all sorts of obstacles. First, you are assuming that the AOL timestamp and the website owner's timestamp are perfectly synchronized. What happens when the clocks are two seconds off? 30 seconds? 2 minutes? With all the traffic to your website, esp. from AOL to your website, can you still just as easily tell who was who? What if you are getting dozens of hits per second on your site? Or more? <BR/><BR/>More importantly, does it even matter, for all the people who search for "cookie recipes" and "wyoming rodeo location"? <BR/><BR/>So in this whole process, for it to even matter, you have to have a website owner that is determined to find someone who is doing something "bad". Then you have to hope that this person actually has a query that is personally identifiable. Then, you have to hope that they clicked on _your_ website, in response to your query, instead of some other website. <BR/><BR/>Then, even if all those coincidences match up, and you can get the clock times to synch, and separate it out from all your other traffic, you still have the problem of knowing whether or not it was actually the same person issuing that query, as had issued all the other queries. You still have the problem of knowing whether or not it was someone's friend or child or spouse borrowing the computer.<BR/><BR/>I do agree, there are some privacy concerns. But the blogosphere also just needs to <I>calm down</I> and get some perspective. Scale and scope.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155102450175580322006-08-08T22:47:00.000-07:002006-08-08T22:47:00.000-07:00First person identified from AOL Data: Thelma Arno...First person identified from AOL Data: Thelma Arnold<BR/>byline: Michael Arrington<BR/><BR/>On Sunday the news broke that AOL purposefully released 20 million partially anonymized search queries. On Monday AOL apologized, and later that evening the first web interface to the data went up.<BR/><BR/>Today the first person was positively identified from the data - Thelma Arnold, a 62-year-old widow who lives in Lilburn, Georgia.<BR/><BR/>Based on searches ranging from “numb fingers” to “60 single men” to “dog that urinates on everything,” the New York Times was able to quickly determine and confirm her identity. Ms Arnold is AOL searcher no. 4417749.<BR/><BR/>Ms Arnold commented: “My goodness, it’s my whole personal life…I had no idea somebody was looking over my shoulder.”<BR/><BR/>AOL replied: “We apologize specifically to her…There is not a whole lot we can do.”Uke Xensenhttps://www.blogger.com/profile/14925310033193507314noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155101410907633082006-08-08T22:30:00.000-07:002006-08-08T22:30:00.000-07:00A Face is Exposed for AOL Searcher No. 4417749But ...<A HREF="http://www.nytimes.com/2006/08/09/technology/09aol.html" REL="nofollow">A Face is Exposed for AOL Searcher No. 4417749</A><BR/><BR/>But since no one's bank account has been compromised no biggie right? Take a step back, try to perceive the "big picture" here; maybe review the concept behind privacy laws as understood in the US; take a walk in the sun, get some fresh air; sleep on it, see how you feel.Arethusahttps://www.blogger.com/profile/16715437585568507176noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155099918844350212006-08-08T22:05:00.000-07:002006-08-08T22:05:00.000-07:00I assume you've seen the first person identified t...I assume you've seen the first person identified thru the searches?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155099171924272092006-08-08T21:52:00.000-07:002006-08-08T21:52:00.000-07:00I'm reading through all the comments here, and I h...I'm reading through all the comments here, and I have to say that while I agree in theory that some of this data may contain privacy violations, in practice a lot of the examples given by the rest of you commenters are simply false. Just because someone types in a name does not mean it was an "ego surf" query, and therefore does not identify the querier as the named person. I type all sorts of names into the search box all the time.. business contacts, old college friends I'd like to get in touch with again, people whose papers I've read, etc. It is a bit delusional to think that you could infer someone's identity from all the names they typed in.<BR/><BR/>Now, as for the people who typed in names and soc sec. numbers.. while that is not good, it still does not tie that person's name to the querier. So it identifies one person with one soc#, but it still does not identify the querier.<BR/><BR/>I mean, for all we know, the querier could be some unscrupulous identity thief who has already managed to get a hold of a soc#, and is trying to find out more info on the web. Ok, that may be a bit of a stretch, but it could also be one of the many businesses that ask for your soc, typing the name in to try to find you, because you are delinquent on your payments. The point is, the querier is still not necessarily the person named in the query.<BR/><BR/>And even when you really, really think that someone has issued a query that let's you identify them, i.e.: "Hi, my name is Gary D. Sloquist. Where is my homepage?", how can you be certain that all the other queries that follow with the same ID are all -that same- person? People have families. People share computers. People have friends that come visit, and temporarily "borrow" their logons. How do you really know it is the same person?<BR/><BR/>In general, data like this makes me, nervous, too. But I have to agree with Greg that the benefits outweigh the costs. At least that's how I feel today :-) <BR/><BR/>And Greg has an excellent point when he talks about the scope and scale of the violation. I mean, if you really wanted to find out as much information on some random person as you do in these queries, you could just drive somewhere and steal someone's trash. That is about the scale and scope that we're talking about here.<BR/><BR/>Actually, I think the trash stealer has the potential for even worse privacy violations than from this AOL data. Think about it a minute, please. From this AOL data, let's suppose an identity thief wanted to do something nasty. Well, chances are (because this is a big world we live in) there are dozens of identity thieves all around the world, thinking the exact same thing. And don't you think the credit card companies are going to be highly suspicious when, in two days from now, credit card charges start appearing simultaneously in Kiev, in Sofia, in Lagos, and in Ft. Lauderdale, plus two dozen more cities around the world? Because all the identity thieves will not be coordinated with each other, and they'll all be attempting it at the same time. Easy to detect, easy to shut down.<BR/><BR/>Contrast that with the trash thief. One person. Working small. You probably won't detect it until your next credit card cycle. If you even get the bill, because that one person has redirected that mail. It could take weeks before you know, and by then, lots of damage will be done. MUCH more than through the release of the AOL queries.<BR/><BR/>There is risk everywhere. And while you would be correct if you said "well, YOU wouldn't want your name and soc to appear in this data, would you?", I think the overall scale and scope of this is much smaller than everyone is making this out to be.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155084648855777362006-08-08T17:50:00.000-07:002006-08-08T17:50:00.000-07:00You want to know where people can be identified?he...You want to know where people can be identified?<BR/><BR/>here you go:<BR/>"3406329 rene bastarache 327-70-xxxx 2006-03-11 03:42:03"<BR/><BR/>"5167434 barbara jean leighton soc. security 383-34-xxxx 2006-03-22 16:33:50"<BR/><BR/>"18150462 kristy nicole vega hammond la. social secruity number 437-xx-xxxx birth date xx xx xx drivers license number la. 00765xxxx address xxxxx xxxx dr. hammond la. 2006-03-31 08:32:50"<BR/><BR/>"4186504 locate keith ivan thompson born x may xx social security 236-xx-xxxx last address was xxx street apt xxx aurora colorado 2006-03-27 23:00:41"<BR/><BR/>"11932670 shirley dublin 240-xx-xxxx ss no 2006-05-14 23:39:12"<BR/><BR/>There's a lot more than that too. That's only what I was able to find with a quick text search of just the first 3 of the 10 text documents. Also I'm sure that crossreferencing the user ID's will reveal further information. I did find someone's name, location myspace along with what was presumably their Visa #.<BR/><BR/>It's rather obtuse to say that this is insignificant just because there is so much other ID theft that has happened recently. Plus, this data wasn't even stolen--AOL voluntarily gave it out! ID theft was the first thing that popped into my head when I heard about this and the fact that nobody at AOL realized this before the data were released shows that they seriously need to get their heads out of their butts.Big Poppa Sausagehttps://www.blogger.com/profile/11916877331676450174noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155054786273689222006-08-08T09:33:00.000-07:002006-08-08T09:33:00.000-07:00Thanks, Reto. Those are privacy violations. I ap...Thanks, Reto. Those are privacy violations. I appreciate you pointing out specific examples.<BR/><BR/>I think there a couple mitigating factors with this privacy violation that should be considered.<BR/><BR/>First, the scope and scale of the violation. The number of people potentially impacted is small. The impact on those people is likely small. This is nothing like the release of millions of credit card numbers or social security numbers. The depth of our outrage should be proportional to the damage done.<BR/><BR/>Second, AOL research's motivations in releasing this data appears to be pure. They sought to help the IR research community provide the state of search. You may consider that naive, but I believe that the fact that they were trying to help should lessen our anger.<BR/><BR/>Thanks again, Reto. I appreciate you finding those examples.Greg Lindenhttps://www.blogger.com/profile/09216403000599463072noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155048326402626562006-08-08T07:45:00.000-07:002006-08-08T07:45:00.000-07:00Completely off subject - but hit your blog about e...Completely off subject - but hit your blog about every couple weeks and noticed the new PIC without the beard!!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155044340636147462006-08-08T06:39:00.000-07:002006-08-08T06:39:00.000-07:00Ok, there are two points you seem to have missed t...Ok, there are two points you seem to have missed that make it rather easier than you suggest to identify alot of these users. The risk is not just percieved.<BR/><BR/>Number 1: Each entry is time stamped to the second, and includes the web site the user clicked on. It is incredibly easy for any one who owns a website that one of these AOL users clicked on to access their logs, and correlate the time and HTTP referal headers to that specific user ID. Therefore you can attach an IP address and time to a UserID in the AOL data - this makes identifying the person a lot easier. Now, and this isnt hypothetical at all, assume you are the nytimes.com website, where people have to log on to read certain stories, or any other website that requires registration. You have the persons email address, IP address, possibly their name, and so on.<BR/><BR/>This is not impossible, hell it is not even hard. A significant percentage of these AOL users could be identified like this by websites which a number of the users clicked on.<BR/><BR/>Now point 2: Each UserID is linked to all the queries over 3 months, so you can confirm your IP and possible name data by triangulating possibly personally identifying queries, along with all kinds of other information you wouldnt want someone to know.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1155041054110926872006-08-08T05:44:00.000-07:002006-08-08T05:44:00.000-07:00I've now had a chance to spend 5mins browsing the ...I've now had a chance to spend 5mins browsing the data myself. There's a couple hundred examples of people pasting a phishing email into the search box, each email begins: 'Dear [user's *full* name]'. <BR/><BR/>I picked a name at random and was quickly able to see this guy lives in Ohio but is moving to Georgia, he drives a Chevy van (which he's looking to 'pimp'), he may have stomach cancer, he desperately wants to win the lottery -- and is considering enlarging his... <BR/><BR/>Ahh, in any case I know his full name and where he lives, combined with *very* personal details.Reto Meierhttps://www.blogger.com/profile/04583545000534514486noreply@blogger.com