Sunday, April 30, 2023

Why did wisdom of the crowds fail?

Wisdom of the crowds summarizes the opinions of many people to produce useful results. Wisdom of the crowds algorithms -- like rankers, recommenders, and trending algorithms -- usefully do this at massive scale.

But several years ago, wisdom of the crowds on the internet started failing. Algorithms started recommending misinformation, scams, and disinformation. What happened?

Let's think about it in more detail. What changed that caused problems for wisdom of the crowds? Why did it change? What can we can we do about it?

Importantly, did anyone find ways to mitigate the problems? If some did fix their algorithms from amplifying misinformation on their platforms, how did they do that? And why didn't everyone fix their wisdom of the crowd algorithms to prevent them from amplifying misinformation?

I have my own answers to these questions, but I'm curious to hear others. If you have thoughts, I'm most curious to hear about whether you think anyone (at least partially) addressed the problems aggravating misinformation on the internet and, if so, why you think others have not.

Netflix and their new streaming with ads

I was wondering how well Netflix's new ad-supported plans are doing. There hasn't been a lot of criticial reporting on it, and I'm sure others are wondering too, so let's go take a look at what we can find.

Their Q1 2023 only has a few details, but it sounds like the $7/month ad plan generates more total revenue than the $10/month basic, but does not appear to be more profitable and does not appear to be getting a lot of subscribers.

It's not surprising that customers aren't in love with the new Netflix ad plan. It's got ads and the catalog is smaller, and it's only $3 more a month to upgrade to no ads in basic.

It's also not surprising that Netflix is able to get at least $3/month in ad revenue from these viewers, though it might be surprising if it was also substantially more profitable given the cost of acquiring and serving those ads.

It'll take more time before we'll know how this goes for Netflix. But so far it doesn't seem like it's much of a success?

Only as good as the data

The Washington Post reports on the data used for ChatGPT and other large language models (LLMs):
We found several media outlets that rank low on NewsGuard’s independent scale for trustworthiness: RT.com No. 65, the Russian state-backed propaganda site; breitbart.com No. 159, a well-known source for far-right news and opinion; and vdare.com No. 993, an anti-immigration site that has been associated with white supremacy.

Chatbots have been shown to confidently share incorrect information ... Untrustworthy training data could lead it to spread bias, propaganda and misinformation.

AI is only as good as its data. Obviously using known propaganda like Russia Today will be a problem for ChatGPT. Generally, including disinformation or misinformation will make the output worse.

AI/ML benefits from thinking hard about high quality data and the metrics you use for evaluation. It's all an optimization process. Optimize for the wrong thing and your product will do the wrong thing.

Monday, April 17, 2023

The biggest threat to Google

Nico Grant at the New York Times writes that Google is furiously adding features to its web search, including personalized search and personalized information recommendations, in an "panic" that "A.I. competitors like the new Bing are quickly becoming the most serious threat to Google’s search business in 25 years."

Now, I've long been a huge fan of personalized search (eg. [1] [2]). I love the idea of recommending information based on what interested you in the past. And I'm glad to see so many interested in AI nowadays. But I don't think this is the most serious threat to Google's search business.

The biggest threat to Google is if their search quality drops to the point that switching to alternatives becomes attractive. That could happen for a few reasons, but misinformation is what I'd focus on right now.

Google seems to have forgotten how they achieved their #1 position in the first place. It wasn't that Google search was smarter. It was that Altavista became useless, flooded with stale pages and spam because of layoffs and management dysfunction, so bad that they couldn't update their index anymore. And then everyone switched to Google as the best alternative.

The biggest threat to Google is their ongoing decline in the usefulness of their search. Too many ads, too much of a focus on recency over quality, and far too much spam, scams, and misinformation. When Google becomes useless to people, they will switch, just like they did with Altavista.

Sunday, April 16, 2023

Ubiquitous fake crowds

The Washington Post writes: "The Russian government has become far more successful at manipulating social media and search engine rankings than previously known, boosting ... [propaganda] with hundreds of thousands of fake online accounts ... detected ... only about 1% of the time."

Fake crowds can fake popularity. It's easy to manipulate trending, rankers, and recommender algorithms. All you have to do is create a thousand sockpuppet accounts and have them like and share all your stuff. Wisdom of the crowds is broken.

This can be fixed, but first you have to see the problem clearly. Then you'll see that you can't just use the behavior from every account anymore for wisdom of the crowd algorithms. You have to use only reliable accounts and toss everything spammy or unknown.