Wednesday, November 15, 2023

Book excerpt: How some companies get it right

(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")

How do some companies fix their algorithms? In the last decade, wisdom of the crowds broke, corrupted by bad actors. But some found fixes that let them still use wisdom of the crowds.

Why was Wikipedia resilient to spammers and shills when Facebook and Twitter were not? Diving into how Wikipedia works, this book shows that Wikipedia is not a freewheeling anarchy of wild edits by anyone, but a place where the most reliable and trusted editors have most of the power. A small percentage of dedicated Wikipedia editors have much more control over Wikipedia than the others; their vigilance is the key to keeping out scammers and propagandists.

It's well known that When Larry Page and Sergey Brin first created Google, they invented the PageRank algorithm. Widely considered a breakthrough at the time, PageRank used links between web pages as if they were votes for what was interesting and popular. PageRank says a web page is useful if it has a lot of other useful web pages pointing to it.

Less widely known is that PageRank quickly succumbed to spam. Spammers created millions of web pages with millions of links all pointing to each other, deceiving the PageRank algorithm. Because of spam and manipulation, Google quickly replaced PageRank with the much more resilient TrustRank.

TrustRank only considers links from reliable and trustworthy web pages and mostly ignores links from unknown or untrusted sources. It works by propagating trust along links between web pages from known trusted pages to other pages. TrustRank made manipulating Google's search ranking algorithm much less effective and much more expensive for scammers.

TrustRank also works for social media. Start by identifying thousands of accounts that are known to be reliable, meaning that they are real people posting useful information, and thousands of accounts that are unreliable, meaning that they are known to be spammers and scammers. Then look at the accounts that those accounts follow, like, reshare, or engage with in any way. Those nearby accounts then get a bit of the goodness or badness, spreading through the engagement network. Repeat this over and over, allowing reliability and unreliability to spread across all the accounts, and you know how reliable most accounts are even if they are anonymous.

If you boost reliable accounts and mostly ignore unknown and unreliable accounts, fake accounts become less influential, and it becomes much less cost-effective for bad actors to create influential fake accounts.

Companies that fixed their wisdom of the crowd algorithms also do not use engagement to optimize their algorithms. Optimizing for engagement will cause wisdom of the crowd algorithms to promote scams, spam, and misinformation. Lies get clicks.

It’s a lot of work to not optimize for engagement. Companies like Netflix, Google, YouTube, and Spotify put in considerable effort to run long experiments, often measuring people over months or even years. They then develop proxy short-term metrics that they can use to measure long-term satisfaction and retention over shorter periods of time. One example is satisfied clicks, which are clicks where people are not immediately repelled and spend time using the content they see, ignoring clicks on scams or other low quality content. These companies put in all this effort to develop good metrics because they know that optimizing algorithms for engagement eventually will hurt the company.

Algorithms can be fixed if executives leading the companies decide to fix them. Some companies have successfully prevented bad actors from manipulating wisdom of the crowds. The surprise: Companies make more much more money over the long-run if they don't optimize algorithms for clicks.

No comments: