Thursday, October 19, 2023

Book excerpt: The problem is fake crowds

(This is an excerpt from my book. Please let me know if you like it and want more.)

It is usually unintentional. Companies don’t intend for their websites to fill with spam. Companies don’t intend for their algorithms to amplify propagandists, shills, and scammers.

It can happen just from overlooking the problem then build up over time. Bad actors come in, the problem grows and grows, and eventually becomes difficult and costly to stop.

For the bad guys, the incentives are huge. Get your post trending, and a lot of people will see it. If your product is the first thing people see when they search, you will get a lot of sales. When algorithms recommend your content to people, that means a lot more people will see you. It’s like free advertising.

Adversaries will attack algorithms. They will pay people to offer positive reviews. They will create fake crowds consisting of hundreds of fake accounts, all together liking and sharing their brilliant posts, all together saying how great they are. If wisdom of the crowd algorithms treat these fake crowds as real, the recommendations will be shilled, spammy, and scammy.

Allow the bad guys to create fake crowds and the algorithms will make terrible recommendations. Algorithms try to help people find what they need. They try to show just the right thing to customers at just the right time. But fake crowds make that impossible. Facebook suffers from this problem. An internal study at Facebook looked at why Facebook couldn’t retain young adults. Young people consistently described Facebook as “boring, misleading, and negative” and complained that “they often have to get past irrelevant content to get to what matters.”

Customers won’t stick around if what they see is mostly useless scams. Nowadays, Facebook’s business has stalled because of problems with growth and retention, especially with young adults. Twitter's audience and revenue has cratered.

Bad, manipulated, shilled data means bad recommendations. People won’t like what they are seeing, and they won’t stay around.

Kate Conger wrote in the New York Times about why tech companies sometimes underestimate how bad problems with spam, misinformation, propaganda, and scams will get if neglected. In the early years of Twitter, “they believed that any reprehensible content would be countered or drowned out by other users.” Jason Goldman, who was very early at Twitter, described “a certain amount of idealistic zeal” that they all had, a belief that the crowds would filter out bad content and regulate discussion in the town square.

It wasn’t long until adversaries took advantage of their naiveté: “In September 2016, a Russian troll farm quietly created 2,700 fake Twitter profiles” which they used to shill and promote whatever content they liked, including attempting to manipulate the upcoming US presidential election.

On Facebook, “One Russian- run Facebook page, Heart of Texas, attracted hundreds of thousands of followers by cultivating a narrow, aggrieved identity,” Max Fisher wrote in The Chaos Machine. “‘Like if you agree,’ captioned a viral map with all other states marked ‘awful’ or ‘boring,’ alongside text urging secession from the morally impure union. Some posts presented Texas identity as under siege (‘Like & share if you agree that Texas is a Christian state’).”

Twitter was born around lofty goals of the power of wisdom of the crowds to fix problems. But the founders were naive about how bad the problems could get with bad actors creating fake accounts and controlling multiple accounts. By pretending to be many people, adversaries could effectively vote many times, and give the appearance of a groundswell of faked support and popularity to anything they liked. Twitter’s algorithms would then dutifully pick up the shilled content as trending or popular and amplify it further.

Twitter later “rolled out new policies that were intended to prevent the spread of misinformation,” started taking action against at least some of the bot networks and controlled accounts, and even “banned all forms of political advertising.” That early idealism that “the tweets must flow” and that wisdom of the crowds would take care of all problems was crushed under a flood of manipulated fake accounts.

Bad actors manipulate wisdom of the crowds because it is lucrative to do so. For state actors, propaganda on social media is cheaper than ever. Creating fake crowds feigns popularity for their propaganda, confuses the truth in a flood of claims and counterclaims, and silences opposition. For scammers, wisdom of the crowds algorithms are like free advertising. Just by creating a few hundred fake accounts or by paying others to help shill, they can wrap scams or outright fraud in a veneer of faked reliability and usefulness.

“Successfully gaming the algorithm can make the difference between reaching an audience of millions – or shouting into the wind,” wrote Julia Carrie Wong in the Guardian. Successfully manipulating wisdom of the crowds data tricks trending and recommender algorithms into amplifying. Getting into trending, the top search results, or getting recommended by manipulating using fake and controlled accounts can be a lot cheaper and more effective than buying advertising.

“In addition to distorting the public’s perception of how popular a piece of content is,” Wong wrote, “fake engagement can influence how that content performs in the all-important news feed algorithm.” With fake accounts, bad actors can fake likes and shares, creating fake engagement and fake popularity, and fooling the algorithms into amplifying. “It is a kind of counterfeit currency in Facebook’s attention marketplace.”

“Fake engagement refers to things such as likes, shares, and comments that have been bought or otherwise inauthentically generated on the platform,” Karen Hao wrote in MIT Technology Review. It’s easy to do. “Fake likes and shares [are] produced by automated bots and used to drive up someone’s popularity.”

“Automation, scalability, and anonymity are hallmarks of computational propaganda,” wrote University of Oxford Professor Philip Howard in his recent book Lie Machines. “Programmers who set up vast networks” of shills and bots “have a disproportionate share of the public conversation because of the fake user accounts they control.” For example, “dozens of fake accounts all posing as engaged citizens, down- voting unsympathetic points of view and steering a conversation in the service of some ideological agenda— a key activity in what has come to be known as political astroturfing. Ordinary people who log onto these forums may believe that they are receiving a legitimate signal of public opinion on a topic when they are in effect being fed a narrative by a secret marketing campaign.” Fake crowds create a fake “impression that there is public consensus.” And by manipulating wisdom of the crowds algorithms, adversaries “control the most valuable resource possible … our attention.”

The most important part is at the beginning. Let’s say there is a new post full of misinformation. No one has seen it yet. What it needs is to look popular. What it needs is a lot of clicks, likes, and shares. If you control a few hundred accounts, all you need to do is have them all engage with your new post around the same time. And wow! Suddenly you look popular!

Real people join in later. It is true that real people share misinformation and spread it further. But the critical part is at the start. Fake crowds make something new look popular. It isn’t real. It’s not real people liking and sharing the misinformation. But it works. The algorithms see all the likes and shares. The algorithms think the post is popular. The algorithms amplify the misinformation. Once the algorithms amplify, a lot of real people see the shilled post. It is true that there is authentic engagement from real people. But most important is how everything got started, shilling using fake crowds.

When adversaries shill wisdom of the crowd algorithms, they replace the genuinely popular with whatever they like. This makes the experience worse and eventually hurts growth, retention, and corporate profits. These long-term costs are subtle enough that many tech companies often miss them until the costs become large.

Ranking algorithms use wisdom of the crowds to determine what is popular and interesting. Wisdom of the crowds requires independent opinions. You don't have independent opinions when there is coordinated shilling by adversaries, scammers, and propagandists. Faked crowds make trending, search, and recommendation algorithms useless. To be useful, the algorithms have to use what real people actually like.

No comments: