tag:blogger.com,1999:blog-65696812024-03-12T18:06:32.375-07:00Geeking with GregGreg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.comBlogger1682125tag:blogger.com,1999:blog-6569681.post-48809181153238259322024-01-12T11:53:00.000-08:002024-02-09T09:27:08.945-08:00My book, Algorithms and MisinformationMisinformation and disinformation are the biggest problems on the internet.
<p />
To solve a problem, you need to understand the problem. In <i>Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It</i>, I claim that the problem is not that misinformation exists, but that so many people see it. I explain why algorithms amplify scams and propaganda, how it easily can happen unintentionally, and offer solutions.
<p />
You can read much of the book for free. If you want a single article summary, this overview describes the entire book:
<ul>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-overview-from-book-proposal.html">Overview from the book proposal</a>
</ul>
If you are interested in what you might get from skimming the book, you might be interested in a bit more:
<ul>
<li><a href="https://glinden.blogspot.com/2023/10/a-summary-of-my-book.html">Summary</a></li>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-overview-from-book-proposal.html">Overview from the book proposal</a></li>
<li><a href="https://glinden.blogspot.com/2023/11/book-excerpt-table-of-contents.html">Table of Contents</a></li>
<li><a href="https://glinden.blogspot.com/2024/01/book-excerpt-conclusion.html">Conclusion</a></li>
</ul>
If you want part of what you might get from reading the entire book, you may want all the excerpts:
<ul>
<li><a href="https://glinden.blogspot.com/2023/10/a-summary-of-my-book.html">Summary</a></li>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-overview-from-book-proposal.html">Overview from the book proposal</a></li>
<li><a href="https://glinden.blogspot.com/2023/11/book-excerpt-table-of-contents.html">Table of Contents</a></li>
<li><a href="https://glinden.blogspot.com/2023/12/book-excerpt-first-pages-of-book.html">First pages of the book</a></li>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-problem-is-not-algorithm.html">The problem is not the algorithm</a></li>
<li><a href="https://glinden.blogspot.com/2023/12/book-excerpt-rise-and-fall-of-wisdom-of.html">The rise and fall of wisdom of the crowds</a></li>
<li><a href="https://glinden.blogspot.com/2023/11/book-excerpt-how-companies-build.html">How companies build algorithms using experimentation</a></li>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-metrics-chasing-engagement.html">Metrics chasing engagement</a></li>
<li><a href="https://glinden.blogspot.com/2023/12/book-excerpt-bonuses-and-promotions.html">Bonuses and promotions causing bad incentives</a></li>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-irresistible-lure-of.html">The irresistible lure of an unlocked house</a></li>
<li><a href="https://glinden.blogspot.com/2023/12/book-excerpt-manipulating-likes.html">Manipulating likes, comments, shares, and follows</a></li>
<li><a href="https://glinden.blogspot.com/2023/12/book-excerpt-manipulating-customer.html">Manipulating customer reviews</a></li>
<li><a href="https://glinden.blogspot.com/2023/12/extended-book-excerpt-computational.html">Computational propaganda</a></li>
<li><a href="https://glinden.blogspot.com/2023/11/book-excerpt-how-some-companies-get-it.html">How some companies get it right</a></li>
<li><a href="https://glinden.blogspot.com/2024/01/book-excerpt-data-and-metrics-determine.html">Data and metrics determine what algorithms do</a></li>
<li><a href="https://glinden.blogspot.com/2023/11/book-excerpt-problem-is-bad-incentives.html">The problem is bad incentives</a></li>
<li><a href="https://glinden.blogspot.com/2023/11/book-excerpt-people-determine-what.html">People determine what the algorithms do</a></li>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-problem-is-fake-crowds.html">The problem is fake crowds</a></li>
<li><a href="https://glinden.blogspot.com/2023/12/book-excerpt-wisdom-of-trustworthy.html">Wisdom of the trustworthy</a></li>
<li><a href="https://glinden.blogspot.com/2023/10/book-excerpt-mark-as-spam-long-fight-to.html">Mark as spam, the long fight to keep emails and texts useful</a></li>
<li><a href="https://glinden.blogspot.com/2024/01/book-excerpt-use-only-trustworthy.html">Use only trustworthy behavior data</a></li>
<li><a href="https://glinden.blogspot.com/2024/01/book-excerpt-win-win-win-for-customers.html">A win-win-win for customers, companies, and society</a></li>
<li><a href="https://glinden.blogspot.com/2024/01/book-excerpt-from-hope-to-despair-and.html">From hope to despair and back to hope</a></li>
<li><a href="https://glinden.blogspot.com/2024/01/book-excerpt-conclusion.html">Conclusion</a></li>
</ul>
I wanted this book to be a part of the debate on how to solve misinformation and disinformation on the internet. This book offers some practical solutions. It was intended to be an essential part of the discussion about viable solutions to what has become one of the biggest problems of our time.
<p />
I wrote, developed, and edited this book over four years. It was under contract with two agents for a year but will not be published. The full manuscript had many more examples, interviews, and stories, but you can get some of what you would have gotten by reading the book by reading all the excerpts above.
<p />
Some might want to jump straight to ideas for solutions. I think solutions depend on who you are.
<p />
For those inside of tech companies, this book shows how other companies have fixed this and made more revenue. Because it's easy for executives to unintentionally cause search and recommendations to amplify scams, it's important for everyone to question what algorithms are optimized for and make sure they point toward the long-term growth of the company.
<p />
For the average person, because the book shows companies actually make more money when they don't allow their algorithms to promote scams, this book gives hope that complaining about scammy products and stopping use of those products will change the internet we use every day.
<p />
For policy makers, because it's hard to regulate AI but easy to regulate what they already know how to regulate, this book claims they should target scammy advertising that funds misinformation, increase fines for promoting fraud, and ramp up antitrust efforts (to increase consumers' ability to switch to alternatives and further raise long-term costs on companies that enshittify their products).
<p />
Why these are the solutions requires exploring the problem. Most of the book is about how companies build their algorithms -- optimizing them over time -- and how that can accidentally amplify misinformation. To solve the problem, focus not on that misinformation exists, but that people see too much misinformation and disinformation. If the goal is to reduce it to nuisance levels, we can fix misinformation on the internet.
<p />
Through stories, examples, and research, this book showed why so many people see misinformation and disinformation, that it is often unintentional, and that it doesn't maximize revenue for companies. Understanding why we see so much misinformation is the key to coming up with practical solutions.
<p />
I hope others find this useful. If you do, please let me know.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com1tag:blogger.com,1999:blog-6569681.post-29197660262433119252024-01-10T13:37:00.000-08:002024-01-10T13:37:39.214-08:00Book excerpt: Conclusion
<i>(This is one version of the conclusion from my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Wisdom of crowds is the idea that summarizing the opinions of a lot of people often is very useful. Computers can do this too. Wisdom of the crowd algorithms operating at a massive scale pick everything we see when we use the internet.
<p />
Computer algorithms look at people's actions as if they were votes for what is interesting and important. Search and recommendations on your favorite websites combine what millions of people do to help you find what you need.
<p />
In recent years, something has gone terribly wrong. Wisdom of the crowds has failed us.
<p />
Misinformation and scammers are everywhere on the internet. You cannot buy something from Amazon, see what friends are doing on Facebook, or try to read news online without encountering fraudsters and propagandists.
<p />
This is the story of what happened and how to fix it, told by the insiders that built the internet we have today.
<p />
Throughout the last thirty years of the internet, we fought fraudsters and scammers trying to manipulate what people see. We fought scammers when we built web search. We fought spammers trying to get into our email inboxes. We fought shills when we built algorithms recommending what to buy.
<p />
Seeing these hard battles through the lens of insiders reveals an otherwise hidden insight: how companies optimize their algorithms is what amplifies misinformation and causes the problems we have today.
<p />
The problem is not the algorithm. The problem is how algorithms are tuned and optimized.
<p />
Algorithms will eventually show whatever the team is rewarded for making the algorithms show. When algorithms are optimized badly, they can do a lot of harm. Through the metrics and incentives they set up, teams and executives control these algorithms and how they are optimized over time.
<p />
We have control. People control the algorithms. We should make sure these algorithms built by people work well for people.
<p />
Wisdom of the crowd algorithms such as recommender systems, trending, and search rankers are everywhere on the internet. Because they control what billions see every day, these algorithms are enormously valuable.
<p />
The algorithms are supposed to work by sharing what people find interesting with other people who have not seen it yet. They can enhance discovery and help people discover things they would not have found on their own, but only if they are optimized and tuned properly.
<p />
Short-term measures like clicks are bad metrics for algorithms. These metrics encourage scams, sensationalistic content, and misinformation. Amplifying fraudsters and propagandists creates a terrible experience for customers and eventually hurts the company.
<p />
Wisdom of the crowds doesn’t work when the crowds are fake. Wisdom of the crowds will amplify scams and propaganda if a few people can shout down everyone else with their hordes of bots and shills. Wisdom of the crowds requires information from real, independent people.
<p />
If executives tell teams to optimize for clicks, it can be hard to remove fake accounts, shills, and sockpuppets. Click metrics will be higher when bad actors shill because fake crowds faking popularity looks like a lot of new fake accounts creating a lot of new fake engagement. But none of it is real, and none of it helps the company or its customers in the long run.
<p />
Part of the solution is only using reliable accounts for wisdom of crowds. Wisdom of the trustworthy makes it much harder for bad actors to create fake crowds and feign popularity. Wisdom of the trustworthy means algorithms only use provably human and reliable accounts as input to the algorithms. To deter fraudsters from creating lots of fake accounts, trust must be hard to gain and easy to lose.
<p />
Part of the solution is to recognize that most metrics are flawed proxies for what you really want. What companies really want is satisfied customers that stay with you for a long time. Always question whether your metrics are pointing you at the right target. Always question if your wisdom of the crowd algorithms are usefully helping customers.
<p />
It's important to view optimizing algorithms as investing in the long-term. Inside tech companies, to measure the success of those investments and the long-term success of the company, teams should run long experiments to learn more about long-term harm and costs. Develop metrics that approximate long-term retention and growth. Everyone on teams should constantly question metrics and frequently change goal metrics to improve them.
<p />
As Google, Netflix, YouTube, and Spotify have discovered, companies make more money if they invest in good algorithms that don't chase clicks.
<p />
Even so, some companies may need encouragement to focus on the long-term, especially if their market power means customers have nowhere else to go.
<p />
Consumer groups and policy makers can help by pushing for more regulation of the advertising that funds scams, antitrust enforcement to maintain competition and offer alternative to consumers that are fed up with enshittified products, and the real threat of substantial and painful fines for failing to minimize scams and fraud.
<p />
We can have the internet we want. We can protect ourselves from financial scams, consumer fraud, and political propaganda. We can fix misinformation on the internet.
<p />
With a deeper understanding of why wisdom of the crowd algorithms can amplify misinformation and cause harm, we can fix seemingly ungovernable algorithms.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-52272840340428863622024-01-08T09:10:00.000-08:002024-01-08T09:10:28.477-08:00Book excerpt: A win-win-win for customers, companies, and society<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")</i>
<p />
Everyone wins -- companies, consumers, and society -- if companies fix their algorithms to stop amplifying scams and misinformation.
<p />
Executives are often tempted to reward their teams for simpler success metrics like engagement. But companies make more money if they focus on long-term customer satisfaction and retention.
<p />
YouTube had a problem. They asked customers, “What’s the biggest problem with your homepage today?” The answer came back: “The #1 issue was that viewers were getting too many already watched videos on their homepage.” In our interview, YouTube Director Todd Beaupré discussed how YouTube made more money by optimizing their algorithms for diversity, customer retention, and long-term customer satisfaction.
<p />
YouTube ran experiments. They found that reducing already watched recommendations reduced how many videos people watched from their home page. Beaupré said, “What was surprising, however, was that viewers were watching more videos on YouTube overall. Not only were they finding another video to enjoy to replace the lost engagement from the already watched recommendations on the homepage, they found additional videos to watch as well. There were learning effects too. As the experiment ran for several months, the gains increased.”
<p />
Optimizing not for accuracy but for discovery turned out to be one of YouTube’s biggest wins. Beaupré said, “Not only did we launch this change, but we launched several more variants that reduced already watched recommendations that combined to be the most impactful launch series related to growing engagement and satisfaction that year.”
<p />
Spotify researchers found the same thing, that optimizing for engagement right now misses a chance to show something that will increase customer engagement in the future. They <a href="https://research.atspotify.com/publications/algorithmic-balancing-of-familiarity-similarity-discovery-in-music-recommendations/">said</a>, “Good discoveries often lead to downstream listens from the user. Driving discovery can help reduce staleness of recommendations, leading to greater user satisfaction and engagement, thereby resulting in increased user retention. Blindly optimizing for familiarity results in potential long term harms.” In the short-term, showing obvious and familiar things might get a click. In the long-term, helping customers discover new things leads to greater satisfaction and better retention.
<p />
Companies that don't optimize for engagement make more money. In a paper “<a href="https://research.google/pubs/focus-on-the-long-term-its-better-for-users-and-business/">Focus on the Long-Term: It’s Better for Users and Business</a>,” Googlers wrote that “optimizing based on short-term revenue is the obvious and easy thing to do, but may be detrimental in the long-term if user experience is negatively impacted.” What can look like a loss in short-term revenue can actually be a gain in long-term revenue.
<p />
Google researchers found that it was very important to measure long-term revenue because optimizing for engagement ignores that too many ads will make people ignore your ads or stop coming entirely. Google said investing in cutting ads in half in their product improved customer satisfaction and resulted in a net positive change in ad revenue, but they could only see that they made more money when they measured over long periods of time.
<p />
Netflix uses very long experiments to keep their algorithms targeting long-term revenue. From the paper "<a href="https://dl.acm.org/doi/10.1145/2843948">Netflix Recommender System</a>": “We ... let the members in each [experimental group] interact with the product over a period of months, typically 2 to 6 months ...The time scale of our A/B tests might seem long, especially compared to those used by many other companies to optimize metrics, such as click-through rates ... We build algorithms toward the goal of maximizing medium-term engagement with Netflix and member retention rates ... If we create a more compelling service by offering personalized recommendations, we induce members who were on the fence to stay longer, and improve retention.”
<p />
Netflix's goal is keeping customers using the product. If customers stay, they keep generating revenue, which maximizes long-term business value. “Over years of development of personalization and recommendations, we have reduced churn by several percentage points. Reduction of monthly churn both increases the lifetime of an existing subscriber and reduces the number of new subscribers we need to acquire.”
<p />
Google revealed how they made more money when they did not optimize for engagement. Netflix revealed they focus on keeping people watching Netflix for many years, including their unusually lengthy experiments that sometimes last over a year, because that makes them more money. Spotify researchers revealed how they keep people subscribing longer when they suggest less obvious, more diverse, and more useful recommendations, making them more money. YouTube, after initially optimizing for engagement, switched to optimizing for keeping people coming back to YouTube over years, finding that is what made them the most money in the long run.
<p />
Scam-filled, engagement-hungry, or manipulated algorithms make less money than helpful algorithms. Companies such as Google, YouTube, Netflix, Wikipedia, and Spotify offer lessons for companies such as Facebook, Twitter, and Amazon.
<p />
Some companies know that adversaries attack and shill their algorithms because the profit motive is so high from getting to the top of trending algorithms or recommendations. Some companies know that if they invest in eliminating spam, shilling, and manipulation, that investment will pay off in customer satisfaction and higher growth and revenue in the future. Some companies align the interests of their customers and the company by optimizing algorithms for long-term customer satisfaction, retention, and growth.
<p />
Wisdom of the crowds failed the internet. Then the algorithms that depend on wisdom of the crowds amplified misinformation across the internet. Some already have shown the way to fix the problem. If all of us borrow lessons from those that already have solutions, we can solve the problem of algorithms amplifying misinformation. All companies can fix their algorithms, and they will make more money if they do.
<p />
Many executives are unaware of the harms of optimizing for engagement. Many do not realize when they are hurting the long-term success of the company.
<p />
This book has recommendations for regulators and policy makers, focusing their work on incentives including executive compensation and the advertising that funds misinformation and scams. This book provides examples to teams inside companies of why they should not optimize for engagement and what companies do instead. And this book provides evidence consumers can use to advocate for companies better helping their customers while also increasing profits for the company.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com4tag:blogger.com,1999:blog-6569681.post-60981395288304288052024-01-07T15:02:00.000-08:002024-01-07T15:02:08.872-08:00Book excerpt: Use only trustworthy behavior data<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")</i>
<p />
Adversaries manipulate wisdom of crowds algorithms by controlling a crowd of accounts.
<p />
Their controlled accounts can then coordinate to shill whatever they like, shout down opposing views, and create an overwhelming flood of propaganda that makes it hard for real people to find real information in the sea of noise.
<p />
The Aspen Institute Commission, in a report titled <a href="https://www.aspeninstitute.org/publications/commission-on-information-disorder-final-report/">Commission on Information Disorder</a>, suggests the problem is often confined to a surprisingly small number of accounts, amplified by coordinated activity from other controlled accounts.
<p />
They describe how it works: “Research reveals that a small number of people and/or organizations are responsible for a vast proportion of misinformation (aka ‘superspreaders’) ... deploying bots to promote their content ... Some of the most virulent propagators of falsehood are those with the highest profile [who are often] held to a lower standard of accountability than others ... Many of these merchants of doubt care less about whether they lie, than whether they successfully persuade, either with twisted facts or outright lies.”
<p />
The authors of this report offer a solution. They suggest that these manipulative accounts should not be amplified by algorithms, making the spreading of misinformation much more costly and much more difficult to do efficiently.
<p />
Specifically, they argue social media companies and government regulators should “hold superspreaders of mis- and disinformation to account with clear, transparent, and consistently applied policies that enable quicker, more decisive actions and penalties, commensurate with their impacts — regardless of location, or political views, or role in society.”
<p />
Because just a few accounts, supported by substantial networks of controlled shill accounts, are the problem, they add that social media should focus “on highly visible accounts that repeatedly spread harmful misinformation that can lead to significant harms.”
<p />
Problems with adversaries manipulating, shilling, and spamming have a long history. One way to figure out how to solve the problem is to look at how others mitigated these issues in the past.
<p />
Particularly helpful are the solutions for web spam. As described in the research paper "<a href="http://i.stanford.edu/~kvijay/krishnan-raj-airweb06.pdf">Web Spam Detection with Anti-Trust Rank</a>", web spam is “artificially making a webpage appear in the top results to various queries on a search engine.” The web spam problem is essentially the same problem faced by social media rankers and recommenders. Spammers manipulate the data that ranking and recommender algorithms use to determine what content to surface and amplify.
<p />
The researchers described how bad actors create web spam: “A very common example ... [is] creating link farms, where webpages mutually reinforce each other ... [This] link spamming also includes ... putting links from accessible pages to the spam page, such as posting web links on publicly accessible blogs.”
<p />
This is essentially the same techniques used by adversaries for social media; adversaries use controlled accounts and bots to post, reshare, and like content, reinforcing how popular it appears.
<p />
To fix misinformation on social media, learn from what has worked elsewhere. <a href="https://en.wikipedia.org/wiki/TrustRank">TrustRank</a> is a popular and widely used technique in web search engines to reduce the efficiency, effectiveness, and prevalence of web spam. It “effectively removes most of the spam” without negatively impacting non-spam content.
<p />
How does it work? “By exploiting the intuition that good pages -- i.e. those of high quality -- are very unlikely to point to spam pages or pages of low quality.”
<p />
The idea behind TrustRank is to start from the trustworthy and view the actions of those trustworthy people to also be likely to be trustworthy. Trusted accounts link to, like, share, and post information that is trustworthy. Everything they say is trustworthy is now mostly trustworthy too, and the process repeats. In this way, trust gradually propagates out from a seed of known reliable accounts to others.
<p />
As the "<a href="https://dl.acm.org/doi/10.5555/1316689.1316740">Combating Web Spam with TrustRank</a>" researchers put it, “We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good ... The algorithm identifies other pages that are likely to be good based on their connectivity with the good seed pages.”
<p />
TrustRank works for web spam in web search engines. “We can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.” Later work suggested adding in <a href="http://i.stanford.edu/~kvijay/krishnan-raj-airweb06.pdf">Anti-Trust Rank</a> has some benefits as well, which works by taking a set of known untrustworthy people who have a history of spamming, shilling, and attempting to manipulate the ranker algorithms, then assuming that everything they have touched are all also likely to be untrustworthy.
<p />
In social media, much of the problem is not that bad content exists at all, but that bad content is amplified by algorithms. Specifically, rankers and recommenders on social media look at likes, shares, and posts, then think that shilled content is popular, so the algorithms share the shilled content with others.
<p />
The way this works, both for web search and for social media, is that wisdom of the crowd algorithms including rankers and recommenders count votes. A link, like, click, purchase, rating, or share is a vote that a piece of content is useful, interesting, or good. What is popular or trending is what gets the most votes.
<p />
Counting votes in this way easily can be manipulated by people who create or use many controlled accounts. Bad actors vote many times, effectively stuffing the ballot box, to get what they want on top.
<p />
If wisdom of crowds only uses trustworthy data from trustworthy accounts, shilling, spamming, and manipulation becomes much more difficult.
<p />
Only accounts known to be trustworthy should matter for what is considered popular. Known untrustworthy accounts with a history of being involved in propaganda and shilling should have their content hidden or ignored. And unknown accounts, such as brand new accounts or accounts that have no connection to trustworthy accounts, also should be ignored as potentially harmful and not worth the risk of including.
<p />
Wisdom of the trustworthy dramatically raises the costs for adversaries. No longer can a few dozen accounts, acting together, successfully shill content.
<p />
Now, only trustworthy accounts amplify. And because trust is hard to gain and easily lost, disinformation campaigns, propaganda, shilling, and spamming often become cost prohibitive for adversaries.
<p />
As Harvard fellow and security expert Bruce Schneier wrote in a piece for Foreign Policy titled “<a href="https://foreignpolicy.com/2019/08/12/8-ways-to-stay-ahead-of-influence-operations/">8 Ways to Stay Ahead of Influence Operations</a>,” the problem is recognizing these fake accounts that are all acting together in a coordinated way to manipulate the algorithms and not using their data to inform ranker and recommender algorithms.
<p />
Schneier wrote, “Social media companies need to detect and delete accounts belonging to propagandists as well as bots and groups run by those propagandists. Troll farms exhibit particular behaviors that the platforms need to be able to recognize.”
<p />
Shills and trolls are shilling and trolling. That is not normal human behavior.
<p />
Real humans don’t all act together, at the same time, to like and share some new content. Real humans cannot act many times per second or vote on content they have never seen. Real humans cannot all like and share content from a pundit as soon as it appears and then all do it again exactly in the same way for the next piece of content from that pundit.
<p />
When bad actors use controlled fake accounts to stuff the ballot box, the behavior is blatantly not normal.
<p />
There are a lot of accounts in social media today that are being used to manipulate the wisdom of the crowd algorithms. Their clicks, likes, and shares are bogus and should not be used by the algorithms.
<p />
Researchers in Finland studying the phenomenon back in 2021 <a href="https://arxiv.org/abs/2105.10671">wrote</a> that “5-10% of Twitter accounts are bots and responsible for the generation of 20-25% of all tweets.” The researchers describe these compromised accounts as “cyborgs” and write that they “have characteristics of both human-generated and bot-generated accounts."
<p />
These controlled accounts are unusually active, producing a far larger percentage of all tweets than the percentage of accounts they represent. This also was a low estimate on the total amount of manipulated accounts in social media as it did not include compromised accounts, accounts that are paid to shill, or accounts paid to disclose their password so they can sometimes be used by someone else to shill.
<p />
Because bad actors using accounts to spam and shill must quickly act in concert to spam and shill, and often do so repeatedly with the same accounts, their behavior is not normal. Their unusually active and unusually timed actions can be detected.
<p />
One detection tool <a href="https://ojs.aaai.org/index.php/AAAI/article/view/11268">published</a> by researchers at the American Association for Artificial Intelligence (AAAI) conference was a “classifier ... capturing the local and global variations of observed characteristics along the propagation path ... The proposed model detected fake news within 5 min of its spread with 92 percent accuracy for Weibo and 85 percent accuracy for Twitter.”
<p />
Professor Kate Starbird, who runs a research group studying disinformation at University of Washington, wrote how social media companies have taken exactly the wrong approach, exempting prominent accounts associated with misinformation, disinformation, and propaganda rather than subjecting them and their shills to skepticism and scrutiny. Starbird <a href="https://twitter.com/katestarbird/status/1460667299447246852">wrote</a>, “Research shows that a small number of accounts have outsized impact on the spread of harmful misinfo (e.g. around vaccines and false/misleading claims of voter fraud). Instead of whitelisting these prominent accounts, they should be held to higher levels of scrutiny and accountability.”
<p />
Researchers have explained the problem, being willing to amplify anything that isn’t provably bad rather than only amplifying that which is known to be trustworthy. In a piece titled <a href="https://yalereview.org/article/computational-propaganda">Computational Propaganda</a>, Stanford Internet Observatory researcher Renee DiResta wrote, “Our commitment to free speech has rendered us hesitant to take down disinformation and propaganda until it is conclusively and concretely identified as such beyond a reasonable doubt. That hesitation gives ... propagandists an opportunity.”
<p />
The hesitation is problematic, as it makes it easy to manipulate wisdom of crowds algorithms. “Incentive structures, design decisions, and technology have delivered a manipulatable system that is being gamed by propagandists,” DiResta said. “Social algorithms are designed to amplify what people are talking about, and popularity is ... easy to feign.”
<p />
Rather than starting from the assumption that every account is real, the algorithms should start with the assumption that every account is fake.
<p />
Only provably trustworthy accounts should be used by wisdom of the crowd algorithms such as trending, rankers, and recommenders. When considering what is popular, not only should fake accounts coordinating to shill be ignored, but also there should be considerable skepticism toward new accounts that have not been proven to be independent of the others.
<p />
With wisdom of crowds algorithms, rather than think of which accounts should be banned and not used, consider the minimum number of trustworthy accounts needed to not lower the perceived quality of the recommendations. There is <a href="https://www.nuriaoliver.com/recsys/wisdomFew_sigir09.pdf">no reason</a> to use all the data when the biggest problem is shilled and untrustworthy data.
<p />
Companies are playing whack-a-mole with bad actors who just create new accounts or find new shills every time they’re whacked because it’s so profitable -- like free advertising -- to create fake crowds that manipulate the algorithms.
<p />
Propagandists and scammers are loving it and winning. It’s easy and lucrative for them.
<p />
Rather than classify accounts as spam, classify accounts as trustworthy. Only use trustworthy data as input to the algorithms, ignoring anything unknown or borderline as well as known spammers and shills.
<p />
Toss big data happily, anything suspicious at all. Do not be concerned about false positives galore accidentally marking new accounts or borderline accounts as shills when deciding what to input to the recommender algorithms. None of that matters if it does not reduce the perceived quality of the recommendations.
<p />
As with web spam and e-mail spam, the goal isn’t eliminating manipulation, coordination, disinformation, scams, and propaganda.
<p />
The goal is raising the costs on adversaries, ideally to the point where most of it is no longer cost-effective. If bad actors no longer find it easy and effective to try to manipulate recommender systems on social media, most will stop.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-30231324258955413222024-01-04T14:19:00.000-08:002024-01-06T10:54:44.497-08:00Book excerpt: Data and metrics determine what algorithms do<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Wisdom of the crowd algorithms, including rankers and recommenders, work from data about what people like and do. Teams inside tech companies gather user behavior data then tune and optimize algorithms to maximize measurable targets.
<p />
The data quality and team incentives control what the algorithms produce and how useful it is. When the behavior data or goal metrics are bad, the outcome will be bad. When the wisdom of the crowds data is trustworthy and when the algorithms are optimized for the long-term, algorithms like recommendations will be useful and helpful.
<p />
Queensland University Professor Rachel Thomas <a href="https://arxiv.org/abs/2002.08512">warned</a> that “unthinking pursuit of metric optimization can lead to real-world harms, including recommendation systems promoting radicalization .... The harms caused when metrics are overemphasized include manipulation, gaming, a focus on short-term outcomes to the detriment of longer-term values ... particularly when done in an environment designed to exploit people’s impulses and weaknesses."
<p />
The problem is that “metrics tend to overemphasize short-term concerns.” Thomas gave as an example the problems that YouTube had before 2017 because they years earlier picked “watch time” (how long people spend watching videos) as a proxy metric for user satisfaction. An algorithm that tries to pick videos people will watch right now will tend to show anything to get a click including risqué videos or lies that get people angry. So YouTube struggled with their algorithms amplifying sensationalistic videos and scams. These clickbait videos looked great on short-term metrics like watch time but repelled users in the long-term.
<p />
“AI is very effective at optimizing metrics,” Thomas said. Unfortunately, if you pick the wrong metrics, AI will happily optimize for the wrong thing. “The unreasonable effectiveness of metric optimization in current AI approaches is a fundamental challenge to the field and yields an inherent contradiction: solely optimizing metrics leads to far from optimal outcomes.”
<p />
Unfortunately, it’s impossible to get a perfect success metric for algorithms. Not only are metrics “just a proxy for what you really care about,” but also all “metrics can, and will be gamed.” The goal has to be to make the success metrics as good as possible and keep fixing the metrics as they drift away from the real goal of the long-term success of the company. Only by constantly fixing the metrics will teams optimize the algorithms to help the company grow and profit over the years.
<p />
A classic article by Steven Kerr, “<a href="https://www.jstor.org/stable/4165235">On the folly of rewarding A while hoping for B,”</a> was originally published back in 1975. The author wrote: “Many managers seek to establish simple, quantifiable standards against which to measure and reward performance. Such efforts may be successful in highly predictable areas within an organization, but are likely to cause goal displacement when applied anywhere else.”
<p />
Machine learning algorithms need a target. Teams need to have success metrics for algorithms so they know how to make them better. But it is important to recognize that metrics are likely to be wrong and to keep trying to make them better.
<p />
You get what you measure. When managers pick a metric, there are almost always rewards and incentives tied to that metric. Over time, as people optimize for the metric, you will get that metric maximized, often at the expense of everything else, and often harming the true goals of the organization.
<p />
Kerr went on to say, “Explore what types of behavior are currently being rewarded. Chances are excellent that ... managers will be surprised by what they find -- that firms are not rewarding what they assume they are.” An editor when Kerr's article was republished in 1995 summarized this as, “It’s the reward system, stupid!”
<p />
Metrics are hard to get right, especially because they often end up being a moving target over time. The moment you put a metric in place, people both inside and outside the company will start to find ways to succeed against that metric, often finding cheats and tricks that move the metric without helping customers or the company. It's as Goodhart’s Law <a href="https://en.wikipedia.org/wiki/Goodhart%27s_law">says</a>: “When a measure becomes the target, it ceases to be an effective measure.”
<p />
One familiar example to all of us is the rapid growth of clickbait headlines -- “You won’t believe what happens next” -- that provide no value but try to get people to click. This happened because the headline writers were rewarded for getting a click, whether or not they do it through deception. When what the organization optimizes is getting a click, teams will drive clicks.
<p />
Often companies pick poor success metrics such as clicks <a href="https://en.wikipedia.org/wiki/McNamara_fallacy">just because it is too hard</a> to measure the things that matter most. Long-term metrics that try to be good proxies for what we really care about such as retention, long-term growth, long-term revenue, and customer satisfaction can be costly to measure. And, because of Goodhart’s Law, the metrics will not work forever and will need to be changed over time. Considerable effort is necessary.
<p />
Many leaders don’t realize the consequences of not putting in that effort. You will get what you measure. Unless you reward teams for the long-term growth and profitability of the company, teams will not optimize for the success of the company or shareholders.
<p />
What can companies do? Professor Thomas went on to say that companies should “use a slate of metrics to get a fuller picture and reduce gaming” which can “keep metrics in their place.” The intent is that gaming of one metric may be visible in another, so a slate with many metrics may show problems that otherwise might be missed. Another idea is changing metrics frequently, which also can reduce gaming and provides an opportunity to adjust metrics so they are closer to the true target.
<p />
Getting this wrong causes a lot of harm to the company and sometimes to others as well. “A modern AI case study can be drawn from recommendation systems,” Thomas writes. “Platforms are rife with attempts to game their algorithms, to show up higher in search results or recommended content, through fake clicks, fake reviews, fake followers, and more.”
<p />
“It is much easier to measure short-term quantities [such as] click-through rates,” Thomas said. But “many long-term trends have a complex mix of factors and are tougher to quantify.” There is a substantial risk if teams, executives, and companies get their metrics wrong. “Facebook has been the subject of years’ worth of ... scandals ... which is now having a longer-term negative impact on Facebook’s ability to recruit new engineers” and grow among younger users.
<p />
As Googler and AI expert François Chollet once <a href="https://twitter.com/fchollet/status/1532759198517170177">said</a>, “Over a short time scale, the problem of surfacing great content is an algorithmic problem (or a curation problem). But over a long time scale, it's an incentive engineering problem.”
<p />
It is the optimization of the algorithms, not the algorithms themselves, that determine what they show. Incentives, rewards, and metrics that determine what wisdom of the crowd algorithms do. That is why metrics and incentives are so important.
<p />
Get the metrics wrong, and the long-term costs for the company — stalled growth, poor retention, poor reputation, regulatory risk — become worse and worse. Because the algorithms are optimized over time, it is important to be constantly fixing the data and metrics to make sure they are trustworthy and doing the right thing. Trustworthy data and long-term metrics lead to algorithms that minimize scams and maximize long-term growth and profits.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-22147636278044323142024-01-03T14:12:00.000-08:002024-01-03T14:12:16.954-08:00Book excerpt: From hope to despair and back to hope<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Twenty five years ago, when recommendation algorithms first launched at large scale on the internet, these algorithms helped people discover new books to read and new movies to watch.
<p />
In recent years, wisdom of the crowds failed the internet, and the internet filled with misinformation.
<p />
The story of why this happened — and how to fix it — runs through the algorithms that pick what we see on the internet. Algorithms use wisdom of the crowds at a massive scale to find what is popular and interesting. That is how they determine what to show to millions of people. When these algorithms fail, misinformation flourishes.
<p />
The reason the algorithms fail is not what you think. It is not the algorithms.
<p />
Only with an insider view can readers see how the algorithms work and how tech companies build these algorithms. The surprise is that the algorithms are actually made of people.
<p />
People build and maintain these algorithms. Wisdom of the crowds works using data about what people do. The key to why algorithms go wrong, and how they can be fixed, runs through people and the incentives people have.
<p />
When bad actors manipulate algorithms, they are trying to get their scams and misinformation seen by as many people as possible as cheaply as possible.
<p />
When teams inside companies optimize algorithms, they are trying to meet the goals executives set for them, whatever those goals are and regardless of whether they are the right goals for the company.
<p />
People’s incentives control what the algorithms do. And incentives are the key to fixing misinformation on the internet.
<p />
To make wisdom of the crowds useful again, and to make misinformation ineffective, all companies must use only reliable data and must not optimize their algorithms for engagement. As this book shows, these solutions reduce the reach of misinformation, making it far less effective and far more expensive for scammers and fraudsters.
<p />
We know these solutions work because some companies did it. Exposing a gold mine of knowledge buried deep inside the major tech companies, this book shows that some successfully stopped their algorithms from amplifying misinformation by not optimizing for engagement. And, importantly, these companies made more money by doing so.
<p />
Companies that have not fixed their algorithms have taken a dark path, blinded by short-term optimization for engagement, and the teams deceived by bad incentives and bad metrics inside of their companies. This book shows the way out for those led astray.
<p />
People inside and outside the powerful tech companies, including consumers and policy makers, can help align incentives away from short-term engagement and toward long-term customer satisfaction and growth.
<p />
It turns out it's a win-win to listen to consumers and optimize algorithms to be helpful for your customers in the long-term. Nudging people's incentives in practical ways is easier once you see inside the companies, understand how they build these algorithms, and see that companies make more money when they do not myopically optimize their algorithms in ways that later will cause a flood of misinformation and scams.
<p />
Wisdom of the crowd algorithms are everywhere on the internet. Readers of this book started out feeling powerless to fix the algorithms that control everything we see and the misinformation these algorithms promote. Readers of this book end this book hopeful and ready to push for change.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-15088290194663751112023-12-27T10:24:00.000-08:002023-12-27T10:24:15.197-08:00Book excerpt: First pages of the book<i>(This is an excerpt from my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It". The first sentence and first page of a book hook readers in. This book starts with an entertaining tale about algorithms and their importance at the beginning of Amazon.com)
</i>
<p />
The old brick building for Amazon headquarters in 1997 was in a grimy part of downtown Seattle across from a needle exchange and Wigland.
<p />
There were only a couple dozen of us, the software engineers. We sat at desks made of unfinished four-by-fours bolted to what should have been a door. Exhausted from work, sometimes we slept on the floor of our tiny offices.
<p />
In my office, from the look of the carpet, somebody had spilled coffee many times. A soft blue glow from a screen showing computer code lit my face. I turned to find Jeff Bezos in my doorway on his hands and knees.
<p />
He touched his forehead down to the filthy floor. Down and up again, his face disappeared and reappeared as he bowed.
<p />
He was chanting: “I am not worthy. I am not worthy.”
<p />
What could cause the founder of Amazon, soon to be one of the world’s richest men, to bow down in gratitude to a 24-year-old computer programmer? An algorithm.
<p />
Algorithms are computer code, often created in the wee hours by some geek in a dingy room reeking of stale coffee. Algorithms can be enormously helpful. And they can be equally harmful. Either way they choose what billions of people see online every day. Algorithms are power.
<p />
What do algorithms do? Imagine you are looking for something good to read. You remember your friend recently read the book Good Omens and liked it. You go to Amazon and search for [good omens]. What happens next?
<p />
Your casually dashed-off query immediately puts thousands of computers to work just to serve you. Search and ranker algorithms run their computer code in milliseconds, then the computers talk to each other about what they found for you. Working together, the computers comb through what are billions of potential options, filtering and sorting among them, to surface what you might be seeking.
<p />
And look at that! The book Good Omens is the second thing Amazon shows you, just below the recent TV series. That TV series looks fun too. Perhaps you’ll watch that later. For now, you click on the book.
<p />
As you look at the Good Omens book, more algorithms spring into action looking for more ways to help you. Perhaps there are similar books you might enjoy? Recommender algorithms follow their instructions, sorting through what millions of other customers found to show you what “customers who liked Good Omens also liked.” Maybe there is something that might be enticing, that gets you to click “buy now”.
<p />
And that’s why Jeff Bezos was on my office floor, laughing and bowing.
<p />
The percentage of Amazon sales that come through recommender algorithms is much higher than what you’d expect. In fact, it’s astounding. For years, about a third of Amazon’s sales came directly through content suggested by Amazon’s recommender algorithms.
<p />
Most of the rest of Amazon’s revenue comes through Amazon’s search and ranking algorithms. In total, nearly all of Amazon’s revenue comes from content suggested, surfaced, found, and recommended by algorithms. At a brick and mortar bookstore, a clerk might help you find the book you are looking for. At “Earth’s Biggest Bookstore”, algorithms find or suggest nearly everything people buy.
<p />
How algorithms are optimized, and what they show to people, is worth billions every year. Even small changes can make enormous differences.
<p />
Jeff Bezos celebrated that day because what algorithms show to people matters. How algorithms are tuned, improved, and optimized matters. It can change a company’s fortunes.
<p />
One of Amazon’s software engineers just found an improvement that made the recommender algorithms much more effective. So there Jeff was, bobbing up and down. Laughing. Celebrating. All because of how important recommender algorithms were to Amazon and its customers.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-86778837168475997172023-12-19T13:18:00.000-08:002023-12-19T13:18:04.278-08:00Book excerpt: Wisdom of the trustworthy<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Wisdom of the crowds is the idea that combining the opinions of a lot of people will often get a very useful result, usually one that is better and more accurate than almost all of the opinions on their own.
<p />
Wisdom of the trustworthy is the same idea, just with the addition of only using the opinions of people that are provably real, independent people.
<p />
Discard all opinions from known shills, spammers, and propagandists. Then also discard all the opinions from accounts that may be shills, spammers, and propagandists, even if you do not know. Only use proven, trustworthy behavior data.
<p />
Wisdom of the trustworthy is a necessary reaction to online anonymity. Accounts are not people. In fact, many people have multiple accounts, often for reasons that have nothing to do with trying to manipulate algorithms. But the ease of creating accounts, and difficulty of verifying that accounts are actual people, means that we should be skeptical of all accounts.
<p />
On an internet where anyone can create and control hundreds or even thousands of accounts, trust should be hard to gain and easy to lose. New accounts should not be considered reliable. Over time, if an account behaves independently, interacts with other accounts in a normal manner, does not shill or otherwise coordinate with others, and doesn’t show robot behaviors such as liking or sharing posts at rates not possible for humans, it may start to be trusted.
<p />
The moment an account engages in anything that resembles coordinated shilling, all trust should be lost, and the account should go back to untrusted. Trust should be hard to gain and easy to lose.
<p />
Wisdom of the trustworthy makes manipulation much more costly, time-consuming, and inefficient for spammers, scammers, propagandists, and other adversaries. No longer would creating a bunch of accounts that are used by wisdom of the crowd algorithms be easy or cheap. Now, it would be a slow, cumbersome process, trying to get the accounts trusted, then having the accounts ignored again the moment they shilled anything.
<p />
Over a decade ago, a paper called “<a href="https://dl.acm.org/doi/10.1145/1571941.1572033">Wisdom of the Few</a>” showed that recommender systems can do as well using only a much smaller number of carefully selected experts as they would using all available data on every user. The insight was that high quality data often outperforms badly noisy data, especially if the badly noisy data is not merely noisy but actually manipulated by adversaries. Less is more, the researchers argued, if less means only using provably good data for recommendations.
<p />
Big data has become a mantra in computer science. The more data, the better, it is thought, spurred on by <a href="https://www.microsoft.com/en-us/research/publication/scaling-to-very-very-large-corpora-for-natural-language-disambiguation/">an early result</a> by Michele Banko and Eric Brill at Microsoft Research that showed that accuracy on a natural language task depended much more on how much training data was used than on which algorithm they used. Results just kept getting better and better the more data they used. As others found similar things in search, recommender systems, and other machine learning tasks, big data became popular.
<p />
But big data cannot mean corrupted data. Data that has random noise is usually not a problem; averaging over large amounts of the data usually eliminates the issue. But data that has been purposely skewed by an adversary with a different agenda is very much a problem.
<p />
In search, web spam has been a problem since the beginning, including widespread manipulation of the PageRank algorithm first invented by the Google founders. PageRank worked by counting the links between web pages as votes. People created links between pages on the Web, and each of these links could be viewed as votes that some person thought that page was interesting. By recursively looking at who linked to who and who linked to that, the idea was wisdom could be found in the crowd of people who created those links.
<p />
It wasn’t long until people started creating lots of links and lots of pages that linked to a page that they wanted amplified by the search engines. This was the beginning of link farms, massive collections of pages that effectively voted that they were super interesting and should be shown high up in the search results.
<p />
The solution to web spam was <a href="https://www.vldb.org/conf/2004/RS15P3.PDF">TrustRank</a>. TrustRank starts with a small number of trusted websites, then starts trusting only sites they link to. Untrusted or unknown websites are largely ignored when ranking. Only the votes from trusted websites count to determine what search results to show to people. A related idea, <a href="http://i.stanford.edu/~kvijay/krishnan-raj-airweb06.pdf">Anti-TrustRank</a>, starts with all the known spammers, shills, and other bad actors, and marks them and everyone they associate with as untrusted.
<p />
E-mail spam had a similar solution. Nowadays, trusted senders can send e-mail, which includes major companies and people you have interacted with in the past. Unknown senders are viewed skeptically, sometimes allowed into your inbox, sometimes not, depending on what they have done in the past and what others seem to think of the e-mails they send, but often will go straight to spam. And untrusted senders, you never see their e-mails at all, as they are blocked or sent straight to spam without any risk of being featured in your inbox.
<p />
The problem on social media is severe. Professor Fil Menczer <a href="https://www.niemanlab.org/2022/05/elon-musk-says-relaxing-content-rules-on-twitter-will-boost-free-speech-but-research-shows-otherwise/">described</a> how “Social media users have in past years become victims of manipulation by astroturf causes, trolling and misinformation. Abuse is facilitated by social bots and coordinated networks that create the appearance of human crowds.” The core problem is bad actors creating a fake crowd. They are pretending to be many people and effectively stuffing the ballot box of algorithmic popularity.
<p />
For wisdom of the crowd algorithms such as rankers and recommenders, only use proven trustworthy behavior data. Big data is useless if the data is manipulated. It should not be possible to use fake and controlled accounts to start propaganda trending and picked up by rankers and recommenders.
<p />
To avoid manipulation, any behavior data that may involve coordination should be discarded and not used by the algorithms. Discard all unknown or known bad data. Keep only known good data. Shilling will kill the credibility and usefulness of a recommender.
<p />
It should be wisdom of independent reliable people. Do not try to find wisdom in big corrupted crowds full of shills.
<p />
There is no free speech issue with only considering authentic data when computing algorithmic amplification. CNN analyst and Yale lecturer Asha Rangappa <a href="https://cafe.com/notes-from-contributors/note-from-asha-what-elon-musk-gets-wrong-about-the-marketplace-of-ideas/">described</a> it well: “Social media platforms like Twitter are nothing like a real public square ... in the real public square ... all of the participants can only represent one individual – themselves. They can’t duplicate themselves, or create a fake audience for the speaker to make the speech seem more popular than it really is.” However, on Twitter, “the prevalence of bots, combined with the amplification features on the platform, can artificially inflate the ‘value’ of an idea ... Unfortunately, this means that mis- and disinformation occupy the largest ‘market share’ on the platform.”
<p />
Professor David Kirsch echoed this concern that those who create fake crowds are able to manipulate people and algorithms on social media. Referring to the power of fake crowds to amplify, Kirsch <a href="https://www.latimes.com/business/story/2022-04-12/musk-is-off-the-twitter-board-of-directors-the-tesla-twitter-bot-army-marches-on">said</a>, “It matters who stands in the public square and has a big megaphone they’re holding, [it’s] the juice they’re able to amplify their statements with.”
<p />
Unfortunately, on the internet, amplified disinformation is particularly effective. For example, bad actors can use a few dozen very active controlled accounts to create the appearance of unified opinion in a discussion forum, shouting down anyone who disagrees and controlling the conversation. Combined with well-timed likes and shares from multiple controlled accounts, they can overwhelm organic activity.
<p />
Spammers can make their own irrelevant content trend. These bad actors create manufactured popularity and consensus with all their fake accounts.
<p />
Rangappa <a href="https://cafe.com/notes-from-contributors/note-from-asha-what-elon-musk-gets-wrong-about-the-marketplace-of-ideas">suggested</a> a solution based on a similar idea to prohibiting collusion in the free market: “In the securities market, for example, we prohibit insider trading and some forms of coordinated activity because we believe that the true value of a company can only be reflected if its investors are competing on a relatively level playing field. Similarly, to approximate a real marketplace of ideas, Twitter has to ensure that ideas can compete fairly, and that their popularity represents their true value.”
<p />
In the Facebook Papers published in The Atlantic, Adrienne LaFrance talked about the problem at Facebook, <a href="https://www.theatlantic.com/ideas/archive/2021/10/facebook-papers-democracy-election-zuckerberg/620478/">saying</a> the company “knows that repeat offenders are disproportionately responsible for spreading misinformation ... It could tweak its algorithm to prevent widespread distribution of harmful content ... It could also automatically throttle groups when they’re growing too fast, and cap the rate of virality for content that’s spreading too quickly ... Facebook could shift the burden of proof toward people and communities to demonstrate that they’re good actors — and treat reach as a privilege, not a right ... It could do all of these things.”
<p />
Former Facebook data scientist Jeff Allen similarly <a href="https://integrityinstitute.org/widely-viewed-content-analysis-tracking-dashboard">proposed</a>, “Facebook could define anonymous and unoriginal content as ‘low quality’, build a system to evaluate content quality, and incorporate those quality scores into their final ranking ‘value model’.” Allen went on to add in ideas similar to TrustRank, saying that accounts that produce high quality content should be trusted, accounts that spam and shill should be untrusted, and then trust could be part of ranking.
<p />
Allen was concerned about the current state of Facebook, and warned of the long-term retention and growth problems Facebook and Twitter later experienced: “The top performing content is dominated by spammy, anonymous, and unoriginal content ... The platform is easily exploited. And while the platform is vulnerable, we should expect exploitative actors to be heavily [exploiting] it.”
<p />
Bad actors running rampant is not inevitable. As EFF and Stack Overflow board member Anil Dash <a href="https://twitter.com/anildash/status/1525162747704524801">said</a>, fake accounts and shilling is “endemic to networks that are thoughtless about amplification and incentives. Intentionally designed platforms have these issues, but at a manageable scale.”
<p />
Just as web spam and email spam were reduced to almost nothing by carefully considering how to make them less effective, and just as many community sites like Stack Overflow and Medium are able to counter spam and hate, Facebook and other social media websites can too.
<p />
When algorithms are manipulated everyone but the spammers lose. Users lose because the quality of the content is worse, with shilled scams and misinformation appearing above content that is actually popular and interesting. The business loses because its users are less satisfied, eventually causing retention problems and hurting long-term revenue.
<p />
The idea of only using trustworthy accounts in wisdom of the crowd algorithms has already been proven to work. Similar ideas are widely used already for reducing web and email spam to nuisance levels. Wisdom of the trustworthy should be used wherever and whenever there are problems with manipulation of wisdom of the crowd algorithms.
<p />
Trust should not be easy to get. New accounts are easy for bad actors to create, so should be viewed with skepticism. Unknown or untrusted accounts should have their content downranked, and their actions should be mostly ignored by ranker and recommender algorithms. If social media companies did this, then shilling, spamming, and propaganda by bad actors would pay off far less often, making them too costly for many efforts to continue.
<p />
In the short-term, with the wrong metrics, it looks great to allow bots, fake accounts, fake crowds, and shilling. Engagement numbers go up, and you see many new accounts. But it’s not real. These aren’t real people who use your product, helpfully interact with other people, and buy things from your advertisers. Allowing untrustworthy accounts and fake crowds hurts customers, advertisers, and the business in the long-term.
<p />
Only trustworthy accounts should be amplified by algorithms. And trust should be hard to get and easy to lose.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-59417774442881073282023-12-13T12:35:00.000-08:002023-12-13T12:35:32.034-08:00Extended book excerpt: Computational propaganda
<i>(This is a long excerpt about manipulation of algorithms by adversaries from my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Inauthentic activity is designed to manipulate social media. It exists because there is a strong incentive to manipulate wisdom of the crowd algorithms. If someone can get recommended by algorithms, they can get a lot of free attention because they now will be the first thing many people see.
<p />
For adversaries, a successful manipulation is like a free advertisement, seen by thousands or even millions. On Facebook, Twitter, YouTube, Amazon, Google, and most other sites on the internet, adversaries have a very strong incentive to manipulate these company’s algorithms.
<p />
For some governments, political parties, and organizations, the incentive to manipulate goes beyond merely shilling some content for the equivalent of free advertising. These adversaries engage thousands of controlled accounts over long periods of time in disinformation campaigns.
<p />
The goal is to promote a point of view, shut down those promoting other points of view, obfuscate unfavorable news and facts, and sometimes even create whole other realities that millions of people believe are true.
<p />
These efforts by major adversaries are known as “computational propaganda.” Computational propaganda unites many terms — “information operations,” “information warfare,” “influence operations,” “online astroturfing,” “cybertufing,” “disinformation campaigns,” and many others — and is <a href="https://www.pbs.org/wgbh/frontline/interview/sam-woolley/">defined</a> as “the use of automation and algorithms in the manipulation of public opinion.”
<p />
More simply, computational propaganda is an attempt to give “the illusion of popularity” by using a lot of fake accounts and fake followers and make something look far more popular than it actually is. It creates “manufactured consensus,” the appearance that many people think something is interesting, true, and important when, in fact, it is not.
<p />
It is propaganda by stuffing the ballot box. The trending algorithm on Twitter and the recommendation engine on Facebook look at what people are sharing, liking, and commenting on as votes, votes for what is interesting and important. But “fringe groups that were five or 10 people could make it look like they were 10 or 20,000 people,” <a href="https://www.pbs.org/wgbh/frontline/documentary/facebook-dilemma/">reported</a> PBS’ The Facebook Dilemma. “A lot of people sort of laughed about how easy it was for them to manipulate social media.” They run many “accounts on Facebook at any given time and use them to manipulate people.”
<p />
This is bad enough when it is done for profit, to amplify a scam or just to try to sell more of some product. But when governments get involved, especially autocratic governments, reality itself can start to warp under sustained efforts to confuse what is real. “It’s anti-information,” <a href="https://cafe.com/now-and-then/disinformation-and-democracy/">said</a> historian Heather Cox Richardson. Democracies rely on a common understanding of facts, of what is true, to function. If you can get even a few people to believe something that is not true, it changes how people vote, and can even “alter democracy.”
<p />
The scale of computational propaganda is what makes it so dangerous. Large organizations and state-sponsored actors are able to sustain thousands of controlled accounts pounding out the same message over long periods of time. They can watch how many real people react to what they do, learn what is working and what is failing to gain traction, and then adapt, increasing the most successful propaganda.
<p />
The scale is what creates computational propaganda from misinformation and disinformation. Stanford Internet Observatory’s Renée DiResta <a href="https://yalereview.org/article/computational-propaganda">provided</a> an excellent explanation in The Yale Review: “Misinformation and disinformation are both, at their core, misleading or inaccurate information; what separates them is intent. Misinformation is the inadvertent sharing of false information; the sharer didn’t intend to mislead people and genuinely believed the story. Disinformation, by contrast, is the deliberate creation and sharing of information known to be false. It’s a malign narrative that is spread deliberately, with the explicit aim of causing confusion or leading the recipient to believe a lie. Computational propaganda is a suite of tools or tactics used in modern disinformation campaigns that take place online. These include automated social media accounts that spread the message and the algorithmic gaming of social media platforms to disseminate it. These tools facilitate the disinformation campaign’s ultimate goal — media manipulation that pushes the false information into mass awareness.”
<p />
The goal of computational propaganda is to bend reality, to make millions believe something that is not true is true. DiResta warned: “As Lenin purportedly put it, ‘A lie told often enough becomes the truth.’ In the era of computational propaganda, we can update that aphorism: ‘If you make it trend, you make it true’”
<p />
In recent years, Russia was particularly effective at computational propaganda. Adversaries created fake media organizations that looked real, created fake accounts with profiles and personas that looked real, and developed groups and communities to the point they had hundreds of thousands of followers. Russia was “building influence over a period of years and using it to manipulate and exploit existing political and societal divisions,” DiResta <a href="https://www.nytimes.com/2018/12/17/opinion/russia-report-disinformation.html">wrote</a> in the New York Times.
<p />
The scale of this effort was remarkable. “About 400,000 bots [were] engaged in the political discussion about the [US] Presidential election, responsible for roughly 3.8 million tweets, about one-fifth of the entire conversation,” <a href="https://firstmonday.org/ojs/index.php/fm/article/view/7090">said</a> USC researchers.
<p />
Only later was the damage at all understood. In the book <i>Zucked</i>, Roger McNamee summarized the findings: “Facebook disclosed that 126 million users had been exposed to Russian interference, as well as 20 million users on Instagram ... The user number represents more than one-third of the US population, but that grossly understates its impact. The Russians did not reach a random set of 126 million people on Facebook. Their efforts were highly targeted. On the one hand, they had targeted people likely to vote for Trump with motivating messages. On the other, they identified subpopulations of likely Democratic voters who might be discouraged from voting ... In an election where only 137 million people voted, a campaign that targeted 126 million eligible voters almost certainly had an impact.”
<p />
These efforts were highly targeted, trying to pick out parts of the US electorate that might be susceptible to their propaganda. The adversaries worked over a long period of time, adapting as they discovered what was getting traction.
<p />By late 2019, as <a href="https://www.technologyreview.com/2021/09/16/1035851/facebook-troll-farms-report-us-2020-election/">reported</a> by MIT Technology Review, “all 15 of the top pages targeting Christian Americans, 10 of the top 15 Facebook pages targeting Black Americans, and four of the top 12 Facebook pages targeting Native Americans were being run by …. Eastern European troll farms.”
<p />
These pages “reached 140 million US users monthly.” They achieved this extraordinary reach not by people seeking them out on their own, but by manipulating Facebook’s “engagement-hungry algorithm.” These groups were so large and so popular because “Facebook’s content recommendation system had pushed [them] into their news feeds.” Facebook’s optimization process for their algorithms was giving these inauthentic actors massive reach for their propaganda.
<p />
As Facebook data scientists warned inside of the company, “Instead of users choosing to receive content from these actors, [Facebook] is choosing to give them an enormous reach.” Real news, trustworthy information from reliable sources, took a back seat to this content. Facebook was amplifying these troll farms. The computational propaganda worked.
<p />
The computational propaganda was not limited to Facebook. The efforts spanned many platforms, trying the same tricks everywhere, looking for flaws to exploit and ways to extend their reach. The New York Times <a href="https://www.nytimes.com/2018/12/17/us/politics/russia-2016-influence-campaign.html">reported</a> that the Russian “Internet Research Agency spread its messages not only via Facebook, Instagram and Twitter ... but also on YouTube, Reddit, Tumblr, Pinterest, Vine and Google+” and others. Wherever they were most successful, they would do more. They went wherever it was easiest and most efficient to spread their false message to a mass audience.
<p />
It is tempting to question how so many people could fall for this manipulation. How could over a hundred million Americans, and hundreds of millions of people around the world, see propaganda and believe it?
<p />
But this propaganda did not obviously look like Russian propaganda. The adversaries would impersonate Americans using fake accounts with descriptions that appeared to be authentic on casual inspection. Most people would have no idea they were reading a post or joining a Facebook Group that was created by a troll farm.
<p />
Instead “they would be attracted to an idea — whether it was guns or immigration or whatever — and once in the Group, they would be exposed to a steady flow of posts designed to provoke outrage or fear,” said Roger McNamee in <i>Zucked</i>. “For those who engaged frequently with the Group, the effect would be to make beliefs more rigid and more extreme. The Group would create a filter bubble, where the troll, the bots, and the other members would coalesce around an idea floated by the troll.”
<p />
The propaganda was carefully constructed, using amusing memes and emotion-laden posts to lure people in, then using manufactured consensus through multiple controlled accounts to direct and control what people saw afterwards.
<p />
Directing and controlling discussions only requires a small number of accounts if well-timed and coordinated. Most people reading a group are passive. Most people are not actively posting to the group. And far more people read than like, comment, or reshare.
<p />
Especially if adversaries do the timing well to get the first few comments and likes, then “as few as 1 to 2 percent of a group can steer the conversation if they are well- coordinated. That means a human troll with a small army of digital bots— software robots— can control a large, emotionally engaged Group.” If any real people start to argue or point out that something is not true, they can be drowned out by the controlled accounts simultaneously slamming them in the comments, creating an illusion of consensus and keeping the filter bubble intact.
<p />
This spanned the internet, on every platform and across seemingly-legitimate websites. Adversaries tried many things to see what worked. When something gained traction, they would “post the story simultaneously on an army of Twitter accounts” along with their controlled accounts saying, “read the story that the mainstream media doesn’t want you to know about.” If any real journalist eventually wrote about the story, “The army of Twitter accounts— which includes a huge number of bots— tweets and retweets the legitimate story, amplifying the signal dramatically. Once a story is trending, other news outlets are almost certain to pick it up.”
<p />
In the most successful cases, what starts as propaganda becomes misinformation, with actual American citizens unwittingly echoing Russian propaganda, now mistakenly believing a constructed reality was actually real.
<p />
By no means was this limited to only within the United States or only by Russians. Many large scale adversaries, including governments, political campaigns, multinational corporations, and organizations, are engaging in computational propaganda. What they have in common is using thousands of fake, hacked, controlled, or paid accounts to rapidly create messages on social media and the internet. They create manufactured consensus around their message and flood confusion around what is real and what is not. They have been seen “distorting political discourse, including in Albania, Mexico, Argentina, Italy, the Philippines, Afghanistan, South Korea, Bolivia, Ecuador, Iraq, Tunisia, Turkey, Taiwan, Paraguay, El Salvador, India, the Dominican Republic, Indonesia, Ukraine, Poland and Mongolia,” <a href="https://www.theguardian.com/technology/2021/apr/12/facebook-loophole-state-backed-manipulation">wrote</a> the Guardian.
<p />
Computational propaganda is everywhere in the world. It “has become a regular tool of statecraft,” <a href="https://www.lawfaremedia.org/article/brief-history-online-influence-operations">said</a> Princeton Professor Jacob Shapiro, “with at least 51 different countries targeted by government-led online influence efforts” in the last decade.
<p />
An example in India is instructive. In the 2019 general election in India, adversaries used “hundreds of WhatsApp groups,” fake accounts, hacked and hijacked accounts, and “Tek Fog, a highly sophisticated app” to centrally control activity on social media. In a published paper, researchers <a href="https://arxiv.org/abs/2104.13259">wrote</a> that adversaries “were highly effective at producing lasting Twitter trends with a relatively small number of participants.” This computational propaganda amplified “right-wing propaganda … making extremist narratives and political campaigns appear more popular than they actually are.” They were remarkably effective: “A group of public and private actors working together to subvert public discourse in the world’s largest democracy by driving inauthentic trends and hijacking conversations across almost all major social media platforms.”
<p />
Another recent example was in Canada, the so-called “Siege of Ottawa.” In the Guardian, Arwa Mahdawi <a href="https://www.theguardian.com/commentisfree/2022/feb/08/ottawa-truckers-protest-anti-vaxx-canada">wrote</a> about how it came about: “It’s an astroturfed movement – one that creates an impression of widespread grassroots support where little exists – funded by a global network of highly organised far-right groups and amplified by Facebook ... Thanks to the wonders of modern technology, fringe groups can have an outsize influence ... [using] troll farms: organised groups that weaponise social media to spread misinformation.”
<p />
Computational propaganda “threatens democracies worldwide.” It has been “weaponized around the world,” said MIT Professor Sinan Aral in the book <i>The Hype Machine</i>. In the 2018 general elections in Sweden, a third of politics-related hashtagged tweets “were from fake news sources.” In the 2018 national elections in Brazil, “56 percent of the fifty most widely shared images on [popular WhatsApp] chat groups were misleading, and only 8 percent were fully truthful.” In the 2019 elections in India, “64 percent of Indians encountered fake news online.” In the Philippines, there was a massive propaganda effort against Maria Ressa, a journalist “working to expose corruption and a Time Person of the Year in 2018.” Every democracy around the world is seeing adversaries using computational propaganda.
<p />
The scale is what makes computational propaganda so concerning. The actors behind computational propaganda are often well-funded with considerable resources to bring to bear to achieve their aims.
<p />
Remarkably, there is now enough money involved that there are private companies “offering disinformation-for-hire services.” Computational propaganda “has become more professionalised and is now produced on an industrial scale.” It is everywhere in the world. “In 61 countries, we found evidence of political parties or politicians running for office who have used the tools and techniques of computational propaganda,” <a href="https://demtech.oii.ox.ac.uk/research/posts/industrialized-disinformation/">said</a> researchers at University of Oxford. The way they work is always the same. “Automated accounts are often used to amplify certain narratives while drowning out others ... in order to game the automated systems social media companies use.” It is spreading propaganda using manufactured consensus at industrial scale.
<p />
Also concerning is that computational propaganda can target the just most vulnerable and the most susceptible and still achieve its aims. In a democracy, the difference between winning an election and losing is often just a few percentage points.
<p />
To change the results of an election, you don’t have to influence everyone. The target of computational propaganda is usually “only 10-20% of the population.” Swaying even a fraction of this audience by convincing them to vote in a particular way or discouraging them from voting at all “can have a resounding impact,” shifting all the close elections favorably, and leading to control of a closely-contested government.
<p />
To address the worldwide problem of computational propaganda, it is important to understand why it works. Part of why computational propaganda works is the story of why propaganda has worked throughout history. Computational propaganda floods what people see with a particular message, creating an illusion of consensus while repeating the same false message over and over again.
<p />
This feeds the common belief fallacy, even if the number of controlled accounts is relatively small, by creating the appearance that everyone believes this false message to be true. It creates a firehose of falsehood, flooding people with the false message, creating confusion about what is true or not, and drowning out all other messages. And the constant repetition, seeing the message over and over, fools our minds using the illusionary truth effect, which tends to make us believe things we have seen many times before, “even if the idea isn’t plausible and even if [we] know better.”
<p />
As Wharton Professor Ethan Mollick <a href="https://twitter.com/emollick/status/1502397996360740868">wrote</a>, “The Illusionary Truth Effect supercharges propaganda on social media. If you see something repeated enough times, it seems more true.” Professor Mollick went on to say that studies found it works on the vast majority of people even when the information isn’t plausible and merely five repetitions were enough to start to make false statements seem true.
<p />
The other part of why computational propaganda works is algorithmic amplification by social media algorithms. Wisdom of the crowd algorithms, which are used in search, trending, and recommendations, work by counting votes. They look for what is popular, or what seems to be interesting to people like you, by looking at what people seemed to have enjoyed in the recent past.
<p />
When the algorithms look for what people are enjoying, these algorithms assume each person is a real person and each person is acting independently. When adversaries create many fake accounts or coordinate between many controlled accounts, they are effectively voting many times, fooling the algorithms with an illusion of consensus.
<p />
What the algorithm thought was popular and interesting turns out to be shilled. The social media post is not really popular or interesting, but the computational propaganda effort made it look to the algorithm that it is. And so the algorithm amplifies the propaganda, inappropriately showing it to many more people, and making the problem far worse.
<p />
Both people using social media and the algorithms picking what people see on social media are falling victim to the same technique, manufactured consensus, the propagandist creating “illusory notions of ... popularity because of this same automated inflation of the numbers.” It is adversaries using bots and coordinated accounts to mimic real users.
<p />
“They can drive up the number of likes, re-messages, or comments associated with a person or idea,” <a href="https://www.cambridge.org/core/books/social-media-and-democracy/bots-and-computational-propaganda-automation-for-communication-and-control/A15EE25C278B442EF00199AA660BFADD">wrote</a> the authors of <i>Social Media and Democracy</i>. “Researchers have catalogued political bot use in massively bolstering the social media metrics.”
<p />
The fact that they are only mimicking real users is important to addressing the problem. They are not real users, and they don’t behave like real users.
<p />
For example, when the QAnon conspiracy theory was growing rapidly on Facebook, it <a href="https://www.vice.com/en/article/wx5v4y/facebooks-algorithm-spread-qanon-content-to-new-users">grew</a> using “minimally connected bulk group invites. One member sent over 377,000 group invites in less than 5 months.” There were very few people responsible. According to reporter David Gilbert, there are “a relatively few number of actors creating a large percentage of the content.” He said a “small group of users has been able to hijack the platform.”
<p />
To shill and coordinate between many accounts pushing propaganda, adversaries have to behave in ways that are not human. Bots and other accounts that are controlled by just a few people all “pounce on fake news in the first few seconds after it’s published, and they retweet it broadly.” The initial spreaders of the propaganda “are much more likely to be bots than humans” and often will be the same accounts, superspreaders of propaganda, acting over and over again.
<p />
Former Facebook data scientist Sophie Zhang talked about this in a Facebook internal memo, <a href="https://www.buzzfeednews.com/article/craigsilverman/facebook-ignore-political-manipulation-whistleblower-memo">reported</a> by BuzzFeed: “thousands of inauthentic assets ... coordinated manipulation ... network[s] of more than a thousand actors working to influence ... The truth was, we simply didn’t care enough to stop them.” Despairing about the impact of computational propaganda on people around the world, Zhang went on to lament, “I have blood on my hands.”
<p />
Why do countries, and especially authoritarian regimes, create and promote propaganda? Why do they bother?
<p />
The authors of the book <i>Spin Dictators</i> write that, in recent years, because of globalization, post-industrial development, and technology changes, authoritarian regimes have “become less bellicose and more focused on subtle manipulation. They seek to influence global opinion, while co-opting and corrupting Western elites.”
<p />
Much of this is simply that it in recent decades has become cheaper and more effective to maintain power through manipulation and propaganda, in part due to lower costs on communication such as disinformation campaigns on social media, in part due to the economic benefits of openness that raise the costs of use of violence.
<p />
“Rather than intimidating citizens into submission, they use deception to win the people over.” Nowadays, propaganda is easier and cheaper. “Their first line of defense, when the truth is against them, is to distort it. They manipulate information ... When the facts are good, they take credit for them; when bad, they have the media obscure them when possible and provide excuses when not. Poor performance is the fault of external conditions or enemies ... When this works, spin dictators are loved rather than feared.”
<p />
Nowadays, it is cheaper to become loved than feared. “Spin dictators manipulate information to boost their popularity with the general public and use that popularity to consolidate political control, all while pretending to be democratic.”
<p />
While not all manipulation of wisdom of the crowd algorithms is state actors, adversarial states are a big problem: “The Internet allows for low-cost, selective censorship that filters information flows to different groups.” Propaganda online is cheap. “Social networks can be hijacked to disseminate sophisticated propaganda, with pitches tailored to specific audiences and the source concealed to increase credibility. Spin dictators can mobilize trolls and hackers ... a sophisticated and constantly evolving tool kit of online tactics.”
<p />
Unfortunately, internet “companies are vulnerable to losing lucrative markets,” so they are not always quick to act when they discover countries manipulating their rankers and recommender algorithms; authoritarian governments often play to this fear by threatening retaliation or loss of future business in the country.
<p />
Because “the algorithms that decide what goes viral” are vulnerable to shilling, it is also easy for “spin dictators use propaganda to spread cynicism and division.” And “if Western publics doubt democracy and distrust their leaders, those leaders will be less apt to launch democratic crusades around the globe.” Moreover, they can spread the message that “U.S.-style democracy leads to polarization and conflict” and corruption. This reduces the threats to an authoritarian leader and reinforces their own popularity.
<p />
Because the manipulation is all adversaries trying to increase their visibility, downranking or removing accounts involved in computational propaganda has little business risk. New accounts and any account involved in shilling, coordination, or propaganda could largely be ignored for the purpose of algorithmic amplification, and repeat offenders could be banned entirely.
<p />
Computational propaganda exists because it is cost effective to do at large scale. Increasing the cost of propaganda reaching millions of people may be enough to vastly reduce its impact. As Sinan Aral writes in the book <i>The Hype Machine</i>, “We need to cut off the financial returns to spreading misinformation and reduce the economic incentive to create it in the first place.”
<p />
While human susceptibility to propaganda is difficult to solve, on the internet today, a big part of the problem of computational propaganda comes down to how easy it is for adversaries to manipulate wisdom of the crowd algorithms and see their propaganda cheaply and efficiently amplified by algorithms.
<p />
Will Oremus <a href="https://www.washingtonpost.com/technology/2021/07/21/facebook-youtube-vaccine-misinfo/">blamed</a> recommendation and other algorithms in the Washington Post making it far too easy for the bad guys. “The problem of misinformation on social media has less to do with what gets said by users than what gets amplified — that is, shown widely to others — by platforms’ recommendation software,” he said. Raising the costs to manipulating the recommendation engine is key to reducing the effectiveness of computational propaganda.
<p />
Wisdom of the crowds depends on the crowd consisting of independent voices voting independently. When that assumption is violated, adversaries can force the algorithms to recommend whatever they want. Computational propaganda uses a combination of bots and many controlled accounts, along with so-called “useful idiot” shills, to efficiently and effectively manipulate trending, ranker, and recommender algorithms.
<p />
Allowing their platforms to be manipulated by computational propaganda makes the experience on the internet worse. University of Oxford researchers <a href="https://demtech.oii.ox.ac.uk/research/posts/global-fears-of-disinformation-perceived-internet-and-social-media-harms-in-142-countries/">found</a> that “globally, disinformation is the single most important fear of internet and social media use and more than half (53%) of regular internet users are concerned about disinformation [and] almost three quarters (71%) of internet users are worried about a mixture of threats, including online disinformation, fraud and harassment.” At least in the long-term, it is in everyone’s interest to reduce computational propaganda.
<p />
When adversaries have their bots and coordinated accounts like, share, and post, none of that is authentic activity. None of that shows that people actually like the content. None of that content is actually popular nor interesting. It is all manipulation of the algorithms and only serves to make relevance and the experience worse.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-87391453943245733482023-12-10T08:51:00.000-08:002023-12-10T08:51:51.880-08:00Book excerpt: The rise and fall of wisdom of the crowds<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Wisdom of the crowds is the epiphany that combining many people's opinions is useful, often more useful than expert opinions.
<p />
Customer reviews summarize what thousands of people think of movies, books, and everything else you might want to buy. Customer reviews can be really useful for knowing if you want to buy something you've never tried before.
<p />
When you search the internet, what thousands of people clicked on before you helps determine what you see. Most of the websites on the internet are useless or scams; wisdom of the crowd filters all that out and helps you find what you need.
<p />
When you read the news online, you see news first that other people think is interesting. What people click determines what information you see about what's going on in the world.
<p />
Algorithms on the internet take the wisdom of the crowds to a gargantuan scale. Algorithms process all the data, summarizing it all down, until you get millions of people helping millions of people find what they need.
<p/>
It sounds great, right? And it is. But once you use wisdom of the crowds, scammers come in. They see dollar signs in fooling those algorithms. Scammers profit from faking crowds.
<p />
When manipulated, wisdom of the crowds can promote scams, misinformation, and propaganda. Spammers clog up search engines until we can't see anything but scams. Online retailers are filled with bogus positive customer reviews of counterfeit and fraudulent items. The bad guys astroturf everything using fake crowds. Foreign operatives are able to flood the zone on social media with propaganda using thousands of fake accounts.
<p />
What we need is an internet that works for us. We need an internet that is useful and helpful, where we can find what we need without distractions and scams. Wisdom of the crowds and the algorithms that use wisdom of the crowds are the key to getting us there. But wisdom of the crowds can fail.
<p />
It's tricky to get right. Good intentions can lead to destructive outcomes. When executives tell their teams to optimize for clicks, they discover far too late that going down that path optimizes for scams and hate. When teams use big data, they're trying to make their algorithms work better, but they often end up sweeping up manipulated data that skews their results toward crap. Understanding why wisdom of the crowds fails and how to fix it is the key to getting us the internet we want.
<p />
The internet has come a long way. In the mid-1990s, it was just a few computer geeks. Nowadays, everyone in the world is online. There have been hard lessons learned along the way. These are the stories of unintended consequences.
<p />
Good intentioned efforts to tell teams to increase engagement caused misinformation and spam. Experimentation and A/B testing helped some teams help customers, but also accidentally sent some teams down dark paths of harming customers. Attempts to improve algorthms easily can go terribly wrong.
<p />
The internet has grown massively. During all of that growth, many internet companies struggled with figuring out how to make a real business. At first, Google had no revenue and no idea how to make money off web search. At first, Amazon had no profits and it was unclear if it ever would.
<p />
Almost always, people at tech companies had good intentions. We were scrambling to build the right thing. What we ended up building was not always the right thing. The surprising reason for this failure is what gets built depends not so much on the technology but the incentives people have.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-19236921742780747062023-12-08T10:29:00.000-08:002023-12-08T10:29:18.193-08:00Book excerpt: Manipulating likes, comments, shares, and follows<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
“The systems are phenomenally easy to game,” <a href="https://www.ribbonfarm.com/2017/05/23/there-are-bots-look-around/">explained</a> Stanford Internet Observatory’s Renee DiResta.
<p />
The fundamental idea behind the algorithms used by social media is that “popular content, as defined by the crowd” should rise to the top. But “the crowd doesn’t have to be real people.”
<p />
In fact, adversaries can get these algorithms to feature whatever content they want. The process is easy and cheap, just pretend to be many people: “Bots and sockpuppets can be used to manipulate conversations, or to create the illusion of a mass groundswell of grassroots activity, with minimal effort.”
<p />
Whatever they want — whether it is propaganda, scams, or just flooding-the-zone with disparate and conflicting misinformation — can appear to be popular, which trending, ranker, and recommender algorithms will then dutifully amplify.
<p />
“The content need not be true or accurate,” DiResta notes. All this requires is a well-motivated small group of individuals pretending to be many people. “Disinformation-campaign material is spread via mass coordinated action, supplemented by bot networks and sockpuppets (fake people).”
<p />
Bad actors can amplify propaganda on a massive scale, reaching millions, cheaply and easily, from anywhere in the world. “Anyone who can gather enough momentum from sharing, likes, retweets, and other message-amplification features can spread a message across the platforms’ large standing audiences for free,” DiResta <a href="https://yalereview.org/article/computational-propaganda">continued</a> in an article for Yale Review titled "Computational Propaganda": “Leveraging automated accounts or fake personas to spread a message and start it trending creates the illusion that large numbers of people feel a certain way about a topic. This is sometimes called ‘manufactured consensus’.”
<p />
Another name for it is astroturf. Astroturf is feigning popularity by using a fake crowd of shills. It's not authentic. Astroturf creates the illusion of popularity.
<p />
There are even businesses set up to provide the necessary shilling, hordes of fake people on social media available on demand to like, share, and promote whatever you may want. As described by Sarah Frier in the book <i>No Filter</i>: “If you searched [get Instagram followers] on Google, dozens of small faceless firms offered to make fame and riches more accessible, for a fee. For a few hundred dollars, you could buy thousands of followers, and even dictate exactly what these accounts were supposed to say in your comments.”
<p />
Sarah Frier described the process in more detail. “The spammers ... got shrewder, working to make their robots look more human, and in some cases paying networks of actual humans to like and comment for clients.” They found “dozens of firms” offering these services of “following and commenting” to make content falsely appear to be popular and thereby get free amplification by the platforms wisdom of the crowd algorithms. “It was quite easy to make more seemingly real people.”
<p />
In addition to creating fake people by the thousands, it is easy to find real people who are willing to be paid to shill, some of which would even “hand over the password credentials” for their account, allowing the propagandists to use their account to shill whenever they wished. For example, there were sites where bad actors could “purchase followers and increase engagement, like Kicksta, Instazood, and AiGrow. Many are still running today.” And in discussion groups, it was easy to recruit people who, for some compensation, “would quickly like and comment on the content.”
<p />
Bad actors manipulate likes, comments, shares, and follows because it works. When wisdom of the crowd algorithms look for what is popular, they pick up all these manipulated likes and shares, thinking they are real people acting independently. When the algorithms feature manipulated content, bad actors get what is effectively free advertising, the coveted top spots on the page, seen by millions of real people. This visibility, this amplification, can be used for many purposes, including foreign state-sponsored propaganda or scams trying to swindle.
<p />
Professor Fil Menczer studies misinformation and disinformation on social media. In our interview, he pointed out that it is not just wisdom of the crowd algorithms that fixate on popularity, but a “cognitive/social” vulnerability that “we tend to pay attention to items that appear popular … because we use the attention of other people as a signal of importance.”
<p />
Menczer explained: “It’s an instinct that has evolved for good reason: if we see everyone running we should run as well, even if we do not know why.” Generally, it does often work to look at what other people are doing. “We believe the crowd is wise, because we intrinsically assume the individuals in the crowd act independently, so that the probability of everyone being wrong is very low.”
<p />
But this is subject to manipulation, especially online on social media “because one entity can create the appearance of many people paying attention to some item by having inauthentic/coordinated accounts share that item.” That is, if a few people can pretend to be many people, they can create the appearance of a popular trend, and fool our instinct to follow the crowd.
<p />
To make matters worse, there often can be a vicious cycle where some people are manipulated by bad actors, and then their attention, their likes and shares, is “further amplified by algorithms.” Often, it is enough to merely start some shilled content trending, because “news feed ranking algorithms use popularity/engagement signals to determine what is interesting/engaging and then promote this content by ranking it higher on people’s feeds.”
<p />
Adversaries manipulating the algorithms can be clever and patient, sometimes building up their controlled accounts over a long period of time. One low cost method of making a fake account look real and useful is to steal viral content and share it as your own.
<p />
In an article titled “<a href="https://www.nytimes.com/2021/12/01/technology/misinformation-cute-cats-online.html">Those Cute Cats Online? They Help Spread Misinformation</a>,” New York Times reporters described one method of how new accounts manage to quickly gain large numbers of followers. The technique involves reposting popular content, such as memes that previously went viral, or cute pictures of animals: “Sometimes, following a feed of cute animals on Facebook unknowingly signs [people] up” for misinformation. “Engagement bait helped misinformation actors generate clicks on their pages, which can make them more prominent in users’ feeds in the future.”
<p />
Controlling many seemingly real accounts, especially accounts that have real people following them to see memes and cute pictures of animals, allows bad actors to “act in a coordinated fashion to increase influence.” The goal, according to researchers at Indiana University, is to create a network of controlled shills, many of which might be unwitting human participants, that are “highly coordinated, persistent, homogeneous, and fully focused on amplifying” scams and propaganda.
<p />
This is not costless for social media companies. Not only are people directly misled, and even sometimes pulled into conspiracy theories and scams, but amplifying manipulated content including propaganda rather than genuinely popular content will “negatively affect the online experience of ordinary social media users” and “lower the overall quality of information” on the website. Degradation of the quality of the experience can be hard for companies to see, only eventually showing up in poor retention and user growth when customers get fed up and leave in disgust.
<p />
Allowing fake accounts, manipulation of likes and shares, and shilling of scams and propaganda may hurt the business in the long-term, but, in the short-term, it can mean advertising revenue. As Karen Hao <a href="https://www.technologyreview.com/2021/11/20/1039076/facebook-google-disinformation-clickbait/">reported</a> in MIT Technology Review, “Facebook isn’t just amplifying misinformation. The company is also funding it.” While some adversaries manipulate wisdom of the crowd algorithms in order to push propaganda, some bad actors are in it for the money.
<p />
Social media companies allowing this type of manipulation does generate revenue, but it also reduces the quality of the experience, filling the site with unoriginal content, republished memes, and scams. Hao detailed how it works: “Financially motivated spammers are agnostic about the content they publish. They go wherever the clicks and money are, letting Facebook’s news feed algorithm dictate which topics they’ll cover next ... On an average day, a financially motivated clickbait site might be populated with ... predominantly plagiarized ... celebrity news, cute animals, or highly emotional stories—all reliable drivers of traffic. Then, when political turmoil strikes, they drift toward hyperpartisan news, misinformation, and outrage bait because it gets more engagement ... For clickbait farms, getting into the monetization programs is the first step, but how much they cash in depends on how far Facebook’s content-recommendation systems boost their articles.”
<p />
The problem is that this works. Adversaries have a strong incentive to manipulate social media’s algorithms if it is easy and profitable.
<p />
But “they would not thrive, nor would they plagiarize such damaging content, if their shady tactics didn’t do so well on the platform,” Hao wrote. “One possible way Facebook could do this: by using what’s known as a graph-based authority measure to rank content. This would amplify higher-quality pages like news and media and diminish lower-quality pages like clickbait, reversing the current trend.” The idea is simple, that authoritative, trustworthy sources should be amplified more than untrustworthy or spammy sources.
<p />
Broadly this type of manipulation is spam, much like spam that technology companies have dealt with for years in email and on the Web. If social media spam was not cost-effective, it would not exist. Like with web spam and email spam, the key with social media spam is to make it less effective and less efficient. As Hao suggested, manipulating wisdom of the crowd algorithms could be made to be less profitable by viewing likes and shares from less trustworthy accounts with considerable skepticism. If the algorithms did not amplify this content as much, it would be much less lucrative to spammers.
<p />
Inside of Facebook, data scientists proposed something similar. Billy Perrigo at Time magazine <a href="https://time.com/6116354/facebook-employees-deprioritized-misinformation/">reported</a> that Facebook “employees had discovered that pages that spread unoriginal content, like stolen memes that they’d seen go viral elsewhere, contributed to just 19% of page-related views on the platform but 64% of misinformation views.” Facebook data scientists “proposed downranking these pages in News Feed ... The plan to downrank these pages had few visible downsides ... [and] could prevent all kinds of high-profile missteps.”
<p />
What the algorithms show is important. The algorithms can amplify a wide range of interesting and useful content that enhances discovery and keeps people on the platform.
<p />
Or the algorithms can amplify manipulated content, including hate speech, spam, scams, and misinformation. That might make people click now in outrage, or perhaps fool them for a while, but will cause people to leave in disguist eventually.Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-8863362316519346982023-12-05T11:04:00.000-08:002023-12-05T11:04:26.054-08:00Book excerpt: Bonuses and promotions causing bad incentives<i>(This is an excerpt from my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Bonuses are a powerful incentive. Technology companies are using them more than ever. Most technology companies cap salaries and instead use bonuses and stock grants as most of their compensation for employees.
<p />
These bonuses are often tied to key metrics. For example, imagine that if you deploy a change to the recommendation algorithms that boosts revenue by a fraction of a percent, you would get the maximum bonus, a windfall of a million dollars, into your pocket.
<p />
What are you going to do? You’re going to try to get that bonus. In fact, you’ll do anything you can to get that bonus.
<p />
The problem comes when the criteria for what gets the bonus isn’t exactly correct. It doesn’t matter if it is mostly correct — increasing revenue is mostly correct as a goal — what matters is if there is any way, any way at all, to get that bonus in a way that doesn’t help the company and customers.
<p />
Imagine you find a way to increase revenue by biasing the recommendations toward outright scams, snake oil salesmen selling fake cures to the desperate. Just a twiddle to the algorithms and those scams show up just a bit more often, and that nudges the revenue just that much higher, at least when you tested it for a couple days.
<p />
Do you roll out this new scammy algorithm to everyone? Should everyone see more of these scams? And what happens to customers, and the company, if people see all these scams?
<p />
But that bonus. That tasty, tasty bonus. $1 million dollars. Surely, if you weren’t supposed to do this, they wouldn’t give you that bonus. Would they? This has to be the right thing. Isn’t it?
<p />
People working within technology companies have to make decisions like this every day. Examples abound of ways to generate more revenue that ultimately are harmful to the company, including increasing the size and number of paid promotions, salacious or otherwise inappropriate content, deceptive sales pitches, promoting lower quality items where you receive compensation, spamming people with takeover or pop-up advertising, and feeling strong emotions such as hatred.
<p />
As an article in Wired titled “15 Months of Fresh Hell Inside Facebook” described, this is a real problem. There easily can be “perverse incentives created by [the] annual bonus program, which pays people in large part based on the company hitting growth targets.”
<p />
“You can do anything, no matter how crazy the idea, as long as you move the goal metrics,” added Facebook whistleblower Francis Haugen. If you tell people their bonus depends on moving goal metrics, they will do whatever it takes to move those goal metrics.
<p />
This problem is why some tech companies reject using bonuses as a large part of their compensation. As Netflix founder and CEO Reed Hastings explained, “The risk is that employees will focus on a target instead of spotting what’s best for the company in the present moment.”
<p />
When talking about bonuses in our interview, a former executive who worked at technology startups gave the example of teams meeting their end-of-quarter quotas by discounting, which undermines pricing strategy and can hurt the long-term of the company. He also told of an executive who forced a deal through that was bad for the company because signing the deal ensured he hit his quarterly licensing goal and got his bonus. This other executive, when challenged by the CEO, defended his choice by saying he was not given the luxury of long-term thinking.
<p />
“We learned that bonuses are bad for business,” Netflix CEO Reed Hastings said. “The entire bonus system is based on the premise that you can reliably predict the future, and that you can set an objective in any given moment that will continue to be important down the road.”
<p />
The problem is that people will work hard to get a bonus, but it is hard to set a criteria for bonuses that cannot be abused in some way. People will try many, many things seeking to find something that wins the windfall the company is dangling in front of them. Some of the innovations might be real. But others may actually cause harm, especially over long periods of time.
<p />
As Reed Hastings went on to say, what companies need to be able to do is “adapt direction quickly” and have creative freedom to do the right thing for the company, not to focus on what “will get you that big check.” It’s not just how much you pay people, it’s also how you pay them.
<p />
Similarly, the people working on changing and tuning algorithms want to advance in their careers. How people are promoted, who is promoted, and for what reason creates incentives. Those incentives ultimately change what wisdom of the crowd algorithms do.
<p />
If people are promoted for helping customers find and discover what they need and keeping customers satisfied, people inside the company have more incentive to target those goals. If people are promoted for getting people to click more regardless of what they are clicking, then those algorithms are going to get more clicks, so more people get those promotions.
<p />
In the book An Ugly Truth, the authors found Facebook “engineers were given engagement targets, and their bonuses and annual performance reviews were anchored to measurable results on how their products attracted more users or kept them on the site longer.” Performance reviews and promotions were tied with making changes that kept people engaged and clicking. “Growth came first,” they found. “It’s how people are incentivized on a day-to-day basis.”
<p />
Who gets good performance reviews and promotions determines which projects get done. If a project that reduces how often people see disinformation from adversaries is both hard and gets poor performance reviews for its team, many people will abandon it. If another project that promotes content that makes people angry gets its team promoted because they increased engagement, then others will look over and say, that looks easy, I can do that too.
<p />
In the MIT Technology Review article “How Facebook Got Addicted to Spreading Misinformation,” Karen Hao described the incentives: “With their performance reviews and salaries tied to the successful completion of projects, employees quickly learned to drop those that received pushback and continue working on those dictated from the top down.”
<p />
The optimization of these algorithms is a series of steps, each one a small choice, about what people should and shouldn’t do. Often, the consequences can be unintended, which makes it that much more important for executives to check frequently if they are targeting the right goals. As former Facebook Chief Security Officer Alex Stamos said, “Culture can become a straightjacket” and force teams down paths that eventually turn out to be harmful to customers and the company.
<p />
Executives need to be careful of the bonus and promotion incentives they create for how their algorithms are tuned and optimized. What the product does depends on what incentives teams have.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-48292733592538307812023-12-04T09:53:00.000-08:002023-12-04T09:57:48.258-08:00The failure of big dataFor decades, the focus in machine learning has been big data.
<p />
More data beats better algorithms, <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/acl2001.pdf">said an early 2001 result</a> from Banko and Brill at Microsoft Research. This was hugely influential on the ML community. For years, most people found it was roughly true that if you get more data, ML works better.
<p />
Those days have come to an end. Nowadays, big data often is worse. This is because low quality or manipulated data wrecks everything.
<p />
There was a quiet assumption behind big data that any bad data in the big data is noise that averages out. That is wrong in most real world data where the bad data is skewed.
<p />
This problem is acute with user behavior data, like clicks, likes, links, or ratings. ML trying to use user behavior is trying to do wisdom of crowds, summarizing the opinions of many independent sources to produce useful information.
<p />
Adversaries can purposely skew user behavior data. When they do, using that data will yield terrible results in ML algorithms because the adversaries are able to make the algorithms show whatever they like. That includes the important ranking algorithms for search, trending, and recommendations that we use every day to find information on the internet.
<p />
ML using behavior data is doing wisdom of crowds. Wisdom of crowds assumes the crowd is full of real, unbiased, non-coordinating voices. It doesn't work when the crowd is not real. In cases where you are not sure, it's better to discard much of the data, anything not reliable.
<p />
Better data often beats big data if you measure by what is useful to people. ML needs data from reliable, representative, independent, and trustworthy sources to produce useful results. If you aren't sure about the reliability, throw that data out, even if you are throwing most of the data away in the end. Seek useful data, not big data.Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-90097657597998204282023-12-01T12:56:00.000-08:002023-12-01T12:56:22.480-08:00Book excerpt: Manipulating customer reviews<i>(This is an excerpt from my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Amazon is the place people shop online. Over 40% of all US e-commerce spending was on Amazon.com in recent years.
<p />
Amazon also is the place for retailers to list their products for sale. Roughly 25% of all US e-commerce spending recently was third-party marketplace sellers using the Amazon.com website to sell their goods. Amazon is the place for merchants wanting to be seen by customers.
<p />
Because the stakes are so high, sellers have a strong incentive to have positive reviews of their products. Customers not only look at the reviews before buying, but also filter what they search for based on the reviews.
<p />
“Reviews are meant to be an indicator of quality to consumers," Zoe Schiffer <a href="https://www.theverge.com/2020/10/2/21497416/amazon-crack-down-fraudulent-reviews-facebook-wechat-groups">wrote</a> for The Verge, “[And] they also signal to algorithms whose products should rise to the top.”
<p />
For example, when a customer searches on Amazon for [headphones], there are tens of thousands of results. Most customers will only look at the first few of those results. The difference between being one of the top results for that search for headphones and being many clicks down the list can make or break a small manufacturer.
<p />
As Wired put it in an article titled “<a href="https://www.wired.com/story/amazon-and-the-spread-of-health-misinformation/">How Amazon’s Algorithms Curated a Dystopian Bookstore</a>”: “Amazon shapes many of our consumption habits. It influences what millions of people buy, watch, read, and listen to each day. It’s the internet’s de facto product search engine — and because of the hundreds of millions of dollars that flow through the site daily, the incentive to game that search engine is high. Making it to the first page of results for a given product can be incredibly lucrative.”
<p />
But there is a problem. “Many curation algorithms can be gamed in predictable ways, particularly when popularity is a key input. On Amazon, this often takes the form of dubious accounts coordinating.”
<p />
The coordination of accounts often takes the form of paying people to write positive reviews whether they have used the item or not. It is not hard to recruit people to write a bogus positive review. A small payment and being allowed to keep the product for free is usually enough. There are even special discussion forums where people wait to be offered the chance to post a false positive review, ready and available recruits for the scam.
<p />
BuzzFeed described the process in detail in an investigative piece, “<a href="https://www.buzzfeednews.com/article/nicolenguyen/amazon-fake-review-problem">Inside Amazon’s Fake Review Economy</a>.” They discuss “a complicated web of subreddits, invite-only Slack channels, private Discord servers, and closed Facebook groups.” They went on to detail how “sellers typically pay between $4 to $5 per review, plus a refund of the product ... [and] reviewers get to keep the item for free.”
<p />
Why do merchants selling on Amazon do this? As Nicole Nguyen explained in that BuzzFeed article, “Being a five-star product is crucial to selling inventory at scale in Amazon’s intensely competitive marketplace — so crucial that merchants are willing to pay thousands of people to review their products positively.”
<p />
Only one product can appear at the top of an Amazon search for [headphones]. And the top result will be the one most customers see and buy. It is winner take all.
<p />
“Reviews are a buyer’s best chance to navigate this dizzyingly crowded market and a seller’s chance to stand out from the crowd ... Online customer reviews are the second most trusted source of product information, behind recommendations from family and friends ... The best way to make it on Amazon is with positive reviews, and the best way to get positive reviews is to buy them.”
<p />
Because so few customers leave reviews, and even fewer leave positive reviews, letting the natural process take its course means losing to another less scrupulous merchant who is willing to buy as many positive reviews as they need. The stakes are high, and those who refuse to manipulate the reviews usually lose.
<p />
“Sellers trying to play by the rules are struggling to stay afloat amid a sea of fraudulent reviews,” Nguyen wrote. It is “really hard to launch a product without them.”
<p />
More recently, Facebook Groups have grown in popularity, generally and as a way to recruit people to write fake reviews. UCLA researchers <a href="https://cacm.acm.org/magazines/2023/10/276636-leveraging-social-media-to-buy-fake-reviews/fulltext">described in detail</a> how it works, finding “23 [new] fake review related groups every day. These groups are large and quite active, with each having about 16,000 members on average, and 568 fake review requests posted per day per group. Within these Facebook groups, sellers can obtain a five-star review that looks organic.” They found the cost of buying a fake review to be quite cheap, “the cost of the product itself,” because “the vast majority of sellers buying fake reviews compensate the reviewer by refunding the cost of the product via a PayPal transaction after the five-star review has been posted” with only a small number of the bad sellers also offering money in addition to a refund of the cost of the product.
<p />
Washington Post reporters also <a href="https://www.washingtonpost.com/business/economy/how-merchants-secretly-use-facebook-to-flood-amazon-with-fake-reviews/2018/04/23/5dad1e30-4392-11e8-8569-26fda6b404c7_story.html">found</a> “fraudulent reviews [often] originate on Facebook, where sellers seek shoppers on dozens of networks, including Amazon Review Club and Amazon Reviewers Group, to give glowing feedback in exchange for money or other compensation.”
<p />
You might think that getting caught manipulating reviews, and through fake reviews also getting featured in search and in recommendations, might have some cost for sellers if they were to get caught. However, Brad Stone in <i>Amazon Unbound</i> found that “sellers [that] adopted deceitful tactics, like paying for reviews on the Amazon website” faced almost no penalties. “If they got caught and their accounts were shut down, they simply opened new ones.”
<p />
Manipulating reviews, search rankings, and recommendations hurts Amazon customers and, eventually, will undermine trust in Amazon. While Amazon reviews have been viewed as a useful and trusted way to figure out what to buy on Amazon, fake reviews threaten to undermine that trust.
<p />
“It’s easy to manipulate ratings or recommendation engines, to create networks of sockpuppets with the goal of subtly shaping opinions, preying on proximity bias and confirmation bias,” <a href="https://www.ribbonfarm.com/2017/05/23/there-are-bots-look-around/">wrote</a> Stanford Internet Observatory’s Renee DiResta. Sockpuppets are fake accounts pretending to be real people. When bad actors create many sockpuppets, they can use those fake accounts to feign popularity and dominate conversations. “Intentional, deliberate, and brazen market manipulation, carried out by bad actors gaming the system for profit ... can have a profound negative impact.”
<p />
The bad guys manipulate ranking algorithms through a combination of fake reviews and coordinated activity between accounts. A group of people, all working together to manipulate the reviews, can change what algorithms like the search ranker or the recommendation engine think are popular. Wisdom of the crowd algorithms, including reviews, require all the votes to be independent, and coordinated shilling breaks that assumption.
<p />
Nowadays, Amazon seems to be saturated with fake reviews. The Washington Post <a href="https://www.washingtonpost.com/business/economy/how-merchants-secretly-use-facebook-to-flood-amazon-with-fake-reviews/2018/04/23/5dad1e30-4392-11e8-8569-26fda6b404c7_story.html">found</a> that “for some popular product categories, such as Bluetooth headphones and speakers, the vast majority of reviews appear to violate Amazon’s prohibition on paid reviews.”
<p />
This hurts both Amazon customers and other merchants trying to sell on Amazon. “Sellers say the flood of inauthentic reviews makes it harder for them to compete legitimately and can crush profits.” Added one retailer interviewed by the Washington Post, “These days it is very hard to sell anything on Amazon if you play fairly.”
<p />
Of course, this also means the reviews no longer indicate good products. Items with almost entirely 5 star reviews may be an “inferior or downright faulty products.” Customers are “left in the dark” using “seemingly genuine reviews” but end up buying “products of shoddy quality.” As Buzzfeed <a href="https://www.buzzfeednews.com/article/nicolenguyen/amazon-fake-review-problem">warned</a>, “These reviews can significantly undermine the trust that consumers and the vast majority of sellers and manufacturers place in Amazon, which in turn tarnishes Amazon’s brand.”
<p />
Long-term harm to customer trust could eventually lead people to shop on Amazon less. Customer Reports, in an article titled “<a href="https://www.consumerreports.org/customer-reviews-ratings/hijacked-reviews-on-amazon-can-trick-shoppers/">Hijacked Reviews on Amazon Can Trick Shoppers</a>,” went as far as to warn against using the average review score at all: “Fraudulent reviews are a well-known pitfall for shoppers on Amazon ... never rely on just looking at the number of reviews and the average score ... look at not only good reviews, but also the bad reviews.”
<p />
Unfortunately, Amazon executives may have to see growth and sales problems, due to lack of customer trust in the reviews, before they are willing to put policies in place to change the incentives for sellers. For now, as Consumer Reports said, Amazon's customer reviews can no longer be trusted.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-55384226155748147662023-11-28T14:48:00.000-08:002023-11-28T14:48:40.230-08:00Book excerpt: The problem is bad incentives<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Incentives matter. “As long as your goal is creating more engagement,” said former Facebook data scientist Francis Haugen in a 60 Minutes interview, “you’re going to continue prioritizing polarizing, hateful content.”
<p />
Teams inside of the tech companies determine how the algorithms are optimized and what the algorithms amplify. People in teams optimize those algorithms for whatever goals they are given. Metrics and incentives the teams have inside the tech companies determine how wisdom of the crowd algorithms are optimized over time.
<p />
What the company decides is important and rewards determines how the algorithms are tuned. Metrics determine what wins A/B tests. Metrics decide what changes get launched to customers. Metrics determine who gets promoted inside these companies. When a company creates bad incentives by picking bad metrics, the algorithms will produce bad results.
<p />
What Facebook’s leadership prioritizes and rewards determines what people see on Facebook. “Facebook’s algorithm isn’t a runaway train,” Haugen said. “The company may not directly control what any given user posts, but by choosing which types of posts will be seen, it sculpts the information landscape according to its business priorities.” What the executives prioritize in what they measure and reward determines what types of posts people see on Facebook. You get what you measure.
<p />
“Mark has never set out to make a hateful platform. But he has allowed choices to be made where the side effects of those choices are that hateful, polarizing content gets more distribution and more reach,” Haugen said. Disinformation, misinformation, and scams on social media are “the consequences of how Facebook is picking out that content today.” The algorithms are “optimizing for content that gets engagement, or reaction.”
<p />
Who gets that quarterly bonus? It’s hard to have a long-term focus when the company offers large quarterly bonuses for hitting short-term engagement targets. In <i>No Rules Rules</i>, Netflix co-founder and CEO Reed Hastings wrote, “We learned that bonuses are bad for business.” He went on to say that executives are terrible at setting the right metrics for the bonuses and, even if they do, “the risk is that employees will focus on a target instead of spot what’s best for the company.”
<p />
Hastings said that “big salaries, not merit bonuses, are good for innovation” and that Netflix does not use “pay-per-performance bonuses.” Though “many imagine you lose your competitive edge if you don’t offer a bonus,” he said, “We have found the contrary: we gain a competitive edge in attracting the best because we just pay all that money in salary.”
<p />
At considerable effort, Google, Netflix, and Spotify have shown that, properly measured in long experiments, short-term metrics such as engagement or revenue hurt the company in the long-run. For example, in a paper titled “<a href="https://research.google/pubs/pub43887/">Focus on the Long-term: It’s Better for Users and Business</a>”, Google showed that optimizing for weekly ad revenue would result in far too many ads in the product to maximize Google’s long-term ad revenue. Short-term metrics miss the most important goals for a company: growth, retention, and long-term profitability.
<p />
Short-term metrics and incentives overoptimize for immediate gains and ignore long-term costs. While companies and executives should have enough reasons to avoid bad incentives and metrics that hurt the company in the long-term, it is also true that regulators and governments could step in to encourage the right behaviors. As Foreign Policy wrote when talking about democracies protecting themselves from adversarial state actors, regulators could encourage social media companies to think beyond the next quarterly earnings report.
<p />
Regulators have struggled to understand how to help. Could they directly regulate algorithms? Attempts to do so have immediately hit the difficulty of crafting useful regulations for machine learning algorithms. But the problem is not the algorithm. The problem is people.
<p />
Companies want to make money. Many scammers and other bad actors also want to make money. The money is in the advertising.
<p />
Fortunately, the online ad marketplace already has a history of being regulated in many countries. Regulators in many countries already maintain bans on certain types of ads, restrictions on some ads, and financial reporting requirements for advertising. Go after the money and you change the incentives.
<p />
Among those suggesting increasing regulation on social media advertising is the Aspen Institute Commission on Information Disorder. In their <a href="https://www.aspeninstitute.org/publications/commission-on-information-disorder-final-report/">report</a>, they suggest countries “require social media companies to regularly disclose ... information about every digital ad and paid post that runs on their platforms [and then] create a legal requirement for all social media platforms to regularly publish the content, source accounts, reach and impression data for posts that they organically deliver to large audiences.”
<p />
This would provide transparency to investors, the press, government regulators, and the public, allowing problems to be seen far earlier, and providing a much stronger incentive for companies themselves to prevent problems before having them disclosed.
<p />
The Commission on Information Disorder goes further, suggesting that, in the United States, the extension of Section 230 protections to advertising and algorithms that promote content is overly broad. They argue any content that is featured, either by paid placement advertising or by recommendation algorithms, should be more heavily scrutinized: “First, withdraw platform immunity for content that is promoted through paid advertising and post promotion. Second, remove immunity as it relates to the implementation of product features, recommendation engines, and design.”
<p />
Their report was authored by some of the world experts on misinformation and disinformation. They say that “tech platforms should have the same liability for ad content as television networks or newspapers, which would require them to take appropriate steps to ensure that they meet the established standards for paid advertising in other industries.” They also say that “the output of recommendation algorithms” should not be considered user speech, which would enforce a “higher standard of care” when the company’s algorithms get shilled and amplify content “beyond organic reach.”
<p />
These changes would provide strong incentives for companies to prevent misinformation and propaganda in their products. The limitations on advertising would reduce the effectiveness of using advertising in disinformation campaigns. It also would reduce the effectiveness of spammers who opportunistically pile on disinformation campaigns, cutting into their efficiency and profitability. Raising the costs and reducing the efficiency of shilling will reduce the amount of misinformation on the platform.
<p />
Subject internet companies to the same regulations on advertising that television networks and newspapers have. Regulators are already familiar with following the money, and even faster enforcement and larger penalties for existing laws would help. By changing where revenue comes from, it may encourage better incentives and metrics within tech companies.
<p />
“Metrics can exert a kind of tyranny,” former Amazon VP Neil Roseman said in our interview. Often teams “don’t know how to measure a good customer experience.” And different teams may have “metrics that work against each other at times” because simpler and short-term term metrics often “narrow executive focus to measurable input/outputs of single systems.” A big problem is that “retention (and long-term value) are long-term goals which, while acknowledged, are just harder for people to respond to than short-term.”
<p />
Good incentives and metrics focus on the long-term. Short-term incentives and metrics can create a negative feedback loop as algorithms are optimized over time. Good incentives and metrics focus on what is important to the business, long-term retention and growth.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-87892900990489908282023-11-27T17:50:00.000-08:002023-11-27T17:50:33.538-08:00Tim O'Reilly on algorithmic tuning for exploitationTim O'Reilly, Mariana Mazzucato, and Ilan Strauss have three working papers focusing on Amazon's ability to extract unusual profits from its customers nowadays. The papers are:
<ul>
<li><a href="https://www.ucl.ac.uk/bartlett/public-purpose/publications/2023/nov/algorithmic-attention-rents-theory-digital-platform-market-power">Algorithmic Attention Rents: A theory of digital platform market power</a></li>
<li><a href="https://www.ucl.ac.uk/bartlett/public-purpose/publications/2023/nov/amazons-algorithmic-rents-economics-information-amazon">Amazon’s Algorithmic Rents: The economics of information on Amazon</a></li>
<li><a href="https://www.ucl.ac.uk/bartlett/public-purpose/publications/2023/nov/behind-clicks-can-amazon-allocate-user-attention-it-pleases">Behind the Clicks: Can Amazon allocate user attention as it pleases?</a></li>
</ul>
The core idea in all three is that Amazon has become the default place to shop online for many. So, when Amazon changes their site in ways that make Amazon higher profits but hurt consumers, it takes work for people to figure that out and shop elsewhere.
<p />
The papers criticize the common assumption that people will quickly switch to shopping elsewhere if the Amazon customer experience deteriorates. Realistically, people are busy. People have imperfect information, limited time, and it is effortful to find another place to shop. At least up to some limit, people may tolerate a familiar but substantially deteroriated experience for some time.
<p />
For search, it takes effort for people to notice that they are being shown lots of ads, that less reliable third party sellers are promoted over less profitable but more relevant options, and that the most useful options aren't always first. And then it takes yet more effort to switch to using other online stores. So Amazon is able to extract extraordinary profits in ways less dominant online retailers can't get away with.
<p />
But I do have questions about how far Amazon can push this. How long can Amazon get away with excessive advertising and lower quality? Do consumers tire of it over time and move on? Or do they put up with it forever as long as the pain is below some threshold?
<p />
Take an absurd extreme. Imagine that Amazon thought it could maximize its revenue and profits by showing only ads and only the most profitable ads for any search regardless of the relevance of those ads to the search. Clearly, that extreme would not work. The search would be completely useless and consumers would go elsewhere very rapidly.
<p />
Now back off from that extreme, adding back more relevant ads and more organic results. At what point do consumers stay at Amazon? And do they just stay at Amazon or do they slowly trickle away?
<p />
I agree time and cognitive effort, as well as Amazon Prime renewing annually, raise switching costs. But when will consumers have had enough? Do consumers only continue using Amazon with all the ads until they realize the quality has changed? When does brand and reputation damage accumulate to the point that consumers start trusting Amazon less, shopping at Amazon less, and expending the effort of trying alternatives?
<p />
I think one model of customer attrition is that every time customers notice a bad experience, they have some probability of using Amazon less in the future. The more bad experiences they have, the faster the damage to long-term revenue. Under this model, even the level of ads Amazon has now is causing slow damage to Amazon. Amazon execs may not notice because the damage is over long periods of time and hard to attribute directly back to the poor quality search results, but the damage is there. This is the model I've seen used by some others, such as Google Research in their "<a href="https://research.google/pubs/pub43887/">Focus on the Long-term</a>" paper.
<p />
Another model might be that consumers are captured by dominant companies such as Amazon and will not pay the costs to switch until they hit some threshold. That is, most customers will refuse to try alternatives until it is completely obvious that it is worth the effort. This assumes that Amazon can exploit customers for a very long time, and that customers will not stop using Amazon no matter what they do. There is some extreme where that breaks, but only at the threshold, not before.
<p />
The difference between these two models matters a lot. If Amazon is experiencing substantial but slow costs from what they are doing right now, there's much more hope for them changing their behavior on their own than if Amazon is experiencing no costs from their bad behavior unless regulators impose costs externally. The solutions you get in the two scenarios are likely to be different.
<p />
I enjoyed the papers and found them thought-provoking. Give the papers a read, especially if you are interested in the recent discussions of <a href="https://en.wikipedia.org/wiki/Enshittification">enshittification</a> started by Cory Doctorow. As Cory points out, this is a much broader problem than just Amazon. And we need practical solutions that companies, consumers, and policy makers can actually implement.Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com3tag:blogger.com,1999:blog-6569681.post-79378982498847126372023-11-26T08:34:00.000-08:002023-11-26T08:34:08.711-08:00Book excerpt: People determine what the algorithms do<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
The problem is people. These algorithms are built, tuned, and optimized by people. The incentives people have determine what these algorithms do.
<p />
If what wins A/B tests is what gets the most clicks, people will optimize the algorithms to get more clicks. If a company hands out bonuses and promotions when the algorithms get more clicks, people will tune the algorithms to get more clicks.
<p />
It doesn’t matter that what gets clicks and engagement may not be good for customers or the company in the long-term. Lies, scams, and disinformation can be very engaging. Fake crowds generate a lot of clicks. None of them are real, true, and none of them help customers or the business, but look at all those click, click, clicks.
<p />
Identifying the right problem is the first step toward finding the right solutions. The problem is not algorithms. The problem is how people optimize the algorithms. Lies, scams, and disinformation thrive if people optimize for the short-term. Problems like misinformation are a symptom of a system that invites these problems.
<p />
Instead, invest in the long-term. Invest in removing fake crowds and in a good customer experience that keeps people around. Like any investment, this means lower profits in the short-term for higher profits in the long-term. Companies maximize long-term profitability by making sure teams are optimizing for customer satisfaction and retention.
<p />
It’s not the algorithm, it’s people. People are in control. People tune the algorithm to cause harm usually unintentionally and sometimes because they have incentives to ignore the harm. The algorithm does what people tell it to.
<p />
To fix why the algorithms cause harm, look to the people who build the algorithms. Fixing the harm from wisdom of the crowd algorithms requires fixing why people allow those algorithms to cause harm.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-13544743693034419522023-11-17T13:23:00.000-08:002023-11-17T13:23:20.575-08:00Book excerpt: How companies build algorithms using experimentation<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Wisdom of the crowd algorithms shape what people see on the internet. Constant online experimentation shapes what wisdom of the crowd algorithms do.
<p />
Wisdom of crowds is the idea that summarizing the opinions of lots of independent people is often useful. Many machine learning algorithms use wisdom of the crowds, including rankers, trending, and recommenders on social media.
<p />
It's important to realize that recommendations algorithms are not magic. They don't come up with good recommendations out of thin air. Instead, they just summarize what people found.
<p />
If summarizing what people found is all the algorithms do, why do they create harm? Why would algorithms amplify social media posts about scammy vitamin supplements? Why would algorithms show videos from white supremacists?
<p />
It is not how the algorithms are built, but how they are optimized. Companies change, twiddle, and optimize algorithms over long periods of time using online experiments called A/B tests.
In A/B tests, some customers see version A of the website and some customers see version B.
<p />
Teams compare the two versions. Whichever version performs better, by whatever metrics the company chooses, is the version that later launches for all customers. This process repeats and repeats, slowly increasing the metrics.
<p />
Internet companies run tens of thousands of these online experiments every year. The algorithms are constantly tested, changing, and improving, getting closer and closer to the target. But what if you have the wrong target? If the goal is wrong, what the algorithms do will be wrong.
<p />
Let’s say you are at Facebook working on the news feed algorithm. The news feed algorithm is what picks what posts people see when they come to Facebook. And let’s say you are told to optimize the news feed for what gets the most clicks, likes, and reshares. What do you do? You will start trying changes to the algorithm and A/B testing them. Does this change get more clicks? What about this one? Through trial-and-error, you will find whatever makes the news feed get more engagement.
<p />
It is this trial-and-error process of A/B testing that drives what the algorithms do. Whatever the goal is, whatever the target, teams of software engineers will work hard to twiddle the algorithms to hit those goals. If your goal is the wrong goal, your algorithms will slowly creep toward doing the wrong thing.
<p />
So what gets the most clicks? It turns out scams, hate, and lies get a lot of clicks. Misinformation tends to provoke a strong emotional reaction. When people get angry, they click. Click, click, click.
<p />
And if your optimization process is craving clicks, it will show more of whatever gets clicks. Optimizing algorithms for clicks is what causes algorithms to amplify misinformation on the internet.
<p />
To find practical solutions, it's important to understand how powerful tech companies build their algorithms. It's not what you would expect.
<p />
Algorithms aren't invented so much as evolved. These algorithms are optimized over long periods of time, changing slowly to maximize metrics. That means the algorithms can unintentionally start causing harm.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-91982694802906206582023-11-17T07:04:00.000-08:002023-11-17T07:04:58.007-08:00It's easy for social media to fill with astroturfMost underestimate how easy it is for social media to become dominated by astroturf. It's easy. All you need is a few people creating and controlling multiple accounts. Here's an example.
<p />
Let's say you have 100M real people using your social media site. On average, most post or comment infrequently, once every 10 days. That looks like real social media activity from real people. Most people lurk, a few people post a lot.
<p />
Now let's say 1% of people shill their own posts using about 10 accounts they control on average. These accounts also post and comment more frequently, once a day. Most of these use a few burner accounts to like, share, and comment on their own posts. Some use paid services and unleash hundreds of bots to shill for them.
<p />
In this simple example, about 50% of comments and posts you see on the social media site will be artificially amplified by fake crowds. Astroturfed posts and comments will be everywhere. This is because most people don't post often, and the shills are much more active.
<p />
Play with the numbers. You'll find that if most people don't post or comment -- and most real people don't -- it's easy for people who post a lot from multiple accounts they control to dominate conversations and feign popularity. It's like a megaphone for social media.
<p />
Also important is how hard it is for the business to fix astroturf once they (often unintentionally) go down this path. This example social media site has 100M people using it, but claims about 110M users. Real engagement is much smaller with fewer highly engaged accounts, not what this business pitches to advertisers. Once you have allowed this problem to grow, it's tempting for companies finding themselves in this situation to not fix it.Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-87241342641940352582023-11-15T13:50:00.000-08:002023-11-15T13:50:25.585-08:00Book excerpt: How some companies get it right<i>(This is an excerpt from drafts of my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")</i>
<p />
How do some companies fix their algorithms? In the last decade, wisdom of the crowds broke, corrupted by bad actors. But some found fixes that let them still use wisdom of the crowds.
<p />
Why was Wikipedia resilient to spammers and shills when Facebook and Twitter were not? Diving into how Wikipedia works, this book shows that Wikipedia is not a freewheeling anarchy of wild edits by anyone, but a place where the most reliable and trusted editors have most of the power. A small percentage of dedicated Wikipedia editors have much more control over Wikipedia than the others; their vigilance is the key to keeping out scammers and propagandists.
<p />
It's well known that When Larry Page and Sergey Brin first created Google, they invented the PageRank algorithm. Widely considered a breakthrough at the time, PageRank used links between web pages as if they were votes for what was interesting and popular. PageRank says a web page is useful if it has a lot of other useful web pages pointing to it.
<p />
Less widely known is that PageRank quickly succumbed to spam. Spammers created millions of web pages with millions of links all pointing to each other, deceiving the PageRank algorithm. Because of spam and manipulation, Google quickly replaced PageRank with the much more resilient TrustRank.
<p />
TrustRank only considers links from reliable and trustworthy web pages and mostly ignores links from unknown or untrusted sources. It works by propagating trust along links between web pages from known trusted pages to other pages. TrustRank made manipulating Google's search ranking algorithm much less effective and much more expensive for scammers.
<p />
TrustRank also works for social media. Start by identifying thousands of accounts that are known to be reliable, meaning that they are real people posting useful information, and thousands of accounts that are unreliable, meaning that they are known to be spammers and scammers. Then look at the accounts that those accounts follow, like, reshare, or engage with in any way. Those nearby accounts then get a bit of the goodness or badness, spreading through the engagement network. Repeat this over and over, allowing reliability and unreliability to spread across all the accounts, and you know how reliable most accounts are even if they are anonymous.
<p />
If you boost reliable accounts and mostly ignore unknown and unreliable accounts, fake accounts become less influential, and it becomes much less cost-effective for bad actors to create influential fake accounts.
<p />
Companies that fixed their wisdom of the crowd algorithms also do not use engagement to optimize their algorithms. Optimizing for engagement will cause wisdom of the crowd algorithms to promote scams, spam, and misinformation. Lies get clicks.
<p />
It’s a lot of work to not optimize for engagement. Companies like Netflix, Google, YouTube, and Spotify put in considerable effort to run long experiments, often measuring people over months or even years. They then develop proxy short-term metrics that they can use to measure long-term satisfaction and retention over shorter periods of time. One example is satisfied clicks, which are clicks where people are not immediately repelled and spend time using the content they see, ignoring clicks on scams or other low quality content. These companies put in all this effort to develop good metrics because they know that optimizing algorithms for engagement eventually will hurt the company.
<p />
Algorithms can be fixed if executives leading the companies decide to fix them. Some companies have successfully prevented bad actors from manipulating wisdom of the crowds. The surprise: Companies make more much more money over the long-run if they don't optimize algorithms for clicks.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-59843254417002096622023-11-09T08:35:00.002-08:002023-11-09T09:49:56.370-08:00Book excerpt: Table of Contents<i>(This is the Table of Contents from my book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Introduction: How good algorithms became a fountain of scams, shills, and disinformation — and what to do about it
<p />
Part I: The golden age of wisdom of crowds algorithms<br />
Chapter 1: The rise of helpful algorithms<br />
Chapter 2: How companies build algorithms using experimentation
<p />
Part II: The problem is not the algorithms<br />
Chapter 3: Bad metrics: What gets measured gets done<br />
Chapter 4: Bad incentives: What gets rewarded gets replicated<br />
Chapter 5: Bad actors: The irresistible lure of an unlocked house
<p />
Part III: How to stop algorithms from amplifying misinformation<br />
Chapter 6: How some companies get it right<br />
Chapter 7: How to solve the problems with the algorithms<br />
Chapter 8: Getting platforms to embrace long-term incentives and metrics<br />
Chapter 9: Building a win-win-win for companies, users, and society
<p />
Conclusion: From hope to despair and back to hope
<p />
<i>(That was the Table of Contents from a draft of my book. If you might be interested in this book, I'd love to know.)
</i>Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-1751059255220453992023-10-30T13:08:00.000-07:002023-10-30T13:08:01.985-07:00Book excerpt: Overview from the book proposal<i>(This is an excerpt from the book proposal for my unpublished book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Without most of us even realizing it, algorithms determine what we see everyday on the internet.
<p />
Computer programs pick which videos you’ll watch next on TikTok and YouTube. When you go to Facebook and Twitter, algorithms pick which news stories you’ll read. When it’s movie night, algorithms dictate what you’ll watch on Netflix based on what you watched in the past. Everywhere you look, algorithms decide what you see.
<p />
When done well, these computer programs have enormous value, helping people find what they need quickly and easily. It’s hard to find what you are looking for with so much out there. Algorithms filter through everything, tossing bad options away with wild abandon, to bring rare gems right to you.
<p />
Imagine you’re looking for a book. When you go to Amazon and start searching, algorithms are what filter through all the world’s books for you. But not only that. Algorithms also look at what books people seem most interested in and then bring you the very best choices based on what other customers bought. By quickly filtering through millions of options, computers help people discover things they never would have been able to find on their own.
<p />
These algorithms make recommendations in much the same way that you would. Suppose you have a friend who asks you to recommend a good book for her to read. You might ask yourself, what do you know about her? Does she like fiction or nonfiction? Which authors does she like? What books did she read in the past few months? With a little information about your friend’s tastes, you might narrow things down. Perhaps she would like this well-reviewed mystery book? It has some similar themes to a book she enjoyed last year.
<p />
Algorithms combine opinions, likes, and dislikes from millions of people. The seminal book The Wisdom of Crowds popularized the idea that combining the opinions of many random people often gives useful results. What algorithms do is bring together the wisdom of crowds at massive scale. One way they do this is by distilling thousands of customer reviews so you can easily gauge the average review of a movie or video game before you sink time and money into it. Another way is by showing you that customers who bought this also bought that. When algorithms pick what you see on the internet, they use wisdom of the crowds.
<p />
Something changed a few years ago. Wisdom of the crowds failed. Algorithms that use wisdom of the crowds started causing harm. Across the internet, algorithms that choose what people see started showing more spam, misinformation, and propaganda.
<p />
What happened? In the same way a swindler on a street corner will stack the crowd with collaborators who loudly shill the supposed wonders of their offerings, wisdom of the crowd algorithms got fooled into promoting misinformation, scams, and frauds. With the simple ease of creating many online accounts, a fraudster can pretend to be an entire crowd of people online. A fake crowd gives scammers a megaphone that they can use to amplify their own voice as they drown out the voices of others.
<p />
Search and recommendation algorithms across the internet were fooled by these fake crowds. Before the 2020 election in the United States, foreign adversaries posted propaganda to social media, then pretended to be large numbers of Americans liking and resharing, fooling the algorithms into amplifying their posts. 140 million people in the United States saw this propaganda, many of whom were voters. In 2019, the largest pages on social media for Christian Americans, such as “Be Happy Enjoy Life” and “Jesus is my Lord”, were controlled by foreign operatives pretending to be Americans. These troll farms shilled recommendation, search, and trending algorithms, getting top placement for their posts and high visibility for their groups, reaching 75 million people. Scammers manipulated wisdom of the crowd algorithms with shills to promote their bogus cures during the COVID-19 global pandemic. In 2021, the US Surgeon General was so alarmed by health misinformation on the internet that he warned of increased illness and death if it continued.
<p />
Misinformation and disinformation are now the biggest problems on the internet. It is cheap and easy for scammers and propagandists to get seen by millions. Just create a few hundred accounts, have them like and share your stuff to create the illusion of popularity, and wisdom of the crowd algorithms will amplify whatever you like. Even once many companies realized the algorithms had gone wrong, many failed to fix it.
<p />
This book is about fixing misinformation on the internet by fixing the algorithms that promote misinformation. Misinformation, scams, and propaganda are ubiquitous on the internet. Algorithms including trending, recommendations, and search rankers amplify misinformation, giving it much further reach and making it far more effective.
<p />
But the reason why algorithms amplify misinformation is not what you think. As this book shows, the process of how big tech companies optimize algorithms is what causes those algorithms to promote misinformation. Diving deep inside the tech companies to understand how they build their algorithms is the key to finding practical solutions.
<p />
This book could only be written by an insider with an eye toward how the biggest tech companies operate. That’s because it’s necessary to not only understand the artificial intelligence technology behind the algorithms that pick what people see on the internet, but also understand the business incentives inside these companies when teams build and optimize these algorithms.
<p />
When I invented Amazon’s recommendation algorithm, our team was idealistic about what would happen next. We saw algorithms as a tool to help people. Find a great book. Enjoy some new music. Discover new things. No matter what you are looking for, someone out there probably already found it. Wisdom of the crowd algorithms share what people found with other people who might enjoy it. We hoped for an internet that would be a joyful playground of knowledge and discovery.
<p />
In the years since, and in my journeys through other tech companies, I have seen how algorithms can go terribly wrong. It can happen easily. It can happen unintentionally. Like taking the wrong path in a dark forest, small steps lead to bigger problems. When algorithms go wrong, we need experts like me who can see realistic ways to correct the root causes behind the problems.
<p />
Solutions to what is now the world’s algorithm problem require interdisciplinary expertise in business, technology, management, and policy. I am an artificial intelligence expert, invented Amazon’s recommendation algorithm, and have thirty-two patents on search and recommendation algorithms. I also have a Stanford MBA, worked with executives at Amazon, Microsoft, and Netflix, and am an expert on how tech companies manage, measure, and reward teams working on wisdom of the crowd algorithms. Past books have failed to offer solutions because authors have lacked the insider knowledge, and often the technical and business expertise, to solve the problems causing misinformation and disinformation. Only with a deep understanding of the technology and business will it be possible to find solutions that not only will work, but also will be embraced by business, government, and technology leaders.
<p />
This book walks readers through how these algorithms are built, what they are trying to do, and how they go wrong. I reveal what it is like day-to-day to work on these algorithms inside the biggest tech companies. For example, I describe how the algorithms are gradually optimized over time. That leads to the surprising conclusion that critical to what the algorithms show people is not the algorithms themselves but the metrics companies pick for judging if the algorithms are doing their job well. I show how easy it is for attempts to improve algorithms to instead go terribly wrong. Seemingly unrelated decisions such as how people are promoted can not only cause algorithms to amplify misinformation, but also hurt customers and the long-term profitability of the company.
<p />
Readers need to know both why the algorithms caused harm and why some companies failed to fix the problems. By looking at what major tech companies have done and failed to do, readers see the root causes of the massive spread of misinformation and disinformation on the internet. Some companies have invested in fixing their algorithms and prospered. Some companies failed to fix their algorithms and suffered higher costs as misinformation and scams grew. By comparing companies that have had more success with those that have not, readers discover how some companies keep fraudsters from manipulating their algorithms and why others fail.
<p />
Other books have described misinformation and disinformation on the internet, but no other book offers practical solutions. This book explains why algorithms promote misinformation with key insights into what makes misinformation cost effective for fraudsters. This book describes what tempts giant tech companies to allow misinformation on their platforms and how that eventually hurts the companies and their customers. Importantly, this book provides strong evidence that companies would benefit from fixing their algorithms, establishing that companies make more money when they fix their algorithms to stop scams, propaganda, and misinformation. From this book, consumers, managers, and policy makers not only will know why algorithms go wrong, but also will be equipped with solutions and ready to push for change.
<p />
This is the story of what went wrong and how it can be fixed as told by people who were there. I bring together rare expertise to shine a light on how to solve the greatest problem on the internet today. This book is a guide inside how the world’s biggest technology companies build their algorithms, why those algorithms can go wrong, and how to fix it.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com1tag:blogger.com,1999:blog-6569681.post-89496037956980919192023-10-27T15:03:00.002-07:002023-10-28T08:57:28.815-07:00Book excerpt: The irresistible lure of an unlocked house<i>(This is an excerpt from drafts of my unpublished book, "Algorithms and Misinformation: Why Wisdom of the Crowds Failed the Internet and How to Fix It")
</i>
<p />
Bad incentives and bad metrics create an opportunity. They are what allow bad guys to come in and take root. Scammers and propagandists can take advantage of poorly optimized algorithms to make algorithms promote whatever misinformation they like.
<p />
Adversaries outside of these companies see wisdom of the crowd algorithms as an opportunity for free advertising. By manipulating algorithms with fake crowds, such as an astroturf campaign of controlled accounts and bots pretending to be real people, bad actors can feign popularity. Wisdom of the crowds summarizes opinions of the crowd. If the crowd is full of shills, the opinions will be skewed in whatever direction the shills like.
<p />
There is a massive underground economy around purchasing five star reviews on Amazon — as well as offering one star reviews for competing products — that allows counterfeiters and fraudsters to purchase whatever reputation they like for questionable and even dangerous products. Third-party merchants selling counterfeit, fraudulent, or other illicit goods with very high profit margins buy reviews from these services, feigning high quality to unwitting Amazon customers. If they are caught, they simply create a new account, list all their items again, and buy more fake reviews.
<p />
Get-rich-quick scammers and questionable vitamin supplement dealers can buy fake crowds of bogus accounts on social media that like and share their false promises. Buying fake crowds of followers on social media that like and share your content is a mature service now with dealers offering access to thousands of accounts for a few hundred dollars. Scammers rely on these fake crowds shilling their wares to fool algorithms into promoting their scams.
<p />
Foreign operatives have buildings full of people, each employee sitting at a desk pretending to be hundreds of Americans at once. They spend long days at work on social media with their multitude of fake accounts, commenting, liking, following, and sharing, all with the goal of pushing their disinformation and propaganda. The propaganda effort was so successful that, by 2019, some of the largest pages on social media were controlled by foreign governments with interests not aligned with the United States. Using their multitude of fake accounts, they were able to fool social media algorithms into recommending their pages and posts. Hundreds of millions of Americans saw their propaganda.
<p />
It is cheap to buy fake crowds and swamp wisdom of the crowd algorithms with bogus data about what is popular. When the crowd isn’t real, the algorithms don’t work. Wisdom of the crowd relies on crowds of independent, real people. Fake crowds full of shills means there is no wisdom in that crowd.
<p />
When algorithms amplify scams and disinformation, it may increase a platform’s engagement metrics for the moment. But, in the long-run, the bad actors win and the company loses. It is easy for people inside of tech companies to unwittingly optimize their algorithms in ways that help scammers and propagandists and hurt customers.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-8948450517789192272023-10-21T14:35:00.001-07:002023-10-21T14:42:29.579-07:00A summary of my bookMy book is the untold story of the algorithms that shape our lives, how they went terribly wrong, and how to fix them.
<p />
Most people now have at least a vague idea that algorithms choose what we see on our favorite online platforms. On Amazon they recommend millions of products. On Facebook they predict whether we’re more likely to click on a cute animal video or a rant about Donald Trump. At Netflix, Spotify, Twitter, YouTube, Instagram, TikTok, and every other site on the internet, they serve billions of users with billions of recommendations. But most people don’t know how all those algorithms really work — or why in recent years they began filling our screens with misinformation.
<p />
Other books have described the abundant misinformation, scams, and propaganda on many platforms, but this is the first to offer practical fixes to misinformation and disinformation across the entire internet by focusing on how and why algorithms amplify harmful content. This book offers solutions to what has become the biggest problem on the internet, using insider knowledge from my 30 years of experience in artificial intelligence, recommender systems, search, advertising, online experimentation, and metrics, including many years at Amazon, Microsoft, and startups.
<p />
Many assume “the problem with algorithms” is a tech problem, but it’s actually an incentives problem. Solutions must begin with the incentives driving the executives who run platforms, the investors who fund them, the engineers who build and optimize algorithms, and the content creators who do whatever it takes to maximize their own visibility. Ultimately, this is a book about people and how people optimize algorithms.
<p />
Equipped with insider knowledge of why these algorithms do what they do, readers will finish this book with renewed hope, practical solutions, and ready to push for change.
<p />
<i>(this was a summary of my book, and I will be posting more excerpts from the book here)</i>Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0tag:blogger.com,1999:blog-6569681.post-84728707300779610982023-10-19T11:25:00.000-07:002023-10-19T11:25:04.158-07:00Book excerpt: The problem is fake crowds<i>(This is an excerpt from my book. Please let me know if you like it and want more.)
</i><p />
It is usually unintentional. Companies don’t intend for their websites to fill with spam. Companies don’t intend for their algorithms to amplify propagandists, shills, and scammers.
<p />
It can happen just from overlooking the problem then build up over time. Bad actors come in, the problem grows and grows, and eventually becomes difficult and costly to stop.
<p />
For the bad guys, the incentives are huge. Get your post trending, and a lot of people will see it. If your product is the first thing people see when they search, you will get a lot of sales. When algorithms recommend your content to people, that means a lot more people will see you. It’s like free advertising.
<p />
Adversaries will attack algorithms. They will pay people to offer positive reviews. They will create fake crowds consisting of hundreds of fake accounts, all together liking and sharing their brilliant posts, all together saying how great they are. If wisdom of the crowd algorithms treat these fake crowds as real, the recommendations will be shilled, spammy, and scammy.
<p />
Allow the bad guys to create fake crowds and the algorithms will make terrible recommendations. Algorithms try to help people find what they need. They try to show just the right thing to customers at just the right time. But fake crowds make that impossible.
Facebook suffers from this problem. An internal study at Facebook looked at why Facebook couldn’t retain young adults. Young people consistently described Facebook as “boring, misleading, and negative” and complained that “they often have to get past irrelevant content to get to what matters.”
<p />
Customers won’t stick around if what they see is mostly useless scams. Nowadays, Facebook’s business has stalled because of problems with growth and retention, especially with young adults. Twitter's audience and revenue has cratered.
<p />
Bad, manipulated, shilled data means bad recommendations. People won’t like what they are seeing, and they won’t stay around.
<p />
Kate Conger wrote in the New York Times about why tech companies sometimes underestimate how bad problems with spam, misinformation, propaganda, and scams will get if neglected. In the early years of Twitter, “they believed that any reprehensible content would be countered or drowned out by other users.” Jason Goldman, who was very early at Twitter, described “a certain amount of idealistic zeal” that they all had, a belief that the crowds would filter out bad content and regulate discussion in the town square.
<p />
It wasn’t long until adversaries took advantage of their naiveté: “In September 2016, a Russian troll farm quietly created 2,700 fake Twitter profiles” which they used to shill and promote whatever content they liked, including attempting to manipulate the upcoming US presidential election.
<p />
On Facebook, “One Russian- run Facebook page, Heart of Texas, attracted hundreds of thousands of followers by cultivating a narrow, aggrieved identity,” Max Fisher wrote in The Chaos Machine. “‘Like if you agree,’ captioned a viral map with all other states marked ‘awful’ or ‘boring,’ alongside text urging secession from the morally impure union. Some posts presented Texas identity as under siege (‘Like & share if you agree that Texas is a Christian state’).”
<p />
Twitter was born around lofty goals of the power of wisdom of the crowds to fix problems. But the founders were naive about how bad the problems could get with bad actors creating fake accounts and controlling multiple accounts. By pretending to be many people, adversaries could effectively vote many times, and give the appearance of a groundswell of faked support and popularity to anything they liked. Twitter’s algorithms would then dutifully pick up the shilled content as trending or popular and amplify it further.
<p />
Twitter later “rolled out new policies that were intended to prevent the spread of misinformation,” started taking action against at least some of the bot networks and controlled accounts, and even “banned all forms of political advertising.” That early idealism that “the tweets must flow” and that wisdom of the crowds would take care of all problems was crushed under a flood of manipulated fake accounts.
<p />
Bad actors manipulate wisdom of the crowds because it is lucrative to do so. For state actors, propaganda on social media is cheaper than ever. Creating fake crowds feigns popularity for their propaganda, confuses the truth in a flood of claims and counterclaims, and silences opposition. For scammers, wisdom of the crowds algorithms are like free advertising. Just by creating a few hundred fake accounts or by paying others to help shill, they can wrap scams or outright fraud in a veneer of faked reliability and usefulness.
<p />
“Successfully gaming the algorithm can make the difference between reaching an audience of millions – or shouting into the wind,” wrote Julia Carrie Wong in the Guardian. Successfully manipulating wisdom of the crowds data tricks trending and recommender algorithms into amplifying. Getting into trending, the top search results, or getting recommended by manipulating using fake and controlled accounts can be a lot cheaper and more effective than buying advertising.
<p />
“In addition to distorting the public’s perception of how popular a piece of content is,” Wong wrote, “fake engagement can influence how that content performs in the all-important news feed algorithm.” With fake accounts, bad actors can fake likes and shares, creating fake engagement and fake popularity, and fooling the algorithms into amplifying. “It is a kind of counterfeit currency in Facebook’s attention marketplace.”
<p />
“Fake engagement refers to things such as likes, shares, and comments that have been bought or otherwise inauthentically generated on the platform,” Karen Hao wrote in MIT Technology Review. It’s easy to do. “Fake likes and shares [are] produced by automated bots and used to drive up someone’s popularity.”
<p />
“Automation, scalability, and anonymity are hallmarks of computational propaganda,” wrote University of Oxford Professor Philip Howard in his recent book Lie Machines. “Programmers who set up vast networks” of shills and bots “have a disproportionate share of the public conversation because of the fake user accounts they control.” For example, “dozens of fake accounts all posing as engaged citizens, down- voting unsympathetic points of view and steering a conversation in the service of some ideological agenda— a key activity in what has come to be known as political astroturfing. Ordinary people who log onto these forums may believe that they are receiving a legitimate signal of public opinion on a topic when they are in effect being fed a narrative by a secret marketing campaign.” Fake crowds create a fake “impression that there is public consensus.” And by manipulating wisdom of the crowds algorithms, adversaries “control the most valuable resource possible … our attention.”
<p />
The most important part is at the beginning. Let’s say there is a new post full of misinformation. No one has seen it yet. What it needs is to look popular. What it needs is a lot of clicks, likes, and shares. If you control a few hundred accounts, all you need to do is have them all engage with your new post around the same time. And wow! Suddenly you look popular!
<p />
Real people join in later. It is true that real people share misinformation and spread it further. But the critical part is at the start. Fake crowds make something new look popular. It isn’t real. It’s not real people liking and sharing the misinformation. But it works. The algorithms see all the likes and shares. The algorithms think the post is popular. The algorithms amplify the misinformation. Once the algorithms amplify, a lot of real people see the shilled post. It is true that there is authentic engagement from real people. But most important is how everything got started, shilling using fake crowds.
<p />
When adversaries shill wisdom of the crowd algorithms, they replace the genuinely popular with whatever they like. This makes the experience worse and eventually hurts growth, retention, and corporate profits. These long-term costs are subtle enough that many tech companies often miss them until the costs become large.
<p />
Ranking algorithms use wisdom of the crowds to determine what is popular and interesting. Wisdom of the crowds requires independent opinions. You don't have independent opinions when there is coordinated shilling by adversaries, scammers, and propagandists. Faked crowds make trending, search, and recommendation algorithms useless. To be useful, the algorithms have to use what real people actually like.
Greg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.com0