Saturday, March 25, 2023
Are ad-driven business models bad?
Saturday, March 18, 2023
NATO on bots, sockpuppets, and shills manipulating social media
Buying manipulation remains cheap ... The vast majority of the inauthentic engagement remained active across all social media platforms four weeks after purchasing.The fake engagement gets picked up and amplified by algorithms like trending, search ranking, and recommenders. That's why it is so effective. A thousand sockpuppets engage with something new in the first hour, then the algorithms think it is popular and show crap to more people. I think there are a few questions to ask about this: Is it possible for social media platforms to stop their amplification of propaganda and scams? If it is possible but some of them don't, why not? Finally, is it in the best interest of the companies in the long-run to allow this manipulation of their platforms?[Scammers and foreign operatives are] exploiting flaws in platforms, and pose a structural threat to the integrity of platforms.
Saturday, February 25, 2023
Too many metrics and the Otis Redding problem
Superhuman AI in the game Go
Thursday, February 16, 2023
Huge numbers of fake accounts on Twitter
Details on personalized learning at Duolingo
When students are given material that’s too difficult, they often get frustrated and quit ... [Too] easy ... doesn’t challenge. Duolingo uses AI to keep its learners squarely in the zone where they remain engaged but are still learning at the edge of their abilities. Bloom’s 2-sigma problem ... [found that] average students who were individually tutored performed two standard deviations better than they would have in a classroom. That’s enough to raise a person’s test scores from the 50th percentile to the 98th When Duolingo was launched in 2012 ... the goal was to make an easy-to-use online language tutor that could approximate that supercharging effect. We'd like to create adaptive systems that respond to learners based not only on what they know but also on the teaching approaches that work best for them. What types of exercises does a learner really pay attention to? What exercises seem to make concepts click for them?Great details on how Duolingo maximizes fun and learning while minimizing frustration and abandons, even when those goals are in conflict. Lots more in there, well worth reading.
Massive fake crowds for disinformation campaigns
Misinformation and disinformation are the biggest problems on the internet right now. And it's never been cheaper and easier to do.
Note how it works. The fake accounts coordinate together to shout down others and create the appearance of agreement. It's like giving one person a megaphone. One person now has thousands of voices shouting in unison, dominating the conversation.
Propaganda is not free speech. One person should have one voice. It shouldn't be possible to buy more voices to add to yours. And algorithms like rankers and recommenders definitely shouldn't treat these as organic popularity and amplify them further.
The article is part of a much larger investigative report combining reporters from The Guardian, Le Monde, Der Spiegel, El Pais, and others. You can read much more starting from this article, "Revealed: the hacking and disinformation team meddling in elections".
Tuesday, January 31, 2023
How can enshittification happen?
Layoffs as a social contagion
[CEOs] know layoffs are harmful to company well-being, let alone the well-being of employees, and don’t accomplish much, but everybody is doing layoffs and their board is asking why they aren’t doing layoffs also. The tech industry layoffs are basically an instance of social contagion, in which companies imitate what others are doing. If you look for reasons for why companies do layoffs, the reason is that everybody else is doing it ... Not particularly evidence-based. Layoffs often do not increase stock prices, in part because layoffs can signal that a company is having difficulty. Layoffs do not increase productivity. Layoffs do not solve what is often the underlying problem, which is often an ineffective strategy ... A bad decision.For more on the harm, please see my old 2009 post from the last time this happened, "Layoffs and tech layoffs".
Monday, December 19, 2022
Are ad-supported business models anti-consumer?
Monday, December 12, 2022
Focus on the Long-term
tl;dr: When you increase ads, short-term revenue goes up, but you're diving deeper into ad inventory and the average ad quality drops. Over time, this causes people to look at ads less, click on ads less, and reduces retention. If you measure using long experiments that capture those effects, you find that showing fewer ads makes less money in the short-term but more money in the long-term.
Because most A/B tests don't measure long-term effects properly and this is hard for most organizations to measure correctly, the broader implication is that most websites show too many ads to maximize long-term profits.
Saturday, December 10, 2022
ML and flooding the zone with crap
If the crowd is shilled and fake, most of the data isn't useful for machine learning. To be useful, you have to pull out the scarce wisdom in the sea of noise.
Gary Marcus looked at this in his latest post, "AI's Jurassic Park moment". Gary talks about how ChatGPT makes it much cheaper to produce huge amounts of reasonable-sounding bullshit and post it on community sites, then he said:
For Stack Overflow, the issue is literally existential. If the website is flooded with worthless code examples, programmers will no longer go there, its database of over 30 million questions and answers will become untrustworthy, and the 14 year old website will die.StackOverflow added:
Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers.There was a 2009 SIGIR paper, "The Wisdom of the Few", that cleverly pointed out that a lot of this is unnecessary. For recommender systems, trending algorithms, reviews, and rankers, only the best data is needed to produce high quality results. Once you use the independent, reliable, high quality opinions, adding more big data can easily make things worse. Less is more, especially in the presence of adversarial attacks on your recommender system.The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce. There are also many people trying out ChatGPT to create answers, without the expertise or willingness to verify that the answer is correct prior to posting. Because such answers are so easy to produce, a large number of people are posting a lot of answers.
When using behavior data, ask what would happen if you could sort by usefulness to the ML algorithm and users. You'd go down the sorted list, then stop at some point when the output no longer improved. That stopping point would be very early if a lot of the data is crap.
In today's world, with fake crowds and shills everywhere, wisdom of the crowds fails. Data of unknown quality or provable spam should be freely ignored. Only use reliable, independent behavior data as input to ML.
Thursday, November 24, 2022
Alternatives to Twitter
In part that was from the shut down of Google Reader. In part I was finally giving in on the trend against long form blogging. So this blog has been pretty quiet for years.
The recent decline of Twitter has me looking for and trying alternatives.
One surprise I found is that Google News and TechMeme feel worth using more often. Both are surprisingly effective alternatives to social media. I am finding they have most of the value without much of the unpleasantness, although missing the contact with close friends.
I was also surprised to find I liked LinkedIn as a substitute for Twitter. I find most of the interactions to be fairly good there, though again missing some close friends.
So far, I have more mixed feelings about using Mastodon, Facebook, Instagram, Post, or going back to blogging as alternatives. Anyone have anything else they like? Or differing experiences?
Wednesday, November 23, 2022
Quoted in the Washington Post
I'm talking about how Amazon used to help (but no longer) for finding and discovering what you want to buy, saying, "The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother."
The reporter, Geoffrey Fowler, goes on to say, "This is probably how most of us imagine Amazon still works. But today advertisers are driving the experience ... The Amazon we experience today is pretty much the opposite of how Amazon used to work."
The article is critical of all the ads on Amazon now, which makes the shopping experience terrible. I think it is very hard to find things on Amazon nowadays. This happened for a well-known reason. Increasing ad load -- which is the number of ads on a web page -- will usually increase short-term revenue, but it hurts retention, ad performance, and long-term revenue. As the Washington Post reporter describes, all the ads cause people to go elsewhere when they need to shop, and that has long-term costs for Amazon.
Saturday, November 19, 2022
Experimentation and metrics
A/B testing observes human behavior, which is messy and complicated. Closer to behavioral economics, the metrics represent partial information and observations. From very limited data, we need to say why humans do the crazy things they do and predict what will happen next.
When used well, A/B testing helps innovation. But A/B testing should not subjugate, binding teams to do nothing unless a key metric is passed. Rather it should be used to gain partial information about expected short and long-term costs and benefits.
For misinformation, disinformation, scams, and the impact of advertising, A/B tests get some data on short-term effects, but little on long-term benefits. Ultimately there will be an investment decision about whether to pay the expected short-term costs for the hoped for long-term benefits.
A/B testing is a powerful tool for bottom-up innovation. But it is only a tool and can be used badly. A/B data should be used to inform debate, not halt debate. And I think it should always be helping to find a way to say yes to new ideas.
Sunday, July 03, 2022
Making it more difficult to shill recommender systems
The reason this matters is that recommender systems these days are struggling with shilling. Companies are playing whack-a-mole with bad actors who just create new accounts or find new shills every time they're whacked because it's so profitable -- like free advertising -- to create fake crowds that manipulate the algorithms. Propagandists and scammers are loving it and winning. It's easy and lucrative for them.
So what's wrong with taking the opposite strategy, only using the most reliable accounts? As a thought experiment, let's say you rank order accounts by your confidence they are human, independent, not shilling, and trustworthy. Then go down the list of accounts, using their behavior data until the recommendations stop improving at a noticeable level (being careful about cold start and the long tail). Then stop. Don't use the rest. Why not do that? It'd vastly increase costs for adversaries. And it wouldn't change the perceived quality of recommendations because you've made sure it wouldn't.
The traditional approach to this is to classify accounts as spam or shills separate from how the data will be used. The classifiers minimize the error rates (false positive and false negative), then treat all borderline cases as not spam. The idea here is to do almost the opposite of that traditional approach, classify accounts as trustworthy, then use only those, ignoring anything unknown or borderline as well as known spammers and shills.
This works because how the data will be used as well as the bad incentives for spammers and propagandists are really sensitive to false negatives (letting in any manipulation at all of the recommender algorithms) but not very sensitive to false positives (accidentally not using some of the data that might have actually been fine to use). Letting in even one shill can badly impact recommenders, especially when shills target getting new content trending, but using less data of the lower quality data doesn't usually change the recommendations in ways that matter for people.
This isn't my idea or a new idea, by the way. It's actually a quite old idea, talked about in papers like TrustRank, Anti-TrustRank, and Wisdom of the Few, and similar techniques are applied already by companies like Google for dealing with web spam.
The world has changed in the last decade. Especially on social media, there is rampant manipulation of wisdom of the crowd data such as likes and shares. A big part of the problem is the algorithms like trending, search, and recommender systems that pick up the manipulated data and use it to amplify shilled content. That makes it quite profitable and easy for disinformation and scams.
Places like Amazon, Facebook, and Twitter are swimming in data, but their problem is that a lot of it is untrustworthy and shilled. But you don't need to use all the data. Toss big data happily, anything suspicious at all, false positives galore accidentally marking new accounts or borderline accounts as shills when deciding what to input to the recommender algorithms. Who cares if you do?
Saturday, March 20, 2021
Wisdom of the trusted
Two decades ago, a lot of us underestimated the negative effects of lower costs for communication and information sharing. While good, it also made propaganda, shilling, and manipulation far easier, and our defenses against disinformation campaigns proved weak.
We in tech were overly idealistic about what would happen as the cost of information and communication dropped. Many thought propaganda would be harder as people could now easily access the truth.
But you can't source reviews from your customers anymore if the vast majority of reviews are paid shills. You can't rank using usage data if ratings and clicks are mostly fake.
Crowdsourced information, including web crawls, reviews, and commentary, only works when almost everyone is independent and unbiased. Coordinated disinformation breaks crowdsourcing.
Flood-the-zone shouldn't have been a surprise, but it was. Propaganda and manipulation are winning because we still treat inauthentic behavior as real.
While there is plenty of mostly-deserved love for big data, often less is more when you live in an adversarial, flood-the-zone world. Wisdom of the crowds has an assumption of independence between agents, which now has been broken by coordinated disinformation campaigns.
If you are looking at garbage, there is no information. Adding disinformation to good data purely makes things worse. It's like making a milkshake, then eying a huge putrid sack of night soil nearby. Sure, you could add some of that to what you made, but even a little is going to make it worse. If there is crap everywhere, you might want to stick with what you can prove to be good.
Polling, one of the oldest forms of crowdsourced information, has been impacted too. The trend in recent years is that low response rates and shilling make it so expensive to poll that Pew Research gets better data cheaper by forming and managing a large paid panel of trusted experts.
For those working in machine learning, for those trying to work with big data, reputation and reputable sources have to be the response in a flood-the-zone world. When most of the data is bad, how you filter your data becomes the most important thing.
We have a big challenge ahead, countering disinformation using reputation and lack of reputation. In a flood-the-zone world, most data out there is now bad to useless. Isolating the useful requires skepticism toward data, like TrustRank, starting untrusted, bad until proven good.
Reviews should discard anything even resembling a shill, giving visibility only to reviews from independent and trustworthy customers. Recommender systems and rankers should focus on the data from and related to proven sources, and weight anything unknown as questionable at best and likely worthless. Most crowdsourced data for machine learning, from clicks to content, is going to have to be viewed with skepticism.
Inauthentic behavior and coordinated disinformation campaigns have shilled wisdom of the crowd to death. For reliable big data in a flood-the-zone world, it will have to be wisdom of the trusted.
Tuesday, December 15, 2020
When will virtual reality take off? The $100 bet.
In early 2020, we decided to wait settle the bet because it looked like there was some chance VR would reach 10M units/year in 2020. Because of COVID and people looking for entertainment at home, Valve's release of Half Life Alyx, Supernatural (the VR exercise program), and big pushes on consumer VR by several companies, we wanted to wait and see if it was off by just one year, if 2020 was the year.
At this point, the results are in, and it is clear VR has not reached far beyond early adopters and enthusiasts. Estimates of total hardware sales vary depending on what is considered VR hardware, but most estimates I've seen have worldwide unit sales at around 5-6M in 2020.
Barron's has a nice summary: "We’ve been talking about virtual reality for decades, but it’s gone pretty much nowhere. Despite all of our advances in tech, VR hasn’t been able to bridge the physical and digital realms in any substantial way." TechCrunch adds, "There are signs of growth though it’s clear [VR] is still a niche product."
So what went wrong? Looking back at VR hype in 2016, there were a lot of reasons to be optimistic: HoloLens from Microsoft, Sony entering VR with Playstation VR, Valve pushing hard on VR in the Steam store and with their own products, Xbox looking like it might do VR, Google showing interest in VR, and, though it always seemed like vaporware to me, there was a lot of excitement around the promises made by heavily-funded MagicLeap. It looked likely that someone would make a must-have game or other compelling use of VR that might attract tens of millions of people.
Speculating a bit, I think the issue here goes beyond just needing more time, so beyond waiting for gradual acceptance of VR and growth. I think the problem is that the non-virtual-reality experience is close enough for most purposes, making VR uncompelling to set up and use.
For example, take the virtual tourism experience of visiting the International Space Station in Google Earth. It's fun and compelling enough without virtual reality, so VR in virtual tourism only a little bit of wow to the experience. Half Life Alyx seems to me to suffer from the same problem, a fun game with some compelling content, so great to try, but not a must-have. Exercise programs like Supernatural or Beat Saber fall in the same category, fun, cool to try, but not something without okay substitutes or alternatives.
At the time we made the bet back in 2016, I said something similar about why I might lose the bet: "There are several wild cards here. For example, it is possible that much cheaper units can be made to work. It's possible that someone discovers very carefully chosen environments and software tricks fool the brain into fully accepting the virtual reality, especially for gaming, increasing the appeal and making it a must-have experience for a lot of people. As unsavory as it is, pornography is often a wild card with new technology, potentially driving adoption in ways that can determine winners and losers. A breakthrough in display (such as retinal displays) might allow virtual reality hardware that is much cheaper and lighter. Business use is another unknown where virtual reality could provide a large cost savings over physical presence. I do think there are many ways in which I could lose this bet."
Unfortunately, I don't think such must-have, compelling VR experiences exist. Perhaps at some point it will. Chris Pruett, who runs part of Oculus, speculated about that, saying: "My guess would be something that is highly immersive, that involves active motion of your body, and ... it's probably going to be something that you either play with other people or is shareable with other people." That sounds plausible to me, though, more broadly, I think it has to be a must-have experience without okay substitutes in non-VR, which is a high bar. My prediction now in 2020 would be that VR will continue to struggle for years to break out beyond enthusiasts and early adopters, at least until it has a truly must-have experience.
I think Daniel Lemire took the harder side of this bet, so I'll match his $100 donation to Wikipedia to settle the bet. Back in 2016, I did add a couple ways of making my side of the bet even harder, saying I doubted even over three years in 2016-2019 that VR would sell a total of more than 10M/units, which appears to be close, and that Google Cardboard-like devices wouldn't go beyond being just a toy, so not regularly used by tens of millions, which looks like it was correct.
And I want to thank Daniel for making this bet. Whether you are betting with the hype or against it, along with conventional wisdom or against the flow, it's hard to publicly take a stand and one way or another and be willing to be wrong, especially when big company money is betting against you. This was an interesting bet.
If you enjoyed this, you might also be interested in our 2012 bet about whether tablets will replace PCs.
Update: Daniel Lemire has a post up on his thoughts on the bet, "Virtual reality… millions but not tens of millions… yet".
Friday, December 04, 2020
Facebook and investing in the long-term
Facebook engineers and data scientists posted the results of a series of experiments called "P(Bad for the World)." ... The team trained a machine-learning algorithm to predict posts that users would consider "bad for the world" and demote them in news feeds. In early tests, the new algorithm successfully reduced the visibility of objectionable content.The article is an insightful look at the struggle inside Facebook on recommender systems for news, metrics, and short vs. long-term metrics and growth. Key is fear of harming short-term metrics like sessions per user and engagement.
But it also lowered the number of times users opened Facebook, an internal metric known as "sessions" that executives monitor closely.
Another product, an algorithm to classify and demote "hate bait" — posts that don’t strictly violate Facebook’s hate speech rules, but that provoke a flood of hateful comments ... [Another] called "correct the record," would have retroactively notified users that they had interacted with false news and directed them to an independent fact-check ... [Both were] vetoed by policy executives who feared it would disproportionately show notifications to people who shared false news from right-wing websites.
Many rank-and-file workers and some executives ... want to do more to limit misinformation and polarizing content. [Others] fear those measures could hurt Facebook’s growth, or provoke a political backlash ... Some disillusioned employees have quit, saying they could no longer stomach working for a company whose products they considered harmful.
Any attempt to increase quality of news or ads is going to result in a short-term reduction in metrics engagement, usage, and revenue. That's obvious and not the question to ask. The question to ask is, does it pay off in the long-term?
It's unsurprising that once you've kicked off all users who hate what Facebook has become and addicted the rest to clickbait, the remainder will use Facebook less in the short-term if you improve the quality of content.
This is just like any other investment. If you invest in any large expense, you expect your short-term profits to drop, but you're betting that your long-term profits will rise. In this case, increased news quality is an investment in bringing back lapsed users.
Even measured over weeks, sessions per user is going to take a hit with a change to news quality because users who like higher quality news already disengaged and abandoned and current heavy users won't like the change. It will take a long time to pay off.
For Facebook, reducing disinformation probably would also be an investment in other areas. Facebook is polluting society with disinformation, externalizing costs; cutting disinformation is an investment in reducing regulation risk from governments. And Facebook wants good people, and many good people are leaving ([1]) or won't even consider working there because of their practices, a considerable long-term cost on the company; cutting disinformation is an investment in recruiting and retention. So Facebook probably would see benefits beyond lapsed users.
Facebook and others need to think of reducing disinformation as an investment in the future. Eliminating scams, low quality ads, clickbait, and disinformation often will reduce short-term metrics, but is a long-term investment in quality to reduce abandons, bring back lapsed users, and in other long-term business goals. These investments take a long-time to pay off, but that's why you make investments, for the long-term payoff.
Tuesday, November 26, 2019
Papers and posting
While I'm talking about summarizing papers, I want to highlight two lines of work that had an impact on me in the last few years and that I think deserve much more attention. Both argue we, as in all of us in tech, are doing something important wrong.
The first argues that our metrics usually are off, specifically way too focused on short-term goals like immediate revenue. This is the work started by the fantastic Focus on the Long-term out of Google and continuing from there (including [1] [2]). Because much of what we all do is an optimization process -- ML, deep learning, recommendations, A/B testing, search, and advertising -- having the targets wrong means we are optimizing for the wrong thing.
Optimizing for the wrong thing is ubiquitous in our industry. It may, for example, cause almost everyone to show too many ads and too many low quality ads. If everyone has their metrics subtly wrong, everything we make, and especially everything in the ML community, may be aiming for the wrong target.
The second is Kate Starbird's work on disinformation campaigns. Across many recent papers, Kate argues that the traditional classifier approach to spam, trolls, and shills has been failing. Adversaries can create many accounts and enlist real humans in their disinformation effort. Knocking a few accounts away does nothing; it is like shooting bullets into a wave. Instead, it is important to look at the goals of disinformation campaigns and make them more expensive to achieve. Because shills impact so many things we do -- training data for ML and deep learning, social media, reviews, recommendations, A/B testing, search, advertising -- our failure to deal with shills means the assumptions all of these systems have about the data all being equally good are wrong, and the quality of all these systems is reduced.
Solutions are hard. I'm afraid Kate's advice on solutions is limited. But I would say solutions include whitelisting (using only experts, verified real people, or accounts that are expensive to create), recognizing likely disinformation as it starts to propagate and slowing it, and countering likely disinformation with accurate information where it appears. Those replace outdated whack-a-mole account classifiers and work across multiple accounts to counter modern disinformation campaigns. Manipulation and shilling from sophisticated adversaries is ubiquitous in our industry. Until we fix this, many of our systems produce lower quality results.
Finally, I am posting a lot less here now, so let me point to other resources for anyone who liked this blog. I still post frequently on Twitter; you can follow me there. On AI/ML, there's a lot of great writing by a lot of people, far too many to list, but I can at least list my favorites, which are François Chollet and Ben Hamner on Twitter. On economics and econometrics, which I enjoy for adding breadth to AI/ML, my favorites are economists Noah Smith and Dina Pomeranz on Twitter.