Saturday, March 18, 2023

NATO on bots, sockpuppets, and shills manipulating social media

NATO has a new report, "Social Media Manipulation 2022/2023: Assessing the Ability of Social Media Companies to Combat Platform Manipulation".
Buying manipulation remains cheap ... The vast majority of the inauthentic engagement remained active across all social media platforms four weeks after purchasing.

[Scammers and foreign operatives are] exploiting flaws in platforms, and pose a structural threat to the integrity of platforms.

The fake engagement gets picked up and amplified by algorithms like trending, search ranking, and recommenders. That's why it is so effective. A thousand sockpuppets engage with something new in the first hour, then the algorithms think it is popular and show crap to more people.

I think there are a few questions to ask about this: Is it possible for social media platforms to stop their amplification of propaganda and scams? If it is possible but some of them don't, why not? Finally, is it in the best interest of the companies in the long-run to allow this manipulation of their platforms?

Saturday, February 25, 2023

Too many metrics and the Otis Redding problem

The "Otis Redding problem" is "holding people, groups, or businesses to too many metrics: They can’t satisfy or even think about all of them at once."

The problem is not just that people don't really know what to do anymore. It's that many people, when faced with this, start doing things that reward themselves: "They end up doing what they want or the one or two things they believe are important or that will bring them rewards (regardless of senior management’s strategic intent)."

That quote is from Stanford Professor Bob Sutton's book Good Boss, Bad Boss, which somehow I hadn't read until recently. I've read all of Bob Sutton's other books too, they're all great reads.

This is just one tidbit from that book. There's lots more in there. On the Otis Redding problem, my read is that Bob's advice is to only pick a 2-3 simple, actionable metrics, but then frequently discuss whether they are achieving what you want and change them if they aren't.

By the way, the name the "Otis Redding problem" comes from the line in his song "Sitting on the Dock of the Bay" where he says, "Can’t do what ten people tell me to do, so I guess I’ll remain the same."

Superhuman AI in the game Go

For a few years now, AI achieved superhuman game playing abilities for Go.

It was quite a milestone for AI. When I was in graduate school, people used to joke that AI for Go was where careers go to die. The game has a massive search space, so had thwarted efforts for decades.

So AlphaGo and similar efforts that beat top-ranked Go players was a very big deal indeed when it happened back in 2016. But now, a amateur-level human player just beat a top-ranked AI at playing Go. He won 14 of 15 games.

Most of the reporting on this has been that the player used an exploit, one hole in the AI strategy, that will easily be closed. But I think this will be harder to fix than most people expect.

AlphaGo and similar techniques work by using deep learning to guide the game tree search, focusing it on moves used by experts. This result says you can't do that, that you need to consider more possible moves.

The human won here by doing moves the AI didn't expect, then exploiting the result. It's not that there is just one hole. It's that doing moves outside of what the AI expects, anything outside of what it has seen in the training data, can result in a bad playing by the AI, which can then be exploited by the human.

Solving that means considering more moves by the opponent, which explodes the game tree search, making the search massively exponential again. I suspect it's going to be hard to fix.

Thursday, February 16, 2023

Huge numbers of fake accounts on Twitter

It seems like this should get more attention, "hundreds of thousands of counterfeit Twitter accounts set up by Russian propaganda and disinformation" that are "still active on social media today."

There has been widespread manipulation of social media, customer reviews, and trending, search ranker, and recommender algorithms using fake crowds.

All of these depend on wisdom of the crowds. They try to use what people do and like to help other people find things. But wisdom of the crowds doesn't work when the crowd isn't real.

Caroline Orr Bueno has some more details, writing that "this is the first we've heard of an ongoing campaign involving such a large number of accounts" and that it is clear this is at "a scale with the potential to mass-manipulate."

Orr Bueno also quotes former Twitter executive Yoel Roth as saying "it's all too cheap and all too easy." This is the core problem with misinformation and disinformation in the last decade.

If it is cheap, easy, and profitable to scam and manipulate using huge crowds of fake accounts, you will get huge numbers of fake accounts. The solution will have to be to make it more expensive, difficult, and unprofitable to scam and manipulate using fake accounts.

Details on personalized learning at Duolingo

There's a new, great, long article on how Duolingo's personalized learning algorithms work, "How Duolingo's AI learns what you need to learn".

An excerpt as a teaser:

When students are given material that’s too difficult, they often get frustrated and quit ... [Too] easy ... doesn’t challenge.

Duolingo uses AI to keep its learners squarely in the zone where they remain engaged but are still learning at the edge of their abilities.

Bloom’s 2-sigma problem ... [found that] average students who were individually tutored performed two standard deviations better than they would have in a classroom. That’s enough to raise a person’s test scores from the 50th percentile to the 98th

When Duolingo was launched in 2012 ... the goal was to make an easy-to-use online language tutor that could approximate that supercharging effect.

We'd like to create adaptive systems that respond to learners based not only on what they know but also on the teaching approaches that work best for them. What types of exercises does a learner really pay attention to? What exercises seem to make concepts click for them?

Great details on how Duolingo maximizes fun and learning while minimizing frustration and abandons, even when those goals are in conflict. Lots more in there, well worth reading.

Massive fake crowds for disinformation campaigns

The Guardian has a good article, "'Aims': the software for hire that can control 30,000 fake online profiles", on fake crowds faking popularity and consensus to manipulate opinion.

Misinformation and disinformation are the biggest problems on the internet right now. And it's never been cheaper and easier to do.

Note how it works. The fake accounts coordinate together to shout down others and create the appearance of agreement. It's like giving one person a megaphone. One person now has thousands of voices shouting in unison, dominating the conversation.

Propaganda is not free speech. One person should have one voice. It shouldn't be possible to buy more voices to add to yours. And algorithms like rankers and recommenders definitely shouldn't treat these as organic popularity and amplify them further.

The article is part of a much larger investigative report combining reporters from The Guardian, Le Monde, Der Spiegel, El Pais, and others. You can read much more starting from this article, "Revealed: the hacking and disinformation team meddling in elections".

Tuesday, January 31, 2023

How can enshittification happen?

Cory Doctorow has a great piece in Wired, "The ‘Enshittification’ of TikTok. Or how, exactly, platforms die." It's about that we regularly see companies make their product worse and worse until it hits a tipping point, then the company loses its customers and starts dying.

Enshittification eventually causes the company to die, so isn't in the best interest of the company. It's definitely not maximizing shareholder value or long-term profits. So why does it happen?

Cory Doctorow does have a bit on the why, but could use a lot more: "An enshittification strategy only succeeds if it is pursued in measured amounts ... For enshittification-addled companies, that balance is hard to strike ... Individual product managers, executives, and activist shareholders all give preference to quick returns at the cost of sustainability, and are in a race to see who can eat their seed-corn first."

That's not very satisfying though. I mean, the company dies. Execs are screwing up. Why does that happen? What can be done about it? That's the question I think needs answering.

Understanding exactly why enshittification happens is important to finding real, viable solutions. Is it purposeful or unintentional on the part of teams and company leaders? Is it inevitable or preventable? If you get the root cause wrong, you'll get the wrong solution.

My view is that enshittification is mostly unintentional. I think it's a result of A/B testing, mistakes in setting up incentives, and teams busily optimizing for what's right in front of them instead of keeping their eye on the prize.

I don't think executives intentionally drive companies into the ground. I think most execs and teams have no idea that this path they are going down will cause such long-term harm to the company. If most really don't want to destroy the company, that leads to different solutions.

Layoffs as a social contagion

Stanford Professor Jeffrey Pfeffer wrote about the recent layoffs at tech companies, saying that it hurts the company in the long-term, but CEOs can't avoid the pressure to join in.
[CEOs] know layoffs are harmful to company well-being, let alone the well-being of employees, and don’t accomplish much, but everybody is doing layoffs and their board is asking why they aren’t doing layoffs also.

The tech industry layoffs are basically an instance of social contagion, in which companies imitate what others are doing. If you look for reasons for why companies do layoffs, the reason is that everybody else is doing it ... Not particularly evidence-based.

Layoffs often do not increase stock prices, in part because layoffs can signal that a company is having difficulty. Layoffs do not increase productivity. Layoffs do not solve what is often the underlying problem, which is often an ineffective strategy ... A bad decision.

For more on the harm, please see my old 2009 post from the last time this happened, "Layoffs and tech layoffs".

Monday, December 19, 2022

Are ad-supported business models anti-consumer?

Advertising-supported businesses are harder to align with long-term customer satisfaction than subscription businesses, but they make more money if they do.

A common view is that ad-supported websites, in their drive for more ad clicks, cannot resist exploting their customers with scammy content and more and more ads.

The problem is that eventually those websites become unusable and the business fails. Take the simple case of websites that put more and more ads on the page. Sure, ad revenue goes up for a while, but people rapidly become annoyed with all the ads and leave. The business then declines.

That's not maximizing revenue or profitability. That's a business failure by execs that should have known better.

It's very tempting to use short-term metrics like ad clicks and engagement for advertising-supported businesses, which encourages doing things like increasing ad load or clickbait content. But in the long run, that hurts retention, growth, and ad revenue.

In a subscription-supported business, it's easier to get the metrics right because the goal is keeping customers subscribing. In an ad-supported business, it isn't as obvious that keeping customers around and clicking ads for years is the goal. But it's still the goal.

Ad-supported businesses will make more money if they aren't filled with scams or laden with ads. But it's easy for ad-supported businesses to get the incentives and metrics wrong, much more error-prone than for subscription-supported businesses where the metrics are more obvious. While it may be harder for executives to see, ad-supported business do better if they focus on long-term customer satisfaction, retention, and growth.

Monday, December 12, 2022

Focus on the Long-term

One of my favorite papers of all time is "Focus on the Long-Term: It's better for Users and Business" from Google Research. This paper found that Google makes more money in the long-term -- when carefully and properly measured -- by reducing advertising. Because of this work, they reduced advertising on mobile devices by 50%.

tl;dr: When you increase ads, short-term revenue goes up, but you're diving deeper into ad inventory and the average ad quality drops. Over time, this causes people to look at ads less, click on ads less, and reduces retention. If you measure using long experiments that capture those effects, you find that showing fewer ads makes less money in the short-term but more money in the long-term.

Because most A/B tests don't measure long-term effects properly and this is hard for most organizations to measure correctly, the broader implication is that most websites show too many ads to maximize long-term profits.

Saturday, December 10, 2022

ML and flooding the zone with crap

Wisdom of the few is often better than wisdom of the crowds.

If the crowd is shilled and fake, most of the data isn't useful for machine learning. To be useful, you have to pull out the scarce wisdom in the sea of noise.

Gary Marcus looked at this in his latest post, "AI's Jurassic Park moment". Gary talks about how ChatGPT makes it much cheaper to produce huge amounts of reasonable-sounding bullshit and post it on community sites, then he said:

For Stack Overflow, the issue is literally existential. If the website is flooded with worthless code examples, programmers will no longer go there, its database of over 30 million questions and answers will become untrustworthy, and the 14 year old website will die.
StackOverflow added:
Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers.

The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce. There are also many people trying out ChatGPT to create answers, without the expertise or willingness to verify that the answer is correct prior to posting. Because such answers are so easy to produce, a large number of people are posting a lot of answers.

There was a 2009 SIGIR paper, "The Wisdom of the Few", that cleverly pointed out that a lot of this is unnecessary. For recommender systems, trending algorithms, reviews, and rankers, only the best data is needed to produce high quality results. Once you use the independent, reliable, high quality opinions, adding more big data can easily make things worse. Less is more, especially in the presence of adversarial attacks on your recommender system.

When using behavior data, ask what would happen if you could sort by usefulness to the ML algorithm and users. You'd go down the sorted list, then stop at some point when the output no longer improved. That stopping point would be very early if a lot of the data is crap.

In today's world, with fake crowds and shills everywhere, wisdom of the crowds fails. Data of unknown quality or provable spam should be freely ignored. Only use reliable, independent behavior data as input to ML.

Thursday, November 24, 2022

Alternatives to Twitter

About five years ago, I moved most of my blogging from here to microblogging on Twitter.

In part that was from the shut down of Google Reader. In part I was finally giving in on the trend against long form blogging. So this blog has been pretty quiet for years.

The recent decline of Twitter has me looking for and trying alternatives.

One surprise I found is that Google News and TechMeme feel worth using more often. Both are surprisingly effective alternatives to social media. I am finding they have most of the value without much of the unpleasantness, although missing the contact with close friends.

I was also surprised to find I liked LinkedIn as a substitute for Twitter. I find most of the interactions to be fairly good there, though again missing some close friends.

So far, I have more mixed feelings about using Mastodon, Facebook, Instagram, Post, or going back to blogging as alternatives. Anyone have anything else they like? Or differing experiences?

Wednesday, November 23, 2022

Quoted in the Washington Post

I'm quoted in the Washington Post today in an article titled "It’s not your imagination: Shopping on Amazon has gotten worse."

I'm talking about how Amazon used to help (but no longer) for finding and discovering what you want to buy, saying, "The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother."

The reporter, Geoffrey Fowler, goes on to say, "This is probably how most of us imagine Amazon still works. But today advertisers are driving the experience ... The Amazon we experience today is pretty much the opposite of how Amazon used to work."

The article is critical of all the ads on Amazon now, which makes the shopping experience terrible. I think it is very hard to find things on Amazon nowadays. This happened for a well-known reason. Increasing ad load -- which is the number of ads on a web page -- will usually increase short-term revenue, but it hurts retention, ad performance, and long-term revenue. As the Washington Post reporter describes, all the ads cause people to go elsewhere when they need to shop, and that has long-term costs for Amazon.

Saturday, November 19, 2022

Experimentation and metrics

Since the early days of the Web, I've been a fan of A/B testing for promoting innovation and ideas. But A/B testing is a tool. Like any tool, it can be used well or used poorly.

A/B testing observes human behavior, which is messy and complicated. Closer to behavioral economics, the metrics represent partial information and observations. From very limited data, we need to say why humans do the crazy things they do and predict what will happen next.

When used well, A/B testing helps innovation. But A/B testing should not subjugate, binding teams to do nothing unless a key metric is passed. Rather it should be used to gain partial information about expected short and long-term costs and benefits.

For misinformation, disinformation, scams, and the impact of advertising, A/B tests get some data on short-term effects, but little on long-term benefits. Ultimately there will be an investment decision about whether to pay the expected short-term costs for the hoped for long-term benefits.

A/B testing is a powerful tool for bottom-up innovation. But it is only a tool and can be used badly. A/B data should be used to inform debate, not halt debate. And I think it should always be helping to find a way to say yes to new ideas.

Sunday, July 03, 2022

Making it more difficult to shill recommender systems

Lately I've been thinking about recommender algorithms and how they go wrong. I keep hitting examples of people arguing that we should ban the fewest accounts possible when thinking about what accounts are used by recommender systems. Why? Or why not the opposite? What's wrong with using the fewest accounts you can without degrading the perceived quality of the recommendations?

The reason this matters is that recommender systems these days are struggling with shilling. Companies are playing whack-a-mole with bad actors who just create new accounts or find new shills every time they're whacked because it's so profitable -- like free advertising -- to create fake crowds that manipulate the algorithms. Propagandists and scammers are loving it and winning. It's easy and lucrative for them.

So what's wrong with taking the opposite strategy, only using the most reliable accounts? As a thought experiment, let's say you rank order accounts by your confidence they are human, independent, not shilling, and trustworthy. Then go down the list of accounts, using their behavior data until the recommendations stop improving at a noticeable level (being careful about cold start and the long tail). Then stop. Don't use the rest. Why not do that? It'd vastly increase costs for adversaries. And it wouldn't change the perceived quality of recommendations because you've made sure it wouldn't.

The traditional approach to this is to classify accounts as spam or shills separate from how the data will be used. The classifiers minimize the error rates (false positive and false negative), then treat all borderline cases as not spam. The idea here is to do almost the opposite of that traditional approach, classify accounts as trustworthy, then use only those, ignoring anything unknown or borderline as well as known spammers and shills.

This works because how the data will be used as well as the bad incentives for spammers and propagandists are really sensitive to false negatives (letting in any manipulation at all of the recommender algorithms) but not very sensitive to false positives (accidentally not using some of the data that might have actually been fine to use). Letting in even one shill can badly impact recommenders, especially when shills target getting new content trending, but using less data of the lower quality data doesn't usually change the recommendations in ways that matter for people.

This isn't my idea or a new idea, by the way. It's actually a quite old idea, talked about in papers like TrustRank, Anti-TrustRank, and Wisdom of the Few, and similar techniques are applied already by companies like Google for dealing with web spam.

The world has changed in the last decade. Especially on social media, there is rampant manipulation of wisdom of the crowd data such as likes and shares. A big part of the problem is the algorithms like trending, search, and recommender systems that pick up the manipulated data and use it to amplify shilled content. That makes it quite profitable and easy for disinformation and scams.

Places like Amazon, Facebook, and Twitter are swimming in data, but their problem is that a lot of it is untrustworthy and shilled. But you don't need to use all the data. Toss big data happily, anything suspicious at all, false positives galore accidentally marking new accounts or borderline accounts as shills when deciding what to input to the recommender algorithms. Who cares if you do?

Saturday, March 20, 2021

Wisdom of the trusted

Flood-the-zone disinformation is a problem for crowdsourced data. Wisdom of the crowds, mass amateurization, and rejection of gatekeepers no longer works with coordinated disinformation campaigns overwhelming rankers, recommenders, and content with shills and spam.

Two decades ago, a lot of us underestimated the negative effects of lower costs for communication and information sharing. While good, it also made propaganda, shilling, and manipulation far easier, and our defenses against disinformation campaigns proved weak.

We in tech were overly idealistic about what would happen as the cost of information and communication dropped. Many thought propaganda would be harder as people could now easily access the truth.

But you can't source reviews from your customers anymore if the vast majority of reviews are paid shills. You can't rank using usage data if ratings and clicks are mostly fake.

Crowdsourced information, including web crawls, reviews, and commentary, only works when almost everyone is independent and unbiased. Coordinated disinformation breaks crowdsourcing.

Flood-the-zone shouldn't have been a surprise, but it was. Propaganda and manipulation are winning because we still treat inauthentic behavior as real.

While there is plenty of mostly-deserved love for big data, often less is more when you live in an adversarial, flood-the-zone world. Wisdom of the crowds has an assumption of independence between agents, which now has been broken by coordinated disinformation campaigns.

If you are looking at garbage, there is no information. Adding disinformation to good data purely makes things worse. It's like making a milkshake, then eying a huge putrid sack of night soil nearby. Sure, you could add some of that to what you made, but even a little is going to make it worse. If there is crap everywhere, you might want to stick with what you can prove to be good.

Polling, one of the oldest forms of crowdsourced information, has been impacted too. The trend in recent years is that low response rates and shilling make it so expensive to poll that Pew Research gets better data cheaper by forming and managing a large paid panel of trusted experts.

For those working in machine learning, for those trying to work with big data, reputation and reputable sources have to be the response in a flood-the-zone world. When most of the data is bad, how you filter your data becomes the most important thing.

We have a big challenge ahead, countering disinformation using reputation and lack of reputation. In a flood-the-zone world, most data out there is now bad to useless. Isolating the useful requires skepticism toward data, like TrustRank, starting untrusted, bad until proven good.

Reviews should discard anything even resembling a shill, giving visibility only to reviews from independent and trustworthy customers. Recommender systems and rankers should focus on the data from and related to proven sources, and weight anything unknown as questionable at best and likely worthless. Most crowdsourced data for machine learning, from clicks to content, is going to have to be viewed with skepticism.

Inauthentic behavior and coordinated disinformation campaigns have shilled wisdom of the crowd to death. For reliable big data in a flood-the-zone world, it will have to be wisdom of the trusted.

Tuesday, December 15, 2020

When will virtual reality take off? The $100 bet.

About four years ago, Professor Daniel Lemire and I made a $100 bet on how quickly virtual reality would reach a broad, mainstream market. Specifically, my side of the bet was, "Virtual reality hardware (not counting cardboard) will not sell more than 10M units/year worldwide before March 2019." He bet that it would.

In early 2020, we decided to wait settle the bet because it looked like there was some chance VR would reach 10M units/year in 2020. Because of COVID and people looking for entertainment at home, Valve's release of Half Life Alyx, Supernatural (the VR exercise program), and big pushes on consumer VR by several companies, we wanted to wait and see if it was off by just one year, if 2020 was the year.

At this point, the results are in, and it is clear VR has not reached far beyond early adopters and enthusiasts. Estimates of total hardware sales vary depending on what is considered VR hardware, but most estimates I've seen have worldwide unit sales at around 5-6M in 2020.

Barron's has a nice summary: "We’ve been talking about virtual reality for decades, but it’s gone pretty much nowhere. Despite all of our advances in tech, VR hasn’t been able to bridge the physical and digital realms in any substantial way." TechCrunch adds, "There are signs of growth though it’s clear [VR] is still a niche product."

So what went wrong? Looking back at VR hype in 2016, there were a lot of reasons to be optimistic: HoloLens from Microsoft, Sony entering VR with Playstation VR, Valve pushing hard on VR in the Steam store and with their own products, Xbox looking like it might do VR, Google showing interest in VR, and, though it always seemed like vaporware to me, there was a lot of excitement around the promises made by heavily-funded MagicLeap. It looked likely that someone would make a must-have game or other compelling use of VR that might attract tens of millions of people.

Speculating a bit, I think the issue here goes beyond just needing more time, so beyond waiting for gradual acceptance of VR and growth. I think the problem is that the non-virtual-reality experience is close enough for most purposes, making VR uncompelling to set up and use.

For example, take the virtual tourism experience of visiting the International Space Station in Google Earth. It's fun and compelling enough without virtual reality, so VR in virtual tourism only a little bit of wow to the experience. Half Life Alyx seems to me to suffer from the same problem, a fun game with some compelling content, so great to try, but not a must-have. Exercise programs like Supernatural or Beat Saber fall in the same category, fun, cool to try, but not something without okay substitutes or alternatives.

At the time we made the bet back in 2016, I said something similar about why I might lose the bet: "There are several wild cards here. For example, it is possible that much cheaper units can be made to work. It's possible that someone discovers very carefully chosen environments and software tricks fool the brain into fully accepting the virtual reality, especially for gaming, increasing the appeal and making it a must-have experience for a lot of people. As unsavory as it is, pornography is often a wild card with new technology, potentially driving adoption in ways that can determine winners and losers. A breakthrough in display (such as retinal displays) might allow virtual reality hardware that is much cheaper and lighter. Business use is another unknown where virtual reality could provide a large cost savings over physical presence. I do think there are many ways in which I could lose this bet."

Unfortunately, I don't think such must-have, compelling VR experiences exist. Perhaps at some point it will. Chris Pruett, who runs part of Oculus, speculated about that, saying: "My guess would be something that is highly immersive, that involves active motion of your body, and ... it's probably going to be something that you either play with other people or is shareable with other people." That sounds plausible to me, though, more broadly, I think it has to be a must-have experience without okay substitutes in non-VR, which is a high bar. My prediction now in 2020 would be that VR will continue to struggle for years to break out beyond enthusiasts and early adopters, at least until it has a truly must-have experience.

I think Daniel Lemire took the harder side of this bet, so I'll match his $100 donation to Wikipedia to settle the bet. Back in 2016, I did add a couple ways of making my side of the bet even harder, saying I doubted even over three years in 2016-2019 that VR would sell a total of more than 10M/units, which appears to be close, and that Google Cardboard-like devices wouldn't go beyond being just a toy, so not regularly used by tens of millions, which looks like it was correct.

And I want to thank Daniel for making this bet. Whether you are betting with the hype or against it, along with conventional wisdom or against the flow, it's hard to publicly take a stand and one way or another and be willing to be wrong, especially when big company money is betting against you. This was an interesting bet.

If you enjoyed this, you might also be interested in our 2012 bet about whether tablets will replace PCs.

Update: Daniel Lemire has a post up on his thoughts on the bet, "Virtual reality… millions but not tens of millions… yet".

Friday, December 04, 2020

Facebook and investing in the long-term

Kevin Roose, Mike Isaac and Sheera Frenkel at the New York Times had a great piece ([1] [2]) on the internal debate inside Facebook on removing disinformation:
Facebook engineers and data scientists posted the results of a series of experiments called "P(Bad for the World)." ... The team trained a machine-learning algorithm to predict posts that users would consider "bad for the world" and demote them in news feeds. In early tests, the new algorithm successfully reduced the visibility of objectionable content.

But it also lowered the number of times users opened Facebook, an internal metric known as "sessions" that executives monitor closely.

Another product, an algorithm to classify and demote "hate bait" — posts that don’t strictly violate Facebook’s hate speech rules, but that provoke a flood of hateful comments ... [Another] called "correct the record," would have retroactively notified users that they had interacted with false news and directed them to an independent fact-check ... [Both were] vetoed by policy executives who feared it would disproportionately show notifications to people who shared false news from right-wing websites.

Many rank-and-file workers and some executives ... want to do more to limit misinformation and polarizing content. [Others] fear those measures could hurt Facebook’s growth, or provoke a political backlash ... Some disillusioned employees have quit, saying they could no longer stomach working for a company whose products they considered harmful.
The article is an insightful look at the struggle inside Facebook on recommender systems for news, metrics, and short vs. long-term metrics and growth. Key is fear of harming short-term metrics like sessions per user and engagement.

Any attempt to increase quality of news or ads is going to result in a short-term reduction in metrics engagement, usage, and revenue. That's obvious and not the question to ask. The question to ask is, does it pay off in the long-term?

It's unsurprising that once you've kicked off all users who hate what Facebook has become and addicted the rest to clickbait, the remainder will use Facebook less in the short-term if you improve the quality of content.

This is just like any other investment. If you invest in any large expense, you expect your short-term profits to drop, but you're betting that your long-term profits will rise. In this case, increased news quality is an investment in bringing back lapsed users.

Even measured over weeks, sessions per user is going to take a hit with a change to news quality because users who like higher quality news already disengaged and abandoned and current heavy users won't like the change. It will take a long time to pay off.

For Facebook, reducing disinformation probably would also be an investment in other areas. Facebook is polluting society with disinformation, externalizing costs; cutting disinformation is an investment in reducing regulation risk from governments. And Facebook wants good people, and many good people are leaving ([1]) or won't even consider working there because of their practices, a considerable long-term cost on the company; cutting disinformation is an investment in recruiting and retention. So Facebook probably would see benefits beyond lapsed users.

Facebook and others need to think of reducing disinformation as an investment in the future. Eliminating scams, low quality ads, clickbait, and disinformation often will reduce short-term metrics, but is a long-term investment in quality to reduce abandons, bring back lapsed users, and in other long-term business goals. These investments take a long-time to pay off, but that's why you make investments, for the long-term payoff.

Tuesday, November 26, 2019

Papers and posting

If you haven't seen it, Adrian Colyer's excellent blog has great reviews and summaries of recent papers. Back when this blog started in 2004, there weren't many people summarizing research papers. Many more are now, which is part of why I post less now. Adrian's blog is excellent and similar to what I used to do, but I think better in many ways. You can also follow Adrian Colyer on Twitter.

While I'm talking about summarizing papers, I want to highlight two lines of work that had an impact on me in the last few years and that I think deserve much more attention. Both argue we, as in all of us in tech, are doing something important wrong.

The first argues that our metrics usually are off, specifically way too focused on short-term goals like immediate revenue. This is the work started by the fantastic Focus on the Long-term out of Google and continuing from there (including [1] [2]). Because much of what we all do is an optimization process -- ML, deep learning, recommendations, A/B testing, search, and advertising -- having the targets wrong means we are optimizing for the wrong thing.

Optimizing for the wrong thing is ubiquitous in our industry. It may, for example, cause almost everyone to show too many ads and too many low quality ads. If everyone has their metrics subtly wrong, everything we make, and especially everything in the ML community, may be aiming for the wrong target.

The second is Kate Starbird's work on disinformation campaigns. Across many recent papers, Kate argues that the traditional classifier approach to spam, trolls, and shills has been failing. Adversaries can create many accounts and enlist real humans in their disinformation effort. Knocking a few accounts away does nothing; it is like shooting bullets into a wave. Instead, it is important to look at the goals of disinformation campaigns and make them more expensive to achieve. Because shills impact so many things we do -- training data for ML and deep learning, social media, reviews, recommendations, A/B testing, search, advertising -- our failure to deal with shills means the assumptions all of these systems have about the data all being equally good are wrong, and the quality of all these systems is reduced.

Solutions are hard. I'm afraid Kate's advice on solutions is limited. But I would say solutions include whitelisting (using only experts, verified real people, or accounts that are expensive to create), recognizing likely disinformation as it starts to propagate and slowing it, and countering likely disinformation with accurate information where it appears. Those replace outdated whack-a-mole account classifiers and work across multiple accounts to counter modern disinformation campaigns. Manipulation and shilling from sophisticated adversaries is ubiquitous in our industry. Until we fix this, many of our systems produce lower quality results.

Finally, I am posting a lot less here now, so let me point to other resources for anyone who liked this blog. I still post frequently on Twitter; you can follow me there. On AI/ML, there's a lot of great writing by a lot of people, far too many to list, but I can at least list my favorites, which are Fran├žois Chollet and Ben Hamner on Twitter. On economics and econometrics, which I enjoy for adding breadth to AI/ML, my favorites are economists Noah Smith and Dina Pomeranz on Twitter.

Wednesday, May 08, 2019

Tech and tech idealism

It's been almost 2 years since my last post! I don't know if anyone is still reading this. If you are, thank you!

Why haven't I posted more? Partly it is the broad transition to microblogging, which everyone is using more than long form. But part also is that I have negative feelings about where tech has been going.

I'm a tech idealist. I think tech can and should be a force for good in the world. I have spent most of my life trying to build systems where computers are helping humans. Sometimes this is by computers sifting information that is hard for people to find on their own. Sometimes this is by computers surfacing other people that can help.

Lately, some tech companies have been favoring exploitation and deception. Data is being used to manipulate. Tech is becoming customer hostile.

I've been lucky. I have gotten to work on some amazing things. There is a joy to helping someone discover a new book they will love, a bit of knowledge added to a life. Many people feel overwhelmed by the news and information in their lives, and sorting through to find what is truly important is too hard. Ads shouldn't be so annoying and irrelevant, and, you know what, they don't have to be. I've enjoyed helping people find and discover whatever they need online.

But looking at where we are in tech now, it feels like a dot com bubble again. Get rich quick. It's not building something that people love, but get the buck. Greed feeds short-term thinking. Grab that next bonus and get out before the wreckage hits.

Tech idealism is still out there. There still are many people building things that help people. There is research, the creation of knowledge and new ways to help even more people. There are many people using computers and data for good.

And there are many new people getting into computer science, which is fantastic. Computers are a force multiplier. Computers make people more productive and more powerful. Computer science and data science are just starting to have an impact in other fields.

The interdisciplinary opportunities are everywhere and exciting. We know almost nothing about our own oceans; there are huge opportunities for discoveries in biology from undersea probes and drones. We are just starting to image the entire night sky frequently, and sifting through that data with massive computing power will forever change astronomy. The field of economics is shifting to data and behavior over theory. Archeology can be fueled by processing massive amounts of satellite imagery. In field after field, computers and data are making the once impossible possible.

Tech idealism is coming back. Something may have to come to flush away some of those just seeking quick profits. Some of the worst abuses may have to be obvious failures before they are rained in. But it will change.

Computers and data are a force multiplier, allowing people to do more than they could before. Working at massive scale, computers help us understand and discover. In long-term, tech is a force for good.