Monday, December 19, 2022

Are ad-supported business models anti-consumer?

Advertising-supported businesses are harder to align with long-term customer satisfaction than subscription businesses, but they make more money if they do.

A common view is that ad-supported websites, in their drive for more ad clicks, cannot resist exploting their customers with scammy content and more and more ads.

The problem is that eventually those websites become unusable and the business fails. Take the simple case of websites that put more and more ads on the page. Sure, ad revenue goes up for a while, but people rapidly become annoyed with all the ads and leave. The business then declines.

That's not maximizing revenue or profitability. That's a business failure by execs that should have known better.

It's very tempting to use short-term metrics like ad clicks and engagement for advertising-supported businesses, which encourages doing things like increasing ad load or clickbait content. But in the long run, that hurts retention, growth, and ad revenue.

In a subscription-supported business, it's easier to get the metrics right because the goal is keeping customers subscribing. In an ad-supported business, it isn't as obvious that keeping customers around and clicking ads for years is the goal. But it's still the goal.

Ad-supported businesses will make more money if they aren't filled with scams or laden with ads. But it's easy for ad-supported businesses to get the incentives and metrics wrong, much more error-prone than for subscription-supported businesses where the metrics are more obvious. While it may be harder for executives to see, ad-supported business do better if they focus on long-term customer satisfaction, retention, and growth.

Monday, December 12, 2022

Focus on the Long-term

One of my favorite papers of all time is "Focus on the Long-Term: It's better for Users and Business" from Google Research. This paper found that Google makes more money in the long-term -- when carefully and properly measured -- by reducing advertising. Because of this work, they reduced advertising on mobile devices by 50%.

tl;dr: When you increase ads, short-term revenue goes up, but you're diving deeper into ad inventory and the average ad quality drops. Over time, this causes people to look at ads less, click on ads less, and reduces retention. If you measure using long experiments that capture those effects, you find that showing fewer ads makes less money in the short-term but more money in the long-term.

Because most A/B tests don't measure long-term effects properly and this is hard for most organizations to measure correctly, the broader implication is that most websites show too many ads to maximize long-term profits.

Saturday, December 10, 2022

ML and flooding the zone with crap

Wisdom of the few is often better than wisdom of the crowds.

If the crowd is shilled and fake, most of the data isn't useful for machine learning. To be useful, you have to pull out the scarce wisdom in the sea of noise.

Gary Marcus looked at this in his latest post, "AI's Jurassic Park moment". Gary talks about how ChatGPT makes it much cheaper to produce huge amounts of reasonable-sounding bullshit and post it on community sites, then he said:

For Stack Overflow, the issue is literally existential. If the website is flooded with worthless code examples, programmers will no longer go there, its database of over 30 million questions and answers will become untrustworthy, and the 14 year old website will die.
StackOverflow added:
Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers.

The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce. There are also many people trying out ChatGPT to create answers, without the expertise or willingness to verify that the answer is correct prior to posting. Because such answers are so easy to produce, a large number of people are posting a lot of answers.

There was a 2009 SIGIR paper, "The Wisdom of the Few", that cleverly pointed out that a lot of this is unnecessary. For recommender systems, trending algorithms, reviews, and rankers, only the best data is needed to produce high quality results. Once you use the independent, reliable, high quality opinions, adding more big data can easily make things worse. Less is more, especially in the presence of adversarial attacks on your recommender system.

When using behavior data, ask what would happen if you could sort by usefulness to the ML algorithm and users. You'd go down the sorted list, then stop at some point when the output no longer improved. That stopping point would be very early if a lot of the data is crap.

In today's world, with fake crowds and shills everywhere, wisdom of the crowds fails. Data of unknown quality or provable spam should be freely ignored. Only use reliable, independent behavior data as input to ML.

Thursday, November 24, 2022

Alternatives to Twitter

About five years ago, I moved most of my blogging from here to microblogging on Twitter.

In part that was from the shut down of Google Reader. In part I was finally giving in on the trend against long form blogging. So this blog has been pretty quiet for years.

The recent decline of Twitter has me looking for and trying alternatives.

One surprise I found is that Google News and TechMeme feel worth using more often. Both are surprisingly effective alternatives to social media. I am finding they have most of the value without much of the unpleasantness, although missing the contact with close friends.

I was also surprised to find I liked LinkedIn as a substitute for Twitter. I find most of the interactions to be fairly good there, though again missing some close friends.

So far, I have more mixed feelings about using Mastodon, Facebook, Instagram, Post, or going back to blogging as alternatives. Anyone have anything else they like? Or differing experiences?

Wednesday, November 23, 2022

Quoted in the Washington Post

I'm quoted in the Washington Post today in an article titled "It’s not your imagination: Shopping on Amazon has gotten worse."

I'm talking about how Amazon used to help (but no longer) for finding and discovering what you want to buy, saying, "The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother."

The reporter, Geoffrey Fowler, goes on to say, "This is probably how most of us imagine Amazon still works. But today advertisers are driving the experience ... The Amazon we experience today is pretty much the opposite of how Amazon used to work."

The article is critical of all the ads on Amazon now, which makes the shopping experience terrible. I think it is very hard to find things on Amazon nowadays. This happened for a well-known reason. Increasing ad load -- which is the number of ads on a web page -- will usually increase short-term revenue, but it hurts retention, ad performance, and long-term revenue. As the Washington Post reporter describes, all the ads cause people to go elsewhere when they need to shop, and that has long-term costs for Amazon.

Saturday, November 19, 2022

Experimentation and metrics

Since the early days of the Web, I've been a fan of A/B testing for promoting innovation and ideas. But A/B testing is a tool. Like any tool, it can be used well or used poorly.

A/B testing observes human behavior, which is messy and complicated. Closer to behavioral economics, the metrics represent partial information and observations. From very limited data, we need to say why humans do the crazy things they do and predict what will happen next.

When used well, A/B testing helps innovation. But A/B testing should not subjugate, binding teams to do nothing unless a key metric is passed. Rather it should be used to gain partial information about expected short and long-term costs and benefits.

For misinformation, disinformation, scams, and the impact of advertising, A/B tests get some data on short-term effects, but little on long-term benefits. Ultimately there will be an investment decision about whether to pay the expected short-term costs for the hoped for long-term benefits.

A/B testing is a powerful tool for bottom-up innovation. But it is only a tool and can be used badly. A/B data should be used to inform debate, not halt debate. And I think it should always be helping to find a way to say yes to new ideas.

Sunday, July 03, 2022

Making it more difficult to shill recommender systems

Lately I've been thinking about recommender algorithms and how they go wrong. I keep hitting examples of people arguing that we should ban the fewest accounts possible when thinking about what accounts are used by recommender systems. Why? Or why not the opposite? What's wrong with using the fewest accounts you can without degrading the perceived quality of the recommendations?

The reason this matters is that recommender systems these days are struggling with shilling. Companies are playing whack-a-mole with bad actors who just create new accounts or find new shills every time they're whacked because it's so profitable -- like free advertising -- to create fake crowds that manipulate the algorithms. Propagandists and scammers are loving it and winning. It's easy and lucrative for them.

So what's wrong with taking the opposite strategy, only using the most reliable accounts? As a thought experiment, let's say you rank order accounts by your confidence they are human, independent, not shilling, and trustworthy. Then go down the list of accounts, using their behavior data until the recommendations stop improving at a noticeable level (being careful about cold start and the long tail). Then stop. Don't use the rest. Why not do that? It'd vastly increase costs for adversaries. And it wouldn't change the perceived quality of recommendations because you've made sure it wouldn't.

The traditional approach to this is to classify accounts as spam or shills separate from how the data will be used. The classifiers minimize the error rates (false positive and false negative), then treat all borderline cases as not spam. The idea here is to do almost the opposite of that traditional approach, classify accounts as trustworthy, then use only those, ignoring anything unknown or borderline as well as known spammers and shills.

This works because how the data will be used as well as the bad incentives for spammers and propagandists are really sensitive to false negatives (letting in any manipulation at all of the recommender algorithms) but not very sensitive to false positives (accidentally not using some of the data that might have actually been fine to use). Letting in even one shill can badly impact recommenders, especially when shills target getting new content trending, but using less data of the lower quality data doesn't usually change the recommendations in ways that matter for people.

This isn't my idea or a new idea, by the way. It's actually a quite old idea, talked about in papers like TrustRank, Anti-TrustRank, and Wisdom of the Few, and similar techniques are applied already by companies like Google for dealing with web spam.

The world has changed in the last decade. Especially on social media, there is rampant manipulation of wisdom of the crowd data such as likes and shares. A big part of the problem is the algorithms like trending, search, and recommender systems that pick up the manipulated data and use it to amplify shilled content. That makes it quite profitable and easy for disinformation and scams.

Places like Amazon, Facebook, and Twitter are swimming in data, but their problem is that a lot of it is untrustworthy and shilled. But you don't need to use all the data. Toss big data happily, anything suspicious at all, false positives galore accidentally marking new accounts or borderline accounts as shills when deciding what to input to the recommender algorithms. Who cares if you do?