Monday, December 19, 2022

Are ad-supported business models anti-consumer?

Advertising-supported businesses are harder to align with long-term customer satisfaction than subscription businesses, but they make more money if they do.

A common view is that ad-supported websites, in their drive for more ad clicks, cannot resist exploting their customers with scammy content and more and more ads.

The problem is that eventually those websites become unusable and the business fails. Take the simple case of websites that put more and more ads on the page. Sure, ad revenue goes up for a while, but people rapidly become annoyed with all the ads and leave. The business then declines.

That's not maximizing revenue or profitability. That's a business failure by execs that should have known better.

It's very tempting to use short-term metrics like ad clicks and engagement for advertising-supported businesses, which encourages doing things like increasing ad load or clickbait content. But in the long run, that hurts retention, growth, and ad revenue.

In a subscription-supported business, it's easier to get the metrics right because the goal is keeping customers subscribing. In an ad-supported business, it isn't as obvious that keeping customers around and clicking ads for years is the goal. But it's still the goal.

Ad-supported businesses will make more money if they aren't filled with scams or laden with ads. But it's easy for ad-supported businesses to get the incentives and metrics wrong, much more error-prone than for subscription-supported businesses where the metrics are more obvious. While it may be harder for executives to see, ad-supported business do better if they focus on long-term customer satisfaction, retention, and growth.

Monday, December 12, 2022

Focus on the Long-term

One of my favorite papers of all time is "Focus on the Long-Term: It's better for Users and Business" from Google Research. This paper found that Google makes more money in the long-term -- when carefully and properly measured -- by reducing advertising. Because of this work, they reduced advertising on mobile devices by 50%.

tl;dr: When you increase ads, short-term revenue goes up, but you're diving deeper into ad inventory and the average ad quality drops. Over time, this causes people to look at ads less, click on ads less, and reduces retention. If you measure using long experiments that capture those effects, you find that showing fewer ads makes less money in the short-term but more money in the long-term.

Because most A/B tests don't measure long-term effects properly and this is hard for most organizations to measure correctly, the broader implication is that most websites show too many ads to maximize long-term profits.

Saturday, December 10, 2022

ML and flooding the zone with crap

Wisdom of the few is often better than wisdom of the crowds.

If the crowd is shilled and fake, most of the data isn't useful for machine learning. To be useful, you have to pull out the scarce wisdom in the sea of noise.

Gary Marcus looked at this in his latest post, "AI's Jurassic Park moment". Gary talks about how ChatGPT makes it much cheaper to produce huge amounts of reasonable-sounding bullshit and post it on community sites, then he said:

For Stack Overflow, the issue is literally existential. If the website is flooded with worthless code examples, programmers will no longer go there, its database of over 30 million questions and answers will become untrustworthy, and the 14 year old website will die.
StackOverflow added:
Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers.

The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce. There are also many people trying out ChatGPT to create answers, without the expertise or willingness to verify that the answer is correct prior to posting. Because such answers are so easy to produce, a large number of people are posting a lot of answers.

There was a 2009 SIGIR paper, "The Wisdom of the Few", that cleverly pointed out that a lot of this is unnecessary. For recommender systems, trending algorithms, reviews, and rankers, only the best data is needed to produce high quality results. Once you use the independent, reliable, high quality opinions, adding more big data can easily make things worse. Less is more, especially in the presence of adversarial attacks on your recommender system.

When using behavior data, ask what would happen if you could sort by usefulness to the ML algorithm and users. You'd go down the sorted list, then stop at some point when the output no longer improved. That stopping point would be very early if a lot of the data is crap.

In today's world, with fake crowds and shills everywhere, wisdom of the crowds fails. Data of unknown quality or provable spam should be freely ignored. Only use reliable, independent behavior data as input to ML.