Monday, December 19, 2022
Are ad-supported business models anti-consumer?
Monday, December 12, 2022
Focus on the Long-term
tl;dr: When you increase ads, short-term revenue goes up, but you're diving deeper into ad inventory and the average ad quality drops. Over time, this causes people to look at ads less, click on ads less, and reduces retention. If you measure using long experiments that capture those effects, you find that showing fewer ads makes less money in the short-term but more money in the long-term.
Because most A/B tests don't measure long-term effects properly and this is hard for most organizations to measure correctly, the broader implication is that most websites show too many ads to maximize long-term profits.
Saturday, December 10, 2022
ML and flooding the zone with crap
If the crowd is shilled and fake, most of the data isn't useful for machine learning. To be useful, you have to pull out the scarce wisdom in the sea of noise.
Gary Marcus looked at this in his latest post, "AI's Jurassic Park moment". Gary talks about how ChatGPT makes it much cheaper to produce huge amounts of reasonable-sounding bullshit and post it on community sites, then he said:
For Stack Overflow, the issue is literally existential. If the website is flooded with worthless code examples, programmers will no longer go there, its database of over 30 million questions and answers will become untrustworthy, and the 14 year old website will die.StackOverflow added:
Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers.There was a 2009 SIGIR paper, "The Wisdom of the Few", that cleverly pointed out that a lot of this is unnecessary. For recommender systems, trending algorithms, reviews, and rankers, only the best data is needed to produce high quality results. Once you use the independent, reliable, high quality opinions, adding more big data can easily make things worse. Less is more, especially in the presence of adversarial attacks on your recommender system.The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce. There are also many people trying out ChatGPT to create answers, without the expertise or willingness to verify that the answer is correct prior to posting. Because such answers are so easy to produce, a large number of people are posting a lot of answers.
When using behavior data, ask what would happen if you could sort by usefulness to the ML algorithm and users. You'd go down the sorted list, then stop at some point when the output no longer improved. That stopping point would be very early if a lot of the data is crap.
In today's world, with fake crowds and shills everywhere, wisdom of the crowds fails. Data of unknown quality or provable spam should be freely ignored. Only use reliable, independent behavior data as input to ML.
Thursday, November 24, 2022
Alternatives to Twitter
In part that was from the shut down of Google Reader. In part I was finally giving in on the trend against long form blogging. So this blog has been pretty quiet for years.
The recent decline of Twitter has me looking for and trying alternatives.
One surprise I found is that Google News and TechMeme feel worth using more often. Both are surprisingly effective alternatives to social media. I am finding they have most of the value without much of the unpleasantness, although missing the contact with close friends.
I was also surprised to find I liked LinkedIn as a substitute for Twitter. I find most of the interactions to be fairly good there, though again missing some close friends.
So far, I have more mixed feelings about using Mastodon, Facebook, Instagram, Post, or going back to blogging as alternatives. Anyone have anything else they like? Or differing experiences?
Wednesday, November 23, 2022
Quoted in the Washington Post
I'm talking about how Amazon used to help (but no longer) for finding and discovering what you want to buy, saying, "The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother."
The reporter, Geoffrey Fowler, goes on to say, "This is probably how most of us imagine Amazon still works. But today advertisers are driving the experience ... The Amazon we experience today is pretty much the opposite of how Amazon used to work."
The article is critical of all the ads on Amazon now, which makes the shopping experience terrible. I think it is very hard to find things on Amazon nowadays. This happened for a well-known reason. Increasing ad load -- which is the number of ads on a web page -- will usually increase short-term revenue, but it hurts retention, ad performance, and long-term revenue. As the Washington Post reporter describes, all the ads cause people to go elsewhere when they need to shop, and that has long-term costs for Amazon.
Saturday, November 19, 2022
Experimentation and metrics
A/B testing observes human behavior, which is messy and complicated. Closer to behavioral economics, the metrics represent partial information and observations. From very limited data, we need to say why humans do the crazy things they do and predict what will happen next.
When used well, A/B testing helps innovation. But A/B testing should not subjugate, binding teams to do nothing unless a key metric is passed. Rather it should be used to gain partial information about expected short and long-term costs and benefits.
For misinformation, disinformation, scams, and the impact of advertising, A/B tests get some data on short-term effects, but little on long-term benefits. Ultimately there will be an investment decision about whether to pay the expected short-term costs for the hoped for long-term benefits.
A/B testing is a powerful tool for bottom-up innovation. But it is only a tool and can be used badly. A/B data should be used to inform debate, not halt debate. And I think it should always be helping to find a way to say yes to new ideas.
Sunday, July 03, 2022
Making it more difficult to shill recommender systems
The reason this matters is that recommender systems these days are struggling with shilling. Companies are playing whack-a-mole with bad actors who just create new accounts or find new shills every time they're whacked because it's so profitable -- like free advertising -- to create fake crowds that manipulate the algorithms. Propagandists and scammers are loving it and winning. It's easy and lucrative for them.
So what's wrong with taking the opposite strategy, only using the most reliable accounts? As a thought experiment, let's say you rank order accounts by your confidence they are human, independent, not shilling, and trustworthy. Then go down the list of accounts, using their behavior data until the recommendations stop improving at a noticeable level (being careful about cold start and the long tail). Then stop. Don't use the rest. Why not do that? It'd vastly increase costs for adversaries. And it wouldn't change the perceived quality of recommendations because you've made sure it wouldn't.
The traditional approach to this is to classify accounts as spam or shills separate from how the data will be used. The classifiers minimize the error rates (false positive and false negative), then treat all borderline cases as not spam. The idea here is to do almost the opposite of that traditional approach, classify accounts as trustworthy, then use only those, ignoring anything unknown or borderline as well as known spammers and shills.
This works because how the data will be used as well as the bad incentives for spammers and propagandists are really sensitive to false negatives (letting in any manipulation at all of the recommender algorithms) but not very sensitive to false positives (accidentally not using some of the data that might have actually been fine to use). Letting in even one shill can badly impact recommenders, especially when shills target getting new content trending, but using less data of the lower quality data doesn't usually change the recommendations in ways that matter for people.
This isn't my idea or a new idea, by the way. It's actually a quite old idea, talked about in papers like TrustRank, Anti-TrustRank, and Wisdom of the Few, and similar techniques are applied already by companies like Google for dealing with web spam.
The world has changed in the last decade. Especially on social media, there is rampant manipulation of wisdom of the crowd data such as likes and shares. A big part of the problem is the algorithms like trending, search, and recommender systems that pick up the manipulated data and use it to amplify shilled content. That makes it quite profitable and easy for disinformation and scams.
Places like Amazon, Facebook, and Twitter are swimming in data, but their problem is that a lot of it is untrustworthy and shilled. But you don't need to use all the data. Toss big data happily, anything suspicious at all, false positives galore accidentally marking new accounts or borderline accounts as shills when deciding what to input to the recommender algorithms. Who cares if you do?