Tuesday, November 26, 2019

Papers and posting

If you haven't seen it, Adrian Colyer's excellent blog has great reviews and summaries of recent papers. Back when this blog started in 2004, there weren't many people summarizing research papers. Many more are now, which is part of why I post less now. Adrian's blog is excellent and similar to what I used to do, but I think better in many ways. You can also follow Adrian Colyer on Twitter.

While I'm talking about summarizing papers, I want to highlight two lines of work that had an impact on me in the last few years and that I think deserve much more attention. Both argue we, as in all of us in tech, are doing something important wrong.

The first argues that our metrics usually are off, specifically way too focused on short-term goals like immediate revenue. This is the work started by the fantastic Focus on the Long-term out of Google and continuing from there (including [1] [2]). Because much of what we all do is an optimization process -- ML, deep learning, recommendations, A/B testing, search, and advertising -- having the targets wrong means we are optimizing for the wrong thing.

Optimizing for the wrong thing is ubiquitous in our industry. It may, for example, cause almost everyone to show too many ads and too many low quality ads. If everyone has their metrics subtly wrong, everything we make, and especially everything in the ML community, may be aiming for the wrong target.

The second is Kate Starbird's work on disinformation campaigns. Across many recent papers, Kate argues that the traditional classifier approach to spam, trolls, and shills has been failing. Adversaries can create many accounts and enlist real humans in their disinformation effort. Knocking a few accounts away does nothing; it is like shooting bullets into a wave. Instead, it is important to look at the goals of disinformation campaigns and make them more expensive to achieve. Because shills impact so many things we do -- training data for ML and deep learning, social media, reviews, recommendations, A/B testing, search, advertising -- our failure to deal with shills means the assumptions all of these systems have about the data all being equally good are wrong, and the quality of all these systems is reduced.

Solutions are hard. I'm afraid Kate's advice on solutions is limited. But I would say solutions include whitelisting (using only experts, verified real people, or accounts that are expensive to create), recognizing likely disinformation as it starts to propagate and slowing it, and countering likely disinformation with accurate information where it appears. Those replace outdated whack-a-mole account classifiers and work across multiple accounts to counter modern disinformation campaigns. Manipulation and shilling from sophisticated adversaries is ubiquitous in our industry. Until we fix this, many of our systems produce lower quality results.

Finally, I am posting a lot less here now, so let me point to other resources for anyone who liked this blog. I still post frequently on Twitter; you can follow me there. On AI/ML, there's a lot of great writing by a lot of people, far too many to list, but I can at least list my favorites, which are François Chollet and Ben Hamner on Twitter. On economics and econometrics, which I enjoy for adding breadth to AI/ML, my favorites are economists Noah Smith and Dina Pomeranz on Twitter.

Wednesday, May 08, 2019

Tech and tech idealism

It's been almost 2 years since my last post! I don't know if anyone is still reading this. If you are, thank you!

Why haven't I posted more? Partly it is the broad transition to microblogging, which everyone is using more than long form. But part also is that I have negative feelings about where tech has been going.

I'm a tech idealist. I think tech can and should be a force for good in the world. I have spent most of my life trying to build systems where computers are helping humans. Sometimes this is by computers sifting information that is hard for people to find on their own. Sometimes this is by computers surfacing other people that can help.

Lately, some tech companies have been favoring exploitation and deception. Data is being used to manipulate. Tech is becoming customer hostile.

I've been lucky. I have gotten to work on some amazing things. There is a joy to helping someone discover a new book they will love, a bit of knowledge added to a life. Many people feel overwhelmed by the news and information in their lives, and sorting through to find what is truly important is too hard. Ads shouldn't be so annoying and irrelevant, and, you know what, they don't have to be. I've enjoyed helping people find and discover whatever they need online.

But looking at where we are in tech now, it feels like a dot com bubble again. Get rich quick. It's not building something that people love, but get the buck. Greed feeds short-term thinking. Grab that next bonus and get out before the wreckage hits.

Tech idealism is still out there. There still are many people building things that help people. There is research, the creation of knowledge and new ways to help even more people. There are many people using computers and data for good.

And there are many new people getting into computer science, which is fantastic. Computers are a force multiplier. Computers make people more productive and more powerful. Computer science and data science are just starting to have an impact in other fields.

The interdisciplinary opportunities are everywhere and exciting. We know almost nothing about our own oceans; there are huge opportunities for discoveries in biology from undersea probes and drones. We are just starting to image the entire night sky frequently, and sifting through that data with massive computing power will forever change astronomy. The field of economics is shifting to data and behavior over theory. Archeology can be fueled by processing massive amounts of satellite imagery. In field after field, computers and data are making the once impossible possible.

Tech idealism is coming back. Something may have to come to flush away some of those just seeking quick profits. Some of the worst abuses may have to be obvious failures before they are rained in. But it will change.

Computers and data are a force multiplier, allowing people to do more than they could before. Working at massive scale, computers help us understand and discover. In long-term, tech is a force for good.