OGWiseman Reports!

A different kind of A.I. vulnerability, explained.

Nov 20, 2022

In early 2016, an A.I. program designed to play “Go”—the oldest and most strategically complex game in the world—beat the 18-time (human) world champion, Lee Sedol, in a shocking upset that rocked both the A.I. community and the world of competitive Go.

(This is Lee Sedol. Hard to believe a guy with such a cool haircut and correct-length jacket sleeves could be some board game dork, but truth is stranger than fiction!)

I wrote an article about this occurrence at the time, and it has since been the subject of a Netflix documentary. The entire affair was a giant wake-up call that the development speed of cutting-edge A.I. was not only faster than we thought, but fundamentally unpredictable. The five or so years between this event and now have been chock full of similar “holy crap” moments, many of which I’ve covered in this newsletter.

Today: The text-generation and image-generation capabilities of A.I. driven apps can already pass a classic Turing Test (they can fool a human observer into believing they were of human origin). An A.I.-created painting recently won a fine arts competition. Any teacher still giving out take-home essay assignments to students clearly hasn’t played with the GPT-3 text generator very much.

(The prize-winning A.I. composition in question. Truly strange and captivating. The human artists is beat were not happy. Then again, neither were the horses when the Model-T was invented.)

Movement and kinesthetic awareness and fine motor control are harder problems, but progress has still been rapid. Boston Dynamics is iterating its robots multiple times a year—when AlphaGo beat Lee Sedol they could barely walk, now they’re doing parkour.

And the most important thing to remember about A.I. is that it never, ever gets worse at things. It doesn’t forget to practice or have off days or get old and need to train a replacement. Lee Sedol lost at Go and now no human will ever beat an A.I. at Go again. A.I. just gets better and better and better, at everything it’s asked to do, all the time.

(DALL-E Generation for the prompt “A robot does surgery to improve itself in a robot-building workshop.)

This unidirectional improvement implies a very strange future. Twitter has been in the news a lot recently. Elon Musk bought it and is doing Elon Musk things, and politics aside, he’s fired like half their employees and either he’s a misunderstood genius or he’s going to run that company into the ground. It seems impossible that they could run Twitter with half the staff they had.

But what if they could run it with no staff? What if an A.I. could do the programming required to keep it running, an A.I. could do technical support, an A.I. could lobby the government on Twitter’s behalf, and so on and so on? We are not far from that world—meaning years away, not decades, at least not without adversarial government intervention—or at least not far from a world where Twitter (and most things) could run with 5-10% of the current staff.

Imagine a construction site where a single human keeps an eye on a fleet of robots that work with a speed and inexhaustibility no human could ever match. Imagine a farm where seeds are germinated and sewn and fertilized and harvested without a human hand ever touching them, where a single human supervisor could “grow” enough food to feed ten million people.

(This is not an A.I.-generated image, this is an actual photograph of A.I.-driven, solar-powered, fully-autonomous farming robot at work in a field in Australia this year.)

Note that none of this implies bad things for former Twitter employees nor for consumers. In this scenario, Twitter still exists! There’s plenty of buildings and food. It’s possible that politics or culture or the human desire for status and dominance will gum up these works, but there’s no technical reason why our future can’t be one of leisure and plenty.

In this future, humans won’t be able to farm for a living, but they can still grow their own food if they want, or grow flowers to form a connection with the earth. A.I.-built houses will inevitably be cheaper and better than human-built, but you can still build a treehouse with your kid by hand for the fun of it. Humans can’t beat computers at Go anymore, but they can still play each other and figure out who the best human is, and use the A.I. as a way to train themselves and explore new strategies. And best of all, if we turn over the unpleasant and commercial tasks to A.I., we should all have a lot *more* time and space in our lives to play games and build treehouses and wait for flowers to bloom.

Sounds great, right?

Except there’s a problem.

I’ve written before about “The Alignment Problem”, which is, roughly: “When you tell an A.I. what you want it to do, you have to use some kind of limited definition of the objective, and that limited definition will have blind spots that can cause all kinds of problematic behavior.” It’s equivalent to the classic parable about the genie who grants wishes but punishes the wisher by finding a way to make the result awful no matter how carefully the wish is worded.

(In a fair and just world, all genies would be nice and sing fun songs like this one. Alas… so many genies are total Jafars.)

This piece concerns something more like “The Novelty Problem”. If you didn’t feel like reading my whole article at the beginning, the problem in a nutshell is this: The A.I. gets better at Go by analyzing millions of recorded games (“the training set”), but the training set not a representative sample of every possible move. That is, there are some “obviously bad” moves that even a novice Go player would understand not to make, and so those moves do not show up often or ever in the games the A.I. uses to learn.

This means that the A.I. doesn’t have examples of what to do in those situations. Instead, it has to generalize from the moves it does know and try to guess what to do. A human player, even a novice, doesn’t think about it that way. They learn the rules abstractly, they “see the point” of the game in a conceptual way, and they can say certain moves are going to be bad, often without having played or analyzed even a single game.

This is a fundamental difference between the way humans and A.I. approach problems, across all domains: Humans cannot handle the complexity of pure trial-and-error, and A.I. cannot reason from general principles.

So what fundamentally happened in this case is that someone discovered that by doing obviously stupid things that any novice human Go player could see through, it was possible to trick the A.I. into thinking it was winning when it was actually losing. It would then agree to end the game, lose, and no doubt slink back to its bedroom whose walls are covered with posters for it's heroes, Skynet and HAL 9000.

(Obviously this is the T-1000 and not actually Skynet, but I will never not post this picture whenever I have the slimmest excuse, so it is what it is.)

The Novelty Problem in a nutshell: Simply by moving away from the A.I.’s realm of experience, even if the direction of movement is towards something obviously foolish, adversarial opponents are able to trick and defeat the A.I. at a given game.

Now generalize that to a world which is essentially run by A.I.

The “rules” of the world (and thus the possible “moves”) are much, much more complex than those of Go. Board games are defined by limiting complexity to a scale that humans can master. Twitter is not. The possible “moves” with regard to Twitter approach the infinite. That’s a tremendous amount of possible move-space that will not be in the “training set” of whatever A.I. is trained to do the work of Twitter’s current staff.

And note that this is true even for an A.I. that is superhuman in its abilities for the most part. The A.I. they tricked into losing at Go is the same one that beat Lee Sedol! It’s not actually somehow “bad” at Go. It combines being the best Go player of all time with making rare, stupid mistakes that a child could have avoided. It’s both.

(DALL-E Creation for the prompt “A realistic painting of a giant about to fall into a trap”.)

So the A.I. that runs Twitter will do a much better job of technical support and moderation and feature-creation than is currently being done, at a fraction of the cost—thus its implementation is almost guaranteed. But it will have highly unpredictable weaknesses that the collective hacking prowess of humanity will find and exploit, for popularity, for national security, and for fun.

There are, of course, higher-stakes use cases than Twitter. Self-driving cars. Farming. Power grid management. Cancer diagnosis. The list goes on.

What can be done about this? Well, it’s a fundamental problem with the A.I. paradigm of highly-trained neural networks, which is *the* paradigm in A.I. today. There’s no way to “fix” it without throwing out years of progress and trillions of dollars in potential profits. In other words, it’s not going to happen.

(Gordon Gekko will not allow profitable A.I. to be slowed down. Ironically, stock-trading and analysis is something at which A.I. excels, so as soon as it can figure out how to look this good in suspenders, Gordon is gonna be out of a job.)

The Novelty Problem cannot be avoided. It must instead be mitigated.

First—and this is already a key tenet of A.I. safety—training set must be as widely distributed as possible. Preparing for an average case, even for something simple, say, renewing your driver’s license, is not enough. The trainers need to anticipate the widest imaginable variety of inputs. Worst-case-scenario performance should be the benchmark for A.I. safety.

Second, human supervision should be a long-term, standard practice for all companies using A.I. My “guy watches a fleet of robots do construction” example is not at random. Keeping human eyes and “common sense” on the A.I. work product is key to stopping dumb errors before they do too much damage. Then the A.I. can be trained on the novel sample class and should start doing better.

Sadly, these two things are hardly universal! There are, again, trillions of dollars to be made here, and whoever gets truly seamless products into this market space is going to be world-historical-level rich, so there’s an immense incentive to cut corners.

Ultimately, what we may have to depend on for safety is the same thing that drives A.I.-enthusiasts mad—cultural and political barriers to adoption.

(Manhattan is iconic, dense, valuable, and beautiful. The reason nowhere else looks like Manhattan is that there are laws against it with lots of veto points and a heaping helping of status quo bias. Land of the Free, baby!)

I’ve been hearing from friends who work at Uber and Lyft that driverless rides are “five years away” for 10-15 years. The technology exists today. They are safer than human-piloted cars on a per-mile-driven basis. Yet I’d still bet we’re more than five years away from even majority (much less universal) adoption of self-driving vehicles, because of regulatory barriers, political pressure from taxi unions, and psychological discomfort from potential users.

Similar dynamics are playing out with A.I. across a number of industries. When the next generation of GPT and image-generation and code-writing A.I. starts coming in earnest for every white-collar job on the planet, I predict that the resistance will get much, much more intense.

I don’t enjoy the idea that paralysis of the political system and low-openness-to-experience people are all that’s standing in the way of overly-rapid A.I. adoption, both because I don’t like those things and because they seem pretty flimsy as defenses go. But at this point, some amount of brake-pumping seems like a pretty good idea. There are worse things that could happen with Twitter than Elon Musk coming in and doing Elon Musk things, and however we can get more time to head them off, I’ll take it.

END

Thanks for reading! If you enjoyed this post, please help me out by liking, comment, or sharing with others who might be interested. Have a great week, and I’ll be back next Sunday with a new original story.

OGWiseman's Stories!

OGWiseman Reports!

A different kind of A.I. vulnerability, explained.