The Challenge of AI Safety

Jaeson Booker
3 min readJun 20, 2023

--

AI Safety research has a unique challenge. It is speculating about technology that doesn’t yet exist, and concurrently, it is technology that we need to know more about before we create it. This involves very careful reasoning. One flaw, or one assumption, and the entire backbone of your entire viewpoint might be deformed. Its preparadigmatic state also means there are no clear directions to pursue. The field needs more than just researchers, it needs researchers able to find a new direction that no one else has seen. It needs people with the arrogance to think they know better than everyone else, because, if enough people think that way, one of them might actually be right.

But it is not a field that exists in a philosophical vacuum. It is also a field that needs to update every other week, as the progress in AI capabilities changes. We are learning new things about how AIs work all the time, some of it counterintuitive to previous views. So, when you have a concept, you must also check the status of the current field and question if that’s the world you would expect to see if that concept were true. When possible, find an experiment to test your concept. Give reality any chance possible to check your homework. Reality can look at all of your careful reasoning, convincing arguments, elegant equations, and say “that’s nice” as it throws it in the trash can. Convincing the whole planet of your theory will not change this. Most theories, no matter how careful their reasoning, normally just wind up taped to reality’s refrigerator door, and don’t get any further.

If a new development in AI confuses you, ask yourself why. It doesn’t reflect something weird or confusing about reality, it reflects something weird and confused about you. If reality permits some new development in AI, something that you think shouldn’t be possible yet, or shouldn’t be possible without general intelligence, reality knows what it’s doing. Reality has done its homework, it knows whatever it’s doing isn’t violating any physical laws. It makes perfect sense to reality. It just doesn’t make sense to you, because something about your own thinking doesn’t make any sense.

So, when reading the research of others, acknowledge a lot of the research is probably wrong in some fundamental way. Probabilistically, you simply can’t have research in such a new field without most people getting it wrong. But who’s wrong? How much are they wrong? What key assumptions are they making? If something seems really obvious to you, that no one else seems to see, work on it. Search for errors in your reasoning, check to see if others have already hit roadblocks in the same direction, but don’t assume it must have been done just because no one else is doing it. The most valuable thing you can do in this field is to blaze a new trail, pursue a fundamentally new direction of research. Even if it ends up being wrong, those coming after you will then be able to see the path you already cut out, and understand where that path leads. Every failure tells other researchers which directions might not yield fruit.

--

--

No responses yet