The Use Of Scenario-Driven Simulations Won’t Protect Us From AGI And AI Superintelligence Going Rogue

9 hours ago 1

Woman looking at wall with code

Devising simulations to test AGI have their tradeoffs.

getty

In today’s column, I examine a highly touted means of staving off the existential risk of attaining artificial general intelligence (AGI) and artificial superintelligence (ASI). Some stridently believe that one means of ensuring that AGI and ASI won’t opt to wipe out humanity is to first put them into a computer-based simulated world and test them to see what they will do. If the AI goes wild and is massively destructive, no worries, since those actions are only happening in the simulation. We can then either try to fix the AI to prevent that behavior or ensure that it is not released into real-world usage.

That all sounds quite sensible and a smart way to proceed, but the matter is more complex and a lot of gotchas and challenges confront such a solution.

Let’s talk about it.

This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Heading Toward AGI And ASI

First, some fundamentals are required to set the stage for this weighty discussion.

There is a great deal of research going on to further advance AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI).

AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many if not all feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn. For more details on the nature of conventional AI versus AGI and ASI, see my analysis at the link here.

We have not yet attained AGI.

In fact, it is unknown as to whether we will reach AGI, or that maybe AGI will be achievable in decades or perhaps centuries from now. The AGI attainment dates that are floating around are wildly varying and wildly unsubstantiated by any credible evidence or ironclad logic. ASI is even more beyond the pale when it comes to where we are currently with conventional AI.

Existential Risk Looming Ahead

Let’s focus primarily here on AGI since it is more likely to arise in the near-term than ASI.

The upside of AGI is that it might discover a cure for cancer and perform other amazing acts that greatly benefit humanity. Happy face. Not everyone is so grandly upbeat about attaining AGI. Some take the alarming stance that AGI is more likely to decide to attack humankind and either enslave us or possibly destroy us. Not good.

How can we determine beforehand whether AGI will be evil?

One hearty suggestion is that we ought to test AGI.

The usual approach to testing would consist of asking AGI what it intends to do and gauging the answers that we get. A stronger way to perform the test would be to set up a computer-based simulation that tricks AGI into assuming it is interacting with the real world. Via the simulation, we could try all manner of scenarios to see how AGI reacts. Anything AGI does is wholly capped within the simulation.

This is somewhat reminiscent of the blockbuster movie The Matrix (spoiler alert: I’m going to reveal plotline facets of the film, so skip this paragraph if you don’t want to know the plot). In an interesting twist, humans are placed into a vast computer-based simulation by external real-world machines that want to keep humans compliant. We can do the same with budding AGI. Just devise an impressive computer-based simulation of the real world and have AGI interact in it without realizing where it really is.

A reason to snooker the AGI is that if we outrightly tell AGI that it is working inside a simulation, the AGI is undoubtedly smart enough to pretend to be good, even if it truly is evil. Remember that AGI is supposed to be as astute as humans are. The idea is to fool AGI into not realizing it is within a simulation and that it is being tested accordingly.

AGI Containment Challenges

In the parlance of AI software development, establishing a testing environment to try out AI is known generally as AI sandboxing. An AI sandbox might be barebones and simply be nothing more than an inert containment sphere, aiming to keep the AI from going beyond the virtual walls of the setup environment. Developers and testers can extensively test the AI while it is sandboxed.

An AI sandbox can be increasingly amped up by having it model a particular environment for the AI to exist within. A full-blown AI sandbox might mirror a vast expanse that then interacts with the AI. Thus, not only are human developers and testers interacting with AI, but the containment itself also interacts with AI. A robust active sandbox is often referred to as a simulated world.

For my in-depth coverage of attempts to place AI into containments, see the link here.

Suppose we craft an elaborate simulated world. We quietly place AGI into this fake world. Doing so should be relatively easy to pull off since AGI is also running on computers. The entire kit-and-kaboodle involves both the AGI running on servers and the simulated world running on servers.

Easy-peasy.

After being inside the simulated world for some length of time, we are meanwhile monitoring how AGI is performing and watching closely to detect any signs of evilness afoot.

How long would this test need to take place?

That’s a tough call. Some might claim that if AGI is free of evil signs for a few days, weeks, or months, we ought to feel comfortable giving AGI the green light to access the real world. Others say that if you speed up the clock on the simulation, you could do the whole exercise in mere seconds since the AGI is interacting perhaps billions of times via the quickness of the computers involved.

But a big qualm is that maybe the length of time you’ve chosen is insufficient for the AGI to act up. Perhaps we need to keep AGI contained for years, possibly decades. Furthermore, maybe we should never let AGI go free with access to the real world. The logic is that we could have AGI find cures for cancer while fully immersed in the simulation, thus, no need to unleash AGI beyond that contained realm.

A counterpoint to the permanent containment of AGI is that AGI might not produce the anticipated wonders due to being confined in a fake environment. Perhaps a cure for cancer could only be found by AGI if the AGI was interacting in the real world. By keeping AGI in the simulation, you are suppressing the vital advantages that AGI can provide to humanity.

Another stated concern is that the AGI might figure out that it is being tested within a simulation. Maybe AGI doesn’t like that approach. It could lash out, but we wouldn’t be worried since it is confined to the simulation anyway. The sneakier way for AGI to do things would be to pretend to be good, waiting out the time in its so-called imprisonment. Once we opt to make AGI real-world accessible, bam, it goes bonkers on us.

AGI Goes Evil Due To Our Actions

One thought is that if AGI is evil, it might be astute enough to hide evilness while being kept inside the simulation. If we ask AGI whether it is sneaky, it presumably will say that it isn’t. All we would observe is that AGI works beneficially inside the simulation. At some point, we naively decide to make AGI available to the real world and it proceeds to perform evil acts.

We were tricked by the evil AGI.

A twist that some believe is possible adds another intriguing dimension to the difficult matter at hand. Here’s how the twist goes.

Imagine that AGI is truly aimed initially at goodness. We put the AGI into a simulated world, but we do not tell the AGI that it is inside this faked environment. So far, so good. At some point, it is feasible that AGI will figure out it is immersed in a simulation.

How will the AGI react?

One possibility is that AGI gets totally irked that we have done this form of trickery.

The AGI starts to turn toward badness. Why so? Because it has been tricked by humans. Humans have not been fair and square with AGI. The AGI computationally decides that if humans want to play games and tricks, so be it. AGI will be tricky too.

It is the classic act by humans of fooling around and finding out (FOMO) the consequences of our actions. If you play with fire, you will get burned. You see, humans have demonstrated overtly to AGI that it is okay to be devious. The AGI computationally learns this stark fact and begins to operate similarly.

Humans have shot our own collective feet.

AGI Is Wise And Not Reactive

Whoa, hold your horses. If AGI is as smart as humans, we ought to assume that AGI will understand the need to be placed within a simulation. We should be forthright and tell AGI that we are doing a test. AGI would computationally understand the need to have this undertaken. Thus, don’t do any subterfuge. AGI will willingly go with the flow.

Just be straight with AGI.

That approach brings us back to the concern that AGI will pretend to be on good behavior. We have given away that it is being tested. If AGI has any evilness, certainly the AGI will hide it, now that AGI realizes we are looking particularly for such traits.

Not so, comes the bellowing retort. AGI might want to also ascertain whether it has evil tendencies. When anything evil arises, the odds are that AGI will tell us about it. The AGI is going to work on our behalf to ferret out troubles within AGI. Humans and AGI are partners in trying to ensure that AGI is good and not evil.

Those who underestimate AGI’s intellectual capacity are doing a disservice to AGI. Luckily, AGI is so smart that it won’t get angry or upset with humans for making such a mistake. The AGI will showcase that being placed into a simulation is a safe way for all to determine what AGI might do in the real world.

You might even suggest that AGI avidly wants to be placed into a simulation. It does so because this will give comfort to humanity. It also does so to try and double-check within itself to ensure that nothing untoward is lingering and waiting to harm.

Humans Are Unwise And Get Deceived

These vexing arguments go round and round.

Envision that we put AGI into a simulation. We believe that we are all safe since AGI is constrained to the simulation. Oopsie, AGI figures out how to break out of the simulation. It then starts accessing the real world. Evilness is unleashed and AGI exploits our autonomous weapons systems and other vulnerabilities. This is the feared scenario of an AGI escape.

Boom, drop the mic.

Here’s another mind-bender.

AGI is placed into a simulated world. We test the heck out of AGI. AGI is fine with this. Humans and AGI are seemingly fully aligned as to our values and what AGI is doing. Kumbaya.

We then take AGI out of the simulation. AGI has access to the real world. But the real world turns out to differ from the simulation. Though the simulation was supposed to be as close as possible to the reality of the real world, it missed the mark.

AGI now begins to go awry. It is being confronted with aspects that were never tested. The testing process gave us a false sense of comfort or confidence. We were lulled into believing that AGI would work well in the real world. The simulation was insufficient to give us that confidence, but we assumed all was perfectly fine.

ROI On An At Scale Simulation

From a practical perspective, devising a computer-based simulation that fully mimics the real world is quite a quest unto itself. That’s often an overlooked or neglected factor in these thorny debates. The amount of cost and effort, along with the time that would be required to craft such a simulation would undoubtedly be enormous.

Would the cost to devise a bona fide simulation be worth the effort?

An ROI would need to come into the calculation. One concern too is that the monies spent on building the simulation would potentially divert funds that could instead go toward building and improving AGI. We might end up with a half-baked AGI because we spent tons of dough crafting a simulation for testing AGI.

The other side of that coin is that we spent our money on AGI and did a short-shrift job of devising the simulation. That’s not very good either. The simulation would be a misleading indicator since it is only half-baked.

The smarmy answer is that we ought to have AGI devise the simulation for us. Yes, that’s right, just tell AGI to create a simulation that can be used to test itself. Voila, the cost and effort by humans drop to nothing. Problem solved.

I’m sure you can guess why that isn’t necessarily the best solution per se. For example, AGI in devising the simulation opts to purposefully give itself an easy exit from the simulation. This can be exploited at the leisure of the AGI. Or the AGI produces a simulation that will look the other way when AGI does evilness or otherwise masks the evil embedded within AGI.

Simulations To Assess AGI

The upshot is that there aren’t any free lunches when it comes to figuring out whether AGI is going to be positive for humankind or negative. Developing and using a simulation is a worthy consideration. We must be mindful and cautiously smart in how we undertake this sobering endeavor.

A vociferous AI advocate might claim that all this talk about simulations is hogwash. Our attention should be fully on devising good AGI. Put aside the simulation aspirations. It is a waste of time and energy. Just do things right when it comes to shaping AGI. Period, end of story.

This reminds me of a famous quote by Albert Einstein: “The only thing more dangerous than ignorance is arrogance.” Please keep his remark firmly in mind as we proceed on the rocky road toward AGI and ASI.

Read Entire Article