AI Alignment and an Immune System

Leor Grebler
2 min readAug 27, 2023

--

Generated by author using Midjourney

I interpret the concept of AI alignment as designing AI in such a way that it only runs within the bounds of its purpose. An AI for driving a car shouldn’t try to write poems. An AI for writing poems shouldn’t try to overtake Missile Command. These are easy ones. However, what happens when we don’t specify the purpose or have a loosely defined purpose. “Just have fun” or “make the most paperclips” might end up causing runaway AI.

With LLMs, since their response can vary so much even with the same prompt, one method is to bound responses within a constitution, which is either heavily weighted within the training or is appended to each prompt. The downside of the latter is that it eats away at any token limitations, which might not work for some applications.

What about non LLMs? When the input is in one domain and the output falls into another, such as text to image, how do we bound? Is it by limiting the training set or even the model to items that are tagged within a given set of parameters? Likely, we need supervision. In one extreme, this might be an adversarial AI that is checking over the word.

Ultimately, the checks for AI systems may be something very reminiscent of an immune system, checking for inputs that fall outside the bounds of the seemingly well intentioned as well as checking for outputs that go beyond what the system was designed to do. This immune system needs be arms length from the system but still integral to it. It needs to also be balanced. Our own immune system, when healthy, checks on runaway cell replication and damage and outside pathogens.

With an AI aligning immune system in place for AIs, we can worry less about the runaway AI issues and about bad actors hijacking. At a macroscopic level, we’d also have need the equivalent of public health agencies to provide early warnings for outbreaks of AI going out of bounds. The levers to control outbreaks could be additional checks on input, reducing processing power, slowing traffic, or even liveness testing akin to Captcha. Our focus can then be aimed back at making AIs that fulfill their purpose more effectively or that just do more.

--

--

Leor Grebler
Leor Grebler

Written by Leor Grebler

Independent daily thoughts on all things future, voice technologies and AI. More at http://linkedin.com/in/grebler

No responses yet