Prompt Injection Attack

1 min readFeb 23, 2023

Some people are finding ways to get ChatGPT to respond to prompts that OpenAI put safeguards to protect against. These include profanities and things that are harmful.

Never underestimate humanity’s propensity to try to break things.

So, by asking ChatGPT to stop these safeguards, some people were able to insert prompts that were previous prevented by the system. Did these attacks lead to outputs that were super interesting or just obscene? Or obscenely boring?

The issue with generative AI is that it doesn’t know when it’s output is actually good. It doesn’t know our reaction to the content it produces. Then again, beyond the likes or reposts, most of the time, neither do we. However, we know from our previous learning of interactions what be beyond the realm of decency.

This is something that generative AIs might need to learn the hard way. Rote rules are not going to safeguard against bad actors circumventing them. However, if we teach what’s appropriate and what’s not, the result will be a more decent AI.

Prompt Injection Attack

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Leor Grebler

No responses yet