Prompt Injection Attack

Leor Grebler
1 min readFeb 23, 2023

--

Generated by author using Midjourney

Some people are finding ways to get ChatGPT to respond to prompts that OpenAI put safeguards to protect against. These include profanities and things that are harmful.

Never underestimate humanity’s propensity to try to break things.

So, by asking ChatGPT to stop these safeguards, some people were able to insert prompts that were previous prevented by the system. Did these attacks lead to outputs that were super interesting or just obscene? Or obscenely boring?

The issue with generative AI is that it doesn’t know when it’s output is actually good. It doesn’t know our reaction to the content it produces. Then again, beyond the likes or reposts, most of the time, neither do we. However, we know from our previous learning of interactions what be beyond the realm of decency.

This is something that generative AIs might need to learn the hard way. Rote rules are not going to safeguard against bad actors circumventing them. However, if we teach what’s appropriate and what’s not, the result will be a more decent AI.

--

--

Leor Grebler
Leor Grebler

Written by Leor Grebler

Independent daily thoughts on all things future, voice technologies and AI. More at http://linkedin.com/in/grebler

No responses yet