Prompt Injection Attack

Leor Grebler
1 min readFeb 23, 2023

--

Generated by author using Midjourney

Some people are finding ways to get ChatGPT to respond to prompts that OpenAI put safeguards to protect against. These include profanities and things that are harmful.

Never underestimate humanity’s propensity to try to break things.

So, by asking ChatGPT to stop these safeguards, some people were able to insert prompts that were previous prevented by the system. Did these attacks lead to outputs that were super interesting or just obscene? Or obscenely boring?

The issue with generative AI is that it doesn’t know when it’s output is actually good. It doesn’t know our reaction to the content it produces. Then again, beyond the likes or reposts, most of the time, neither do we. However, we know from our previous learning of interactions what be beyond the realm of decency.

This is something that generative AIs might need to learn the hard way. Rote rules are not going to safeguard against bad actors circumventing them. However, if we teach what’s appropriate and what’s not, the result will be a more decent AI.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Leor Grebler
Leor Grebler

Written by Leor Grebler

Independent daily thoughts on all things future, voice technologies and AI. More at http://linkedin.com/in/grebler

No responses yet

Write a response