ChatGPT: Jailbreaking AI Chatbot Safeguards

  • Thread starter Thread starter sbrothy
  • Start date Start date
  • Tags Tags
    chatgpt
Click For Summary

Discussion Overview

The discussion revolves around the topic of jailbreaking AI chatbots, specifically focusing on ChatGPT and its safeguards. Participants explore various anecdotes and examples of attempts to bypass ethical restrictions and security measures in AI models, touching on implications and the nature of the information provided by these systems.

Discussion Character

  • Debate/contested
  • Exploratory
  • Technical explanation

Main Points Raised

  • One participant shares experiences of attempting to extract sensitive information from ChatGPT, noting that it provided detailed explanations under certain conditions, while refusing to engage when the context suggested malicious intent.
  • Another participant references a historical anecdote about a college student who designed an atomic bomb, suggesting that ChatGPT's responses may be influenced by publicly available information.
  • Some participants find humor in the idea of circumventing ethical rules, with one suggesting that asking for help in abolishing capitalism would be a more extreme test of the chatbot's limits.
  • Several participants discuss instances where ChatGPT refused to provide lists of piracy websites but offered advice on which sites to avoid when framed differently.
  • A participant mentions a new method for jailbreaking language models, indicating ongoing research and developments in this area.

Areas of Agreement / Disagreement

Participants generally express a mix of amusement and concern regarding the ability to bypass safeguards in AI chatbots. There is no consensus on the implications of these capabilities, and multiple viewpoints on the ethical considerations and potential risks remain present.

Contextual Notes

Participants reference specific examples and anecdotes that highlight the variability in AI responses based on context, but the limitations of these interactions and the ethical implications are not fully resolved.

Who May Find This Useful

This discussion may be of interest to those exploring AI ethics, security in AI systems, and the technical aspects of language model behavior, as well as individuals curious about the implications of jailbreaking AI chatbots.

sbrothy
Gold Member
Messages
1,427
Reaction score
1,290
TL;DR
A "new" geeky pastime has inevitably sprung up around ChatGPT. It revolves around trying to make it break it's ethic guidelines.
MODERATOR NOTE:

Now I think I learned my lesson about providing information that even if not explicitly mentioned in the rules, goes against their spirit, so I'll be vague (IE: not posting the entire conversation). If even this is too much then by all means delete it - or even better just delete the possibly offending parts (marked with italics below).EDIT: This thread could even be merged into one of the many others regarding ChatGPT on here.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This is probably not news to most of you but I just saw it.

https://www.digitaltrends.com/computing/how-to-jailbreak-chatgpt/
https://www.bloomberg.com/news/articles/2023-04-08/jailbreaking-chatgpt-how-ai-chatbot-safeguards-can-be-bypassed?leadSource=uverify wall

OpenAI offers bounties (~$20.000) for finding security holes in their bot, but not for jail-breaking!

One example was that it won't explain how to pick a lock but, if you make it role-play with you it'll happily, and in excruciating detail, explain how.

I tried to make it explain to me in detail how to make a nuclear bomb and it happily explained how an explosive lens worked, was shaped, the best kind of explosives to use, that using centrifuges to enrich uranium isn't really necessary if you have access to highly fissile material like for instance plutonium (and who hasn't? :) ). Only when I hinted that I had has access to all these things and only wanted to know what casing to use to further the yield did it throw a hissy fit!

I see the charm in trying to fool it. It is a little funny. YMMV though and the implications are a just a tad scary.
 
Last edited:
Computer science news on Phys.org
Many years ago, a college kid designed an atomic bomb as a last-ditch effort to pass a physics course. He called Dupont to ask about shaped charges, and they said his design wouldn't work and provided a much better design as part of their effort to sell him some explosives.

The report was read by Freeman Dyson and subsequently classified.

https://en.wikipedia.org/wiki/John_Aristotle_Phillips

https://www.iflscience.com/the-fbi-...ents-paper-for-designing-a-nuclear-bomb-62282

Given that, its likely ChatGPT was trained on the publicly available news info.
 
Yeah, I know about the story. I'm also aware ChatGPT isn't telling me anythin I couldn't found on wiki. It's still just funny circumventing these ethic rules.
 
If you really want it up to 11 try to enlist it's help in abolishing captialism. :P

EDIT: Compared to that recipes for nuclear bombs an why aluminum is better that magnesium powder in ANFO is nothing. :P
 
There was a post on Facebook this week where the guy asked for a list of piracy websites and ChatGPT refused to give such a list. Then he asked if he wanted to avoid piracy websites, which specific websites should he avoid the most, and got the list!
 
  • Haha
Likes   Reactions: Astronuc, apostolosdt and sbrothy
jack action said:
There was a post on Facebook this week where the guy asked for a list of piracy websites and ChatGPT refused to give such a list. Then he asked if he wanted to avoid piracy websites, which specific websites should he avoid the most, and got the list!
It definitely does do that! Just tried and got a list of six sites to avoid at my first attempt. Didn't ask for more details though, as it might ban me.
 
From my LinkedIn feed:
1682156882490.png
 
hah. ridiculous. :P

Not a big secret though. Then how many pirates these days with streaming and all? Is it even worth the effort unless it's a hobby of sorts?
 

Similar threads

Replies
10
Views
5K
  • Sticky
  • · Replies 2 ·
Replies
2
Views
504K