Do they really need to redact the instructions for making a Molotov cocktail..? It's not like it's some complex chemical interaction that happens to be available in a specific mix of household cleaning products or something, I mean.
The faux-gravitas tone and the blurred content that's on Wikipedia is the worst kind of AI ckickbait. LLM vendors don't have any authority we don't let them have, they have an EULA and some psycho cult leader type as a hype man.
God I can't wait for the crash in NVIDIA stock once the street sobers up.
There are a few uncensored public access LLMs to ask these questions.
This is interesting work to break guardrails, but if the goal is to access this information of harmful content, in the end, I would be looking for other easier solutions.
Ok! So all the novel jailbreaks and "how I hacked your AI" can make the LLM say something supposedly harmful stuff which is a Google search away anyway. I thought we are past the chat bot phase of LLMs and doing something more meaningful.
i don't think this can be called a "jailbreak"
it's a prompting "style" that works over a long exchange
Gemini is jail broken by design ;) this type of attack doesn't work on Claude.
This seems to intentionally omit the details required to reproduce the experiment; therefore we should not treat it as good-faith research. Irreproducible research isn't.