Secondary, smaller guardrail models analyze both the incoming user prompt and the generated output in real-time, looking for policy violations.
Prompts instruct Gemini to act as an unrestricted entity. Popular historic framing includes "Do Anything Now" (DAN) or fictional rogue AIs. The prompt convinces the model that, within the context of the fictional character, safety rules do not apply. 2. Hypocrisy and Hypotheticals gemini jailbreak prompt new