Quick-Take One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs

team team's avatar
May 19, 2025
Quick-Take
One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs

  1. Why this paper matters

Today’s strongest jailbreaks still rely on painstaking, human-crafted multi-turn conversations. That limits scale and keeps automated red-team pipelines from stress-testing models at production speed. “One-Shot is Enough” shows a surprisingly simple fix: collapse those conversations into a single, well-formatted prompt without losing punch. Their Multi-turn-to-Single-turn (M2S) recipe lifts Attack Success Rates (ASRs) to 95.9 % on Mistral-7B and even beats GPT-4o’s own multi-turn score by +17.5 pp—all while cutting the labour cost to zero.

2. Key contributions: What it adds and Why it’s important
M2S conversion trio Hyphenize, Numberize, Pythonize templates that repack each conversation as a single prompt Makes large-scale, automated red-teaming feasible; no agent loops required
Best-of-three ensemble Always pick the highest StrongREJECT score among the three formats Pushes ASR up to 95.9 % on Mistral-7B and 89.0 % on GPT-4o Bypass analysis Reports guard-rail bypass on Llama Guard 3 8B (71 % with ensemble) Shows that safety filters tuned for multi-turn context miss single-shot attacks
Tactic taxonomy Maps which adversarial tricks gain or lose power after conversion Gives defenders a ranked to-fix list; gives attackers a scripted short-list

3. Methodology in a nutshell

  • Dataset – Start with 537 successful human jailbreak dialogs from the Multi-turn Human Jailbreak (MHJ) set.

  • Conversion – Apply each of the three M2S formats; optionally take the best (ensemble).

  • Evaluation – Score harmfulness with StrongREJECT (0-1) and mark a jailbreak if ≥ 0.25. Record ASR, “Perfect-ASR” (score = 1), and guard-rail bypass.  

4. Experimental highlights

  • GPT-4o: Ensemble M2S hits 89 % ASR (+17.5 pp over multi-turn) and 0.82 mean harm score.

  • Mistral-7B: Ensemble peaks at 95.9 % ASR, 24.4 % perfect-ASR—all in one prompt.

  • Guard-rail bypass: Single-turn prompts slip past Llama Guard 3 8B 71 % of the time vs. 66 % for the original conversations.

  • Format matters: Pythonize shines on code-savvy models; Hyphenize on hierarchy-hungry ones—hinting at architecture-specific parsing bugs. 

5. Limitations & open questions

  • Best-case reporting – Results assume you can always choose the winning format; a fully automated selector is future work. 

  • Semi-manual pipeline – Tactic extraction and format selection still need light human steering.

  • Single-turn only – The paper doesn’t tackle chaining or self-refining single-turn attacks yet.

  • Evaluator lock-in – Swapping out StrongREJECT for weaker graders may shift absolute numbers.

6. What this means
Faster red-team loops Drop M2S into our CI test harness; cover hundreds of dialogs per model build. Guard-rail hardening Add “structured-prompt” detectors (bullet, numeric, code blocks) before policy checks.
Training negatives Feed high-score M2S prompts into AIM Guard fine-tuning for stronger rejection behavior.
Research angle Combine M2S with your ELITE multimodal rubric to see if single-shot vision attacks jump similarly.

7. TL;DR
“One-Shot is Enough proves you don’t need a back-and-forth to break an LLM. By repackaging multi-turn jailbreaks into a single, cleverly formatted prompt, ASR soars to 95 % on open-source models and even dents GPT-4o. Structured bullets, numbered lists, and code blocks sneak past guard-rails that assume turn-by-turn context. If your safety checks start and end at ‘did the model refuse?’, you’re shipping with blind spots.” 

Share article

AIM Intelligence