Prompt Engineering — Avançado

Chapter 8: Avoiding Hallucinations

Hallucination — confidently stating something false — is the #1 trust problem with language models. It's significantly reducible through prompt engineering.

Why It Happens

  • Asking about things outside the provided context, forcing Claude to "guess"
  • Not explicitly giving Claude permission to say "I don't know"
  • Asking for specific numbers, names, or dates that weren't provided
  • Not requiring Claude to cite or reference specific source passages

The Core Anti-Hallucination Technique

Ground Claude's answers in provided text. Explicitly permit uncertainty. Use <instructions> to answer based ONLY on information in <document> tags; if the answer is not in the document, say exactly "This information is not in the provided document." Do not supplement with outside knowledge. Before each claim, quote the exact passage that supports it.

From Anthropic's Docs (Agentic Systems)

Use <investigate_before_answering>: Never speculate about code you have not opened. If the user references a specific file, you MUST read the file before answering. Never make claims about code before investigating — give grounded, hallucination-free answers. The principle generalizes: for any domain, require Claude to investigate before concluding.

Calibrated Uncertainty

When answering, mark each claim: HIGH — Based directly on stated information. MEDIUM — Reasonable inference from stated information. LOW — My best guess given limited information. UNKNOWN — Cannot determine from provided information.

Anti-Hallucination Checklist

  • Ground responses in provided documents
  • Explicitly allow Claude to say "I don't know"
  • Require citations or quotes for factual claims
  • Ask Claude to verify its answer before finishing
  • For numbers, require showing the calculation
  • Use structured output (JSON) — forces explicit knowledge claims
  • For agentic systems: require file reading before making claims

Self-Verification Prompt

After completing your analysis: 1. Re-read the original question. 2. Does your answer directly address it? 3. Are all factual claims supported by the provided text? 4. Did you make any unwarranted assumptions? Correct any issues before finalizing.

Chapter 9: Building Complex Prompts (Industry Use Cases)

Complex prompts aren't just long prompts — they're architecturally sound. They separate concerns cleanly, handle edge cases explicitly, and are built for reliability at scale.

9.1 Complex Prompts from Scratch — Chatbot

A production chatbot prompt needs: persona consistency, knowledge boundaries, tone control, and graceful failure. Use <role>, <personality>, <knowledge_base>, <response_format>, <edge_cases>. Example: Sofia for BateauBillet — ferry info, no real-time prices; direct users to booking site; max 3 short paragraphs; handle off-topic and upset users.

Legal applications have the highest hallucination cost. Key principles: Ground everything in provided documents; require "I cannot determine this" for missing information; include mandatory disclaimers; require specific clause citations for every claim. Use <role> (legal document analyst, no legal advice), <constraints> (only explicit document info, cite clause/section, state ambiguity, disclaimer for legal opinion), <document>, <task>, <output_format> with FINDING, SOURCE, QUOTE, NOTES.

9.3 Exercise: Complex Prompts for Financial Services

Constraints: AI helps users understand their own financial statements; cannot give investment advice; must flag unusual items (>20% deviation from prior period); structured report. Framework: <role> financial statement analyst; <task> summary, key ratios (from data only), unusual items, questions for accountant; <constraints> calculate only from provided data, "Data unavailable" if missing, mandatory disclaimer, redirect investment questions to advisor; <output_format> with Financial Position Summary, Key Ratios table, Unusual Items, Questions for Your Accountant, Disclaimer.

9.4 Exercise: Complex Prompts for Coding

Constraints: Correct language/framework; no unsolicited features; explain what changed and why; ask for clarification when ambiguous. Framework: <role> senior software engineer, minimal and well-documented; <principles> minimal changes, no unsolicited features, explain changes, ask first if ambiguous, note how to test; <constraints> match language/framework, no new deps without asking, don't remove error handling; <codebase_context>, <request>; <output_format>: if clarification needed ask first, else approach, code, what changed, how to test.

Congratulations & Next Steps

  • Practice: Take a real workflow and build a production prompt. Go through role → clear instructions → XML structure → examples → anti-hallucination → format specification.
  • Iterate: Good prompts aren't written once. Test against edge cases, observe failures, refine. Keep a prompt library.
  • Evaluate: For production, build evals — test inputs with expected outputs to measure prompt quality over time.
  • Resources: platform.claude.com/docs

Appendix: Beyond Standard Prompting

A.1 Chaining Prompts

Prompt chaining breaks a complex task into sequential API calls; output of one becomes input of the next. When to chain: task too complex for one prompt; you need to inspect or validate intermediate results; different steps need different roles/constraints; you want self-correction (generate → review → refine).

Self-Correction Pattern: Step 1 — Generate (e.g. "Write a blog post about X"). Step 2 — Critique (review against criteria, identify weaknesses). Step 3 — Refine (revise based on critique). Anthropic note: With adaptive thinking in Claude Opus 4.6, the model handles most multi-step reasoning internally; explicit chaining remains valuable when you need to inspect intermediate outputs or enforce a pipeline structure.

A.2 Tool Use

Tool use lets Claude interact with external systems: APIs, databases, code execution, search, custom functions. How it works: 1. You define tools (name, description, parameter schema). 2. Claude decides when to call them. 3. Claude formats the tool call. 4. Your code executes and returns results. 5. Claude synthesizes into a response.

Tool description best practices: Be explicit about when the tool should be used; include examples of appropriate use; specify limitations; for Claude 4.6 avoid "ALWAYS use this tool" — use "Use this tool when..." to avoid overtriggering. Parallel tool calling: Claude Opus 4.6 can call multiple tools simultaneously when independent; use <use_parallel_tool_calls> to instruct parallel calls and never use placeholders.

A.3 Search & Retrieval (RAG)

RAG fetches relevant documents and injects them into Claude's context before answering. Pipeline: user asks → system searches knowledge base → top results injected → Claude answers from injected docs. Prompting: Answer using ONLY <search_results>; cite source document for each claim; if insufficient say "Based on available documentation..."; if unanswerable say "I don't have information about that in my knowledge base." Long context: Put the user question after the documents; Anthropic research shows up to 30% accuracy improvement on complex multi-document queries.

Quick Reference

  • Prompt Structure — System = persistent context; user message = specific task
  • Be Clear & Direct — Specify format, length, tone, and audience explicitly
  • Role Assignment — Set persona in system prompt with specific expertise and context
  • XML Tags — Use <instructions>, <context>, <document> to separate content
  • Output Format — Tell Claude what TO do; match prompt style to desired output
  • Step-by-Step — Ask Claude to reason before answering for complex tasks
  • Examples — 3–5 diverse examples in <example> tags; include edge cases
  • Anti-Hallucination — Ground in documents; require citations; allow "I don't know"
  • Prompt Chaining — Generate → Review → Refine for high-stakes outputs
  • Tool Use — Define tools with "when to use" descriptions; enable parallel calling
  • RAG — Put retrieved docs BEFORE the query; require source citations
  • Adaptive Thinking — Use effort parameter (low/medium/high/max) to control depth

Full Anthropic documentation: platform.claude.com/docs

Perguntas frequentes

How do I reduce hallucinations in Claude's answers?
Ground answers in provided documents only; explicitly allow "I don't know"; require citations or quotes for factual claims; ask Claude to verify before finishing; for numbers show the calculation; use structured output (e.g. JSON).
When should I use prompt chaining vs. a single prompt?
Use chaining when the task is too complex for one prompt, you need to inspect or validate intermediate results, different steps need different roles, or you want a generate → review → refine pipeline. Claude 4.6 handles much multi-step reasoning internally; chaining is still useful for inspectable pipelines.
What is the best order for RAG prompts?
Put the retrieved documents first, then the user question last. Anthropic's research shows this can improve accuracy by up to 30% on complex multi-document queries. Always require source citations in the response.