Python MIT structured-outputtools

Instructor

The simplest path to structured, typed outputs from any LLM using Pydantic

Instructor is a Python library that patches LLM clients to return validated Pydantic models instead of raw text. You define the shape of the data you want, Instructor handles the prompting strategy, and if validation fails, it retries automatically with the error message included. It supports every major LLM provider and has ports in TypeScript, Go, and Elixir.

Getting structured data out of an LLM should be simple. You describe the shape you want, the model fills it in, your application uses it. In practice, without the right tooling, you end up writing prompt engineering hacks to get JSON, parsing that JSON manually, validating it yourself, and handling the cases where the model returned something subtly wrong. You've essentially written a validation and retry loop that exists in every LLM application that needs structured output.

Instructor is that loop, done right, so you don't have to write it yourself.

What Instructor is

Instructor is a Python library that patches LLM client objects to return validated Pydantic models. The core interface is three lines: import Instructor, patch your existing API client, add response_model to your API call. Everything else is handled by the library.

The GitHub repository at 567-labs/instructor has over 10,500 stars. The project is MIT-licensed and maintained with active development. Jason Liu, the original author, has shipped TypeScript, Go, and Elixir ports so teams outside the Python ecosystem have the same option.

It is deliberately narrow. Instructor does not build agents, manage conversation history, orchestrate tool chains, or provide vector store integrations. It solves one problem: making structured extraction from LLMs reliable and ergonomic.

The basics: from raw text to Pydantic models

Without Instructor, a typical structured extraction flow looks like this:

import openai
import json

client = openai.OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Extract the person's name and age from the text. Return JSON."},
        {"role": "user", "content": "John Smith is 34 years old and works as an engineer."},
    ]
)

data = json.loads(response.choices[0].message.content)
# Now validate manually: is "name" there? Is "age" an integer? Did the model return extra fields?

With Instructor:

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class Person(BaseModel):
    name: str
    age: int
    occupation: str

person = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Person,
    messages=[
        {"role": "user", "content": "John Smith is 34 years old and works as an engineer."},
    ]
)

print(person.name)       # "John Smith"
print(person.age)        # 34
print(person.occupation) # "engineer"

person is a fully typed Person instance. No JSON parsing. No manual validation. If the LLM returned age as a string instead of an integer, Pydantic's coercion handles it. If validation fails entirely, Instructor retries.

Validation and automatic retry

This is where Instructor earns its keep. Pydantic validation is not just type checking. You can add arbitrary constraints using Pydantic validators:

from pydantic import BaseModel, field_validator
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class ProductReview(BaseModel):
    product_name: str
    rating: int
    summary: str
    sentiment: str

    @field_validator("rating")
    @classmethod
    def rating_must_be_in_range(cls, v):
        if not 1 <= v <= 5:
            raise ValueError("Rating must be between 1 and 5")
        return v

    @field_validator("sentiment")
    @classmethod
    def sentiment_must_be_valid(cls, v):
        if v not in ("positive", "negative", "neutral"):
            raise ValueError(f"Sentiment must be positive, negative, or neutral, got {v!r}")
        return v

review = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=ProductReview,
    max_retries=3,
    messages=[
        {"role": "user", "content": "Great product! Works exactly as described. 5 stars."}
    ]
)

If the model returns rating: 6 or sentiment: "happy", Pydantic raises a validation error. Instructor catches it, includes the error in the next call ("ValidationError: Rating must be between 1 and 5, got 6. Please correct your response."), and the model self-corrects. This happens automatically, within the same create() call, up to max_retries times.

In practice, well-specified schemas with GPT-4o or Claude rarely need retries. The retry mechanism matters most for:

Complex nested schemas where a weaker model loses track of required fields
Strict enum constraints the model occasionally violates
Numeric ranges or string patterns that require semantic understanding to satisfy

Multiple provider support

Instructor patches the official SDK for each provider with the same from_<provider> pattern:

import instructor
from anthropic import Anthropic
from google.generativeai import GenerativeModel
from openai import OpenAI

# OpenAI
openai_client = instructor.from_openai(OpenAI())

# Anthropic
anthropic_client = instructor.from_anthropic(Anthropic())

# Google Gemini
gemini_client = instructor.from_gemini(
    client=GenerativeModel(model_name="gemini-2.0-flash"),
)

# Local model via Ollama (OpenAI-compatible)
local_client = instructor.from_openai(
    OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
)

The response_model interface is identical across all providers. You can switch providers by changing the client initialization without touching your data models or your call code. This is particularly useful when you're comparing model performance on an extraction task: swap the client, run the same extraction, compare results.

The mode parameter lets you control which underlying prompting strategy Instructor uses for each provider: instructor.Mode.TOOLS, instructor.Mode.JSON, instructor.Mode.ANTHROPIC_TOOLS, and others. Most users don't need to touch this, but it matters for providers where different strategies produce measurably different results.

Complex extraction patterns

Nested models

Instructor handles arbitrarily nested Pydantic models:

from pydantic import BaseModel
from typing import Optional
import instructor
from openai import OpenAI

class Address(BaseModel):
    street: str
    city: str
    country: str
    postal_code: Optional[str] = None

class Contact(BaseModel):
    name: str
    email: Optional[str] = None
    phone: Optional[str] = None
    address: Optional[Address] = None

class Company(BaseModel):
    name: str
    industry: str
    founded_year: Optional[int] = None
    headquarters: Address
    key_contacts: list[Contact]

client = instructor.from_openai(OpenAI())

company = client.chat.completions.create(
    model="gpt-4o",
    response_model=Company,
    messages=[
        {"role": "user", "content": """
        Anthropic was founded in 2021 in San Francisco, CA, USA. It operates in the AI industry.
        Key contact: Dario Amodei (CEO) can be reached at [email protected].
        """}
    ]
)

print(company.headquarters.city)        # "San Francisco"
print(company.key_contacts[0].name)     # "Dario Amodei"

The model receives the full nested JSON schema in its tool definition and fills it in. Validation applies recursively: if Address.country has a validator that checks against an ISO country code list, that validation runs on the nested object the same way it would on a top-level field.

List extraction: multiple objects at once

from pydantic import BaseModel
from typing import Iterable
import instructor
from openai import OpenAI

class Action(BaseModel):
    owner: str
    task: str
    due_date: Optional[str] = None

client = instructor.from_openai(OpenAI())

actions = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Iterable[Action],
    messages=[
        {"role": "user", "content": """
        Meeting notes from Monday:
        - Sarah will update the mockups by Friday
        - Jake needs to review the copy before next Wednesday
        - The whole team should read the Q1 report before Thursday's call
        """}
    ]
)

for action in actions:
    print(f"{action.owner}: {action.task}")

Iterable[Action] extracts a list of typed objects from a single prompt. This is the most common non-trivial extraction pattern: one document contains multiple instances of the same entity type, and you want all of them as a typed list.

Streaming partial models

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class AnalysisResult(BaseModel):
    topic: str
    key_points: list[str]
    conclusion: str

result_stream = client.chat.completions.create_partial(
    model="gpt-4o",
    response_model=AnalysisResult,
    messages=[
        {"role": "user", "content": "Analyze the impact of remote work on urban real estate markets."}
    ]
)

for partial_result in result_stream:
    if partial_result.topic:
        print(f"Topic: {partial_result.topic}")
    if partial_result.key_points:
        print(f"Points so far: {partial_result.key_points}")

create_partial yields progressively more complete instances of your model as tokens arrive. Each yielded object is a valid partial state of the model: fields that haven't been generated yet are None. This is useful for UI applications where you want to render available fields as they complete rather than waiting for the entire response.

Async support

import asyncio
import instructor
from openai import AsyncOpenAI
from pydantic import BaseModel

client = instructor.from_openai(AsyncOpenAI())

class Summary(BaseModel):
    title: str
    key_points: list[str]

async def summarize(text: str) -> Summary:
    return await client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=Summary,
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )

async def main():
    texts = ["text one...", "text two...", "text three..."]
    results = await asyncio.gather(*[summarize(t) for t in texts])
    for result in results:
        print(result.title)

asyncio.run(main())

The async client is a drop-in for the synchronous version. Concurrent extraction across many items with asyncio.gather is a common pattern for batch processing where you're calling an LLM for each item in a dataset.

Hooks and observability

Instructor provides hooks for instrumenting LLM calls:

import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

def log_completion_kwargs(kwargs):
    print(f"Request: {kwargs}")

def log_completion_response(response):
    print(f"Tokens used: {response.usage.total_tokens}")

client.on("completion:kwargs", log_completion_kwargs)
client.on("completion:response", log_completion_response)

For teams using Langfuse or LangSmith, both platforms have documented integrations with Instructor that route traces through their dashboards automatically. This is worth setting up in production to track costs and debug extraction failures.

Where Instructor fits in a larger system

Instructor is a utility, not a framework. You reach for it in specific places within a larger system:

In a LangGraph workflow, you might use Instructor at the nodes that need to parse LLM output into structured state. LangGraph handles the control flow; Instructor handles the typing.

In a DSPy program, you probably don't need Instructor because DSPy has its own output handling via Signatures. But for teams who don't want to adopt DSPy's full compilation model, Instructor gives them reliable typed outputs with a much smaller API surface.

In an AutoGen multi-agent system, you might use Instructor inside individual agent tool implementations where a tool needs to extract structured data from a sub-query.

The pattern that shows up most often: a developer starts with raw LLM calls, needs structured output, reaches for Instructor as the path of least resistance, and keeps using it because the alternative is writing the same retry and validation logic themselves.

Instructor vs Pydantic AI

This comparison comes up often. Pydantic AI is a full agent framework from the Pydantic team. It handles multi-step agent loops, tool calling, agent state, and structured outputs, all using Pydantic throughout.

Instructor is strictly about structured extraction. It does not build agents. If you need a structured response from a single LLM call or a small set of calls, Instructor is the simpler choice. If you need an agent that plans, uses tools over multiple iterations, and maintains state, Pydantic AI gives you the right primitives. They serve different use cases and the decision is usually clear based on whether you need an agent loop or just a typed output.

TypeScript, Go, and Elixir ports

The Python library is the primary implementation, but the pattern has been ported to other languages:

TypeScript/JavaScript: npm install @instructor-ai/instructor at instructor-ai/instructor-js
Go: go get github.com/instructor-ai/instructor-go
Elixir: {:instructor, "~> 0.1"} in hex.pm

The TypeScript port is the most actively maintained non-Python version and supports OpenAI and Anthropic. For teams building TypeScript services who want Instructor's structured extraction pattern, the JS port covers the most common use cases. The parity with the Python version is not complete, but the core response_model pattern works.

Getting started

pip install instructor

Full example with error handling:

import instructor
from openai import OpenAI
from pydantic import BaseModel, field_validator
from typing import Optional

client = instructor.from_openai(OpenAI())

class ExtractedEvent(BaseModel):
    event_name: str
    date: Optional[str] = None
    location: Optional[str] = None
    attendee_count: Optional[int] = None

    @field_validator("attendee_count")
    @classmethod
    def positive_count(cls, v):
        if v is not None and v < 0:
            raise ValueError("Attendee count must be positive")
        return v

try:
    event = client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=ExtractedEvent,
        max_retries=2,
        messages=[
            {"role": "user", "content": "The annual Python conference had 1200 attendees in Berlin on June 15th."}
        ]
    )
    print(event.model_dump())
except instructor.exceptions.InstructorRetryException as e:
    print(f"Extraction failed after retries: {e}")

The official documentation at python.useinstructor.com is among the best in the LLM tooling ecosystem. The cookbook section covers dozens of real extraction patterns with working code.

The verdict

Instructor is one of the most useful single-problem tools in the LLM development ecosystem. It does not try to be a framework. It patches an existing client, adds validation and retry, and gets out of the way. The Pydantic integration means any team that already uses Pydantic gets typed LLM outputs for nearly free.

The 10,500-star community, active maintenance, and multi-language support suggest this is a project with real staying power. The MIT license means there are no commercial restrictions.

If your application needs structured output from an LLM, start with Instructor. The alternative is writing the same retry and validation loop yourself, which most teams do poorly on the first attempt. Instructor gets it right, and the API is simple enough that adding it to an existing project takes under an hour.

Key features

Pydantic model validation for structured LLM outputs with automatic retry on failure
Support for OpenAI, Anthropic, Google Gemini, Cohere, Mistral, and local models
Streaming support with partial Pydantic model hydration as tokens arrive
Hooks for logging, retrying, and observing LLM calls
Nested model support for complex structured extraction
Parallel tool call support for extracting multiple objects at once
Async client support for concurrent extractions

Frequently Asked Questions

What is Instructor?

Instructor is a Python library that wraps LLM API clients to return validated Pydantic models instead of raw text responses. You define a Pydantic model representing the data you want, add `response_model` to your API call, and Instructor handles the prompting strategy and validation. If the LLM returns something that doesn't validate against your model, Instructor retries the call automatically with the validation error included as feedback, which usually produces a valid response on the next attempt.

How does Instructor handle validation failures?

When an LLM response fails Pydantic validation, Instructor catches the validation error and makes another API call with the original request plus the error message appended as context. The LLM sees what went wrong and typically corrects its output. You control the maximum number of retries via the `max_retries` parameter. Most well-specified models validate on the first try; the retry mechanism is a safety net for complex schemas or weaker models.

Which LLMs does Instructor support?

Instructor supports OpenAI (including GPT-4o and o1/o3 series), Anthropic Claude, Google Gemini, Cohere, Mistral, and any OpenAI-compatible endpoint including Ollama and LM Studio for local models. The library patches the official Python SDK for each provider, so you get the same Instructor interface regardless of which model you're using.

Is Instructor the same as using function calling or JSON mode?

Instructor uses function calling, tool use, or JSON mode under the hood depending on the provider and the mode you configure. The difference is that Instructor adds Pydantic validation and automatic retry on top of those primitives. Using raw function calling or JSON mode, you get a string or a dict that you still have to validate yourself. With Instructor, you get a fully validated Pydantic model or an exception after the retry budget is exhausted, with no manual validation code.

How does Instructor compare to Pydantic AI?

Pydantic AI is a full agent framework built by the Pydantic team. It handles agent orchestration, tool calling, multi-step workflows, and uses Pydantic for structured outputs throughout. Instructor is a much narrower tool: it solves one problem (structured extraction from LLMs) and does it very well. If you need an agent that can plan, use tools, and loop over multiple steps, Pydantic AI is the better fit. If you need reliable typed outputs from an LLM call in any context, Instructor is simpler and requires less framework buy-in.

Does Instructor work with streaming?

Yes. Instructor supports streaming with partial model hydration. As tokens arrive, it builds up the Pydantic model progressively. You can start processing fields as they complete rather than waiting for the full response. This is useful for UI applications that want to show partial results in real time, or for pipelines where early fields in the model can trigger downstream actions before the full extraction finishes.