Python Apache-2.0 memoryragorchestration

Embedchain (Mem0)

Memory layer for AI agents and applications, evolved from the Embedchain RAG framework

Embedchain started as a straightforward RAG framework in 2023 and evolved into Mem0, a memory layer for AI agents that extracts, stores, and retrieves contextual memories across sessions. The underlying library at github.com/mem0ai/mem0 handles both use cases: RAG pipelines through the legacy Embedchain API and persistent cross-session memory through the newer Mem0 interface.

One of the most common complaints about AI chatbots and agents is that they forget everything. You explain your preferences in one conversation and have to repeat them the next. You build context over a dozen turns and it all disappears when the session ends. The solution most people reach for first is stuffing the entire conversation history into the context window, which gets expensive fast and still doesn't persist between sessions.

Mem0 (which started as Embedchain) attacks this problem differently: extract what matters, store it, and retrieve it when it's relevant. It's a memory layer, not a new agent framework, which is both its strength and the source of some confusion about what it actually is.

The Embedchain-to-Mem0 story

Understanding what this tool is requires understanding where it came from.

Embedchain launched in 2023 as a RAG framework. The pitch was simple: give it documents and URLs, it handles chunking, embedding, and retrieval, and you query it conversationally. It grew quickly, reaching tens of thousands of GitHub stars, because it solved the "I want to make my data queryable by an LLM" problem without requiring you to understand the entire LangChain stack.

In 2024, the team recognized that the hardest problem in production AI systems wasn't RAG (LangChain and LlamaIndex had that reasonably covered) but memory: making agents remember things across sessions, across users, and across deployments. They pivoted the product toward persistent agent memory and rebranded as Mem0.

The GitHub repo at github.com/mem0ai/mem0 now contains both the legacy Embedchain interface and the new Mem0 interface. The project has roughly 27,000 stars. Embedchain is in maintenance mode: the code still works, but new features are in the Mem0 layer. If you're starting a new project, use Mem0. If you have existing Embedchain code, it runs, but don't expect new capabilities from it.

What Mem0 actually does

The core abstraction is a Memory client. You add memories to it, and you search it to retrieve relevant memories for a given context.

from mem0 import Memory

m = Memory()

# Add a memory after a user interaction
result = m.add(
    "I prefer Python for data science but TypeScript for web services.",
    user_id="alice"
)

# Retrieve relevant memories before generating a response
memories = m.search("What language should I use for this web project?", user_id="alice")
# Returns: [{"memory": "Prefers TypeScript for web services", "score": 0.87}]

Under the hood, when you call m.add(), Mem0 sends the content to an LLM with a prompt that extracts discrete facts. It stores those facts as vector embeddings in your configured vector store. When you call m.search(), it does a semantic search over stored memories and returns the most relevant ones.

The extraction step is what makes Mem0 different from just storing raw conversation text. Instead of storing "Alice said she prefers Python for data science but TypeScript for web services," it extracts "User prefers Python for data science" and "User prefers TypeScript for web services" as separate searchable facts. At retrieval time, a query about web projects surfaces the TypeScript fact without dragging in the Python fact.

Memory scopes

Mem0 organizes memories into three scopes, and the distinction matters in practice.

User-level memory (user_id) persists across all sessions for a given user. This is where you store preferences, facts about the user, and things they've told the system across many interactions. A coding assistant that knows your preferred language, style guide, and project conventions is using user-level memory.

Session-level memory (session_id) persists within a conversation session. This is closer to extended working memory: things that matter for the current conversation but don't need to carry over to the next one. Session memory fills the gap between short conversation history and long-term user memory.

Agent-level memory (agent_id, often combined with user_id) scopes memories to a specific agent. Useful in multi-agent systems where different agents should have access to different memory pools.

You can combine scopes. Adding a memory with both user_id="alice" and agent_id="coding-assistant" means that memory is only retrieved when Alice is using the coding assistant agent, not when she's using a different agent.

Integrating with existing frameworks

The cleanest use of Mem0 is as a thin layer added to an existing agent architecture. The pattern is consistent across frameworks: search before generating, add after responding.

With a LangChain chain:

from mem0 import Memory
from langchain_openai import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage

m = Memory()
llm = ChatOpenAI(model="gpt-4o-mini")

def chat_with_memory(user_message: str, user_id: str) -> str:
    # Retrieve relevant memories
    relevant_memories = m.search(user_message, user_id=user_id)
    memory_context = "\n".join([f"- {mem['memory']}" for mem in relevant_memories])

    # Build the prompt with memory context
    messages = [
        SystemMessage(content=f"You are a helpful assistant.\n\nUser context:\n{memory_context}"),
        HumanMessage(content=user_message),
    ]

    response = llm.invoke(messages)

    # Store the exchange in memory
    m.add(f"User: {user_message}\nAssistant: {response.content}", user_id=user_id)

    return response.content

This pattern is framework-agnostic. You can apply the same search-then-add wrapper to CrewAI agent runs, AutoGen conversations, or raw OpenAI API calls. Mem0 doesn't care what orchestrates the LLM call; it just stores and retrieves facts.

Mem0 Cloud vs self-hosted

For self-hosting, Mem0 needs two things: an LLM (for memory extraction) and a vector store (for storage). The defaults are OpenAI for the LLM and Chroma for the vector store, which means you can run the whole thing locally with minimal setup.

from mem0 import Memory

config = {
    "llm": {
        "provider": "openai",
        "config": {"model": "gpt-4o-mini", "api_key": "sk-..."}
    },
    "vector_store": {
        "provider": "chroma",
        "config": {"collection_name": "my_memories"}
    }
}

m = Memory.from_config(config)

For production self-hosting, replace Chroma with Qdrant or Pinecone, which handle concurrent writes and scale better.

Mem0 Cloud is the hosted option at mem0.ai. You get a managed API endpoint, no infrastructure to run, and a dashboard for monitoring memory operations. The free tier covers 10,000 memory operations per month, which sounds like a lot but gets consumed quickly by a chatbot with frequent users. Paid plans start at $49/month.

The Mem0 Cloud API is compatible with the open-source library. Switching from self-hosted to cloud is a configuration change:

from mem0 import MemoryClient  # Cloud client

m = MemoryClient(api_key="your-mem0-api-key")
# Same add/search interface as the self-hosted client

The original Embedchain RAG use case

The legacy Embedchain API still works for RAG pipelines. If you inherited an Embedchain codebase or want a simple document Q&A setup, it's still functional.

from embedchain import App

app = App()
app.add("https://example.com/docs/", data_type="web_page")
app.add("path/to/document.pdf", data_type="pdf_file")

response = app.query("What does the documentation say about authentication?")

For new projects, you'd be better served by Dify or LlamaIndex for full-featured RAG pipelines. The Embedchain API is maintained for backward compatibility, not as the primary focus of the project.

Where Mem0 works well

Mem0 shines when you're building something that genuinely needs to remember users over time. Personal assistants, coaching tools, long-running research agents, and customer support systems that should recall previous interactions are all good fits.

The integration-first design is a genuine practical advantage. Adding persistent memory to an existing CrewAI or LangChain setup takes a few lines of code. You don't need to replace your framework, redesign your architecture, or migrate your existing logic.

The automatic extraction is convenient but introduces quality dependence on your extraction LLM. With a capable model (GPT-4o, Claude Sonnet), the extracted memories are accurate and well-delineated. With a smaller or less capable model, you get noisier memories that include irrelevant details or miss important facts. Budget at least GPT-4o-mini for the extraction step; the cost savings from using a cheaper extraction model are usually not worth the quality degradation.

Where Mem0 struggles

Memory retrieval relevance is not perfect. In production, you'll encounter cases where the wrong memories are surfaced (false positives) or relevant memories aren't retrieved (false negatives). Both cause problems: false positives pollute the context with incorrect assumptions about the user; false negatives make the agent seem forgetful even though it stored the information correctly.

The fix is usually tuning the retrieval threshold (the minimum similarity score for a memory to be included) and the number of memories retrieved per query. The right values depend on your specific use case and require experimentation with real data.

Storage costs grow linearly with the number of users and conversations. At scale, the vector store becomes a real infrastructure concern. Mem0's default configuration doesn't include memory cleanup or summarization: old memories accumulate indefinitely. For long-running production systems, you'll want to implement a memory lifecycle policy.

Mem0 vs Letta

Letta (formerly MemGPT) is the most direct conceptual comparison. Both tackle long-term agent memory, but architecturally they're different.

Letta builds memory into the agent itself: the agent is aware of its own memory, can decide what to write and what to discard, and manages context overflow through explicit memory operations. The memory system is deeply coupled to the agent architecture.

Mem0 is external to the agent. It's a service the agent calls. The agent doesn't know or care about how Mem0 works internally. This loose coupling makes Mem0 easier to add to existing systems but gives you less control over memory management behavior compared to Letta's self-directed approach.

For most integration scenarios, Mem0's external service model is more practical. For systems where sophisticated self-directed memory management is a core requirement, Letta's approach is architecturally cleaner.

Getting started

pip install mem0ai

Set your OpenAI API key (used for memory extraction by default):

export OPENAI_API_KEY=sk-...

Basic usage:

from mem0 import Memory

m = Memory()
m.add("My preferred code style uses 4-space indentation and type hints everywhere.", user_id="dev_1")
m.add("I'm working on a FastAPI project with PostgreSQL.", user_id="dev_1")

results = m.search("What database is the user using?", user_id="dev_1")
for r in results:
    print(r["memory"], r["score"])

The search returns memories sorted by relevance score. In your agent, you format the top memories into the system prompt and proceed with your normal LLM call. After the conversation turn, add the exchange to memory with m.add().

Full documentation including integration guides for LangChain, CrewAI, and AutoGen is at docs.mem0.ai.

The verdict

Mem0 solves a real problem that most agent frameworks leave as an exercise for the reader. If you're building AI applications where users interact repeatedly over time, the "agents forget everything" problem will hurt your product. Mem0 gives you a practical, integrable solution without forcing you to rebuild your architecture.

The name change from Embedchain caused real confusion in the community, and the ecosystem is still catching up. Many tutorials and articles still reference Embedchain patterns that have been superseded. If you're starting fresh, go directly to the Mem0 documentation and ignore the Embedchain content unless you specifically need RAG functionality.

For teams that want a persistent memory layer that drops into CrewAI, LangChain, or any custom agent setup, Mem0 is worth spending an afternoon with. The integration is genuinely low-friction, and the 27,000 GitHub stars suggest the community has found it useful for real problems.

Key features

Persistent user and session memory across conversations and deployments
Automatic memory extraction from unstructured conversations
Semantic search over stored memories for contextual retrieval
User-level, session-level, and global memory scopes
Drop-in memory layer for LangChain, CrewAI, AutoGen, and custom agents
Multi-LLM support: OpenAI, Anthropic, Google, Mistral, Ollama
REST API and managed cloud option via Mem0 Cloud

Frequently Asked Questions

What is the difference between Embedchain and Mem0?

Embedchain was the original product: a simple RAG framework that let you add documents to a vector store and query them conversationally. In 2024, the team pivoted to focus on memory for AI agents, rebranding the product as Mem0 and the company as Mem0 AI. The GitHub repo at github.com/mem0ai/mem0 contains both the legacy Embedchain interface (for RAG pipelines) and the new Mem0 interface (for persistent agent memory). New projects should use the Mem0 API. Embedchain code still runs but is in maintenance mode.

What is Mem0?

Mem0 is a memory layer for AI agents and applications. It automatically extracts important facts from conversations, stores them in a vector database, and retrieves relevant memories when context is needed. Think of it as a long-term memory system that sits alongside your agent: instead of passing the entire conversation history into every prompt, Mem0 retrieves the specific memories most relevant to the current turn. It supports user-level memory (what this user prefers), session-level memory (what happened in this conversation), and global memory (shared facts across all users).

How does Mem0 compare to Letta (MemGPT)?

Both tackle the agent memory problem but from different angles. Letta (formerly MemGPT) treats memory as an OS-like layer built into a specific agent architecture, where the agent actively manages its own memory through self-directed writes. Mem0 is a standalone memory service that you add to any existing agent. Mem0 is easier to integrate into existing workflows. Letta offers more sophisticated self-directed memory management at the cost of tighter framework coupling. If you're adding memory to an existing agent, Mem0 is more practical. If you're building a new agent architecture with memory as a core design concern, Letta is worth evaluating.

Is Mem0 open source?

Yes. The library at github.com/mem0ai/mem0 is Apache 2.0 licensed and free to self-host. Mem0 Cloud is a managed hosted service with a free tier (10,000 memory operations/month) and paid tiers for higher usage. Self-hosting requires a vector store backend (Chroma by default for local development, Qdrant or Pinecone for production) and an LLM provider for memory extraction.

Which vector databases does Mem0 support?

Mem0 supports Chroma (default for local development), Qdrant, Pinecone, Weaviate, PGVector, Milvus, Zilliz, Redis, MongoDB Atlas, OpenSearch, Azure AI Search, and a few others. The vector store is configured when you initialize the Memory client. For production deployments, Qdrant or Pinecone are the most commonly used options in the community.

Can I use Mem0 with frameworks other than LangChain?

Yes. Mem0 is designed to be framework-agnostic. It has dedicated integration guides for LangChain, LangGraph, CrewAI, AutoGen, and direct OpenAI/Anthropic API usage. The core pattern is the same in all cases: before generating a response, search memories for relevant context; after generating a response, add the conversation to memory. The two-line integration can be added to almost any agent without changing the agent's core logic.