Embedchain (Mem0)
Memory layer for AI agents and applications, evolved from the Embedchain RAG framework
Embedchain started as a straightforward RAG framework in 2023 and evolved into Mem0, a memory layer for AI agents that extracts, stores, and retrieves contextual memories across sessions. The underlying library at github.com/mem0ai/mem0 handles both use cases: RAG pipelines through the legacy Embedchain API and persistent cross-session memory through the newer Mem0 interface.
One of the most common complaints about AI chatbots and agents is that they forget everything. You explain your preferences in one conversation and have to repeat them the next. You build context over a dozen turns and it all disappears when the session ends. The solution most people reach for first is stuffing the entire conversation history into the context window, which gets expensive fast and still doesn't persist between sessions.
Mem0 (which started as Embedchain) attacks this problem differently: extract what matters, store it, and retrieve it when it's relevant. It's a memory layer, not a new agent framework, which is both its strength and the source of some confusion about what it actually is.
The Embedchain-to-Mem0 story
Understanding what this tool is requires understanding where it came from.
Embedchain launched in 2023 as a RAG framework. The pitch was simple: give it documents and URLs, it handles chunking, embedding, and retrieval, and you query it conversationally. It grew quickly, reaching tens of thousands of GitHub stars, because it solved the "I want to make my data queryable by an LLM" problem without requiring you to understand the entire LangChain stack.
In 2024, the team recognized that the hardest problem in production AI systems wasn't RAG (LangChain and LlamaIndex had that reasonably covered) but memory: making agents remember things across sessions, across users, and across deployments. They pivoted the product toward persistent agent memory and rebranded as Mem0.
The GitHub repo at github.com/mem0ai/mem0 now contains both the legacy Embedchain interface and the new Mem0 interface. The project has roughly 27,000 stars. Embedchain is in maintenance mode: the code still works, but new features are in the Mem0 layer. If you're starting a new project, use Mem0. If you have existing Embedchain code, it runs, but don't expect new capabilities from it.
What Mem0 actually does
The core abstraction is a Memory client. You add memories to it, and you search it to retrieve relevant memories for a given context.
from mem0 import Memory
m = Memory()
# Add a memory after a user interaction
result = m.add(
"I prefer Python for data science but TypeScript for web services.",
user_id="alice"
)
# Retrieve relevant memories before generating a response
memories = m.search("What language should I use for this web project?", user_id="alice")
# Returns: [{"memory": "Prefers TypeScript for web services", "score": 0.87}]
Under the hood, when you call m.add(), Mem0 sends the content to an LLM with a prompt that extracts discrete facts. It stores those facts as vector embeddings in your configured vector store. When you call m.search(), it does a semantic search over stored memories and returns the most relevant ones.
The extraction step is what makes Mem0 different from just storing raw conversation text. Instead of storing "Alice said she prefers Python for data science but TypeScript for web services," it extracts "User prefers Python for data science" and "User prefers TypeScript for web services" as separate searchable facts. At retrieval time, a query about web projects surfaces the TypeScript fact without dragging in the Python fact.
Memory scopes
Mem0 organizes memories into three scopes, and the distinction matters in practice.
User-level memory (user_id) persists across all sessions for a given user. This is where you store preferences, facts about the user, and things they've told the system across many interactions. A coding assistant that knows your preferred language, style guide, and project conventions is using user-level memory.
Session-level memory (session_id) persists within a conversation session. This is closer to extended working memory: things that matter for the current conversation but don't need to carry over to the next one. Session memory fills the gap between short conversation history and long-term user memory.
Agent-level memory (agent_id, often combined with user_id) scopes memories to a specific agent. Useful in multi-agent systems where different agents should have access to different memory pools.
You can combine scopes. Adding a memory with both user_id="alice" and agent_id="coding-assistant" means that memory is only retrieved when Alice is using the coding assistant agent, not when she's using a different agent.
Integrating with existing frameworks
The cleanest use of Mem0 is as a thin layer added to an existing agent architecture. The pattern is consistent across frameworks: search before generating, add after responding.
With a LangChain chain:
from mem0 import Memory
from langchain_openai import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage
m = Memory()
llm = ChatOpenAI(model="gpt-4o-mini")
def chat_with_memory(user_message: str, user_id: str) -> str:
# Retrieve relevant memories
relevant_memories = m.search(user_message, user_id=user_id)
memory_context = "\n".join([f"- {mem['memory']}" for mem in relevant_memories])
# Build the prompt with memory context
messages = [
SystemMessage(content=f"You are a helpful assistant.\n\nUser context:\n{memory_context}"),
HumanMessage(content=user_message),
]
response = llm.invoke(messages)
# Store the exchange in memory
m.add(f"User: {user_message}\nAssistant: {response.content}", user_id=user_id)
return response.content
This pattern is framework-agnostic. You can apply the same search-then-add wrapper to CrewAI agent runs, AutoGen conversations, or raw OpenAI API calls. Mem0 doesn't care what orchestrates the LLM call; it just stores and retrieves facts.
Mem0 Cloud vs self-hosted
For self-hosting, Mem0 needs two things: an LLM (for memory extraction) and a vector store (for storage). The defaults are OpenAI for the LLM and Chroma for the vector store, which means you can run the whole thing locally with minimal setup.
from mem0 import Memory
config = {
"llm": {
"provider": "openai",
"config": {"model": "gpt-4o-mini", "api_key": "sk-..."}
},
"vector_store": {
"provider": "chroma",
"config": {"collection_name": "my_memories"}
}
}
m = Memory.from_config(config)
For production self-hosting, replace Chroma with Qdrant or Pinecone, which handle concurrent writes and scale better.
Mem0 Cloud is the hosted option at mem0.ai. You get a managed API endpoint, no infrastructure to run, and a dashboard for monitoring memory operations. The free tier covers 10,000 memory operations per month, which sounds like a lot but gets consumed quickly by a chatbot with frequent users. Paid plans start at $49/month.
The Mem0 Cloud API is compatible with the open-source library. Switching from self-hosted to cloud is a configuration change:
from mem0 import MemoryClient # Cloud client
m = MemoryClient(api_key="your-mem0-api-key")
# Same add/search interface as the self-hosted client
The original Embedchain RAG use case
The legacy Embedchain API still works for RAG pipelines. If you inherited an Embedchain codebase or want a simple document Q&A setup, it's still functional.
from embedchain import App
app = App()
app.add("https://example.com/docs/", data_type="web_page")
app.add("path/to/document.pdf", data_type="pdf_file")
response = app.query("What does the documentation say about authentication?")
For new projects, you'd be better served by Dify or LlamaIndex for full-featured RAG pipelines. The Embedchain API is maintained for backward compatibility, not as the primary focus of the project.
Where Mem0 works well
Mem0 shines when you're building something that genuinely needs to remember users over time. Personal assistants, coaching tools, long-running research agents, and customer support systems that should recall previous interactions are all good fits.
The integration-first design is a genuine practical advantage. Adding persistent memory to an existing CrewAI or LangChain setup takes a few lines of code. You don't need to replace your framework, redesign your architecture, or migrate your existing logic.
The automatic extraction is convenient but introduces quality dependence on your extraction LLM. With a capable model (GPT-4o, Claude Sonnet), the extracted memories are accurate and well-delineated. With a smaller or less capable model, you get noisier memories that include irrelevant details or miss important facts. Budget at least GPT-4o-mini for the extraction step; the cost savings from using a cheaper extraction model are usually not worth the quality degradation.
Where Mem0 struggles
Memory retrieval relevance is not perfect. In production, you'll encounter cases where the wrong memories are surfaced (false positives) or relevant memories aren't retrieved (false negatives). Both cause problems: false positives pollute the context with incorrect assumptions about the user; false negatives make the agent seem forgetful even though it stored the information correctly.
The fix is usually tuning the retrieval threshold (the minimum similarity score for a memory to be included) and the number of memories retrieved per query. The right values depend on your specific use case and require experimentation with real data.
Storage costs grow linearly with the number of users and conversations. At scale, the vector store becomes a real infrastructure concern. Mem0's default configuration doesn't include memory cleanup or summarization: old memories accumulate indefinitely. For long-running production systems, you'll want to implement a memory lifecycle policy.
Mem0 vs Letta
Letta (formerly MemGPT) is the most direct conceptual comparison. Both tackle long-term agent memory, but architecturally they're different.
Letta builds memory into the agent itself: the agent is aware of its own memory, can decide what to write and what to discard, and manages context overflow through explicit memory operations. The memory system is deeply coupled to the agent architecture.
Mem0 is external to the agent. It's a service the agent calls. The agent doesn't know or care about how Mem0 works internally. This loose coupling makes Mem0 easier to add to existing systems but gives you less control over memory management behavior compared to Letta's self-directed approach.
For most integration scenarios, Mem0's external service model is more practical. For systems where sophisticated self-directed memory management is a core requirement, Letta's approach is architecturally cleaner.
Getting started
pip install mem0ai
Set your OpenAI API key (used for memory extraction by default):
export OPENAI_API_KEY=sk-...
Basic usage:
from mem0 import Memory
m = Memory()
m.add("My preferred code style uses 4-space indentation and type hints everywhere.", user_id="dev_1")
m.add("I'm working on a FastAPI project with PostgreSQL.", user_id="dev_1")
results = m.search("What database is the user using?", user_id="dev_1")
for r in results:
print(r["memory"], r["score"])
The search returns memories sorted by relevance score. In your agent, you format the top memories into the system prompt and proceed with your normal LLM call. After the conversation turn, add the exchange to memory with m.add().
Full documentation including integration guides for LangChain, CrewAI, and AutoGen is at docs.mem0.ai.
The verdict
Mem0 solves a real problem that most agent frameworks leave as an exercise for the reader. If you're building AI applications where users interact repeatedly over time, the "agents forget everything" problem will hurt your product. Mem0 gives you a practical, integrable solution without forcing you to rebuild your architecture.
The name change from Embedchain caused real confusion in the community, and the ecosystem is still catching up. Many tutorials and articles still reference Embedchain patterns that have been superseded. If you're starting fresh, go directly to the Mem0 documentation and ignore the Embedchain content unless you specifically need RAG functionality.
For teams that want a persistent memory layer that drops into CrewAI, LangChain, or any custom agent setup, Mem0 is worth spending an afternoon with. The integration is genuinely low-friction, and the 27,000 GitHub stars suggest the community has found it useful for real problems.
Key features
- Persistent user and session memory across conversations and deployments
- Automatic memory extraction from unstructured conversations
- Semantic search over stored memories for contextual retrieval
- User-level, session-level, and global memory scopes
- Drop-in memory layer for LangChain, CrewAI, AutoGen, and custom agents
- Multi-LLM support: OpenAI, Anthropic, Google, Mistral, Ollama
- REST API and managed cloud option via Mem0 Cloud