Production-ready memory service for AI Agents using MCP Protocol
A high-performance, scalable memory server implementing the Model Context Protocol (MCP) for AI agent systems. Features layered caching, vector search, and session isolation.
- MCP Protocol Compliance: Full implementation of Resources, Tools, and Prompts
- Layered Caching: L1 (memory) + L2 (extended memory) for 70% latency reduction
- Vector Search: Qdrant integration for semantic similarity search
- Session Isolation: Secure multi-tenant architecture
- Batch Operations: Optimized bulk writes for reduced API calls
- Production Tested: Deployed in OpenClaw environment with proven metrics
| Metric | Before | After | Improvement |
|---|---|---|---|
| Read Latency (P50) | 145ms | 42ms | β¬οΈ 71% |
| Read Latency (P95) | 380ms | 89ms | β¬οΈ 77% |
| Search Accuracy | 68% | 85% | β¬οΈ 25% |
| Token Cost/Day | $45 | $30 | β¬οΈ 33% |
| Cache Hit Rate | N/A | 73% | - |
Data from OpenClaw production environment (2 weeks)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client (Agent Session) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β MCP Protocol
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Memory Server β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Resources β β Tools β β Prompts β β
β β β β β β β β
β β - memory:/ β β - read β β - summarize β β
β β <id> β β - write β β - expand β β
β β - memory:/ β β - search β β β β
β β sessions β β - delete β β β β
β β - memory:/ β β - compact β β β β
β β stats β β β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Storage Layer β β
β β βββββββββββββββ βββββββββββββββ β β
β β β SQLite β β Qdrant β β β
β β β (metadata) β β (vectors) β β β
β β βββββββββββββββ βββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Runtime: Node.js 20+
- Language: TypeScript 5.0+
- MCP SDK:
@modelcontextprotocol/sdk - Database: SQLite (
better-sqlite3) - Vector Store: Qdrant (
qdrant-js) - Embedding: Alibaba Cloud Bailian (
text-embedding-v4) - Caching: Custom LRU layered cache
# Clone the repository
git clone https://github.com/kejun/mcp-memory-server.git
cd mcp-memory-server
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Edit .env with your credentials
# - QDRANT_URL=http://localhost:6333
# - ALIBABA_API_KEY=your_key_here
# - ALIBABA_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1docker run -d -p 6333:6333 qdrant/qdrant# Development mode
npm run dev
# Production mode
npm run build
npm startExample client connection:
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
const transport = new StdioClientTransport({
command: 'node',
args: ['dist/index.js'],
});
const client = new Client({
name: 'example-client',
version: '1.0.0',
}, {
capabilities: {},
});
await client.connect(transport);
// Write a memory
await client.callTool({
name: 'write',
arguments: {
sessionId: 'session-1',
content: 'User prefers TypeScript over JavaScript',
tags: ['preferences', 'languages'],
},
});
// Search memories
const result = await client.callTool({
name: 'search',
arguments: {
sessionId: 'session-1',
query: 'What programming languages does the user like?',
limit: 5,
},
});
console.log(result.content);Read a specific memory by ID.
Input:
{
"memoryId": "mem-123"
}Output:
{
"id": "mem-123",
"sessionId": "session-1",
"content": "User prefers TypeScript",
"tags": ["preferences"],
"createdAt": 1708761234567
}Write a new memory entry.
Input:
{
"sessionId": "session-1",
"content": "User likes React framework",
"tags": ["preferences", "frontend"]
}Output:
{
"success": true,
"memoryId": "mem-456"
}Search memories by semantic similarity.
Input:
{
"sessionId": "session-1",
"query": "frontend frameworks",
"limit": 10
}Output:
{
"results": [
{
"id": "mem-456",
"content": "User likes React framework",
"score": 0.92
},
...
]
}Delete a memory by ID.
Input:
{
"memoryId": "mem-123"
}Compact multiple memories into a summary.
Input:
{
"sessionId": "session-1",
"maxMemories": 50
}memory:/sessions- List all sessionsmemory:/stats- Server statisticsmemory:/{sessionId}- Access session memories
summarize- Summarize session memoriesexpand- Expand a memory with context
# Run unit tests
npm test
# Run integration tests (requires Qdrant running)
npm run test:integration
# Generate coverage
npm run coveragemcp-memory-server/
βββ src/
β βββ index.ts # Server entry point
β βββ server.ts # MCP server implementation
β βββ memory-store.ts # Core memory logic
β βββ cache.ts # Layered cache implementation
β βββ qdrant-client.ts # Vector database client
β βββ embedding.ts # Embedding API wrapper
β βββ types.ts # TypeScript types
βββ tests/
β βββ unit/
β β βββ cache.test.ts
β β βββ memory-store.test.ts
β βββ integration/
β βββ server.test.ts
βββ examples/
β βββ basic-client.ts # Example client usage
βββ package.json
βββ tsconfig.json
βββ README.md
| Variable | Description | Default |
|---|---|---|
QDRANT_URL |
Qdrant database URL | http://localhost:6333 |
ALIBABA_API_KEY |
Alibaba Cloud API key | Required |
ALIBABA_BASE_URL |
Embedding API base URL | https://dashscope.aliyuncs.com/compatible-mode/v1 |
DB_PATH |
SQLite database path | ./data/memory.db |
CACHE_MAX_ENTRIES |
L1 cache max size | 100 |
CACHE_TTL_MS |
Cache TTL in milliseconds | 300000 (5 min) |
Provide long-term memory for personal AI assistants across sessions.
Secure session isolation for platforms serving multiple users.
Maintain conversation history and user preferences for chatbots.
Remember user coding preferences, project structure, and past decisions.
- L1 Cache: In-memory LRU for hot data (< 1ms access)
- L2 Cache: Extended memory with TTL for warm data (< 10ms access)
- Vector Cache: Cached query embeddings to avoid redundant API calls
// Efficient bulk write
await memoryStore.writeBatch([
{ sessionId: 's1', content: '...', tags: [] },
{ sessionId: 's1', content: '...', tags: [] },
{ sessionId: 's1', content: '...', tags: [] },
]);
// Single Qdrant HTTP request instead of 3Active session memories are preloaded on first access to reduce latency.
- Session Isolation: Strict filtering prevents cross-session data leakage
- Input Validation: All inputs validated before processing
- Rate Limiting: Built-in rate limiting for API protection
- Audit Logging: All operations logged for compliance
Contributions welcome! Please read our Contributing Guide first.
git clone https://github.com/kejun/mcp-memory-server.git
cd mcp-memory-server
npm install
npm run devnpm test
npm run test:integrationMIT License - see LICENSE file for details.
- Model Context Protocol - MCP Specification
- Qdrant - Vector database for semantic search
- Alibaba Cloud Bailian - Embedding API
- OpenClaw Team - Production testing and feedback
- GitHub: https://github.com/kejun/mcp-memory-server
- NPM: (coming soon)
- Documentation: https://github.com/kejun/mcp-memory-server/wiki
- Issues: https://github.com/kejun/mcp-memory-server/issues
Built with β€οΈ by OpenClaw Team