I use MongoDB as the primary database for AI-powered products like DocRouter.AI and SigAgent.AI. This post explains how it’s implemented—migrations, vector search, and knowledge bases—and why I prefer it over alternatives like Postgres for document-centric, JSON-heavy AI workloads. I want to store a very large number of documents (DocRouter) or logs (SigAgent) without spending much time tuning the database for horizontal scaling; MongoDB fits that need well.
Brief trade-offs vs Postgres
Postgres with jsonb can model the same document-style records and even integrate vector search via extensions, but it shines most when you need strong relational guarantees and complex joins around a relatively stable schema. MongoDB is a better fit when almost everything is a JSON document, the schema evolves quickly, and you care more about horizontal scaling and quick turnaround than about classic SQL features.
In my case, the workloads are heavily document- and log-centric, so the ergonomics and scaling model of MongoDB outweigh the benefits of staying inside the relational/Postgres ecosystem.
| Aspect | MongoDB Approach | Postgres Alternative |
|---|---|---|
| Schema flexibility | Evolving via migrations; no enforcement from the DB, but disciplined application code keeps documents consistent. | Rigid but enforceable with DDL; great for stable relations, heavier for fast-changing AI schemas. |
| Scaling | Horizontal scaling and sharding are built-in; easy to grow with large document/log volumes. | Requires extensions or external tooling (e.g. Citus) and more tuning when pushing JSONB-heavy workloads. |
| Consistency model | Tunable; we use majority writes and typically read from secondaries, so most reads are eventually consistent. | Strong ACID semantics with primary reads by default; great when you need strict guarantees across transactions. |
| Vector search | Native $vectorSearch in MongoDB 8.2+/Atlas. |
Via pgvector, which has mature support. |
| Joins & relations | Limited | Strong |
| Dev speed | Quick iterations on JSON schemas and migrations; fits rapid AI product experiments. | Slower to evolve schemas cleanly; better when you already know the long-term relational shape. |
It’s also totally reasonable to split responsibilities: e.g. Postgres for relational metadata, a dedicated vector store like Pinecone/Weaviate/Qdrant for embeddings, or Redis as an embedding cache in front of another database. For DocRouter.AI and SigAgent.AI, I’m happy to trade some theoretical optimality for a single operational datastore (MongoDB handling documents, logs, and vectors) until scale or workload complexity justifies introducing extra systems.
DocRouter and SigAgent: One Backend, Two Products
DocRouter.AI is a smart document router: you upload documents, define schemas and prompts, and it extracts structured data (e.g. from invoices, medical records, forms) using LLMs. SigAgent is a Claude agent monitor with a different UX and product focus, but it’s built on the same stack.
Roughly 90% of the backend is shared. Both use the same Python package, analytiq_data: MongoDB client, migrations, queue layer, auth, and app startup. The same MongoDB database layout, indices, and migration history apply to both. Product-specific code lives in routes and frontends; the data layer is common.
Strict Schema via Discipline: Migrations as “Schema”
MongoDB doesn’t enforce a schema. I still want predictable structure and safe upgrades, so we enforce it in code and process:
- Consistent document shape per collection. We use a fixed set of field names and types in application code (and in TypeScript/Pydantic where applicable). In practice, each collection behaves like a table with a known “schema.”
- Every change goes through migrations. Renaming fields, adding/removing fields, splitting or renaming collections—all of it is done in versioned migration classes with
up()anddown(). The current schema version is stored in amigrationscollection; on startup we runrun_migrations(analytiq_client)and bring the DB to the latest version. - Element types as schema. We treat document fields as if they were typed: e.g.
schema_id,prompt_version,organization_idare always present where we expect them. New code assumes the post-migration shape. So we get comparable safety to Postgres (known structure, no surprise shapes) while staying in a document model—as long as we’re disciplined and never bypass migrations.
So: schema is “enforced” by convention and migrations, not by the database. That keeps development fast without giving up control.
How Migrations and Indices Are Implemented
At a high level:
- Migrations are versioned Python classes with
up()/down()methods that evolve collections in lockstep with the code. - Indices are either created inside migrations (long-term, repeatable) or via a small helper at runtime for per-module needs.
- Profiling tools (Atlas Query Insights, profiler, Compass/Performance Advisor) surface slow queries that suggest new compound indexes.
Migrations
Migrations live in migration.py. Each migration is a class with:
description: short human-readable summaryup(db): apply the change (e.g. rename field, backfill, add index)down(db): revert the change when possible
The runner loads the list MIGRATIONS, assigns a version by index, and stores the current version in db.migrations under _id: "schema_version". On startup we run pending migrations in order. Examples from the codebase:
- Field renames: e.g.
RenameUserFields(camelCase → snake_case) - New fields: e.g.
LlmResultFieldsMigration(addsis_edited,is_verified, timestamps) - Structural changes: e.g.
RenameCollections(schemas → schema_revisions, schema_versions → schemas),UseMongoObjectIDs
Collection layout and index strategy evolve in one place, with a clear history and rollback path.
Indices
We manage indices in two ways:
- Inside migrations for long-term indices. These are created with
create_index(..., background=True)and dropped indown()so rollback is consistent. - At runtime via
ensure_index()inanalytiq_data/mongodb/index.py. Given a collection, index spec, and name, it creates the index if missing (and optionally drops other non-_idindexes). We use this for things likepayments_usage_records (org_id, timestamp)when the payments module initializes.
Example index patterns:
- Queue collections:
(status, created_at)forfind_one_and_update({ status: "pending" }, sort: { created_at: 1 }). - docs:
(organization_id, upload_date desc)for paginated listing by org. - llm_runs:
(document_id, prompt_id, prompt_version desc)for “latest run by document and prompt,” and(document_id, prompt_revid)for exact revision lookup. - document_index: unique
(kb_id, document_id)and a non-uniquedocument_idindex for cascade deletes.
To find candidates for new indices, we use Atlas Query Insights (filter by operations returning >1,000 documents, then add a compound index matching the filter and sort) or, on Community Edition, enable the database profiler and filter system.profile by nReturned:
db.setProfilingLevel(1, { slowms: 100 })
db.system.profile.find({ nReturned: { $gt: 1000 } }).sort({ ts: -1 }).limit(20)
Level 2 profiling captures everything but has I/O cost, so we use it in short bursts or staging. Level 1 (slow only) is cheaper and still surfaces most heavy queries.
If you prefer GUIs over shell tools, both MongoDB Compass and the Atlas Performance Advisor surface similar slow-query/index recommendations in a more visual way, which is often easier when you’re just getting started tuning a workload.
With schema and indices in place, here’s how we run it.
Dev/Prod Setup and Vector Search
Quickly, the environments look like this:
- Local: MongoDB running on
localhostor via the Atlas Local Docker image (bundlingmongod+mongot) for easy vector search testing. - Production: MongoDB Atlas or AWS DocumentDB, configured for majority writes and read scaling.
- Self-hosted: MongoDB 8.2+ with
mongotin a replica set, using the samecreateSearchIndexes/$vectorSearchAPIs.
For local development, MONGODB_URI defaults to mongodb://localhost:27017. For vector search, we use the MongoDB Atlas Local Docker image (mongodb/mongodb-atlas-local:latest), which bundles mongod and mongot (the search/vector process) in one container.
For production we use MongoDB Atlas; for on-prem consulting we use AWS DocumentDB. The client is configured with w='majority', readPreference='secondaryPreferred', and retryWrites=False (DocumentDB doesn’t support retryWrites).
For self-hosted Community Edition, vector search requires MongoDB 8.2+ with mongot running alongside mongod in a replica set. The createSearchIndexes command and $vectorSearch aggregation work the same way across Atlas, Community 8.2+, and the local Docker image—no separate code path.
Knowledge Bases on Top of Vector Search
We implement knowledge bases in the shared backend using MongoDB’s vector search, with three key pieces: an indexing pipeline, a search pipeline, and a reconciliation service.
Indexing: Blue-Green Atomic Swap
Each knowledge base gets its own vector collection (kb_vectors_<kb_id>). When a document is indexed, we chunk the text (via Chonkie, a small library for token/sentence/recursive text chunking), generate embeddings via LiteLLM, a lightweight multi-provider LLM/embedding client, and then atomically swap old vectors for new ones inside a MongoDB transaction:
async with await client.start_session() as session:
async with session.start_transaction():
# Delete old vectors for this document
await vectors_collection.delete_many(
{"document_id": document_id}, session=session
)
# Insert new vectors
await vectors_collection.insert_many(new_vectors, session=session)
# Update document_index entry
await db.document_index.update_one(
{"kb_id": kb_id, "document_id": document_id},
{"$set": { ... }},
upsert=True, session=session
)
# Update KB stats (document_count, chunk_count)
await db.knowledge_bases.update_one(
{"_id": ObjectId(kb_id)},
{"$set": {"document_count": total_docs, "chunk_count": total_chunks}},
session=session
)
This blue-green pattern means a document is never in a half-indexed state: either all its new vectors are visible, or the old ones remain.
Embedding Cache
Embeddings are cached in a global embedding_cache collection keyed by (SHA-256 chunk hash, embedding model). When the same text chunk appears in multiple KBs (or is re-indexed after an edit), we skip the API call and reuse the cached vector. This saves both cost and latency—especially when re-indexing a large KB after a configuration change.
Search
The query string is embedded with the same model, then we run an aggregation whose first stage is $vectorSearch (with numCandidates = max(top_k * 10, 100) for better recall). We apply filters (organization, tags, date range) in the vector index definition, add vectorSearchScore via $meta, and optionally coalesce neighboring chunks for richer context.
Reconciliation: Keeping KBs in Sync
Documents and KBs can drift: a document’s tags change, a document is deleted, or vectors are orphaned after a failed indexing run. The reconciliation service (reconciliation.py) detects and fixes this:
- Missing documents: documents with matching tags but no
document_indexentry → queued for indexing. - Stale documents: indexed documents whose tags no longer match the KB → removed.
- Orphaned vectors: vectors without a corresponding
document_indexentry → deleted.
Reconciliation uses a distributed lock (atomic find_one_and_update with a 10-minute TTL) so only one worker reconciles a given KB at a time. It processes documents in batches of 100 to keep memory bounded, and supports dry-run mode for auditing without side effects.
In short: we get strict schema via migrations, predictable indexing, vector search for RAG with atomic updates, and self-healing knowledge bases—with one backend shared between DocRouter and SigAgent, and the same patterns whether we run on local Mongo, Atlas, or DocumentDB.
Analytiq Hub