<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://analytiqhub.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://analytiqhub.com/" rel="alternate" type="text/html" /><updated>2026-05-06T12:50:28+00:00</updated><id>https://analytiqhub.com/feed.xml</id><title type="html">Analytiq Hub</title><subtitle>Analytiq Hub provides custom AI solutions and expert guidance for biotech and healthtech companies.
</subtitle><author><name>Andrei Radulescu-Banu</name><email>andrei@analytiqhub.com</email></author><entry><title type="html">The AI Retrieval Stack</title><link href="https://analytiqhub.com/ai/engineering/the-ai-retrieval-stack/" rel="alternate" type="text/html" title="The AI Retrieval Stack" /><published>2026-04-12T00:00:00+00:00</published><updated>2026-04-12T00:00:00+00:00</updated><id>https://analytiqhub.com/ai/engineering/the-ai-retrieval-stack</id><content type="html" xml:base="https://analytiqhub.com/ai/engineering/the-ai-retrieval-stack/"><![CDATA[<p>The <strong>AI retrieval stack</strong> is the pipeline that takes a query and returns relevant results. Getting it right requires answering a set of workload questions before choosing any particular tool:</p>

<ul>
  <li>What is the <strong>unit of retrieval</strong>? (sentences, paragraphs, chunks, files, documents)</li>
  <li>Does the application need <strong>semantic similarity</strong>, <strong>exact-match search</strong>, or both?</li>
  <li>What <strong>metadata filters</strong> matter? (tenant, date, repo, workflow state, permissions)</li>
  <li>Is the product fundamentally a <strong>search engine</strong>, a <strong>database with search</strong>, or a <strong>retrieval substrate</strong>?</li>
  <li>Is the workload closer to <strong>consumer AI search</strong>, <strong>enterprise document retrieval</strong>, <strong>codebase chunk retrieval</strong>, or <strong>multimodal retrieval</strong>?</li>
</ul>

<p>A typical stack looks like:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>content
  ↓
parsing / chunking / field selection
  ↓
embeddings
  ↓
retrieval
  + lexical retrieval
  + vector retrieval
  + metadata filters
  + reranking
  + application logic
  ↓
final results
</code></pre></div></div>

<p>Each layer has its own decisions. In practice, the right architecture depends less on the phrase <strong>“vector database”</strong> and more on the shape of the workload.</p>

<hr />

<h2 id="vector-databases-role-in-the-stack">Vector databases: role in the stack</h2>

<p>An embedding model maps a raw object — a sentence, paragraph, image, or code chunk — into a point in high-dimensional space. Nearby points correspond to similar meaning.</p>

<p>A <strong>vector database</strong> stores and indexes those vectors so that approximate nearest-neighbor (ANN) search is practical at production scale. It typically provides:</p>

<ul>
  <li>storage for vectors and IDs</li>
  <li>ANN indexes</li>
  <li>metadata filters</li>
  <li>updates and deletes</li>
  <li>multitenancy or namespace isolation</li>
  <li>replication, scaling, and operations</li>
</ul>

<p>That makes a vector database more than an in-process nearest-neighbor library but less than a complete search product. It handles one stage of the pipeline well. The rest — chunking, lexical search, reranking, and application logic — lives outside it.</p>

<p>An <strong>ANN index</strong> (approximate nearest-neighbor index) is the data structure that makes vector lookup fast at scale. Given a query embedding, the goal is to retrieve the <strong>top-k</strong> vectors closest under a distance metric (often cosine distance or L2). Doing that <strong>exactly</strong> would require comparing the query to every stored vector, which is too slow and too memory-heavy when there are millions or billions of points in hundreds or thousands of dimensions. ANN indexes <strong>avoid scanning the full corpus</strong> by organizing vectors (for example via clustering, graphs, hashing, or compressed representations) so the engine visits only a small candidate set. The tradeoff is explicit: <strong>recall</strong> (how often the true nearest neighbors appear in the top-<em>k</em>) versus <strong>latency</strong>, memory, and ingest cost — tuned with index parameters and revisited as data and traffic grow.</p>

<p>For many applications, the decisions around chunking, hybrid retrieval, filters, and reranking matter more than the specific ANN backend.</p>

<hr />

<h2 id="embeddings-inputs-outputs-geometry">Embeddings: inputs, outputs, geometry</h2>

<p>Think of an embedding model as a function that maps a <strong>raw object</strong> to a <strong>point in d-dimensional space</strong>.</p>

<p>Typical pipeline:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Raw input → tokenizer / preprocessing → neural encoder → embedding vector
                                                      ↓
                    search · clustering · classification · recommendation · RAG
</code></pre></div></div>

<p>You rarely inspect the coordinates directly. What matters is <strong>relative position</strong>:</p>

<ul>
  <li>similar meaning → nearby vectors</li>
  <li>unrelated meaning → distant vectors</li>
</ul>

<p>Training usually tries to make this geometry useful for the task by <strong>pulling</strong> related examples together and <strong>pushing</strong> unrelated examples apart.</p>

<h3 id="common-training-patterns">Common training patterns</h3>

<ul>
  <li><strong>Masked / causal language modeling</strong> — predict missing or next tokens; useful representations emerge in the hidden states</li>
  <li><strong>Contrastive learning</strong> — positive pairs should be close, negatives far apart</li>
  <li><strong>Supervised classification with an embedding bottleneck</strong> — encoder → embedding → classifier</li>
  <li><strong>Triplet loss</strong> — anchor, positive, negative; enforce $d(\text{anchor}, \text{positive}) \ll d(\text{anchor}, \text{negative})$</li>
</ul>

<p>For retrieval-focused models, <strong>hard negatives</strong> usually matter a lot more than easy random negatives. For example, “Python list comprehension” vs “Python for loops tutorial” teaches a retrieval model much more than “Python list comprehension” vs “banana smoothie”.</p>

<hr />

<h2 id="choosing-an-embedding-model">Choosing an embedding model</h2>

<p>Choosing an embedding model is not just a benchmark exercise. It depends on the shape of the retrieval problem.</p>

<p>The main decision axes are:</p>

<ul>
  <li><strong>Modality</strong> — is the corpus text-only, image-heavy, or truly multimodal?</li>
  <li><strong>Task</strong> — is the goal retrieval, clustering, classification, recommendation, or reranking support?</li>
  <li><strong>Query/document asymmetry</strong> — should queries and stored documents use different embedding modes?</li>
  <li><strong>Domain</strong> — is the corpus general text, code, legal, finance, biomedical, or something else with specialized language?</li>
  <li><strong>Language coverage</strong> — is the corpus multilingual, or is cross-lingual retrieval important?</li>
  <li><strong>Dimension and storage cost</strong> — higher-dimensional vectors may improve quality, but they also increase storage, bandwidth, and retrieval cost</li>
  <li><strong>Latency, privacy, and deployment constraints</strong> — can the embeddings be generated through a hosted API, or do they need to run in a private environment?</li>
</ul>

<p>A practical sequence is:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>modality → task → domain → language coverage → query/document asymmetry → cost/latency → provider choice
</code></pre></div></div>

<p>For many teams, the right first move is to start with a strong general-purpose retrieval embedding model, measure it on a realistic evaluation set, and only then decide whether a domain-specific or multimodal model is justified.</p>

<p>One subtle but important point: query embeddings are not always the same as document embeddings. <a href="https://docs.cohere.com/reference/embed">Cohere’s Embed API</a> explicitly distinguishes <code class="language-plaintext highlighter-rouge">search_query</code> and <code class="language-plaintext highlighter-rouge">search_document</code>. <a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types">Vertex AI</a> supports task-type-aware embeddings for document retrieval, question answering, fact verification, clustering, and more.</p>

<p><a href="https://blog.voyageai.com/2026/01/15/voyage-4/">Voyage AI’s Voyage 4 announcement</a> describes <strong>asymmetric retrieval</strong>: the Voyage 4 models share one <strong>compatible embedding space</strong>, so vectors produced by different models (for example, queries with <code class="language-plaintext highlighter-rouge">voyage-4-lite</code> against documents indexed with <code class="language-plaintext highlighter-rouge">voyage-4-large</code>) still match under the same similarity search. That is useful when <strong>embedding the corpus is a one-time or infrequent cost</strong> but <strong>embedding queries is continuous at serving time</strong>—you can favor a larger model for stored documents and a smaller, faster model for live queries, trading a little operational complexity for better accuracy per dollar and lower query latency. The Voyage embeddings API still exposes <code class="language-plaintext highlighter-rouge">query</code> vs. <code class="language-plaintext highlighter-rouge">document</code> <code class="language-plaintext highlighter-rouge">input_type</code> for retrieval-oriented behavior.</p>

<h3 id="domain-specific-and-task-specific-embeddings">Domain-specific and task-specific embeddings</h3>

<p>A strong general-purpose embedding model is often the right place to start. But it is not always the right place to stop.</p>

<p>Some retrieval problems benefit from <strong>domain-specific embeddings</strong> because the meaning of similarity is different in different fields. Code, legal documents, financial text, and biomedical corpora often contain specialized language, structure, and relevance criteria that a generic model may not represent as well.</p>

<p>Task-specific behavior matters too. A model optimized for <strong>document retrieval</strong> may not be ideal for clustering, and a model optimized for <strong>queries</strong> may not be ideal for stored corpus documents.</p>

<p>In practice, the progression often looks like this:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>general retrieval model
→ realistic evaluation set
→ identify failure cases
→ domain-specific or task-specific model if needed
</code></pre></div></div>

<p>That sequence is usually better than prematurely fine-tuning or choosing a niche model before understanding the retrieval workload. <a href="https://docs.voyageai.com/docs/faq">Voyage</a> explicitly recommends domain-specific models for areas like law, finance, and code, while <a href="https://docs.cohere.com/docs/embeddings">Cohere</a> and <a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types">Vertex AI</a> expose task-aware embedding modes for retrieval and related use cases.</p>

<h3 id="embedding-provider-cheat-sheet">Embedding provider cheat sheet</h3>

<p>The choice of vector database is only half the story. The embedding provider matters just as much.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Provider</th>
      <th style="text-align: left">Strengths</th>
      <th style="text-align: left">Best fit</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong><a href="https://platform.openai.com/docs/api-reference/embeddings">OpenAI</a></strong></td>
      <td style="text-align: left">Strong general-purpose text embeddings; simple API; good default for many retrieval tasks</td>
      <td style="text-align: left">teams that want a straightforward hosted baseline</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong><a href="https://docs.cohere.com/docs/embeddings">Cohere</a></strong></td>
      <td style="text-align: left">Retrieval-oriented embedding stack; explicit query/document modes; strong semantic search and RAG ergonomics</td>
      <td style="text-align: left">semantic search and RAG systems that want query/document-aware embeddings</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong><a href="https://docs.voyageai.com/docs/embeddings">Voyage AI</a></strong></td>
      <td style="text-align: left">Retrieval-focused models; query/document modes; domain-specific models for code, law, and finance; contextualized chunk embeddings; multimodal support</td>
      <td style="text-align: left">teams optimizing retrieval quality in specialized domains</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong><a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings">Google Vertex AI</a></strong></td>
      <td style="text-align: left">Task-type-aware embeddings; configurable output dimensionality for text embeddings; multimodal embeddings for text, image, and video</td>
      <td style="text-align: left">teams already in GCP, or teams needing task-specific and multimodal support</td>
    </tr>
  </tbody>
</table>

<p>A useful way to think about providers is:</p>

<ul>
  <li><strong>OpenAI</strong> — strong general-purpose baseline</li>
  <li><strong>Cohere</strong> — retrieval-first and RAG-friendly</li>
  <li><strong>Voyage</strong> — retrieval specialist, especially for domain-specific workloads</li>
  <li><strong>Vertex AI</strong> — task-aware and multimodal, especially attractive inside Google Cloud</li>
</ul>

<p>Provider choice is not only about benchmark quality. It also depends on deployment model, privacy requirements, batch throughput, dimensionality control, multimodal support, and compliance constraints.</p>

<p>Embedding models are additionally available as <strong>open weights</strong> on <a href="https://huggingface.co/models">Hugging Face</a> for self-hosted inference, fine-tuning, or experimentation alongside the hosted providers above; which checkpoint to use still depends on modality, languages, license, and your own retrieval evaluation rather than any universal pick.</p>

<hr />

<h2 id="after-training-how-vectors-are-used">After training: how vectors are used</h2>

<p>Vectors support several kinds of applications:</p>

<ul>
  <li><strong>semantic search</strong> — nearest-neighbor retrieval</li>
  <li><strong>classification</strong> — linear probes or downstream heads</li>
  <li><strong>clustering</strong> — grouping similar items</li>
  <li><strong>recommendation</strong> — users and items embedded in a shared space</li>
  <li><strong>RAG</strong> — retrieve chunks to condition an LLM</li>
  <li><strong>code retrieval</strong> — retrieve relevant files, symbols, or chunks</li>
  <li><strong>multimodal search</strong> — align text and images or other modalities</li>
</ul>

<p>In many systems, vectors are the <strong>first-stage retriever</strong>, not the final answer generator.</p>

<hr />

<h2 id="multimodal-retrieval-when-ocr-is-not-enough">Multimodal retrieval: when OCR is not enough</h2>

<p>Many retrieval systems are described as “multimodal,” but there are really three different cases.</p>

<h3 id="1-text-only-retrieval">1. Text-only retrieval</h3>

<p>This is the simplest case. The corpus is already text, or can be reduced to text without losing much meaning.</p>

<p>Examples:</p>

<ul>
  <li>plain documents</li>
  <li>knowledge bases</li>
  <li>contracts</li>
  <li>source code</li>
  <li>emails</li>
</ul>

<h3 id="2-ocr-first-retrieval">2. OCR-first retrieval</h3>

<p>This is common for scanned PDFs and forms. The retrieval pipeline extracts text with OCR, then treats the result as ordinary text retrieval.</p>

<p>This works well when most of the important information is still captured in words.</p>

<h3 id="3-true-multimodal-retrieval">3. True multimodal retrieval</h3>

<p>This is needed when the <strong>visual structure itself carries meaning</strong>, not just the text.</p>

<p>Examples:</p>

<ul>
  <li>screenshots</li>
  <li>slide decks</li>
  <li>diagrams</li>
  <li>tables where layout matters</li>
  <li>charts and figures</li>
  <li>image-heavy PDFs</li>
  <li>document page images</li>
  <li>search by screenshot or image</li>
</ul>

<p>In these settings, OCR alone can lose important information. A multimodal embedding model can place text, images, and mixed inputs into a shared retrieval space.</p>

<p>This matters in practice because many enterprise corpora are only partially textual. A text-heavy invoice workflow may be well served by OCR plus text embeddings. A slide deck, dashboard screenshot, or visually complex form may require true multimodal retrieval.</p>

<p>A useful rule of thumb is:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if the meaning survives text extraction → OCR-first may be enough
if the meaning depends on layout, figures, or images → consider multimodal embeddings
</code></pre></div></div>

<p><a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings">Vertex AI multimodal embeddings</a> generate vectors from image, text, and video in a shared semantic space. <a href="https://docs.voyageai.com/docs/multimodal-embeddings">Voyage multimodal embeddings</a> support text and content-rich images such as figures, screenshots, slide decks, and document images. <a href="https://docs.cohere.com/reference/embed">Cohere embeddings</a> also support text, image, and mixed inputs for newer embedding models.</p>

<hr />

<h2 id="chunking-and-indexing-strategy">Chunking and indexing strategy</h2>

<p>Retrieval quality is often dominated by <strong>what</strong> gets indexed and <strong>how</strong> it gets chunked.</p>

<p>Questions to answer early:</p>

<ul>
  <li>What is the <strong>retrieval unit</strong>? A sentence, paragraph, page, section, table, file, or whole document?</li>
  <li>Should chunks <strong>overlap</strong>, or should they be strictly disjoint?</li>
  <li>Should metadata such as title, section name, page number, repo path, or document type be copied into every chunk?</li>
  <li>Should some structures — tables, forms, headers, footnotes, captions, code blocks — be indexed separately?</li>
  <li>Is there a parent-child relationship between chunks and larger source documents?</li>
</ul>

<p>The right strategy depends on the workload:</p>

<ul>
  <li><strong>enterprise document retrieval</strong> often benefits from section-aware, field-aware, or table-aware chunking</li>
  <li><strong>code retrieval</strong> often benefits from symbol-aware or file-aware chunking</li>
  <li><strong>slide decks and screenshots</strong> may require page-level or multimodal chunking</li>
  <li><strong>RAG</strong> often benefits from chunks that are small enough to retrieve precisely but large enough to preserve local context</li>
</ul>

<p>A useful heuristic is:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>chunk for the unit you want to retrieve, not the unit you happen to store
</code></pre></div></div>

<p>This is one reason why the retrieval stack is broader than the vector database. The database stores the vectors, but chunking determines what those vectors mean.</p>

<hr />

<h2 id="precision-recall-and-the-practical-knobs">Precision, recall, and the practical knobs</h2>

<p><strong>Precision</strong> — among returned hits, how many are relevant?</p>

\[\text{precision} = \frac{\lvert\text{relevant} \cap \text{returned}\rvert}{\lvert\text{returned}\rvert}\]

<p><strong>Recall</strong> — among all relevant items in the corpus, how many appear in the result set?</p>

\[\text{recall} = \frac{\lvert\text{relevant} \cap \text{returned}\rvert}{\lvert\text{relevant}\rvert}\]

<p>Usually there is tension between them:</p>

<ul>
  <li>stricter thresholds means: ↑ precision, ↓ recall</li>
  <li>broader retrieval means: ↑ recall, ↓ precision</li>
</ul>

<p>In practice, the biggest quality levers are often:</p>

<ol>
  <li>strong evaluation sets</li>
  <li>good chunking</li>
  <li>metadata modeling</li>
  <li>hard negatives</li>
  <li>hybrid lexical + vector retrieval</li>
  <li>reranking</li>
  <li>threshold and top-<em>K</em> tuning</li>
  <li>domain fine-tuning</li>
</ol>

<p>A good production recipe is usually:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eval set → chunking → stronger embeddings → filters → reranker → threshold / K → domain tuning
</code></pre></div></div>

<p>before exotic modeling.</p>

<hr />

<h2 id="bm25-and-lexical-search">BM25 and lexical search</h2>

<p>BM25 ranks documents for a <strong>keyword</strong> query. It rewards:</p>

<ul>
  <li>rare terms more than common terms</li>
  <li>multiple mentions, but with diminishing returns</li>
  <li>shorter, more focused documents over long noisy ones</li>
</ul>

<p>For each query $q$ and document $d$:</p>

\[\text{BM25}(q,d) =
\sum_{t \in q}
\underbrace{\ln \frac{N - n_t + 0.5}{n_t + 0.5}}_{\text{IDF}}
\cdot
\frac{f(t,d)(k_1+1)}
{f(t,d) + k_1\left(1 - b + b\frac{\lvert d \rvert}{\text{avgdl}}\right)}\]

<p>where $t \in q$ are query terms, $n_t$ is the number of documents containing $t$, $f(t,d)$ is term frequency, $\lvert d \rvert$ is document length, $N$ is corpus size, $\text{avgdl}$ is average document length across the corpus, and $k_1 \approx 1.2$–$2.0$ (term-frequency saturation) and $b \approx 0.75$ (length normalization) are the tunable parameters.</p>

<h3 id="bm25-vs-vectors">BM25 vs vectors</h3>

<ul>
  <li><strong>BM25</strong> is strong on exact terms, IDs, names, and phrases</li>
  <li><strong>vectors</strong> are strong on semantic similarity and paraphrase</li>
  <li><strong>hybrid</strong> often works best in production</li>
</ul>

<p>This is especially true in enterprise systems where both exact identifiers and fuzzy semantic matches matter.</p>

<hr />

<h2 id="the-real-retrieval-stack-lexical-vector-hybrid-reranking">The real retrieval stack: lexical, vector, hybrid, reranking</h2>

<p>Modern retrieval systems usually fit one of four patterns.</p>

<h3 id="1-pure-lexical-search">1. Pure lexical search</h3>

<p>Best when exact token matching dominates:</p>

<ul>
  <li>product codes</li>
  <li>case IDs</li>
  <li>SQL keywords</li>
  <li>API names</li>
  <li>legal citations</li>
</ul>

<h3 id="2-pure-vector-search">2. Pure vector search</h3>

<p>Best when semantic similarity dominates and exact tokens matter less:</p>

<ul>
  <li>recommendations</li>
  <li>some semantic FAQ lookup</li>
  <li>some multimodal applications</li>
</ul>

<h3 id="3-hybrid-search">3. Hybrid search</h3>

<p>Best when both matter:</p>

<ul>
  <li>enterprise documents</li>
  <li>code search</li>
  <li>support knowledge bases</li>
  <li>RAG over heterogeneous corpora</li>
</ul>

<h3 id="4-two-stage-retrieval">4. Two-stage retrieval</h3>

<p>Often best in serious systems:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>retriever → top-K candidates → reranker → final results
</code></pre></div></div>

<p>The retriever may be lexical, vector, or hybrid; the reranker adds precision.</p>

<p>Not all modern retrieval is just “dense vectors vs BM25.” Some systems also use <strong>sparse learned retrieval</strong>, <strong>late-interaction models</strong> such as ColBERT-style designs, or dedicated <strong>rerankers</strong> when ranking quality matters more than keeping the first-stage index simple.</p>

<hr />

<h2 id="ann-index-choices-and-tradeoffs">ANN index choices and tradeoffs</h2>

<p>Approximate nearest-neighbor search speeds up retrieval by giving up some exactness for much lower latency and cost.</p>

<p>The central tradeoffs are:</p>

<ul>
  <li><strong>exact vs approximate search</strong> — exact search gives maximum recall but can be too slow or expensive at scale</li>
  <li><strong>recall vs latency</strong> — more aggressive ANN settings are faster but may miss some true nearest neighbors</li>
  <li><strong>memory vs compression</strong> — some index types are memory-heavy, others compress vectors more aggressively</li>
  <li><strong>update cost vs query cost</strong> — some ANN structures are friendlier to frequent updates than others</li>
</ul>

<p>A few common patterns:</p>

<ul>
  <li><strong>HNSW</strong> — strong quality and very common in production vector search systems</li>
  <li><strong>IVF / PQ-style compression approaches</strong> — attractive when memory efficiency matters more than maximum recall</li>
  <li><strong>exact search</strong> — still useful for smaller corpora, evaluation, and some latency-insensitive workflows</li>
</ul>

<p>In practice, the right ANN choice depends on corpus size, update rate, latency budget, and the acceptable recall loss.</p>

<hr />

<h2 id="the-vector-database-landscape">The vector database landscape</h2>

<p>The term <strong>vector database</strong> is used loosely, but the landscape actually has three broad categories.</p>

<h3 id="1-dedicated-vector-databases--vector-engines">1. Dedicated vector databases / vector engines</h3>

<p>These are built primarily around vector storage and similarity search:</p>

<ul>
  <li><a href="https://docs.pinecone.io/guides/get-started/overview">Pinecone</a></li>
  <li>Milvus</li>
  <li>Qdrant</li>
  <li><a href="https://docs.weaviate.io/weaviate/concepts/search">Weaviate</a></li>
  <li>Turbopuffer</li>
  <li>Chroma</li>
  <li>LanceDB</li>
</ul>

<p>These vary in maturity, operational model, and how strongly they also support lexical or hybrid search.</p>

<h3 id="2-search-engines-with-strong-vector-support">2. Search engines with strong vector support</h3>

<p>These are broader search/ranking systems that also handle vectors well:</p>

<ul>
  <li><a href="https://docs.vespa.ai/en/querying/nearest-neighbor-search-guide.html">Vespa</a></li>
  <li><a href="https://docs.opensearch.org/latest/search-plugins/keyword-search/">OpenSearch</a></li>
  <li>Elasticsearch</li>
</ul>

<p>These often make more sense when search, ranking, and serving are core to the product.</p>

<h3 id="3-general-databases-with-vector-support">3. General databases with vector support</h3>

<p>These keep vectors close to operational data:</p>

<ul>
  <li><a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/">MongoDB Vector Search</a></li>
  <li>Postgres + pgvector</li>
</ul>

<p>These are attractive when data locality and operational simplicity matter more than having a specialized search stack.</p>

<hr />

<h2 id="a-conceptual-map-of-the-major-systems">A conceptual map of the major systems</h2>

<h3 id="pinecone-milvus-qdrant">Pinecone, Milvus, Qdrant</h3>

<p>These are easiest to think of as <strong>dedicated vector database products</strong>.</p>

<p>They are often chosen when the main need is:</p>

<ul>
  <li>vector similarity search</li>
  <li>metadata filtering</li>
  <li>scalable ANN</li>
  <li>production operational support</li>
</ul>

<h3 id="weaviate">Weaviate</h3>

<p>Weaviate is best thought of as a <strong>vector-native / AI database</strong> with strong built-in support for:</p>

<ul>
  <li>vector search</li>
  <li>keyword search</li>
  <li>hybrid search</li>
</ul>

<p>That makes it a strong middle ground between “pure vector DB” and “full search engine.”</p>

<p>See also: <a href="https://docs.weaviate.io/weaviate/concepts/search">Weaviate concepts</a></p>

<h3 id="turbopuffer">Turbopuffer</h3>

<p>Turbopuffer is best thought of as a <strong>retrieval substrate</strong> optimized for large-scale search over vectors and metadata, with support for full-text and hybrid behavior as well.</p>

<p>It is especially interesting for workloads with:</p>

<ul>
  <li>many isolated namespaces</li>
  <li>high update rates</li>
  <li>lots of small chunks</li>
  <li>low-latency nearest-neighbor retrieval</li>
</ul>

<h3 id="vespa">Vespa</h3>

<p>Vespa is not best understood as a vector database. It is an open-source <strong>search + ranking + serving engine</strong>.</p>

<p>Its strength is not just storing vectors, but combining:</p>

<ul>
  <li>lexical retrieval</li>
  <li>vector retrieval</li>
  <li>filters</li>
  <li>business logic</li>
  <li>multi-stage ranking</li>
  <li>serving logic</li>
</ul>

<p>Vespa is a strong choice when <strong>relevance engineering</strong> is central.</p>

<h3 id="opensearch-and-elasticsearch">OpenSearch and Elasticsearch</h3>

<p>OpenSearch and Elasticsearch are not “pure vector DBs” either. They are Lucene-based distributed search engines that support:</p>

<ul>
  <li>BM25 full-text search</li>
  <li>filters and aggregations</li>
  <li>vector search</li>
  <li>hybrid search patterns</li>
</ul>

<p>They are especially strong when traditional search features matter alongside vectors.</p>

<h3 id="mongodb-atlas-search-and-vector-search">MongoDB Atlas Search and Vector Search</h3>

<p>MongoDB provides both:</p>

<ul>
  <li><a href="https://www.mongodb.com/docs/atlas/atlas-search/">Atlas Search</a> for Lucene-backed lexical search</li>
  <li><a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/">Atlas Vector Search</a> for semantic nearest-neighbor retrieval</li>
</ul>

<p>These are strongest when MongoDB is already the system of record and the goal is to keep retrieval close to application data and aggregation pipelines.</p>

<h3 id="postgres--pgvector">Postgres + pgvector</h3>

<p>This is often the simplest option when:</p>

<ul>
  <li>the app already uses Postgres</li>
  <li>scale is moderate</li>
  <li>operational simplicity matters</li>
  <li>vector retrieval is important, but not the entire product</li>
</ul>

<p>It is frequently a very good default for early-stage products.</p>

<hr />

<h2 id="how-different-products-use-different-retrieval-systems">How different products use different retrieval systems</h2>

<p>The best way to understand the landscape is by application shape.</p>

<h3 id="perplexity-search-and-ranking-are-the-product">Perplexity: search and ranking are the product</h3>

<p>Perplexity publicly describes using <a href="https://blog.vespa.ai/perplexity-builds-ai-search-at-scale-on-vespa-ai/">Vespa</a> to power AI search at scale.</p>

<p>That makes sense because Perplexity’s problem is not just semantic retrieval. It is closer to:</p>

<ul>
  <li>search engine retrieval</li>
  <li>ranking</li>
  <li>freshness</li>
  <li>structured filtering</li>
  <li>serving at scale</li>
</ul>

<p>This is a natural fit for a search-and-ranking engine rather than a pure vector DB.</p>

<h3 id="cursor-code-retrieval-is-a-chunked-nearest-neighbor-problem">Cursor: code retrieval is a chunked nearest-neighbor problem</h3>

<p><a href="https://cursor.com/security">Cursor publicly documents</a> using Turbopuffer for codebase indexing: chunk files, embed them, store vectors plus obfuscated metadata, then perform nearest-neighbor search at inference time.</p>

<p>This also makes sense. Cursor’s problem looks like:</p>

<ul>
  <li>lots of small code chunks</li>
  <li>high churn</li>
  <li>many user/repo namespaces</li>
  <li>metadata filters such as path and line range</li>
  <li>extremely fast retrieval</li>
</ul>

<p>That shape favors a fast retrieval substrate over a heavy search-engine stack.</p>

<h3 id="mongodb-integrated-database--search">MongoDB: integrated database + search</h3>

<p>MongoDB’s model is different. It says: keep your operational data in MongoDB, and add lexical and vector retrieval in the same platform.</p>

<p>This is strongest when the system already needs:</p>

<ul>
  <li>document storage</li>
  <li>app data</li>
  <li>workflow state</li>
  <li>search</li>
  <li>vector retrieval</li>
  <li>filters and aggregation</li>
</ul>

<p>with minimal extra infrastructure.</p>

<h3 id="docrouter-style-document-retrieval">DocRouter-style document retrieval</h3>

<p>A DocRouter-style workload is usually <strong>not</strong> just a vector search problem. It is a <strong>document retrieval and workflow problem</strong>.</p>

<p>Typical needs include:</p>

<ul>
  <li>exact IDs and exact phrases</li>
  <li>semantic similarity</li>
  <li>metadata filters</li>
  <li>grouped or field-aware retrieval</li>
  <li>hybrid search</li>
  <li>reranking</li>
  <li>explainability</li>
  <li>workflow and permission logic</li>
</ul>

<p>That usually means the right architecture is:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lexical retrieval
+ vector retrieval
+ metadata filters
+ reranking
+ application logic
</code></pre></div></div>

<p>not just “pick a vector DB.”</p>

<hr />

<h2 id="workload-tilt-docrouter-style-vs-code-editor-retrieval">Workload tilt: DocRouter-style vs code-editor retrieval</h2>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Dimension</th>
      <th style="text-align: left">Document / workflow retrieval</th>
      <th style="text-align: left">Code-editor style retrieval</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>Unit</strong></td>
      <td style="text-align: left">sections, tables, form regions, document families</td>
      <td style="text-align: left">small code chunks, symbols, files</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Churn</strong></td>
      <td style="text-align: left">moderate; often batch ingest + reprocessing</td>
      <td style="text-align: left">very high incremental updates</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Metadata</strong></td>
      <td style="text-align: left">doc type, tenant, date, workflow state, vendor, case</td>
      <td style="text-align: left">repo, branch, path, language, symbol type</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Permissions</strong></td>
      <td style="text-align: left">often critical</td>
      <td style="text-align: left">often critical</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Exact-match need</strong></td>
      <td style="text-align: left">very high</td>
      <td style="text-align: left">high</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Hybrid need</strong></td>
      <td style="text-align: left">very high</td>
      <td style="text-align: left">high</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Typical stack lean</strong></td>
      <td style="text-align: left">MongoDB / Weaviate / Vespa</td>
      <td style="text-align: left">Turbopuffer / Weaviate / Qdrant / Pinecone</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>When ranking is strategic</strong></td>
      <td style="text-align: left">Vespa becomes especially attractive</td>
      <td style="text-align: left">Vespa can matter, but is often heavier than needed</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="feature-matrix-high-level">Feature matrix (high level)</h2>

<table>
  <thead>
    <tr>
      <th style="text-align: left"> </th>
      <th style="text-align: left">MongoDB Atlas Search / Vector Search</th>
      <th style="text-align: left">Weaviate</th>
      <th style="text-align: left">Vespa</th>
      <th style="text-align: left">Turbopuffer</th>
      <th style="text-align: left">Pinecone / Qdrant / Milvus</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>Core identity</strong></td>
      <td style="text-align: left">document DB + embedded search</td>
      <td style="text-align: left">vector / AI database + hybrid</td>
      <td style="text-align: left">search + ranking + serving</td>
      <td style="text-align: left">retrieval substrate</td>
      <td style="text-align: left">dedicated vector DB</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Lexical search</strong></td>
      <td style="text-align: left">strong</td>
      <td style="text-align: left">strong</td>
      <td style="text-align: left">strong</td>
      <td style="text-align: left">some / hybrid-friendly</td>
      <td style="text-align: left">varies</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Vector search</strong></td>
      <td style="text-align: left">yes</td>
      <td style="text-align: left">yes</td>
      <td style="text-align: left">yes</td>
      <td style="text-align: left">yes</td>
      <td style="text-align: left">yes</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Hybrid search</strong></td>
      <td style="text-align: left">composable</td>
      <td style="text-align: left">first-class</td>
      <td style="text-align: left">composable and powerful</td>
      <td style="text-align: left">supported</td>
      <td style="text-align: left">varies</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Custom ranking depth</strong></td>
      <td style="text-align: left">moderate</td>
      <td style="text-align: left">moderate</td>
      <td style="text-align: left">very strong</td>
      <td style="text-align: left">lower</td>
      <td style="text-align: left">lower to moderate</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Namespace-heavy workloads</strong></td>
      <td style="text-align: left">moderate</td>
      <td style="text-align: left">strong</td>
      <td style="text-align: left">possible</td>
      <td style="text-align: left">very strong</td>
      <td style="text-align: left">strong</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Best fit</strong></td>
      <td style="text-align: left">app data already in MongoDB</td>
      <td style="text-align: left">semantic + hybrid RAG</td>
      <td style="text-align: left">search/ranking as product core</td>
      <td style="text-align: left">code/chunk retrieval</td>
      <td style="text-align: left">dedicated ANN workloads</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="choosing-the-right-system-by-application-need">Choosing the right system by application need</h2>

<h3 id="choose-mongodb-search--vector-search-when">Choose MongoDB Search / Vector Search when</h3>

<ul>
  <li>MongoDB is already the system of record</li>
  <li>you want minimal infrastructure sprawl</li>
  <li>metadata-heavy filtering and app integration matter</li>
  <li>retrieval is important, but not a standalone serving product</li>
</ul>

<h3 id="choose-weaviate-when">Choose Weaviate when</h3>

<ul>
  <li>retrieval is central to the application</li>
  <li>you want strong keyword + vector + hybrid search</li>
  <li>you want a dedicated retrieval database without going all the way to a search-engine platform</li>
</ul>

<h3 id="choose-vespa-when">Choose Vespa when</h3>

<ul>
  <li>search and ranking are core differentiators</li>
  <li>you want multi-stage ranking and richer relevance engineering</li>
  <li>the product looks more like search/recommendation/serving than a CRUD app with vectors</li>
</ul>

<h3 id="choose-turbopuffer-when">Choose Turbopuffer when</h3>

<ul>
  <li>the workload is mostly fast chunk retrieval</li>
  <li>there are many namespaces or tenants</li>
  <li>metadata filters matter</li>
  <li>the application looks like a code assistant, retrieval substrate, or large-scale vector index</li>
</ul>

<h3 id="choose-pinecone-qdrant-or-milvus-when">Choose Pinecone, Qdrant, or Milvus when</h3>

<ul>
  <li>you want a dedicated vector database</li>
  <li>the main need is vector retrieval plus filtering</li>
  <li>you do not need the full complexity of a search-engine platform</li>
</ul>

<h3 id="choose-pgvector-when">Choose pgvector when</h3>

<ul>
  <li>you already use Postgres</li>
  <li>scale is moderate</li>
  <li>simplicity matters</li>
  <li>vector retrieval is important, but not the center of the platform</li>
</ul>

<h3 id="choose-opensearch-or-elasticsearch-when">Choose OpenSearch or Elasticsearch when</h3>

<ul>
  <li>you already need classic search-engine features</li>
  <li>lexical search remains central</li>
  <li>vectors are an addition to a search stack, not the entire product</li>
</ul>

<hr />

<h2 id="metrics-that-matter">Metrics that matter</h2>

<p>Beyond raw ANN benchmarks, what actually matters depends on the application.</p>

<h3 id="retrieval-metrics">Retrieval metrics</h3>

<ul>
  <li><strong>Precision@K</strong> — of the top K results returned, what fraction are actually relevant?</li>
  <li><strong>Recall@K</strong> — of all relevant items in the corpus, what fraction appear in the top K?</li>
  <li><strong>MRR (Mean Reciprocal Rank)</strong> — averages the reciprocal of the rank of the first relevant result across queries</li>
  <li><strong>MAP (Mean Average Precision)</strong> — summarizes ranking quality and coverage across recall levels</li>
  <li><strong>NDCG (Normalized Discounted Cumulative Gain)</strong> — rewards highly relevant results appearing early and supports graded relevance</li>
</ul>

<h3 id="system-metrics">System metrics</h3>

<ul>
  <li>latency</li>
  <li>throughput</li>
  <li>freshness</li>
  <li>update cost</li>
  <li>filter correctness</li>
  <li>multitenancy behavior</li>
  <li>operational burden</li>
</ul>

<h3 id="application-specific-metrics">Application-specific metrics</h3>

<ul>
  <li>document systems: exact identifier retrieval, section relevance, workflow correctness</li>
  <li>code assistants: chunk relevance, namespace isolation, freshness after edits</li>
  <li>consumer search: ranking quality, freshness, personalization, serving latency</li>
</ul>

<hr />

<h2 id="failure-modes">Failure modes</h2>

<p>Common retrieval failures include:</p>

<ul>
  <li>easy negatives instead of hard negatives</li>
  <li>bad chunking</li>
  <li>poor metadata modeling</li>
  <li>domain mismatch</li>
  <li>ignoring lexical exact-match needs</li>
  <li>too much faith in vector similarity alone</li>
  <li>evaluating only on easy benchmarks</li>
  <li>no reranking</li>
  <li>no permission or tenant filtering</li>
</ul>

<p>Many disappointing “vector database” results are actually failures of the surrounding retrieval design.</p>

<hr />

<h2 id="practical-takeaway">Practical takeaway</h2>

<p>The most important conceptual point is:</p>

<p>A <strong>vector database</strong> is usually not the right abstraction to optimize first.</p>

<p>For most real systems, the better question is:</p>

<p><strong>What retrieval architecture does this application need?</strong></p>

<ul>
  <li>For <strong>consumer AI search</strong>, search and ranking engines like <strong>Vespa</strong> may be the right center of gravity.</li>
  <li>For <strong>code retrieval</strong>, fast namespace-heavy ANN systems like <strong>Turbopuffer</strong> may be a better fit.</li>
  <li>For <strong>application-integrated retrieval</strong>, <strong>MongoDB Search / Vector Search</strong> or <strong>pgvector</strong> may be simplest.</li>
  <li>For <strong>semantic + hybrid retrieval as a product</strong>, <strong>Weaviate</strong>, <strong>Qdrant</strong>, <strong>Pinecone</strong>, or <strong>Milvus</strong> may be the right class.</li>
  <li>For <strong>enterprise document retrieval</strong>, the answer is often <strong>hybrid lexical + vector + filters + reranking</strong>, not just “choose a vector DB.”</li>
</ul>

<p><em>At <strong>DocRouter.AI</strong> we treat document pipelines as retrieval-shaped problems: exact fields, semantics, metadata, and workflow context—not “vector search alone.”</em></p>

<p><em>Tags, prompts, and structured extraction are how teams operationalize that stack on real documents.</em></p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="ai" /><category term="engineering" /><summary type="html"><![CDATA[An end-to-end view of the AI retrieval stack: embeddings, vector databases, hybrid search, chunking, and how to choose systems by workload.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/ai-retrieval-stack-splash.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/ai-retrieval-stack-splash.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Self-Hosted MongoDB on Kubernetes with Atlas Search (mongot)</title><link href="https://analytiqhub.com/tech/kubernetes/devops/mongodb/self-hosted-mongodb-kubernetes-atlas-search/" rel="alternate" type="text/html" title="Self-Hosted MongoDB on Kubernetes with Atlas Search (mongot)" /><published>2026-03-08T00:00:00+00:00</published><updated>2026-03-08T00:00:00+00:00</updated><id>https://analytiqhub.com/tech/kubernetes/devops/mongodb/self-hosted-mongodb-kubernetes-atlas-search</id><content type="html" xml:base="https://analytiqhub.com/tech/kubernetes/devops/mongodb/self-hosted-mongodb-kubernetes-atlas-search/"><![CDATA[<p>For air-gapped environments, on-premises clusters, or any deployment where MongoDB Atlas is not an option, you can run a production-grade MongoDB replica set with optional <strong>Atlas Search</strong> (full-text and vector indexes) entirely inside Kubernetes. This post describes the <a href="https://github.com/analytiq-hub/analytiq-charts"><code class="language-plaintext highlighter-rouge">mongodb-atlas-local</code></a> Helm chart and the operational details we learned running it on EKS and elsewhere.</p>

<p>If you’re new to Kubernetes, the <a href="/tech/kubernetes/devops/kubernetes-for-docker-users-primer/">Kubernetes for Docker Users primer</a> covers Pods, Deployments, Services, PVCs, and Helm basics. For packaging and GitOps, see <a href="/tech/kubernetes/devops/kubernetes-packaging-helm-gitops/">Kubernetes Packaging and Deployment</a>.</p>

<h2 id="why-not-bitnami">Why not Bitnami?</h2>

<p>The obvious choice for an in-cluster MongoDB is the Bitnami chart, which is widely used and simple to install. The problem is <strong>vector search</strong>. Applications that need semantic search or Atlas-style indexes require the <code class="language-plaintext highlighter-rouge">mongot</code> process — a sidecar that runs alongside <code class="language-plaintext highlighter-rouge">mongod</code> and handles full-text and vector indexes. Bitnami deploys a plain community MongoDB without <code class="language-plaintext highlighter-rouge">mongot</code>, so Atlas Search is simply not available.</p>

<p>The only supported path to <code class="language-plaintext highlighter-rouge">mongot</code> in a self-hosted environment is the <a href="https://github.com/mongodb/mongodb-kubernetes-operator">MongoDB Kubernetes Operator</a>, which introduces the <code class="language-plaintext highlighter-rouge">MongoDBCommunity</code> and <code class="language-plaintext highlighter-rouge">MongoDBSearch</code> custom resources. The operator manages the StatefulSet, replica set initialization, user creation, and TLS — and, when <code class="language-plaintext highlighter-rouge">MongoDBSearch</code> is enabled, injects the <code class="language-plaintext highlighter-rouge">mongot</code> sidecar with the right configuration.</p>

<p>Our chart wraps the operator’s CRDs with sensible defaults and a single <code class="language-plaintext highlighter-rouge">helm upgrade --install</code> interface, so operators don’t need to understand the operator’s internals to get a working cluster. You can run MongoDB with or without search; if you don’t need vector or full-text search, you can disable the <code class="language-plaintext highlighter-rouge">mongot</code> sidecar and save resources.</p>

<h2 id="two-phase-install">Two-phase install</h2>

<p><code class="language-plaintext highlighter-rouge">mongot</code> requires a running, authenticated replica set to connect to — it cannot start on a fresh cluster. The install therefore happens in two phases:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Phase 1: bring up the replica set without search</span>
helm upgrade <span class="nt">--install</span> mongodb oci://ghcr.io/analytiq-hub/mongodb-atlas-local <span class="se">\</span>
  <span class="nt">--version</span> 2.0.1 <span class="nt">--namespace</span> mongodb <span class="se">\</span>
  <span class="nt">--set</span> mongodb.adminPassword<span class="o">=</span><span class="s2">"..."</span> <span class="se">\</span>
  <span class="nt">--set</span> mongodb.appUser.password<span class="o">=</span><span class="s2">"..."</span> <span class="se">\</span>
  <span class="nt">--set</span> search.enabled<span class="o">=</span><span class="nb">false</span>

<span class="c"># Wait for replica set Ready</span>
kubectl <span class="nb">wait</span> <span class="nt">--for</span><span class="o">=</span><span class="nv">condition</span><span class="o">=</span>ready pod <span class="nt">-l</span> <span class="nv">app</span><span class="o">=</span>mongodb-mongodb-atlas-local <span class="se">\</span>
  <span class="nt">-n</span> mongodb <span class="nt">--timeout</span><span class="o">=</span>300s

<span class="c"># Phase 2: enable search</span>
helm upgrade mongodb oci://ghcr.io/analytiq-hub/mongodb-atlas-local <span class="se">\</span>
  <span class="nt">--version</span> 2.0.1 <span class="nt">--namespace</span> mongodb <span class="nt">--reuse-values</span> <span class="se">\</span>
  <span class="nt">--set</span> search.enabled<span class="o">=</span><span class="nb">true</span>
</code></pre></div></div>

<p>Attempting a single-phase install with <code class="language-plaintext highlighter-rouge">search.enabled=true</code> results in <code class="language-plaintext highlighter-rouge">mongot</code> crash-looping because the replica set isn’t ready to accept its connection.</p>

<h2 id="node-sizing-for-stateful-workloads">Node sizing for stateful workloads</h2>

<p>Adding MongoDB changes the cluster sizing arithmetic considerably. Each replica pod runs two containers: <code class="language-plaintext highlighter-rouge">mongod</code> (500m CPU, 400Mi) and <code class="language-plaintext highlighter-rouge">mongodb-agent</code> (500m CPU, 400Mi), plus a <code class="language-plaintext highlighter-rouge">mongot</code> sidecar (250m CPU, 250Mi) when search is enabled. A 3-replica set therefore requests ~2.25 vCPU and ~3.15 Gi of memory, on top of whatever other workloads you run.</p>

<p>The scheduler must fit the entire pod on one node. On a cluster with two <code class="language-plaintext highlighter-rouge">t3.medium</code> nodes (2 vCPU / 4 Gi each), if existing workloads already consume ~1.7 vCPU in requests, there may be ~2.2 vCPU free across both nodes — but never more than ~740m on a single node. A MongoDB pod that needs ~750m CPU cannot be scheduled. Adding a third node (or sizing nodes with enough headroom) resolves it.</p>

<p>The practical lesson: <strong>account for stateful pods when sizing the initial node group</strong>, or ensure the autoscaler can provision new nodes quickly enough not to block workloads.</p>

<h2 id="ebs-csi-driver-and-the-gp2-trap-eks">EBS CSI Driver and the gp2 trap (EKS)</h2>

<p>When we added MongoDB to an EKS cluster, PVCs sat in <code class="language-plaintext highlighter-rouge">Pending</code> indefinitely with the error:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>no persistent volumes available for this claim and no storage class is set
</code></pre></div></div>

<p>EKS creates a <code class="language-plaintext highlighter-rouge">gp2</code> StorageClass by default, but it has two problems. First, it is not marked as the default class — PVCs with an empty <code class="language-plaintext highlighter-rouge">storageClassName</code> get no provisioner assigned. Second, and more importantly, <code class="language-plaintext highlighter-rouge">gp2</code> uses the legacy in-tree <code class="language-plaintext highlighter-rouge">kubernetes.io/aws-ebs</code> provisioner, which was removed in Kubernetes 1.27. On EKS 1.35, it is simply gone.</p>

<p>The fix is to create a <code class="language-plaintext highlighter-rouge">gp3</code> StorageClass backed by the EBS CSI driver (<code class="language-plaintext highlighter-rouge">ebs.csi.aws.com</code>) and mark it as the cluster default:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">storage.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">StorageClass</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">gp3</span>
  <span class="na">annotations</span><span class="pi">:</span>
    <span class="na">storageclass.kubernetes.io/is-default-class</span><span class="pi">:</span> <span class="s2">"</span><span class="s">true"</span>
<span class="na">provisioner</span><span class="pi">:</span> <span class="s">ebs.csi.aws.com</span>
<span class="na">volumeBindingMode</span><span class="pi">:</span> <span class="s">WaitForFirstConsumer</span>
<span class="na">allowVolumeExpansion</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">parameters</span><span class="pi">:</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">gp3</span>
  <span class="na">encrypted</span><span class="pi">:</span> <span class="s2">"</span><span class="s">true"</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">WaitForFirstConsumer</code> is important — it delays EBS volume creation until the pod is actually scheduled to a node, which ensures the volume is created in the correct availability zone. <code class="language-plaintext highlighter-rouge">allowVolumeExpansion: true</code> enables online resizing without pod restarts.</p>

<p>Provision this StorageClass (and the EBS CSI driver) via Terraform or your preferred IaC so new clusters get it automatically.</p>

<h2 id="summary">Summary</h2>

<table>
  <thead>
    <tr>
      <th>Topic</th>
      <th>Takeaway</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Chart</strong></td>
      <td><code class="language-plaintext highlighter-rouge">mongodb-atlas-local</code> on <a href="https://github.com/analytiq-hub/analytiq-charts">analytiq-charts</a> — replica set + optional <code class="language-plaintext highlighter-rouge">mongot</code> for Atlas Search</td>
    </tr>
    <tr>
      <td><strong>Install</strong></td>
      <td>Two-phase: bring up replica set with <code class="language-plaintext highlighter-rouge">search.enabled=false</code>, then enable search</td>
    </tr>
    <tr>
      <td><strong>Sizing</strong></td>
      <td>Reserve enough CPU/memory per node for the full MongoDB pod; scheduler places whole pod on one node</td>
    </tr>
    <tr>
      <td><strong>EKS storage</strong></td>
      <td>Use a <code class="language-plaintext highlighter-rouge">gp3</code> StorageClass with <code class="language-plaintext highlighter-rouge">ebs.csi.aws.com</code>; don’t rely on the default <code class="language-plaintext highlighter-rouge">gp2</code></td>
    </tr>
  </tbody>
</table>

<p>We use this chart for <a href="https://docrouter.ai">Doc Router</a> and other applications that need MongoDB with vector search. For the full Doc Router deployment story (Helm chart, workers, CI/CD, multi-cloud), see <a href="/tech/kubernetes/devops/docrouter/deploying-doc-router-on-kubernetes/">Deploying Doc Router on Kubernetes</a>.</p>

<hr />

<p><em>Andrei Radulescu-Banu is the founder of <a href="https://docrouter.ai">DocRouter.AI</a> (document processing with LLMs) and <a href="https://sigagent.ai">SigAgent.AI</a> (Claude Agent monitoring). His company <a href="https://analytiqhub.com">AnalytiqHub.com</a> provides consulting services for cloud and AI engineering.</em></p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="tech" /><category term="kubernetes" /><category term="devops" /><category term="mongodb" /><summary type="html"><![CDATA[Run a production-grade MongoDB replica set with optional Atlas Search (vector and full-text) inside Kubernetes — for air-gapped, on-prem, or any environment where Atlas isn't an option.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/self-hosted-mongodb-kubernetes-atlas-search-splash.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/self-hosted-mongodb-kubernetes-atlas-search-splash.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Deploying Doc Router on Kubernetes: From Docker Compose to EKS and Digital Ocean</title><link href="https://analytiqhub.com/tech/kubernetes/devops/docrouter/deploying-doc-router-on-kubernetes/" rel="alternate" type="text/html" title="Deploying Doc Router on Kubernetes: From Docker Compose to EKS and Digital Ocean" /><published>2026-03-07T00:00:00+00:00</published><updated>2026-03-07T00:00:00+00:00</updated><id>https://analytiqhub.com/tech/kubernetes/devops/docrouter/deploying-doc-router-on-kubernetes</id><content type="html" xml:base="https://analytiqhub.com/tech/kubernetes/devops/docrouter/deploying-doc-router-on-kubernetes/"><![CDATA[<p>We recently added production-grade Kubernetes support to Doc Router. This post walks through the key decisions and challenges we encountered along the way.</p>

<p>If you’re new to Kubernetes, start with <a href="/tech/kubernetes/devops/kubernetes-for-docker-users-primer/">Kubernetes for Docker Users: A Practical Primer</a>, which covers the core concepts — Pods, Deployments, Services, Namespaces, Secrets, PVCs, Helm, and Kind — before diving into this post. For packaging and GitOps (Kustomize, Helm, Flux), see <a href="/tech/kubernetes/devops/kubernetes-packaging-helm-gitops/">Kubernetes Packaging and Deployment</a>.</p>

<h2 id="why-kubernetes">Why Kubernetes?</h2>

<p>Doc Router was originally deployed using Docker Compose, which worked well for single-node setups. As we started onboarding enterprise customers with availability and scalability requirements, we needed:</p>

<ul>
  <li><strong>Horizontal scaling</strong> — multiple replicas behind a load balancer</li>
  <li><strong>Automated failover</strong> — pods restarted on failure without manual intervention</li>
  <li><strong>Rolling deployments</strong> — zero-downtime upgrades</li>
  <li><strong>Resource isolation</strong> — CPU and memory limits per component</li>
</ul>

<h2 id="architecture">Architecture</h2>

<p>The production deployment consists of two main workloads:</p>

<ul>
  <li><strong>Frontend</strong> — Next.js server (SSR + API routes via NextAuth)</li>
  <li><strong>Backend</strong> — FastAPI application with embedded background workers</li>
</ul>

<p>Both run as Kubernetes Deployments behind a shared nginx ingress with TLS terminated by cert-manager (Let’s Encrypt).</p>

<p>MongoDB can run outside the cluster (MongoDB Atlas) or in-cluster via our <a href="https://github.com/analytiq-hub/analytiq-charts"><code class="language-plaintext highlighter-rouge">mongodb-atlas-local</code></a> Helm chart — see <a href="/tech/kubernetes/devops/mongodb/self-hosted-mongodb-kubernetes-atlas-search/">Self-Hosted MongoDB on Kubernetes with Atlas Search</a> for the install guide. AWS S3 remains an external dependency.</p>

<h2 id="helm-chart">Helm Chart</h2>

<p>We packaged the deployment as a Helm chart (<code class="language-plaintext highlighter-rouge">deploy/charts/doc-router</code>) published to GitHub Container Registry (ghcr.io) as an OCI artifact. The chart is versioned independently of the Docker images, so we can update deployment configuration without rebuilding the application.</p>

<p>Key design decisions:</p>

<ul>
  <li><strong>Single <code class="language-plaintext highlighter-rouge">values.yaml</code></strong> with sensible defaults — operators override only what differs per cluster</li>
  <li><strong>ConfigMap for non-secret config</strong> — <code class="language-plaintext highlighter-rouge">NEXTAUTH_URL</code>, <code class="language-plaintext highlighter-rouge">FASTAPI_ROOT_PATH</code>, worker count, S3 bucket</li>
  <li><strong>Kubernetes Secret for credentials</strong> — MongoDB URI, API keys, NextAuth secret — created by the deploy script, never stored in the chart</li>
  <li><strong>Ingress host derived from <code class="language-plaintext highlighter-rouge">APP_HOST</code></strong> — a single variable drives the entire URL configuration</li>
</ul>

<h2 id="choosing-a-container-registry">Choosing a Container Registry</h2>

<p>We evaluated two natural options: <strong>Amazon ECR</strong> (since we’re already on AWS/EKS) and <strong>GitHub Container Registry (ghcr.io)</strong> (since our source is on GitHub).</p>

<p><strong>ECR</strong> has one significant operational advantage for EKS: nodes authenticate via IAM role, so there is no image pull secret to manage. Costs are low — $0.10/GB stored, with no data transfer charge for pulls within the same AWS region. However, ECR is tightly coupled to AWS. A second deployment on Digital Ocean or a customer’s on-premises cluster would need separate registry credentials and mirroring, making it a poor fit for a multi-cloud or self-hosted product.</p>

<p><strong>ghcr.io</strong> is cloud-neutral — any cluster anywhere can pull images with a single token. It integrates naturally with GitHub Actions (the <code class="language-plaintext highlighter-rouge">GITHUB_TOKEN</code> secret already has <code class="language-plaintext highlighter-rouge">packages: write</code> permission), so publishing images is zero-configuration. The chart package also appears directly on the repository’s GitHub page alongside the source code and releases, which is the right home for an open-source project.</p>

<p>The catch: ghcr.io packages are <strong>private by default</strong> for organizations, and GitHub’s free tier includes only 500 MB storage and 1 GB transfer per month. For clusters that pull large images repeatedly, those limits are reached quickly. Making packages public eliminates the cost entirely, but requires an organization admin to enable public package creation in the org settings — it is disabled by default.</p>

<p>We chose ghcr.io and made our packages public. The images contain no secrets — only application code — so public visibility is appropriate and keeps infrastructure simple. Clusters pull anonymously with no credentials required.</p>

<p>For customers who need private images (for example, an enterprise build with proprietary integrations), the <code class="language-plaintext highlighter-rouge">REGISTRY_PROVIDER</code> variable in the overlay <code class="language-plaintext highlighter-rouge">.env</code> file can be switched to <code class="language-plaintext highlighter-rouge">aws</code> or <code class="language-plaintext highlighter-rouge">do</code> to use ECR or Digital Ocean Container Registry instead, with registry login handled automatically by the deploy scripts.</p>

<h2 id="merging-workers-into-fastapi">Merging Workers into FastAPI</h2>

<p>The original architecture ran the background workers (OCR, LLM, KB indexing, webhooks) as a separate process alongside uvicorn. In Kubernetes, this meant each backend pod ran two Python processes, consuming ~375 MB of memory.</p>

<p>We merged the workers into the FastAPI lifespan using <code class="language-plaintext highlighter-rouge">asyncio.create_task</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">asynccontextmanager</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">lifespan</span><span class="p">(</span><span class="n">app</span><span class="p">):</span>
    <span class="c1"># startup
</span>    <span class="n">worker_tasks</span> <span class="o">=</span> <span class="n">start_workers</span><span class="p">(</span><span class="n">n_workers</span><span class="p">)</span>
    <span class="k">yield</span>
    <span class="c1"># shutdown
</span>    <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">worker_tasks</span><span class="p">:</span>
        <span class="n">task</span><span class="p">.</span><span class="n">cancel</span><span class="p">()</span>
    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">gather</span><span class="p">(</span><span class="o">*</span><span class="n">worker_tasks</span><span class="p">,</span> <span class="n">return_exceptions</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>

<p>This halved per-pod memory usage (~190 MB) and eliminated the process management overhead. The workers share the same event loop as the API, which is safe because all worker I/O is already async.</p>

<h2 id="worker-polling-optimization">Worker Polling Optimization</h2>

<p>With multiple replicas, each pod runs a full set of worker coroutines polling MongoDB queues. At idle with 4 workers per pod, that was ~80 MongoDB queries per second cluster-wide.</p>

<p>We implemented exponential backoff with shared state across parallel workers:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_queue_idle_sleep</span><span class="p">:</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">float</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>  <span class="c1"># shared across all workers on a queue
</span>
<span class="c1"># on idle: back off
</span><span class="n">sleep</span> <span class="o">=</span> <span class="n">_queue_idle_sleep</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"ocr"</span><span class="p">,</span> <span class="n">POLL_MIN_SLEEP</span><span class="p">)</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">sleep</span><span class="p">)</span>
<span class="n">_queue_idle_sleep</span><span class="p">[</span><span class="s">"ocr"</span><span class="p">]</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">sleep</span> <span class="o">*</span> <span class="mi">2</span><span class="p">,</span> <span class="n">POLL_MAX_SLEEP</span><span class="p">)</span>

<span class="c1"># on message found: reset for all workers on this queue
</span><span class="n">_queue_idle_sleep</span><span class="p">[</span><span class="s">"ocr"</span><span class="p">]</span> <span class="o">=</span> <span class="n">POLL_MIN_SLEEP</span>
</code></pre></div></div>

<p>This reduces idle polling to near-zero while keeping response latency low when work arrives.</p>

<h2 id="graceful-shutdown">Graceful Shutdown</h2>

<p>When Kubernetes scales down a pod (HPA scale-in or rolling update), it sends SIGTERM. We needed in-flight jobs to be marked as failed rather than silently abandoned.</p>

<p>Since workers are asyncio tasks, cancellation arrives as <code class="language-plaintext highlighter-rouge">asyncio.CancelledError</code> — a <code class="language-plaintext highlighter-rouge">BaseException</code>, not caught by <code class="language-plaintext highlighter-rouge">except Exception</code>. We added explicit handling in each worker:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span><span class="p">:</span>
    <span class="k">await</span> <span class="n">ad</span><span class="p">.</span><span class="n">msg_handlers</span><span class="p">.</span><span class="n">process_ocr_msg</span><span class="p">(</span><span class="n">analytiq_client</span><span class="p">,</span> <span class="n">msg</span><span class="p">)</span>
<span class="k">except</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">CancelledError</span><span class="p">:</span>
    <span class="n">logger</span><span class="p">.</span><span class="n">warning</span><span class="p">(</span><span class="sa">f</span><span class="s">"Worker cancelled mid-flight on msg </span><span class="si">{</span><span class="n">msg</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'_id'</span><span class="p">)</span><span class="si">}</span><span class="s">, marking failed"</span><span class="p">)</span>
    <span class="k">await</span> <span class="n">ad</span><span class="p">.</span><span class="n">queue</span><span class="p">.</span><span class="n">delete_msg</span><span class="p">(</span><span class="n">analytiq_client</span><span class="p">,</span> <span class="s">"ocr"</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">msg</span><span class="p">[</span><span class="s">"_id"</span><span class="p">]),</span> <span class="n">status</span><span class="o">=</span><span class="s">"failed"</span><span class="p">)</span>
    <span class="k">raise</span>  <span class="c1"># allow the task to actually cancel
</span></code></pre></div></div>

<p>The failed job can then be retried on another pod.</p>

<h2 id="database-migrations-as-a-helm-pre-upgrade-hook">Database Migrations as a Helm Pre-Upgrade Hook</h2>

<p>Running database migrations safely in a multi-replica environment requires that migrations complete before any new application code starts serving traffic. In Docker Compose this is handled by startup ordering, but in Kubernetes rolling updates, new pods can start before old ones are gone — with no guarantee about migration timing.</p>

<p>We solved this with a Helm hook Job that runs <code class="language-plaintext highlighter-rouge">migrate.py</code> using the same backend image, annotated to execute before the upgrade rolls out:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">annotations</span><span class="pi">:</span>
  <span class="s2">"</span><span class="s">helm.sh/hook"</span><span class="err">:</span> <span class="s">pre-upgrade,pre-rollback</span>
  <span class="s">"helm.sh/hook-weight"</span><span class="err">:</span> <span class="s2">"</span><span class="s">-5"</span>
  <span class="s2">"</span><span class="s">helm.sh/hook-delete-policy"</span><span class="err">:</span> <span class="s">hook-succeeded,before-hook-creation</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">pre-upgrade</code> hook ensures migrations run and complete successfully before Helm touches any Deployment. If the migration Job fails, Helm aborts the upgrade entirely — the old version keeps running. <code class="language-plaintext highlighter-rouge">hook-delete-policy: hook-succeeded</code> cleans up the completed Job automatically, keeping the namespace tidy. The <code class="language-plaintext highlighter-rouge">before-hook-creation</code> policy ensures the old Job is removed if a previous run left one behind.</p>

<p>One subtlety: at pre-upgrade time, the ConfigMap has not yet been updated by Helm (hooks run before regular resources). The migration Job therefore mounts only the Secret — which contains <code class="language-plaintext highlighter-rouge">MONGODB_URI</code> — and not the ConfigMap:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">envFrom</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">secretRef</span><span class="pi">:</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">doc-router-secrets</span>
<span class="c1"># ConfigMap intentionally omitted — not yet updated at hook time</span>
</code></pre></div></div>

<p>This means <code class="language-plaintext highlighter-rouge">migrate.py</code> must be written to need only the database connection string, with no dependency on application config values.</p>

<p>The result is a safe, atomic upgrade sequence: <strong>migrate → roll out new pods → terminate old pods</strong> — with automatic rollback if the migration fails.</p>

<h2 id="hpa-tuning">HPA Tuning</h2>

<p>We configured Horizontal Pod Autoscaler on the backend with both CPU and memory targets:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">metrics</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">type</span><span class="pi">:</span> <span class="s">Resource</span>
  <span class="na">resource</span><span class="pi">:</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">cpu</span>
    <span class="na">target</span><span class="pi">:</span>
      <span class="na">type</span><span class="pi">:</span> <span class="s">Utilization</span>
      <span class="na">averageUtilization</span><span class="pi">:</span> <span class="m">80</span>
<span class="pi">-</span> <span class="na">type</span><span class="pi">:</span> <span class="s">Resource</span>
  <span class="na">resource</span><span class="pi">:</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">memory</span>
    <span class="na">target</span><span class="pi">:</span>
      <span class="na">type</span><span class="pi">:</span> <span class="s">Utilization</span>
      <span class="na">averageUtilization</span><span class="pi">:</span> <span class="m">80</span>
</code></pre></div></div>

<p>A subtle issue: HPA scale-down uses <code class="language-plaintext highlighter-rouge">ceil(currentReplicas × currentUtil / targetUtil)</code>. With 5 pods at 72% memory utilization against an 80% target, <code class="language-plaintext highlighter-rouge">ceil(5 × 72/80) = ceil(4.5) = 5</code> — the ceiling arithmetic created a deadlock where the cluster could never scale below 5 pods.</p>

<p>The fix was increasing the memory request from 512 Mi to 768 Mi. After the worker merge reduced actual usage to ~190 MB, utilization dropped to ~25% — well below the threshold — and the cluster scaled back down to the minimum of 2 replicas.</p>

<h2 id="environment-configuration">Environment Configuration</h2>

<p>Next.js <code class="language-plaintext highlighter-rouge">NEXT_PUBLIC_*</code> variables are baked into the browser bundle at build time, not injected at runtime. This caused a subtle bug: our local <code class="language-plaintext highlighter-rouge">.env.local</code> file set <code class="language-plaintext highlighter-rouge">NEXT_PUBLIC_FASTAPI_FRONTEND_URL=http://127.0.0.1:8000</code>. Because <code class="language-plaintext highlighter-rouge">.env.local</code> wasn’t listed in <code class="language-plaintext highlighter-rouge">.dockerignore</code>, it was copied into the Docker build context and read by Next.js during <code class="language-plaintext highlighter-rouge">npm run build</code> — silently overriding the intended production value and baking the localhost URL into every image.</p>

<p>We fixed this in two steps:</p>

<ol>
  <li>
    <p><strong>Exclude all <code class="language-plaintext highlighter-rouge">.env.*</code> files from the Docker build context</strong> by adding <code class="language-plaintext highlighter-rouge">**/.env.*</code> to <code class="language-plaintext highlighter-rouge">.dockerignore</code>, so local development env files can never leak into images.</p>
  </li>
  <li>
    <p><strong>Remove <code class="language-plaintext highlighter-rouge">NEXT_PUBLIC_FASTAPI_FRONTEND_URL</code> entirely.</strong> Rather than baking an absolute URL into the bundle, the frontend now always calls <code class="language-plaintext highlighter-rouge">/fastapi</code> — a relative path that works from any hostname. Next.js rewrites proxy <code class="language-plaintext highlighter-rouge">/fastapi/:path*</code> to the backend service URL at the server layer:</p>
  </li>
</ol>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// next.config.mjs</span>
<span class="k">async</span> <span class="nx">rewrites</span><span class="p">()</span> <span class="p">{</span>
  <span class="k">return</span> <span class="p">[{</span>
    <span class="na">source</span><span class="p">:</span> <span class="dl">'</span><span class="s1">/fastapi/:path*</span><span class="dl">'</span><span class="p">,</span>
    <span class="na">destination</span><span class="p">:</span> <span class="s2">`</span><span class="p">${</span><span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">FASTAPI_BACKEND_URL</span><span class="p">}</span><span class="s2">/fastapi/:path*`</span><span class="p">,</span>
  <span class="p">}];</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">FASTAPI_BACKEND_URL</code> is a server-side runtime variable (not <code class="language-plaintext highlighter-rouge">NEXT_PUBLIC_</code>) pointing to the in-cluster backend service (<code class="language-plaintext highlighter-rouge">http://backend.&lt;namespace&gt;.svc.cluster.local:8000</code>). It is never exposed to the browser. The result is a truly environment-agnostic frontend image that requires no rebuild when moving between clusters.</p>

<h2 id="cicd-pipeline">CI/CD Pipeline</h2>

<h3 id="structure">Structure</h3>

<p>We use three GitHub Actions workflows:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">backend-tests.yml</code></strong> — runs Python tests against a local MongoDB Atlas instance (with vector search via <code class="language-plaintext highlighter-rouge">mongodb-atlas-local</code>) plus TypeScript tests. Triggered by <code class="language-plaintext highlighter-rouge">workflow_call</code> or <code class="language-plaintext highlighter-rouge">workflow_dispatch</code>.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">frontend-build.yml</code></strong> — runs <code class="language-plaintext highlighter-rouge">npm run build</code> for the Next.js frontend. Also triggered by <code class="language-plaintext highlighter-rouge">workflow_call</code> or <code class="language-plaintext highlighter-rouge">workflow_dispatch</code>.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">ci.yml</code></strong> — runs both test workflows on every pull request to <code class="language-plaintext highlighter-rouge">main</code>.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">release.yml</code></strong> — triggered on semver tags (<code class="language-plaintext highlighter-rouge">v[0-9]*.[0-9]*.[0-9]*</code>). Runs both test workflows first, then builds and pushes Docker images if they pass.</li>
</ul>

<h3 id="why-semver-tags-not-branch-pushes">Why semver tags, not branch pushes</h3>

<p>An early version of the pipeline ran tests on every push to <code class="language-plaintext highlighter-rouge">main</code> and triggered builds from there. This caused two problems:</p>

<ol>
  <li><strong>Tests ran twice per release</strong> — once on the branch push, once triggered by the tag.</li>
  <li><strong>The tag trigger didn’t wait for tests</strong> — if a tag was pushed immediately after a commit, the build could race ahead of a still-running test run.</li>
</ol>

<p>The current design avoids both: <code class="language-plaintext highlighter-rouge">release.yml</code> is only triggered by a semver tag, and the <code class="language-plaintext highlighter-rouge">build-push</code> job declares <code class="language-plaintext highlighter-rouge">needs: [test-backend, test-frontend]</code>, so Docker images are never built unless all tests pass on that exact commit. Tests run exactly once per release.</p>

<p>The <code class="language-plaintext highlighter-rouge">ci.yml</code> workflow handles the PR gate separately — developers get test feedback on their branch without triggering a build.</p>

<h3 id="reusable-test-workflows">Reusable test workflows</h3>

<p>Making the test workflows <code class="language-plaintext highlighter-rouge">workflow_call</code>-able (rather than duplicating the job definitions in both <code class="language-plaintext highlighter-rouge">ci.yml</code> and <code class="language-plaintext highlighter-rouge">release.yml</code>) keeps the test logic in one place. Both workflows call the same definitions; any change to the test steps is automatically reflected in both gates.</p>

<p><code class="language-plaintext highlighter-rouge">workflow_dispatch</code> is kept on each test workflow so that individual test suites can be re-run manually from the GitHub Actions UI without needing to push a commit or tag.</p>

<h3 id="image-tagging">Image tagging</h3>

<p>The build step computes image tags from the git tag:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">TAG="$"</span>          <span class="c1"># e.g. v27.0.1-rc2 or v27.0.1</span>
<span class="s">FRONTEND_TAGS="${FRONTEND}:${TAG}"</span>
<span class="c1"># :latest only for stable releases (no pre-release suffix)</span>
<span class="s">if [[ "$TAG" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then</span>
  <span class="s">FRONTEND_TAGS="${FRONTEND_TAGS},${FRONTEND}:latest"</span>
<span class="s">fi</span>
</code></pre></div></div>

<p>Release candidates (<code class="language-plaintext highlighter-rouge">v27.0.1-rc2</code>) get a versioned tag only. Stable releases (<code class="language-plaintext highlighter-rouge">v27.0.1</code>) also update <code class="language-plaintext highlighter-rouge">:latest</code>. This means a cluster running <code class="language-plaintext highlighter-rouge">:latest</code> auto-updates on the next <code class="language-plaintext highlighter-rouge">helm upgrade</code>, while a cluster pinned to a specific tag is unaffected.</p>

<h3 id="helm-chart-publishing-is-manual">Helm chart publishing is manual</h3>

<p>The Helm chart is published separately with <code class="language-plaintext highlighter-rouge">./deploy/scripts/publish-chart.sh &lt;overlay&gt;</code>. We kept this manual for two reasons: the chart version is independent of the app version (you might push 10 image releases without any chart changes), and publishing the chart is a deliberate operator action — it should not happen automatically on every tag.</p>

<h2 id="egress-ips-and-external-service-whitelisting">Egress IPs and External Service Whitelisting</h2>

<p>A practical difference between EKS and DOKS emerged when connecting to MongoDB Atlas, which requires IP whitelisting for all incoming connections.</p>

<p><strong>On EKS</strong>, the cluster’s private node group sits behind a single NAT gateway. All outbound traffic from every pod — regardless of which node it runs on — exits through one stable public IP. Adding that single IP to MongoDB Atlas’s allowlist is all that’s needed, and the IP never changes when nodes are replaced or the cluster scales.</p>

<p><strong>On DOKS</strong>, there is no NAT gateway by default. Each node is assigned its own public IP, and pods reach the internet directly through the node they’re scheduled on. This means:</p>

<ul>
  <li>There is no single egress IP — the source address MongoDB sees depends on which node the backend pod happens to be running on.</li>
  <li>With two nodes, you need two IPs in the allowlist. With autoscaling, new nodes get new IPs, and the allowlist breaks until you add them.</li>
</ul>

<p>For a fixed-size dev cluster, the workaround is to whitelist all current node IPs. For a production DOKS cluster with autoscaling, the correct solution is to provision a <strong>Digital Ocean Load Balancer as a NAT gateway</strong>, routing all cluster egress through a single stable IP. This adds ~$12/month but is the only reliable option when the external service requires a static source address.</p>

<p>For our dev cluster (<code class="language-plaintext highlighter-rouge">doc-router-dev</code>), we whitelist the two node IPs directly. For production DOKS deployments, a managed NAT gateway is required.</p>

<h2 id="overlay-based-deploy-scripts">Overlay-based Deploy Scripts</h2>

<p>Rather than a one-size-fits-all deploy script, we use an overlay pattern:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.env              # shared defaults (local dev values)
.env.eks-test     # overrides for the test EKS cluster
.env.eks-prod     # overrides for production
</code></pre></div></div>

<p>The deploy scripts (<code class="language-plaintext highlighter-rouge">k8s-deploy.sh</code>, <code class="language-plaintext highlighter-rouge">build-push.sh</code>) accept an overlay name and source both files, with the overlay taking precedence. A single variable — <code class="language-plaintext highlighter-rouge">APP_HOST</code> — drives all URL configuration, making it straightforward to add a new environment. <code class="language-plaintext highlighter-rouge">k8s-deploy.sh</code> is idempotent — it uses <code class="language-plaintext highlighter-rouge">helm upgrade --install</code> and handles both fresh installs and rolling updates without any distinction.</p>

<h2 id="whats-next">What’s Next</h2>

<ul>
  <li><strong>On-premises distribution</strong> — Helm chart and images are public on ghcr.io; self-hosted MongoDB is available via the <a href="https://github.com/analytiq-hub/analytiq-charts"><code class="language-plaintext highlighter-rouge">mongodb-atlas-local</code></a> chart (see <a href="/tech/kubernetes/devops/mongodb/self-hosted-mongodb-kubernetes-atlas-search/">Self-Hosted MongoDB on Kubernetes with Atlas Search</a>); documentation for a one-command on-prem install is the next step</li>
  <li><strong>Offline license keys</strong> — JWT-based licenses signed with a private key, verified against a public key baked into the image, for air-gapped installations</li>
  <li><strong>Multi-cloud support</strong> — Digital Ocean Kubernetes is now supported alongside EKS; Azure Kubernetes Service support is planned</li>
</ul>

<hr />

<p><em>Andrei Radulescu-Banu is the founder of <a href="https://docrouter.ai">DocRouter.AI</a> (document processing with LLMs) and <a href="https://sigagent.ai">SigAgent.AI</a> (Claude Agent monitoring). His company <a href="https://analytiqhub.com">AnalytiqHub.com</a> provides consulting services for cloud and AI engineering.</em></p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="tech" /><category term="kubernetes" /><category term="devops" /><category term="docrouter" /><summary type="html"><![CDATA[Production-grade Kubernetes support for Doc Router: key decisions, Helm chart, worker merging, graceful shutdown, and multi-cloud deployment.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/deploying-doc-router-kubernetes-splash.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/deploying-doc-router-kubernetes-splash.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Kubernetes Packaging and Deployment: Kustomize, Helm, and GitOps</title><link href="https://analytiqhub.com/tech/kubernetes/devops/kubernetes-packaging-helm-gitops/" rel="alternate" type="text/html" title="Kubernetes Packaging and Deployment: Kustomize, Helm, and GitOps" /><published>2026-03-06T00:00:00+00:00</published><updated>2026-03-06T00:00:00+00:00</updated><id>https://analytiqhub.com/tech/kubernetes/devops/kubernetes-packaging-helm-gitops</id><content type="html" xml:base="https://analytiqhub.com/tech/kubernetes/devops/kubernetes-packaging-helm-gitops/"><![CDATA[<p>This is the second part of the Kubernetes primer series. The <a href="/tech/kubernetes/devops/kubernetes-for-docker-users-primer/">first part</a> covered the core building blocks — Pods, Deployments, Services, Secrets, PVCs, and Helm basics. This part goes deeper into the two dominant approaches to packaging Kubernetes manifests, and then introduces GitOps as an alternative to running deploy scripts manually.</p>

<hr />

<h2 id="the-manifest-problem">The manifest problem</h2>

<p>A real Kubernetes application needs dozens of YAML files: Deployments, Services, ConfigMaps, Secrets, Ingress rules, HorizontalPodAutoscalers, PodDisruptionBudgets. Writing them by hand is feasible once, but the moment you need the same app running in three environments — local, staging, production — you face a choice:</p>

<ul>
  <li><strong>Copy the files for each environment</strong> and keep them in sync manually (fragile)</li>
  <li><strong>Use a tool that handles the variation</strong> for you</li>
</ul>

<p>Two tools dominate: <strong>Kustomize</strong> and <strong>Helm</strong>. They solve the same problem differently, and many projects use both — Helm for third-party software, Kustomize for their own app.</p>

<hr />

<h2 id="kustomize--layered-yaml-patches">Kustomize — layered YAML patches</h2>

<p>Kustomize ships with <code class="language-plaintext highlighter-rouge">kubectl</code> (no install needed) and works with plain YAML. The idea is a <strong>base + overlays</strong> structure:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>manifests/
  base/
    deployment.yaml      # canonical deployment
    service.yaml
    kustomization.yaml   # lists the resources
  overlays/
    dev/
      kustomization.yaml # patches for dev
      patch-replicas.yaml
    prod/
      kustomization.yaml # patches for prod
      patch-replicas.yaml
      patch-resources.yaml
</code></pre></div></div>

<p>The base defines the resource once. Each overlay patches only what differs. A typical patch looks like:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># overlays/prod/patch-replicas.yaml</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Deployment</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">backend</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">replicas</span><span class="pi">:</span> <span class="m">4</span>       <span class="c1"># override base value of 2</span>
</code></pre></div></div>

<p>To deploy the prod overlay:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl apply <span class="nt">-k</span> overlays/prod/
</code></pre></div></div>

<p>Kustomize merges the base YAML with all patches before sending anything to the API server. You always see plain, readable YAML — there is no templating language to learn, and the output is predictable.</p>

<h3 id="variable-substitution">Variable substitution</h3>

<p>For values that vary by environment (hostnames, image tags, resource sizes), Kustomize offers <code class="language-plaintext highlighter-rouge">substituteFrom</code>: it reads variables from a ConfigMap or Secret and injects them into the manifests at apply time:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># kustomization.yaml</span>
<span class="na">configurations</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s">var-references.yaml</span>
<span class="na">vars</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">APP_DOMAIN</span>
    <span class="na">objref</span><span class="pi">:</span>
      <span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
      <span class="na">name</span><span class="pi">:</span> <span class="s">project-values</span>
      <span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
    <span class="na">fieldref</span><span class="pi">:</span>
      <span class="na">fieldpath</span><span class="pi">:</span> <span class="s">data.domain</span>
</code></pre></div></div>

<p>This is less flexible than Helm’s full templating but keeps the YAML closer to what Kubernetes actually receives.</p>

<h3 id="what-kustomize-does-not-do">What Kustomize does not do</h3>

<p>Kustomize has no concept of a release, no revision history, and no built-in rollback. If you apply a broken overlay, you must fix it and reapply, or manually apply a previous version. For the same reason, there is no <code class="language-plaintext highlighter-rouge">--atomic</code> safety net — if a deployment fails mid-rollout, you notice from <code class="language-plaintext highlighter-rouge">kubectl</code> output, not from the packaging tool.</p>

<hr />

<h2 id="helm--templated-packages">Helm — templated packages</h2>

<p>Helm wraps Kubernetes YAML in a full templating engine (Go templates) and adds lifecycle management on top. A chart is a directory:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>doc-router/
  Chart.yaml          # name, version, appVersion
  values.yaml         # default values
  templates/
    deployment.yaml   # Go template
    service.yaml
    ingress.yaml
    _helpers.tpl      # reusable template fragments
</code></pre></div></div>

<p>A template looks like:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># templates/deployment.yaml</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">replicas</span><span class="pi">:</span> 
  <span class="na">template</span><span class="pi">:</span>
    <span class="na">spec</span><span class="pi">:</span>
      <span class="na">containers</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">backend</span>
          <span class="na">image</span><span class="pi">:</span> <span class="s2">"</span><span class="s">:"</span>
          <span class="na">resources</span><span class="pi">:</span>
            <span class="na">requests</span><span class="pi">:</span>
              <span class="na">cpu</span><span class="pi">:</span> 
</code></pre></div></div>

<p>To install with custom values:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm upgrade <span class="nt">--install</span> doc-router ./doc-router <span class="se">\</span>
  <span class="nt">--set</span> <span class="nv">replicaCount</span><span class="o">=</span>4 <span class="se">\</span>
  <span class="nt">--set</span> image.tag<span class="o">=</span>v1.2.3
</code></pre></div></div>

<p>Or via an override file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm upgrade <span class="nt">--install</span> doc-router ./doc-router <span class="nt">-f</span> values-prod.yaml
</code></pre></div></div>

<h3 id="release-history-and-rollback">Release history and rollback</h3>

<p>Helm records every install and upgrade as a numbered revision in the cluster. You can inspect history and roll back:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm <span class="nb">history </span>doc-router <span class="nt">-n</span> doc-router
helm rollback doc-router 2 <span class="nt">-n</span> doc-router   <span class="c"># back to revision 2</span>
</code></pre></div></div>

<p>With <code class="language-plaintext highlighter-rouge">--atomic</code>, a failed upgrade automatically triggers a rollback — the old version keeps running uninterrupted.</p>

<h3 id="publishing-charts-as-oci-artifacts">Publishing charts as OCI artifacts</h3>

<p>A packaged chart can be pushed to any OCI-compatible registry (ghcr.io, ECR, Docker Hub) and pulled from anywhere:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm push doc-router-0.3.7.tgz oci://ghcr.io/analytiq-hub
helm upgrade <span class="nt">--install</span> doc-router oci://ghcr.io/analytiq-hub/doc-router <span class="nt">--version</span> 0.3.7
</code></pre></div></div>

<p>This means a customer cluster can install your app with a single command, pulling both the chart and images from the same registry, with no Git access required.</p>

<hr />

<h2 id="kustomize-vs-helm--when-to-use-each">Kustomize vs Helm — when to use each</h2>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Kustomize</th>
      <th>Helm</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Learning curve</td>
      <td>Low — just YAML</td>
      <td>Higher — Go templates + chart structure</td>
    </tr>
    <tr>
      <td>Flexibility</td>
      <td>Patches and substitutions</td>
      <td>Full templating, conditionals, loops</td>
    </tr>
    <tr>
      <td>Release history</td>
      <td>None</td>
      <td>Built-in, per-revision</td>
    </tr>
    <tr>
      <td>Rollback</td>
      <td>Manual</td>
      <td><code class="language-plaintext highlighter-rouge">helm rollback</code></td>
    </tr>
    <tr>
      <td>Failure safety</td>
      <td>None</td>
      <td><code class="language-plaintext highlighter-rouge">--atomic</code> auto-rollback</td>
    </tr>
    <tr>
      <td>Publishing</td>
      <td>OCI artifact via Flux</td>
      <td><code class="language-plaintext highlighter-rouge">helm push</code> to any OCI registry</td>
    </tr>
    <tr>
      <td>Best for</td>
      <td>Your own first-party manifests</td>
      <td>Distributable packages, third-party software</td>
    </tr>
  </tbody>
</table>

<p>In practice many projects use both: Helm for installing third-party dependencies (ingress-nginx, cert-manager, MongoDB operator), and Kustomize for their own application manifests. The two are compatible — a Kustomize overlay can reference a Helm chart as a generator.</p>

<hr />

<h2 id="gitops--the-cluster-manages-itself">GitOps — the cluster manages itself</h2>

<p>Both Kustomize and Helm, as described so far, are <strong>imperative</strong>: a human (or a CI job) runs a command that pushes changes into the cluster. GitOps flips this model.</p>

<p>In GitOps, the desired cluster state is declared in a Git repository (or an OCI artifact registry). A controller running <em>inside</em> the cluster continuously watches that source and reconciles actual state to match it. No one runs <code class="language-plaintext highlighter-rouge">helm upgrade</code> — the cluster pulls its own updates.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Developer pushes to Git / CI pushes OCI artifact
         ↓
  Source of truth updated
         ↓
  In-cluster controller detects drift
         ↓
  Controller applies the diff
         ↓
  Cluster matches desired state
</code></pre></div></div>

<p>The key property: <strong>the cluster self-heals</strong>. If someone manually deletes a Deployment or edits a ConfigMap, the controller notices the drift and reverts it within seconds. The Git repo (or OCI artifact) is always the authoritative source.</p>

<hr />

<h2 id="flux--a-gitops-controller">Flux — a GitOps controller</h2>

<p><strong>Flux</strong> is one of the two dominant GitOps controllers (the other is Argo CD). It runs as a set of controllers in the cluster and watches sources:</p>

<h3 id="sources">Sources</h3>

<p>Flux can watch:</p>
<ul>
  <li><strong>Git repositories</strong> — on every push, Flux reconciles the cluster</li>
  <li><strong>OCI artifact registries</strong> — on every <code class="language-plaintext highlighter-rouge">flux push artifact</code>, Flux pulls and applies</li>
  <li><strong>Helm repositories</strong> — for managing Helm releases declaratively</li>
</ul>

<h3 id="core-resources">Core resources</h3>

<p><strong>GitRepository / OCIRepository</strong> — defines where Flux watches:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">source.toolkit.fluxcd.io/v1beta2</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">OCIRepository</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">my-app</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">flux-system</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">interval</span><span class="pi">:</span> <span class="s">1m</span>
  <span class="na">url</span><span class="pi">:</span> <span class="s">oci://123456789.dkr.ecr.us-east-1.amazonaws.com/my-app-manifests</span>
  <span class="na">ref</span><span class="pi">:</span>
    <span class="na">tag</span><span class="pi">:</span> <span class="s">latest</span>
</code></pre></div></div>

<p><strong>Kustomization</strong> — tells Flux what to apply from the source:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">kustomize.toolkit.fluxcd.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Kustomization</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">my-app</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">flux-system</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">interval</span><span class="pi">:</span> <span class="s">5m</span>
  <span class="na">sourceRef</span><span class="pi">:</span>
    <span class="na">kind</span><span class="pi">:</span> <span class="s">OCIRepository</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">my-app</span>
  <span class="na">path</span><span class="pi">:</span> <span class="s">./manifests/kubernetes/overlays/prod</span>
  <span class="na">prune</span><span class="pi">:</span> <span class="no">true</span>      <span class="c1"># delete resources removed from source</span>
  <span class="na">healthChecks</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
      <span class="na">kind</span><span class="pi">:</span> <span class="s">Deployment</span>
      <span class="na">name</span><span class="pi">:</span> <span class="s">backend</span>
      <span class="na">namespace</span><span class="pi">:</span> <span class="s">my-app</span>
</code></pre></div></div>

<p><strong>HelmRelease</strong> — manages a Helm release declaratively:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">helm.toolkit.fluxcd.io/v2beta1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">HelmRelease</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">ingress-nginx</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">flux-system</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">interval</span><span class="pi">:</span> <span class="s">1h</span>
  <span class="na">chart</span><span class="pi">:</span>
    <span class="na">spec</span><span class="pi">:</span>
      <span class="na">chart</span><span class="pi">:</span> <span class="s">ingress-nginx</span>
      <span class="na">version</span><span class="pi">:</span> <span class="s2">"</span><span class="s">4.11.3"</span>
      <span class="na">sourceRef</span><span class="pi">:</span>
        <span class="na">kind</span><span class="pi">:</span> <span class="s">HelmRepository</span>
        <span class="na">name</span><span class="pi">:</span> <span class="s">ingress-nginx</span>
  <span class="na">values</span><span class="pi">:</span>
    <span class="na">controller</span><span class="pi">:</span>
      <span class="na">replicaCount</span><span class="pi">:</span> <span class="m">2</span>
</code></pre></div></div>

<h3 id="cicd-with-flux">CI/CD with Flux</h3>

<p>A typical Flux-based pipeline looks like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. Developer opens a PR
2. CI runs tests
3. PR merged to main
4. CI builds Docker image → pushes to ECR
5. CI packages Kustomize manifests as OCI artifact → flux push artifact → ECR
6. Flux detects new artifact version
7. Flux applies manifests to cluster
8. Cluster rolls out new Deployment
</code></pre></div></div>

<p>Steps 6–8 happen automatically, inside the cluster, with no deploy script and no human intervention.</p>

<h3 id="flux-vs-running-deploy-scripts">Flux vs running deploy scripts</h3>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Shell script (<code class="language-plaintext highlighter-rouge">helm upgrade</code>)</th>
      <th>Flux GitOps</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Who initiates deploy</td>
      <td>Human or CI job</td>
      <td>Cluster controller</td>
    </tr>
    <tr>
      <td>Drift detection</td>
      <td>None — manual kubectl needed</td>
      <td>Continuous — auto-reverts</td>
    </tr>
    <tr>
      <td>Audit trail</td>
      <td>CI logs</td>
      <td>Git history + Flux events</td>
    </tr>
    <tr>
      <td>Rollback</td>
      <td><code class="language-plaintext highlighter-rouge">helm rollback</code></td>
      <td>Revert commit, Flux reconciles</td>
    </tr>
    <tr>
      <td>Complexity</td>
      <td>Low — just a shell script</td>
      <td>Higher — Flux controllers + CRDs</td>
    </tr>
    <tr>
      <td>Air-gapped / on-prem</td>
      <td>Simple</td>
      <td>Requires Flux + registry access</td>
    </tr>
  </tbody>
</table>

<p>GitOps is the right choice for teams with multiple people deploying to shared clusters, or for production environments where drift must be detected and prevented. For a small team or a self-hosted product where simplicity matters, shell scripts with <code class="language-plaintext highlighter-rouge">helm upgrade --install</code> are easier to understand, debug, and hand off to a customer.</p>

<hr />

<h2 id="summary">Summary</h2>

<table>
  <thead>
    <tr>
      <th>Tool</th>
      <th>Role</th>
      <th>Key strength</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Kustomize</strong></td>
      <td>Overlay-based YAML patching</td>
      <td>Plain YAML, no templates, built into kubectl</td>
    </tr>
    <tr>
      <td><strong>Helm</strong></td>
      <td>Templated package manager</td>
      <td>Release history, rollback, publishable charts</td>
    </tr>
    <tr>
      <td><strong>Flux</strong></td>
      <td>GitOps controller</td>
      <td>Self-healing cluster, drift detection, no manual deploys</td>
    </tr>
    <tr>
      <td><strong>Argo CD</strong></td>
      <td>GitOps controller (alternative to Flux)</td>
      <td>Web UI, application health visualisation</td>
    </tr>
  </tbody>
</table>

<p>A mature production setup typically uses all three: Kustomize or Helm for defining manifests, Flux or Argo CD for reconciling them, and a CI pipeline that produces the artifacts both consume.</p>

<p><strong>Next:</strong> <a href="/tech/kubernetes/devops/docrouter/deploying-doc-router-on-kubernetes/">Deploying Doc Router on Kubernetes</a> walks through a real application deployment (Helm chart, workers, CI/CD, EKS and Digital Ocean). If you need in-cluster MongoDB with vector search, see <a href="/tech/kubernetes/devops/mongodb/self-hosted-mongodb-kubernetes-atlas-search/">Self-Hosted MongoDB on Kubernetes with Atlas Search</a>.</p>

<hr />

<p><em>Andrei Radulescu-Banu is the founder of <a href="https://docrouter.ai">DocRouter.AI</a> (document processing with LLMs) and <a href="https://sigagent.ai">SigAgent.AI</a> (Claude Agent monitoring). His company <a href="https://analytiqhub.com">AnalytiqHub.com</a> provides consulting services for cloud and AI engineering.</em></p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="tech" /><category term="kubernetes" /><category term="devops" /><summary type="html"><![CDATA[The second part of the Kubernetes primer series: Kustomize, Helm, and GitOps with Flux — packaging manifests and letting the cluster manage itself.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/kubernetes-packaging-helm-gitops-splash.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/kubernetes-packaging-helm-gitops-splash.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Kubernetes for Docker Users: A Practical Primer</title><link href="https://analytiqhub.com/tech/kubernetes/devops/kubernetes-for-docker-users-primer/" rel="alternate" type="text/html" title="Kubernetes for Docker Users: A Practical Primer" /><published>2026-03-05T00:00:00+00:00</published><updated>2026-03-05T00:00:00+00:00</updated><id>https://analytiqhub.com/tech/kubernetes/devops/kubernetes-for-docker-users-primer</id><content type="html" xml:base="https://analytiqhub.com/tech/kubernetes/devops/kubernetes-for-docker-users-primer/"><![CDATA[<p>If you’ve used Docker Compose, you already understand the core idea: define your services, wire them together with a network, and let the runtime manage the processes. Kubernetes takes that same idea and extends it to run across a cluster of machines, with built-in handling for failures, scaling, and upgrades.</p>

<p>Here’s how the key concepts map across.</p>

<h2 id="from-containers-to-pods">From containers to Pods</h2>

<p>In Docker Compose, the unit of work is a container. In Kubernetes, it is a <strong>Pod</strong> — a group of one or more containers that always run together on the same machine and share a network namespace. Most Pods contain a single container, but some use sidecars: a main process plus a helper (a log shipper, a proxy, or in our case the <code class="language-plaintext highlighter-rouge">mongot</code> search process alongside <code class="language-plaintext highlighter-rouge">mongod</code>).</p>

<p>Pods are ephemeral. When a Pod dies, Kubernetes replaces it with a new one — possibly on a different machine, with a new IP address. You never SSH into a Pod or rely on its IP being stable.</p>

<h2 id="deployments--the-equivalent-of-a-compose-service">Deployments — the equivalent of a Compose service</h2>

<p>A <strong>Deployment</strong> tells Kubernetes: “keep N replicas of this Pod running at all times.” If a Pod crashes, the Deployment controller starts a replacement. If you push a new image, it performs a rolling update — starting new Pods before terminating old ones so traffic is never interrupted.</p>

<p>In Docker Compose terms, a Deployment is your <code class="language-plaintext highlighter-rouge">service:</code> block plus restart policies and rolling update logic built in.</p>

<h2 id="services--stable-internal-addresses">Services — stable internal addresses</h2>

<p>Because Pod IPs change on every restart, Kubernetes introduces <strong>Services</strong>: stable DNS names and virtual IPs that front a group of Pods. A Service named <code class="language-plaintext highlighter-rouge">backend</code> in the <code class="language-plaintext highlighter-rouge">doc-router</code> namespace is reachable at <code class="language-plaintext highlighter-rouge">backend.doc-router.svc.cluster.local</code> from anywhere in the cluster, regardless of how many backend Pods exist or where they are running.</p>

<p>This replaces the automatic DNS that Docker Compose sets up between containers on the same network.</p>

<h2 id="namespaces--isolation-within-a-cluster">Namespaces — isolation within a cluster</h2>

<p>A <strong>Namespace</strong> is a logical partition of the cluster. Resources in different namespaces don’t collide even if they share a name. A typical setup uses separate namespaces for each concern: <code class="language-plaintext highlighter-rouge">doc-router</code> for the application, <code class="language-plaintext highlighter-rouge">mongodb</code> for the database, <code class="language-plaintext highlighter-rouge">ingress-nginx</code> for the load balancer, <code class="language-plaintext highlighter-rouge">cert-manager</code> for TLS certificates.</p>

<p>In Docker Compose terms, a namespace is roughly equivalent to a separate Compose project — distinct networks and name scopes.</p>

<h2 id="configmaps-and-secrets--environment-variables-at-scale">ConfigMaps and Secrets — environment variables at scale</h2>

<p>Docker Compose lets you set <code class="language-plaintext highlighter-rouge">environment:</code> variables inline or via an <code class="language-plaintext highlighter-rouge">.env</code> file. Kubernetes separates non-sensitive config from sensitive config:</p>

<ul>
  <li><strong>ConfigMap</strong> — key-value pairs mounted as environment variables or files. Used for things like <code class="language-plaintext highlighter-rouge">FASTAPI_ROOT_PATH</code>, worker count, S3 bucket name.</li>
  <li><strong>Secret</strong> — base64-encoded values stored (optionally encrypted at rest) separately from your app manifests. Used for database URIs, API keys, and auth secrets. Pods reference Secrets by name; the values are injected at runtime, never baked into the image.</li>
</ul>

<h2 id="persistentvolumeclaims--durable-storage">PersistentVolumeClaims — durable storage</h2>

<p>Docker Compose uses named volumes (backed by the local filesystem) to persist data across container restarts. Kubernetes uses <strong>PersistentVolumeClaims (PVCs)</strong>: a request for a piece of storage of a given size and access mode. The cluster fulfils the claim by provisioning a real volume — an EBS disk on AWS, a DO Block Storage volume on Digital Ocean — and mounting it into the Pod.</p>

<p>PVCs survive Pod restarts and rescheduling. If a database Pod moves to a different node, the volume is detached and reattached automatically. Storage is provisioned dynamically by a <strong>StorageClass</strong>, which specifies the provisioner (e.g. <code class="language-plaintext highlighter-rouge">ebs.csi.aws.com</code> on EKS) and volume type.</p>

<h2 id="ingress-and-the-load-balancer">Ingress and the load balancer</h2>

<p>In Docker Compose you typically expose one port from one container. In Kubernetes, multiple Services need to be reachable from the outside under different paths or hostnames, all through a single external IP.</p>

<p><strong>ingress-nginx</strong> is a Kubernetes controller that runs an nginx reverse proxy inside the cluster. When deployed on EKS, it automatically provisions an AWS Network Load Balancer with a stable public IP. You define <strong>Ingress</strong> rules — “route <code class="language-plaintext highlighter-rouge">/fastapi</code> to the backend Service, everything else to the frontend Service” — and ingress-nginx handles the routing. On a new cluster, the load balancer is the only resource with a public IP; everything else is internal.</p>

<h2 id="cert-manager--automatic-tls">cert-manager — automatic TLS</h2>

<p>cert-manager is a Kubernetes controller that watches Ingress resources and automatically requests TLS certificates from Let’s Encrypt. When you annotate an Ingress with <code class="language-plaintext highlighter-rouge">cert-manager.io/cluster-issuer: letsencrypt-prod</code>, cert-manager handles the ACME challenge, obtains the certificate, stores it in a Secret, and renews it before it expires. You never touch a certificate manually.</p>

<h2 id="helm--packaging-it-all-together">Helm — packaging it all together</h2>

<p>Kubernetes resources are defined as YAML files. A real application needs dozens of them: Deployments, Services, ConfigMaps, Secrets, Ingress rules, PodDisruptionBudgets. <strong>Helm</strong> is the package manager for Kubernetes — it bundles all those YAML files into a <strong>chart</strong>, parameterises them with a <code class="language-plaintext highlighter-rouge">values.yaml</code> file, and installs or upgrades the whole bundle with a single command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm upgrade <span class="nt">--install</span> doc-router oci://ghcr.io/analytiq-hub/doc-router <span class="se">\</span>
  <span class="nt">--namespace</span> doc-router <span class="nt">--set</span> <span class="nv">appHost</span><span class="o">=</span>example.com ...
</code></pre></div></div>

<p>A chart can be published as an OCI artifact to any container registry alongside the Docker images.</p>

<p>If Docker Compose is a <code class="language-plaintext highlighter-rouge">docker run</code> wrapper, Helm is closer to an apt package: versioned, reproducible, and upgradeable.</p>

<h3 id="how-helm-applies-changes">How Helm applies changes</h3>

<p>Every time you run <code class="language-plaintext highlighter-rouge">helm upgrade</code>, Helm compares the new rendered YAML against what it last applied and sends only the diff to the Kubernetes API — resources that haven’t changed are left untouched. Helm records each upgrade as a numbered <strong>revision</strong>, stored as a Secret in the cluster:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>helm <span class="nb">history </span>doc-router <span class="nt">-n</span> doc-router
REVISION  STATUS     CHART           APP VERSION  DESCRIPTION
1         superseded doc-router-0.3.5  v27.0.0    Install <span class="nb">complete
</span>2         superseded doc-router-0.3.6  v27.0.1    Upgrade <span class="nb">complete
</span>3         deployed   doc-router-0.3.7  v27.0.2    Upgrade <span class="nb">complete</span>
</code></pre></div></div>

<h3 id="rolling-back-to-a-known-good-state">Rolling back to a known-good state</h3>

<p>If an upgrade goes wrong, rolling back to the previous revision is a single command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm rollback doc-router <span class="nt">-n</span> doc-router        <span class="c"># rolls back to revision 2</span>
helm rollback doc-router 1 <span class="nt">-n</span> doc-router      <span class="c"># rolls back to a specific revision</span>
</code></pre></div></div>

<p>Helm re-applies the exact YAML from that revision — the same image tags, the same config values — so the cluster returns to the state that last worked. Using <code class="language-plaintext highlighter-rouge">--atomic</code> during an upgrade makes this automatic: if the new Pods don’t become healthy within the timeout, Helm rolls back on its own without any manual intervention.</p>

<h3 id="zero-downtime-rolling-updates">Zero-downtime rolling updates</h3>

<p>When Helm upgrades a Deployment with a new image, Kubernetes does not restart all Pods at once. It uses a <strong>rolling update</strong> strategy controlled by two parameters:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">strategy</span><span class="pi">:</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">RollingUpdate</span>
  <span class="na">rollingUpdate</span><span class="pi">:</span>
    <span class="na">maxUnavailable</span><span class="pi">:</span> <span class="m">0</span>   <span class="c1"># never take a pod down before a new one is ready</span>
    <span class="na">maxSurge</span><span class="pi">:</span> <span class="m">1</span>         <span class="c1"># allow one extra pod above the desired count during the rollout</span>
</code></pre></div></div>

<p>With <code class="language-plaintext highlighter-rouge">maxUnavailable: 0</code>, Kubernetes starts a new Pod with the new image first. Only after that Pod passes its readiness probe — meaning it is actually serving traffic — does Kubernetes terminate one of the old Pods. This continues one Pod at a time until all replicas are on the new version. At no point does the number of healthy Pods drop below the desired count.</p>

<p>The result: an upgrade from <code class="language-plaintext highlighter-rouge">v27.0.1</code> to <code class="language-plaintext highlighter-rouge">v27.0.2</code> with two replicas proceeds as:</p>

<ol>
  <li>Start new Pod (v27.0.2) — 2 old + 1 new running</li>
  <li>New Pod passes readiness check</li>
  <li>Terminate one old Pod — 1 old + 1 new running</li>
  <li>Start second new Pod — 1 old + 2 new running</li>
  <li>Second new Pod passes readiness — terminate last old Pod</li>
  <li>Rollout complete — 2 new Pods running, zero downtime</li>
</ol>

<p>If the new Pod fails its readiness check at step 2, the rollout pauses. No old Pods have been terminated, so the old version continues serving 100% of traffic. With <code class="language-plaintext highlighter-rouge">--atomic</code>, Helm then rolls the release back automatically.</p>

<h2 id="running-kubernetes-locally-with-kind">Running Kubernetes locally with Kind</h2>

<p>Before deploying to a real cluster, it’s useful to test locally using <strong>Kind</strong> (Kubernetes in Docker). Kind runs an entire Kubernetes cluster — control plane and worker nodes — as Docker containers on your laptop. There is no cloud provider, no load balancer, and no cloud volumes; Kind uses your local filesystem for storage and <code class="language-plaintext highlighter-rouge">NodePort</code> services for external access.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./deploy/scripts/setup-kind.sh   <span class="c"># creates the Kind cluster</span>
./deploy/scripts/deploy-kind.sh  <span class="c"># installs the Helm chart locally</span>
</code></pre></div></div>

<p>The same chart that runs on EKS runs on Kind, with a different <code class="language-plaintext highlighter-rouge">values-kind.yaml</code> override file. This lets you iterate on chart changes without incurring cloud costs or waiting for node provisioning.</p>

<h2 id="summary">Summary</h2>

<table>
  <thead>
    <tr>
      <th>Docker Compose concept</th>
      <th>Kubernetes equivalent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Container</td>
      <td>Pod (usually 1 container, sometimes with sidecars)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">service:</code> block</td>
      <td>Deployment + Service</td>
    </tr>
    <tr>
      <td>Container DNS (service name)</td>
      <td>Service DNS (<code class="language-plaintext highlighter-rouge">name.namespace.svc.cluster.local</code>)</td>
    </tr>
    <tr>
      <td>Compose project</td>
      <td>Namespace</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">environment:</code> / <code class="language-plaintext highlighter-rouge">.env</code></td>
      <td>ConfigMap (non-secret) + Secret (sensitive)</td>
    </tr>
    <tr>
      <td>Named volume</td>
      <td>PersistentVolumeClaim + StorageClass</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ports:</code> expose</td>
      <td>Ingress + LoadBalancer Service</td>
    </tr>
    <tr>
      <td>Manual TLS</td>
      <td>cert-manager (automatic Let’s Encrypt)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">docker-compose.yml</code></td>
      <td>Helm chart (<code class="language-plaintext highlighter-rouge">values.yaml</code> + templates)</td>
    </tr>
    <tr>
      <td>Local Docker</td>
      <td>Kind (Kubernetes in Docker)</td>
    </tr>
  </tbody>
</table>

<p><strong>Next:</strong> <a href="/tech/kubernetes/devops/kubernetes-packaging-helm-gitops/">Kubernetes Packaging and Deployment: Kustomize, Helm, and GitOps</a> goes deeper into packaging manifests and GitOps with Flux.</p>

<hr />

<p><em>Andrei Radulescu-Banu is the founder of <a href="https://docrouter.ai">DocRouter.AI</a> (document processing with LLMs) and <a href="https://sigagent.ai">SigAgent.AI</a> (Claude Agent monitoring). His company <a href="https://analytiqhub.com">AnalytiqHub.com</a> provides consulting services for cloud and AI engineering.</em></p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="tech" /><category term="kubernetes" /><category term="devops" /><summary type="html"><![CDATA[If you've used Docker Compose, you already understand the core idea. Kubernetes takes that same idea and extends it to run across a cluster of machines, with built-in handling for failures, scaling, and upgrades.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/kubernetes-docker-users-primer-splash.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/kubernetes-docker-users-primer-splash.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Why and How We Created the Document Agent</title><link href="https://analytiqhub.com/ai/programming/engineering/product/how-we-built-the-document-agent/" rel="alternate" type="text/html" title="Why and How We Created the Document Agent" /><published>2026-02-22T00:00:00+00:00</published><updated>2026-02-22T00:00:00+00:00</updated><id>https://analytiqhub.com/ai/programming/engineering/product/how-we-built-the-document-agent</id><content type="html" xml:base="https://analytiqhub.com/ai/programming/engineering/product/how-we-built-the-document-agent/"><![CDATA[<p>The <strong>Document Agent</strong> is the chat on the document page in DocRouter: you talk to an AI in the context of a single document to create or edit <a href="/docs/schemas/">schemas</a>, <a href="/docs/prompts/">prompts</a>, and <a href="/docs/tags/">tags</a>, run extraction, and tweak results. This post explains why we built it and how we created it—the architecture and the decisions that shaped it.</p>

<p><img src="/assets/images/document_agent.png" alt="Document Agent" /></p>

<hr />

<h2 id="why-we-built-it">Why we built it</h2>

<p>We built the Document Agent to <strong>cut configuration time for parsing a yet-unseen document by about 90%</strong>.</p>

<div class="rounded-xl border-2 border-blue-200 bg-gradient-to-br from-blue-50 to-indigo-50/90 p-5 md:p-6 my-6 shadow-md ring-1 ring-blue-100/50">
  <div class="grid grid-cols-1 md:grid-cols-2 gap-4 md:gap-6">
    <p class="text-gray-800"><strong class="text-blue-900">Before:</strong> Schemas, prompts, and tags by configured by hand.</p>
    <p class="text-gray-800"><strong class="text-blue-900">Now:</strong> <strong>Minutes:</strong> Plain language: AI proposes → you approve → extraction runs.</p>
    </div>
</div>

<hr />

<h2 id="what-the-agent-does">What the agent does</h2>

<p>The agent is a <strong>tool-calling LLM</strong> scoped to one document. It sees the document’s metadata, an OCR text excerpt, optional @-mentions (schemas, prompts, tags you’ve referenced), and the current extraction. It has <strong>25 tools</strong>: schema CRUD and validation, prompt CRUD, tag CRUD, document list/update/delete, get OCR text, run extraction, patch extraction fields, and two help tools (<code class="language-plaintext highlighter-rouge">help_schemas</code>, <code class="language-plaintext highlighter-rouge">help_prompts</code>).</p>
<ul>
  <li><strong>Read-only</strong> tools run automatically; <strong>read-write</strong> tools (create schema, run extraction, update document, etc.) can pause and ask the user to approve or reject each call.</li>
  <li>Conversations are stored in <strong>threads</strong> per document so you can resume or start a new one.</li>
</ul>

<hr />

<h2 id="architecture-overview">Architecture overview</h2>

<p>Three layers matter:</p>

<ul>
  <li>The <strong>agent loop</strong> — how we call the LLM and handle tool calls.</li>
  <li>The <strong>context</strong> we give the LLM (system message).</li>
  <li>The <strong>state</strong> we keep between requests (memory vs MongoDB).</li>
</ul>

<h3 id="agent-loop">Agent loop</h3>

<p>The core is a loop:</p>

<ol>
  <li><strong>Call LLM</strong> with system message + conversation + tool definitions → get text and/or tool calls.</li>
  <li><strong>If any tool call is read-write and not auto-approved:</strong> stop, persist turn state, return <code class="language-plaintext highlighter-rouge">turn_id</code> and pending calls to the client.</li>
  <li><strong>Client shows approve/reject UI</strong> and calls <strong>POST /chat/approve</strong> with approvals.</li>
  <li><strong>Backend executes approved tools</strong>, appends results, calls LLM <strong>once</strong>.</li>
  <li><strong>If LLM returns more tool calls</strong>, return them to the client (repeat from step 2).</li>
</ol>

<p>We cap tool rounds at 10 so a turn can’t run forever. The pause happens in the client, not in a long-running server loop.</p>

<p>Here’s the algorithm in plain form:</p>

<div data-excalidraw="/assets/excalidraw/document-agent-loop.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/document-agent-loop.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>
<p style="text-align: center; margin-top: 0.5rem; font-size: 0.875rem; color: #6b7280;"><strong>Figure 1:</strong> Agent loop.</p>

<p><strong>Important:</strong> The LLM is <strong>not</strong> called once per request. Inside the loop, it can be called up to 10 times in a row when the model keeps returning auto-approved tool calls (e.g. read schema → read prompt → run extraction). Only when a <strong>write</strong> tool needs approval do we pause and return a <code class="language-plaintext highlighter-rouge">turn_id</code>; after the client approves, we call the LLM again (and may loop or pause again).</p>

<h3 id="context-system-message">Context (system message)</h3>

<p>Every turn gets a <strong>system message</strong> built from:</p>

<ul>
  <li><strong>Document ID and file name</strong></li>
  <li><strong>OCR excerpt</strong> of the document (truncated to ~8k characters so we don’t blow the context window)</li>
  <li><strong>Resolved @-mentions</strong> — if the user referenced a schema, prompt, or tag, we resolve it server-side and inject the full content (e.g. full JSON schema) so the LLM doesn’t have to call <code class="language-plaintext highlighter-rouge">get_schema</code> just to see what they meant</li>
  <li><strong>Working state</strong> — the last <code class="language-plaintext highlighter-rouge">schema_revid</code>, <code class="language-plaintext highlighter-rouge">prompt_revid</code>, and <strong>extraction</strong> result from this conversation, so the agent can say “run extraction with that prompt” without the user re-specifying IDs</li>
  <li><strong>Instructions</strong> — use <code class="language-plaintext highlighter-rouge">help_schemas</code> / <code class="language-plaintext highlighter-rouge">help_prompts</code> when creating or modifying those artifacts, and <strong>always</strong> call <code class="language-plaintext highlighter-rouge">validate_schema</code> before <code class="language-plaintext highlighter-rouge">create_schema</code> or <code class="language-plaintext highlighter-rouge">update_schema</code> so we never persist invalid schemas</li>
</ul>

<div data-excalidraw="/assets/excalidraw/document-agent-context.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/document-agent-context.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>
<p style="text-align: center; margin-top: 0.5rem; font-size: 0.875rem; color: #6b7280;"><strong>Figure 2:</strong> How the system message is built.</p>

<h3 id="what-we-store-in-memory">What we store in memory</h3>

<p><strong>Browser memory:</strong> Tool approval lives in the client. When the backend returns a <code class="language-plaintext highlighter-rouge">turn_id</code> and pending tool calls, the UI shows approve/reject cards and holds the user’s choices in browser memory until they submit <strong>POST /chat/approve</strong>. Nothing about which tools the user approved is persisted on the server until that request is sent.</p>

<p><strong>Session memory (server):</strong> When we pause for approval, we keep <strong>turn state</strong> in server memory—message list, pending tool calls, working state, model—keyed by <code class="language-plaintext highlighter-rouge">turn_id</code>, with a TTL of 5 minutes. The approve endpoint loads by <code class="language-plaintext highlighter-rouge">turn_id</code>, runs tools, then discards that state. We don’t persist it: if the user closes the tab or the server restarts, the turn expires and they can resend. So “in-flight approval” is deliberately ephemeral.</p>

<h3 id="what-we-store-in-mongodb">What we store in MongoDB</h3>

<p>We persist two things that matter for the agent:</p>

<ol>
  <li>
    <p><strong>Threads</strong> (<code class="language-plaintext highlighter-rouge">agent_threads</code>). Each document is a conversation thread: <code class="language-plaintext highlighter-rouge">organization_id</code>, <code class="language-plaintext highlighter-rouge">document_id</code>, <code class="language-plaintext highlighter-rouge">created_by</code> (user), plus <code class="language-plaintext highlighter-rouge">title</code>, <code class="language-plaintext highlighter-rouge">messages</code>, <code class="language-plaintext highlighter-rouge">extraction</code>, optional <code class="language-plaintext highlighter-rouge">model</code>, and timestamps. When a turn finishes (no more pending tool calls), we append the user and assistant messages and the latest extraction to the thread. Threads are what you list, load, and resume in the UI.</p>
  </li>
  <li>
    <p><strong>LLM provider config</strong> (<code class="language-plaintext highlighter-rouge">llm_providers</code>). We store <strong>which models are enabled</strong> and, for the document agent, <strong>which models appear in the chat dropdown</strong>. Per provider (Anthropic, OpenAI, Gemini, etc.):</p>
    <ul>
      <li><code class="language-plaintext highlighter-rouge">litellm_models_available</code> — discovered from the provider</li>
      <li><code class="language-plaintext highlighter-rouge">litellm_models_enabled</code> — which of those the org has turned on for general use</li>
      <li><code class="language-plaintext highlighter-rouge">litellm_models_chat_agent</code> — the subset allowed in the document agent UI<br />
Admins can enable many models for extraction but expose only a few in the agent. API keys (tokens) and enabled/disabled per provider live here too.</li>
    </ul>
  </li>
</ol>

<p><strong>Tool definitions</strong> (which tools exist and whether they are read-only vs read-write) are <strong>not</strong> stored in MongoDB. They’re defined in code (the tool registry) and exposed via <strong>GET /chat/tools</strong> so the UI can show “these actions need approval.” That keeps the security model simple and consistent across environments.</p>

<p><strong>Why MongoDB here?</strong> The database is document-oriented and schema-flexible by default, but we don’t treat it as a free-for-all.</p>

<ul>
  <li>We run <strong>versioned migrations</strong> (same idea as SQL): a <code class="language-plaintext highlighter-rouge">migrations</code> collection tracks schema version; each migration can add indexes, rename or reshape collections, and backfill data.</li>
  <li>Result: a <strong>strict, explicit schema</strong> we evolve in a controlled way—portability and the same regularity you’d expect from Postgres—while keeping MongoDB’s strengths: horizontal scaling (sharding, replica sets), flexible documents where we need them, one deployment story for structured and semi-structured data. In practice, agent threads and LLM provider config are as regular as relational tables; we just don’t pay the cost of rigid columns until we need to scale out.</li>
</ul>

<style>
.excalidraw-container {
  width: 100%;
  border: 2px solid #e0e0e0;
  border-radius: 8px;
  box-shadow: 0 2px 8px rgba(0,0,0,0.1);
  background: white;
  display: block;
  margin: 2rem 0;
  min-height: 400px;
}

.excalidraw-container svg {
  width: 100%;
  height: auto;
  display: block;
  margin: 0;
}

.loading-placeholder {
  padding: 2rem;
  text-align: center;
  color: #666;
}
</style>

<script type="module" src="/assets/js/excalidraw/render-excalidraw.js"></script>

<hr />

<h2 id="implementation-stages">Implementation stages</h2>

<p>We built the Document Agent in three stages, each one shippable on its own and visible in the product.</p>

<div data-excalidraw="/assets/excalidraw/document-agent-implementation-stages.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/document-agent-implementation-stages.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>
<p style="text-align: center; margin-top: 0.5rem; font-size: 0.875rem; color: #6b7280;"><strong>Figure 3:</strong> Left-to-right implementation stages.</p>

<p>From <strong>left to right</strong>:</p>

<ul>
  <li><strong>Stage 1 — UI + FastAPI</strong>: we started with the core product surface—document list, schemas, tags, prompts—and added FastAPI endpoints for every UI action. Anything you can point-and-click in the app (create/edit schemas, prompts, tags; run extraction; manage documents) can also be exercised through REST APIs.</li>
  <li><strong>Stage 2 — MCP server</strong>: we wrapped all document, schema, tag, and prompt APIs into a TypeScript MCP server. At this point, we could use external agents (e.g. Claude Code) to operate DocRouter via MCP, turning our REST surface into a tool catalog without changing the backend.</li>
  <li><strong>Stage 3 — Document Agent UI + loop + caching</strong>: we then built the Document Agent UI, added dedicated Copilot FastAPI endpoints and the agent loop described above, and finally layered in LLM caching—provider-level prompt caching for system messages and MongoDB-based embedding caching—to keep the experience both fast and cost-efficient.</li>
</ul>

<hr />

<h2 id="key-decisions">Key decisions</h2>

<p><strong>Read-only vs read-write tools</strong><br />
We split tools into two sets:</p>

<ul>
  <li><strong>Read-only</strong> (e.g. <code class="language-plaintext highlighter-rouge">get_ocr_text</code>, <code class="language-plaintext highlighter-rouge">list_schemas</code>, <code class="language-plaintext highlighter-rouge">validate_schema</code>, <code class="language-plaintext highlighter-rouge">help_schemas</code>) never require approval—they’re safe to run as soon as the LLM asks.</li>
  <li><strong>Read-write</strong> (e.g. <code class="language-plaintext highlighter-rouge">create_schema</code>, <code class="language-plaintext highlighter-rouge">run_extraction</code>, <code class="language-plaintext highlighter-rouge">update_document</code>) require approval by default. The client can send <code class="language-plaintext highlighter-rouge">auto_approve: true</code> (run everything) or <code class="language-plaintext highlighter-rouge">auto_approved_tools: ["run_extraction"]</code> (only those run without pausing).</li>
</ul>

<p>That way power users can say “just run extraction when I ask” while still being prompted for “create a new schema.” The backend exposes <strong>GET /chat/tools</strong> returning the two lists so the UI can explain which actions will pause.</p>

<p><strong>One LLM round per approve</strong><br />
When the user approves tool calls, we execute them and call the LLM <strong>once</strong> with the new tool results. If the LLM returns more tool calls, we return those to the client again—we don’t keep looping on the server. The reason is <strong>control</strong>: the user sees each batch of proposed actions and can approve or reject. If we looped server-side until “no more tool calls,” a single request could do many creates/updates before the user saw anything. So the “loop” is really a handshake: chat → (optional approve) → chat → (optional approve) → …</p>

<p><strong>Sanitizing messages when loading from a thread</strong><br />
When the user resumes a thread, we send the saved messages back to the LLM. But the API requires that every assistant message with <code class="language-plaintext highlighter-rouge">tool_calls</code> is <strong>immediately</strong> followed by <code class="language-plaintext highlighter-rouge">tool</code> messages (one per call). If the user had left mid-approval or we stored a partial state, we might have an assistant message with tool_calls and no following tool results. So before building the LLM request we <strong>sanitize</strong>: we walk the message list and, for any assistant message with tool_calls, check that the next messages are tool results for those call IDs. If not, we strip the tool_calls from that assistant message and send it as content-only. That keeps the API happy and avoids confusing the model with an invalid history.</p>

<p><strong>Working state in the loop</strong><br />
<code class="language-plaintext highlighter-rouge">working_state</code> (schema_revid, prompt_revid, extraction) is updated by the tool implementations as they run (e.g. <code class="language-plaintext highlighter-rouge">run_extraction</code> sets <code class="language-plaintext highlighter-rouge">working_state["extraction"]</code> and <code class="language-plaintext highlighter-rouge">working_state["prompt_revid"]</code>). The system message is built once per turn with the <strong>current</strong> working state. So when the agent says “I’ll use the prompt we just created,” the next LLM call already has that prompt_revid in the system message and the agent can call <code class="language-plaintext highlighter-rouge">run_extraction</code> without a prompt_revid argument (we use working state as default). Same for “update the total to 1,250”—the agent sees the current extraction JSON and can call <code class="language-plaintext highlighter-rouge">update_extraction_field</code> with the right path.</p>

<p><strong>Streaming and approval</strong><br />
We support <strong>streaming</strong> (SSE) for the main chat endpoint: the client gets events for thinking chunks, text chunks, tool calls, tool results, and a final <code class="language-plaintext highlighter-rouge">done</code> payload. The <strong>approve</strong> endpoint is non-streaming: one request, one response. That keeps the approve flow simple (no need to stream a single round) and keeps the “pause for approval” contract clear: you get a full set of tool calls, you approve, you get one full response. Streaming is for the interactive chat experience; approve is for the control boundary.</p>

<p><strong>Thinking blocks and API compatibility</strong><br />
Some models (e.g. Claude with extended thinking) return <strong>thinking_blocks</strong> in the response. When we continue the conversation (e.g. after tool execution), we must send those blocks back to the API in the right format—Anthropic requires a non-empty <code class="language-plaintext highlighter-rouge">signature</code> on each block. Our streaming path sometimes produces blocks without signatures, so we have a pass that only includes blocks that have a signature when we rebuild the message for the next call. We also avoid sending the <code class="language-plaintext highlighter-rouge">thinking</code> parameter when the last assistant message had tool_calls but no thinking_blocks, so we don’t trigger API warnings or rejections.</p>

<p><strong>LLM caching</strong><br />
We use <strong>prompt caching at the provider level</strong> to make repeated calls cheaper and faster. For chat models that support prompt caching (via LiteLLM’s <code class="language-plaintext highlighter-rouge">supports_prompt_caching</code>), we convert the system message into content blocks with <code class="language-plaintext highlighter-rouge">cache_control: {"type": "ephemeral"}</code> so providers like Anthropic and OpenAI can reuse the long, stable system prompt across turns and tool rounds. We intentionally <strong>skip prompt caching for Gemini/Vertex</strong>—their cached-content APIs reject prompts under certain token thresholds, and our system prompts are often smaller than those limits—so for those providers we fall back to regular calls with no cache directive.</p>

<p><strong>SPU and cost</strong><br />
Each LLM call in the agent (and each call inside <code class="language-plaintext highlighter-rouge">run_extraction</code>) checks <strong>SPU</strong> (our credit system) independently. We don’t reserve or estimate “total cost for this turn” upfront—we charge as we go. If the org runs out of credits mid-turn, that LLM call fails and we surface the error; the turn ends. That matches how the rest of the app works and avoids over-engineering reservation logic.</p>

<hr />

<h2 id="frontend-and-api">Frontend and API</h2>

<p>The frontend includes:</p>

<ul>
  <li><strong>Chat panel</strong> — message list, input, model/tools settings</li>
  <li><strong>Tool-call cards</strong> — approve/reject per call, with expandable arguments</li>
  <li><strong>Thinking block</strong> — collapsible, with optional live timer</li>
  <li><strong>Thread dropdown</strong> — list threads, create, load, delete</li>
</ul>

<p>When the backend returns a <code class="language-plaintext highlighter-rouge">turn_id</code> and pending tool_calls, the UI shows the cards and disables send until the user approves or rejects. The same agent is exposed over REST: custom UIs or automation can call <strong>POST /chat</strong> (and <strong>POST /chat/approve</strong> when needed), with optional streaming and thread_id for persistence.</p>

<hr />

<h2 id="summary">Summary</h2>

<p>The Document Agent is a tool-calling LLM with:</p>

<ul>
  <li>A <strong>bounded loop</strong> — LLM → optional user approval → tools → LLM again</li>
  <li><strong>Rich context</strong> — document, OCR, @-mentions, working state</li>
  <li><strong>What we store in memory</strong> (browser: tool approval; server: turn state) vs <strong>what we store in MongoDB</strong> (threads, LLM config)</li>
</ul>

<p>Splitting read-only and read-write tools keeps approval predictable; one round per approve keeps the user in control; message sanitization keeps thread reloads valid; and working state keeps “what we just created” visible to the agent without extra round-trips. If you’re building something similar—an in-context agent that can read and write—this architecture is a solid starting point.</p>

<p>To use the Document Agent, open any document in <a href="https://app.docrouter.ai">DocRouter</a> and open the Chat / Agent tab. For API details, see <a href="/docs/document-agent/">Document Agent</a> in the docs.</p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="ai" /><category term="programming" /><category term="engineering" /><category term="product" /><summary type="html"><![CDATA[How DocRouter's Document Agent was built: a tool-calling AI with 25 tools, human-in-the-loop approval, and an agent loop that cuts document config time by 90%.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/document-agent-blog-splash.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/document-agent-blog-splash.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Why I Prefer MongoDB For AI Applications</title><link href="https://analytiqhub.com/tech/programming/ai/databases/why-i-prefer-mongodb-for-ai-applications/" rel="alternate" type="text/html" title="Why I Prefer MongoDB For AI Applications" /><published>2026-02-17T00:00:00+00:00</published><updated>2026-02-17T00:00:00+00:00</updated><id>https://analytiqhub.com/tech/programming/ai/databases/why-i-prefer-mongodb-for-ai-applications</id><content type="html" xml:base="https://analytiqhub.com/tech/programming/ai/databases/why-i-prefer-mongodb-for-ai-applications/"><![CDATA[<p><em>This post was co-published by MongoDB at <a href="https://www.mongodb.com/company/blog/technical/why-i-prefer-mongodb-for-ai-applications">mongodb.com</a>.</em></p>

<p>I use MongoDB as the primary database for AI-powered products like <a href="https://docrouter.ai">DocRouter.AI</a> and <a href="https://sigagent.ai">SigAgent.AI</a>. This post explains how it’s implemented—migrations, vector search, and knowledge bases—and why I prefer it over alternatives like Postgres for document-centric, JSON-heavy AI workloads. I want to store a very large number of documents (DocRouter) or logs (SigAgent) without spending much time tuning the database for horizontal scaling; MongoDB fits that need well.</p>

<h2 id="brief-trade-offs-vs-postgres">Brief trade-offs vs Postgres</h2>

<p>Postgres with <strong>jsonb</strong> can model the same document-style records and even integrate vector search via extensions, but it shines most when you need <strong>strong relational guarantees</strong> and <strong>complex joins</strong> around a relatively <strong>stable schema</strong>. MongoDB is a better fit when <strong>almost everything is a JSON document</strong>, the schema <strong>evolves quickly</strong>, and you care more about <strong>horizontal scaling</strong> and <strong>quick turnaround</strong> than about classic SQL features.</p>

<p>In my case, the workloads are heavily document- and log-centric, so the ergonomics and scaling model of MongoDB outweigh the benefits of staying inside the relational/Postgres ecosystem.</p>

<div class="mt-6 overflow-x-auto">
  <table class="min-w-full border border-gray-200 text-sm text-left">
    <thead class="bg-gray-50">
      <tr>
        <th class="px-4 py-2 font-semibold text-gray-700 border-b border-gray-200">Aspect</th>
        <th class="px-4 py-2 font-semibold text-gray-700 border-b border-gray-200">MongoDB Approach</th>
        <th class="px-4 py-2 font-semibold text-gray-700 border-b border-gray-200">Postgres Alternative</th>
      </tr>
    </thead>
    <tbody class="divide-y divide-gray-100">
      <tr class="odd:bg-white even:bg-gray-50">
        <td class="px-4 py-2 align-top">Schema flexibility</td>
        <td class="px-4 py-2 align-top">Evolving via migrations; no enforcement from the DB, but disciplined application code keeps documents consistent.</td>
        <td class="px-4 py-2 align-top">Rigid but enforceable with DDL; great for stable relations, heavier for fast-changing AI schemas.</td>
      </tr>
      <tr class="odd:bg-white even:bg-gray-50">
        <td class="px-4 py-2 align-top">Scaling</td>
        <td class="px-4 py-2 align-top">Horizontal scaling and sharding are built-in; easy to grow with large document/log volumes.</td>
        <td class="px-4 py-2 align-top">Requires extensions or external tooling (e.g. Citus) and more tuning when pushing JSONB-heavy workloads.</td>
      </tr>
      <tr class="odd:bg-white even:bg-gray-50">
        <td class="px-4 py-2 align-top">Consistency model</td>
        <td class="px-4 py-2 align-top">Tunable; we use majority writes and typically read from secondaries, so most reads are eventually consistent.</td>
        <td class="px-4 py-2 align-top">Strong ACID semantics with primary reads by default; great when you need strict guarantees across transactions.</td>
      </tr>
      <tr class="odd:bg-white even:bg-gray-50">
        <td class="px-4 py-2 align-top">Vector search</td>
        <td class="px-4 py-2 align-top">Native <code>$vectorSearch</code> in MongoDB 8.2+/Atlas.</td>
        <td class="px-4 py-2 align-top">Via <code>pgvector</code>, which has mature support.</td>
      </tr>
      <tr class="odd:bg-white even:bg-gray-50">
        <td class="px-4 py-2 align-top">Joins &amp; relations</td>
        <td class="px-4 py-2 align-top">Limited</td>
        <td class="px-4 py-2 align-top">Strong</td>
      </tr>
      <tr class="odd:bg-white even:bg-gray-50">
        <td class="px-4 py-2 align-top">Dev speed</td>
        <td class="px-4 py-2 align-top">Quick iterations on JSON schemas and migrations; fits rapid AI product experiments.</td>
        <td class="px-4 py-2 align-top">Slower to evolve schemas cleanly; better when you already know the long-term relational shape.</td>
      </tr>
    </tbody>
  </table>
</div>

<p>It’s also totally reasonable to split responsibilities: e.g. <strong>Postgres for relational metadata</strong>, a dedicated vector store like <strong>Pinecone/Weaviate/Qdrant</strong> for embeddings, or <strong>Redis</strong> as an embedding cache in front of another database. For <a href="https://docrouter.ai">DocRouter.AI</a> and <a href="https://sigagent.ai">SigAgent.AI</a>, I’m happy to trade some theoretical optimality for a <strong>single operational datastore</strong> (MongoDB handling documents, logs, and vectors) until scale or workload complexity justifies introducing extra systems.</p>

<h2 id="docrouter-and-sigagent-one-backend-two-products">DocRouter and SigAgent: One Backend, Two Products</h2>

<p><strong>DocRouter.AI</strong> is a smart document router: you upload documents, define schemas and prompts, and it extracts structured data (e.g. from invoices, medical records, forms) using LLMs. <strong>SigAgent</strong> is a Claude agent monitor with a different UX and product focus, but it’s built on the same stack.</p>

<p>Roughly <strong>90% of the backend is shared</strong>. Both use the same Python package, <a href="https://github.com/analytiq-hub/doc-router/tree/main/packages/python/analytiq_data">analytiq_data</a>: MongoDB client, migrations, queue layer, auth, and app startup. The same MongoDB database layout, indices, and migration history apply to both. Product-specific code lives in routes and frontends; the data layer is common.</p>

<div data-excalidraw="/assets/excalidraw/mongodb_one_backend_two_products.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/mongodb_one_backend_two_products.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>

<h2 id="strict-schema-via-discipline-migrations-as-schema">Strict Schema via Discipline: Migrations as “Schema”</h2>

<p>MongoDB doesn’t enforce a schema. I still want <strong>predictable structure and safe upgrades</strong>, so we enforce it in code and process:</p>

<ul>
  <li><strong>Consistent document shape per collection.</strong> We use a fixed set of field names and types in application code (and in TypeScript/Pydantic where applicable). In practice, each collection behaves like a table with a known “schema.”</li>
  <li><strong>Every change goes through migrations.</strong> Renaming fields, adding/removing fields, splitting or renaming collections—all of it is done in versioned migration classes with <code class="language-plaintext highlighter-rouge">up()</code> and <code class="language-plaintext highlighter-rouge">down()</code>. The current schema version is stored in a <code class="language-plaintext highlighter-rouge">migrations</code> collection; on startup we run <code class="language-plaintext highlighter-rouge">run_migrations(analytiq_client)</code> and bring the DB to the latest version.</li>
  <li><strong>Element types as schema.</strong> We treat document fields as if they were typed: e.g. <code class="language-plaintext highlighter-rouge">schema_id</code>, <code class="language-plaintext highlighter-rouge">prompt_version</code>, <code class="language-plaintext highlighter-rouge">organization_id</code> are always present where we expect them. New code assumes the post-migration shape. So we get <strong>comparable safety to Postgres</strong> (known structure, no surprise shapes) while staying in a document model—as long as we’re disciplined and never bypass migrations.</li>
</ul>

<p>So: schema is “enforced” by convention and migrations, not by the database. That keeps development fast without giving up control.</p>

<h2 id="how-migrations-and-indices-are-implemented">How Migrations and Indices Are Implemented</h2>

<p>At a high level:</p>

<ul>
  <li><strong>Migrations</strong> are versioned Python classes with <code class="language-plaintext highlighter-rouge">up()</code>/<code class="language-plaintext highlighter-rouge">down()</code> methods that evolve collections in lockstep with the code.</li>
  <li><strong>Indices</strong> are either created inside migrations (long-term, repeatable) or via a small helper at runtime for per-module needs.</li>
  <li><strong>Profiling tools</strong> (Atlas Query Insights, profiler, Compass/Performance Advisor) surface slow queries that suggest new compound indexes.</li>
</ul>

<h3 id="migrations">Migrations</h3>

<p>Migrations live in <a href="https://github.com/analytiq-hub/doc-router/blob/main/packages/python/analytiq_data/migrations/migration.py">migration.py</a>. Each migration is a class with:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">description</code>: short human-readable summary</li>
  <li><code class="language-plaintext highlighter-rouge">up(db)</code>: apply the change (e.g. rename field, backfill, add index)</li>
  <li><code class="language-plaintext highlighter-rouge">down(db)</code>: revert the change when possible</li>
</ul>

<p>The runner loads the list <code class="language-plaintext highlighter-rouge">MIGRATIONS</code>, assigns a version by index, and stores the current version in <code class="language-plaintext highlighter-rouge">db.migrations</code> under <code class="language-plaintext highlighter-rouge">_id: "schema_version"</code>. On startup we run pending migrations in order. Examples from the codebase:</p>

<ul>
  <li><strong>Field renames:</strong> e.g. <code class="language-plaintext highlighter-rouge">RenameUserFields</code> (camelCase → snake_case)</li>
  <li><strong>New fields:</strong> e.g. <code class="language-plaintext highlighter-rouge">LlmResultFieldsMigration</code> (adds <code class="language-plaintext highlighter-rouge">is_edited</code>, <code class="language-plaintext highlighter-rouge">is_verified</code>, timestamps)</li>
  <li><strong>Structural changes:</strong> e.g. <code class="language-plaintext highlighter-rouge">RenameCollections</code> (schemas → schema_revisions, schema_versions → schemas), <code class="language-plaintext highlighter-rouge">UseMongoObjectIDs</code></li>
</ul>

<p>Collection layout and index strategy evolve in one place, with a clear history and rollback path.</p>

<h3 id="indices">Indices</h3>

<p>We manage indices in two ways:</p>

<ol>
  <li><strong>Inside migrations</strong> for long-term indices. These are created with <code class="language-plaintext highlighter-rouge">create_index(..., background=True)</code> and dropped in <code class="language-plaintext highlighter-rouge">down()</code> so rollback is consistent.</li>
  <li><strong>At runtime</strong> via <code class="language-plaintext highlighter-rouge">ensure_index()</code> in <code class="language-plaintext highlighter-rouge">analytiq_data/mongodb/index.py</code>. Given a collection, index spec, and name, it creates the index if missing (and optionally drops other non-<code class="language-plaintext highlighter-rouge">_id</code> indexes). We use this for things like <code class="language-plaintext highlighter-rouge">payments_usage_records (org_id, timestamp)</code> when the payments module initializes.</li>
</ol>

<p>Example index patterns:</p>

<ul>
  <li><strong>Queue collections:</strong> <code class="language-plaintext highlighter-rouge">(status, created_at)</code> for <code class="language-plaintext highlighter-rouge">find_one_and_update({ status: "pending" }, sort: { created_at: 1 })</code>.</li>
  <li><strong>docs:</strong> <code class="language-plaintext highlighter-rouge">(organization_id, upload_date desc)</code> for paginated listing by org.</li>
  <li><strong>llm_runs:</strong> <code class="language-plaintext highlighter-rouge">(document_id, prompt_id, prompt_version desc)</code> for “latest run by document and prompt,” and <code class="language-plaintext highlighter-rouge">(document_id, prompt_revid)</code> for exact revision lookup.</li>
  <li><strong>document_index:</strong> unique <code class="language-plaintext highlighter-rouge">(kb_id, document_id)</code> and a non-unique <code class="language-plaintext highlighter-rouge">document_id</code> index for cascade deletes.</li>
</ul>

<p>To find candidates for new indices, we use <strong>Atlas Query Insights</strong> (filter by operations returning &gt;1,000 documents, then add a compound index matching the filter and sort) or, on <strong>Community Edition</strong>, enable the database profiler and filter <code class="language-plaintext highlighter-rouge">system.profile</code> by <code class="language-plaintext highlighter-rouge">nReturned</code>:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">db</span><span class="p">.</span><span class="nx">setProfilingLevel</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">{</span> <span class="na">slowms</span><span class="p">:</span> <span class="mi">100</span> <span class="p">})</span>
<span class="nx">db</span><span class="p">.</span><span class="nx">system</span><span class="p">.</span><span class="nx">profile</span><span class="p">.</span><span class="nx">find</span><span class="p">({</span> <span class="na">nReturned</span><span class="p">:</span> <span class="p">{</span> <span class="na">$gt</span><span class="p">:</span> <span class="mi">1000</span> <span class="p">}</span> <span class="p">}).</span><span class="nx">sort</span><span class="p">({</span> <span class="na">ts</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span> <span class="p">}).</span><span class="nx">limit</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span>
</code></pre></div></div>

<p>Level 2 profiling captures everything but has I/O cost, so we use it in short bursts or staging. Level 1 (slow only) is cheaper and still surfaces most heavy queries.</p>

<p>If you prefer GUIs over shell tools, both <strong>MongoDB Compass</strong> and the <strong>Atlas Performance Advisor</strong> surface similar slow-query/index recommendations in a more visual way, which is often easier when you’re just getting started tuning a workload.</p>

<p>With schema and indices in place, here’s how we run it.</p>

<h2 id="devprod-setup-and-vector-search">Dev/Prod Setup and Vector Search</h2>

<p>Quickly, the environments look like this:</p>

<ul>
  <li><strong>Local:</strong> MongoDB running on <code class="language-plaintext highlighter-rouge">localhost</code> or via the Atlas Local Docker image (bundling <code class="language-plaintext highlighter-rouge">mongod</code> + <code class="language-plaintext highlighter-rouge">mongot</code>) for easy vector search testing.</li>
  <li><strong>Production:</strong> MongoDB Atlas or AWS DocumentDB, configured for majority writes and read scaling.</li>
  <li><strong>Self-hosted:</strong> MongoDB 8.2+ with <code class="language-plaintext highlighter-rouge">mongot</code> in a replica set, using the same <code class="language-plaintext highlighter-rouge">createSearchIndexes</code> / <code class="language-plaintext highlighter-rouge">$vectorSearch</code> APIs.</li>
</ul>

<p>For <strong>local development</strong>, <code class="language-plaintext highlighter-rouge">MONGODB_URI</code> defaults to <code class="language-plaintext highlighter-rouge">mongodb://localhost:27017</code>. For vector search, we use the <strong>MongoDB Atlas Local</strong> Docker image (<code class="language-plaintext highlighter-rouge">mongodb/mongodb-atlas-local:latest</code>), which bundles <code class="language-plaintext highlighter-rouge">mongod</code> and <strong>mongot</strong> (the search/vector process) in one container.</p>

<p>For <strong>production</strong> we use <strong>MongoDB Atlas</strong>; for on-prem consulting we use <strong>AWS DocumentDB</strong>. The client is configured with <code class="language-plaintext highlighter-rouge">w='majority'</code>, <code class="language-plaintext highlighter-rouge">readPreference='secondaryPreferred'</code>, and <code class="language-plaintext highlighter-rouge">retryWrites=False</code> (DocumentDB doesn’t support retryWrites).</p>

<p>For <strong>self-hosted Community Edition</strong>, vector search requires <strong>MongoDB 8.2+</strong> with <strong>mongot</strong> running alongside <code class="language-plaintext highlighter-rouge">mongod</code> in a replica set. The <code class="language-plaintext highlighter-rouge">createSearchIndexes</code> command and <code class="language-plaintext highlighter-rouge">$vectorSearch</code> aggregation work the same way across Atlas, Community 8.2+, and the local Docker image—no separate code path.</p>

<h2 id="knowledge-bases-on-top-of-vector-search">Knowledge Bases on Top of Vector Search</h2>

<p>We implement <strong>knowledge bases</strong> in the shared backend using MongoDB’s vector search, with three key pieces: an indexing pipeline, a search pipeline, and a reconciliation service.</p>

<div data-excalidraw="/assets/excalidraw/mongodb_knowledge_base_flow.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/mongodb_knowledge_base_flow.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>

<h3 id="indexing-blue-green-atomic-swap">Indexing: Blue-Green Atomic Swap</h3>

<p>Each knowledge base gets its own vector collection (<code class="language-plaintext highlighter-rouge">kb_vectors_&lt;kb_id&gt;</code>). When a document is indexed, we chunk the text (via <a href="https://github.com/chonkie-ai/chonkie">Chonkie</a>, a small library for token/sentence/recursive text chunking), generate embeddings via <a href="https://github.com/BerriAI/litellm">LiteLLM</a>, a lightweight multi-provider LLM/embedding client, and then atomically swap old vectors for new ones inside a MongoDB transaction:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">with</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="n">start_session</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="p">.</span><span class="n">start_transaction</span><span class="p">():</span>
        <span class="c1"># Delete old vectors for this document
</span>        <span class="k">await</span> <span class="n">vectors_collection</span><span class="p">.</span><span class="n">delete_many</span><span class="p">(</span>
            <span class="p">{</span><span class="s">"document_id"</span><span class="p">:</span> <span class="n">document_id</span><span class="p">},</span> <span class="n">session</span><span class="o">=</span><span class="n">session</span>
        <span class="p">)</span>
        <span class="c1"># Insert new vectors
</span>        <span class="k">await</span> <span class="n">vectors_collection</span><span class="p">.</span><span class="n">insert_many</span><span class="p">(</span><span class="n">new_vectors</span><span class="p">,</span> <span class="n">session</span><span class="o">=</span><span class="n">session</span><span class="p">)</span>
        <span class="c1"># Update document_index entry
</span>        <span class="k">await</span> <span class="n">db</span><span class="p">.</span><span class="n">document_index</span><span class="p">.</span><span class="n">update_one</span><span class="p">(</span>
            <span class="p">{</span><span class="s">"kb_id"</span><span class="p">:</span> <span class="n">kb_id</span><span class="p">,</span> <span class="s">"document_id"</span><span class="p">:</span> <span class="n">document_id</span><span class="p">},</span>
            <span class="p">{</span><span class="s">"$set"</span><span class="p">:</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}},</span>
            <span class="n">upsert</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">session</span><span class="o">=</span><span class="n">session</span>
        <span class="p">)</span>
        <span class="c1"># Update KB stats (document_count, chunk_count)
</span>        <span class="k">await</span> <span class="n">db</span><span class="p">.</span><span class="n">knowledge_bases</span><span class="p">.</span><span class="n">update_one</span><span class="p">(</span>
            <span class="p">{</span><span class="s">"_id"</span><span class="p">:</span> <span class="n">ObjectId</span><span class="p">(</span><span class="n">kb_id</span><span class="p">)},</span>
            <span class="p">{</span><span class="s">"$set"</span><span class="p">:</span> <span class="p">{</span><span class="s">"document_count"</span><span class="p">:</span> <span class="n">total_docs</span><span class="p">,</span> <span class="s">"chunk_count"</span><span class="p">:</span> <span class="n">total_chunks</span><span class="p">}},</span>
            <span class="n">session</span><span class="o">=</span><span class="n">session</span>
        <span class="p">)</span>
</code></pre></div></div>

<p>This blue-green pattern means a document is never in a half-indexed state: either all its new vectors are visible, or the old ones remain.</p>

<div data-excalidraw="/assets/excalidraw/mongodb_blue_green_indexing.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/mongodb_blue_green_indexing.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>

<h3 id="embedding-cache">Embedding Cache</h3>

<p>Embeddings are cached in a global <code class="language-plaintext highlighter-rouge">embedding_cache</code> collection keyed by <code class="language-plaintext highlighter-rouge">(SHA-256 chunk hash, embedding model)</code>. When the same text chunk appears in multiple KBs (or is re-indexed after an edit), we skip the API call and reuse the cached vector. This saves both cost and latency—especially when re-indexing a large KB after a configuration change.</p>

<h3 id="search">Search</h3>

<p>The query string is embedded with the same model, then we run an aggregation whose first stage is <code class="language-plaintext highlighter-rouge">$vectorSearch</code> (with <code class="language-plaintext highlighter-rouge">numCandidates = max(top_k * 10, 100)</code> for better recall). We apply filters (organization, tags, date range) in the vector index definition, add <code class="language-plaintext highlighter-rouge">vectorSearchScore</code> via <code class="language-plaintext highlighter-rouge">$meta</code>, and optionally coalesce neighboring chunks for richer context.</p>

<h3 id="reconciliation-keeping-kbs-in-sync">Reconciliation: Keeping KBs in Sync</h3>

<p>Documents and KBs can drift: a document’s tags change, a document is deleted, or vectors are orphaned after a failed indexing run. The reconciliation service (<a href="https://github.com/analytiq-hub/doc-router/blob/main/packages/python/analytiq_data/kb/reconciliation.py">reconciliation.py</a>) detects and fixes this:</p>

<ul>
  <li><strong>Missing documents:</strong> documents with matching tags but no <code class="language-plaintext highlighter-rouge">document_index</code> entry → queued for indexing.</li>
  <li><strong>Stale documents:</strong> indexed documents whose tags no longer match the KB → removed.</li>
  <li><strong>Orphaned vectors:</strong> vectors without a corresponding <code class="language-plaintext highlighter-rouge">document_index</code> entry → deleted.</li>
</ul>

<p>Reconciliation uses a <strong>distributed lock</strong> (atomic <code class="language-plaintext highlighter-rouge">find_one_and_update</code> with a 10-minute TTL) so only one worker reconciles a given KB at a time. It processes documents in batches of 100 to keep memory bounded, and supports <strong>dry-run mode</strong> for auditing without side effects.</p>

<style>
.excalidraw-container {
  width: 100%;
  border: 2px solid #e0e0e0;
  border-radius: 8px;
  box-shadow: 0 2px 8px rgba(0,0,0,0.1);
  background: white;
  display: block;
  margin: 2rem 0;
  min-height: 400px;
}
.excalidraw-container svg {
  width: 100%;
  height: auto;
  display: block;
  margin: 0;
}
.loading-placeholder {
  padding: 2rem;
  text-align: center;
  color: #666;
}
</style>

<script type="module" src="/assets/js/excalidraw/render-excalidraw.js"></script>

<hr />

<p>In short: we get <strong>strict schema via migrations</strong>, <strong>predictable indexing</strong>, <strong>vector search for RAG with atomic updates</strong>, and <strong>self-healing knowledge bases</strong>—with one backend shared between DocRouter and SigAgent, and the same patterns whether we run on local Mongo, Atlas, or DocumentDB.</p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="tech" /><category term="programming" /><category term="ai" /><category term="databases" /><summary type="html"><![CDATA[Why MongoDB is ideal for AI applications: document-centric storage, vector search, knowledge bases, and horizontal scaling for DocRouter.AI and SigAgent.AI.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/mongodb-ai-applications-splash.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/mongodb-ai-applications-splash.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Navigating AI Entrepreneurship: Insights From The Application Layer</title><link href="https://analytiqhub.com/ai/entrepreneurship/startups/navigating-ai-entrepreneurship-insights-from-the-application-layer/" rel="alternate" type="text/html" title="Navigating AI Entrepreneurship: Insights From The Application Layer" /><published>2026-01-21T00:00:00+00:00</published><updated>2026-01-21T00:00:00+00:00</updated><id>https://analytiqhub.com/ai/entrepreneurship/startups/navigating-ai-entrepreneurship-insights-from-the-application-layer</id><content type="html" xml:base="https://analytiqhub.com/ai/entrepreneurship/startups/navigating-ai-entrepreneurship-insights-from-the-application-layer/"><![CDATA[<p><em>Through the lens of a serial entrepreneur, this article explores how the AI revolution is shifting from infrastructure to the application layer, where the greatest opportunities lie in solving specialized, data-heavy industry problems rather than perfecting raw technology.</em></p>

<p><em>This article originally appeared on KDnuggets. You can read the original version <a href="https://www.kdnuggets.com/navigating-ai-entrepreneurship-insights-from-the-application-layer">here</a>.</em></p>

<h1 id="introduction">Introduction</h1>

<p>The AI industry is experiencing a wave of transformation comparable to the dot-com era, and entrepreneurs are rushing to stake their claims in this emerging landscape. Yet unlike previous technology waves, this one presents a unique characteristic: the infrastructure is maturing faster than the market can absorb it. This gap between technological capability and practical implementation defines the current opportunity landscape.</p>

<p>Andrei Radulescu-Banu, founder of DocRouter AI and SigAgent AI, brings a unique perspective to this conversation. With a PhD in mathematics from the Massachusetts Institute of Technology (MIT) and decades of engineering experience, Radulescu-Banu has built document processing platforms powered by large language models (LLMs) and developed monitoring systems for AI agents, all while serving as a fractional chief technology officer (CTO) helping startups implement AI solutions.</p>

<p>His journey from academic mathematician to hands-on engineer to AI entrepreneur was not straightforward. “I’ve done many things in my career, but one thing I’ve not done is actually entrepreneurship,” he explains. “I just wish I had started this when I was, I don’t know, out of college, actually.” Now, he is making up for lost time with an ambitious goal of launching six startups in 12 months.</p>

<p>This accelerated timeline reflects a broader urgency in the AI entrepreneurship space. When technological shifts create new markets, early movers often capture disproportionate advantages. The challenge lies in moving quickly without falling into the trap of building technology in search of a problem.</p>

<h1 id="the-layering-of-the-ai-stack">The Layering Of The AI Stack</h1>

<p>Radulescu-Banu draws parallels between today’s AI boom and the internet revolution. “Just like in the past for computer networks, [you] had developers of infrastructure, let’s say, computer switches and routers. And then you had application layer software sitting on top, and then you had web applications. So what’s interesting is that these layers are forming now for the AI stack.”</p>

<p><em>To continue reading the full piece, visit KDnuggets <a href="https://www.kdnuggets.com/navigating-ai-entrepreneurship-insights-from-the-application-layer">here</a>.</em></p>]]></content><author><name>Rachel Kuznetsov</name></author><category term="ai" /><category term="entrepreneurship" /><category term="startups" /><category term="ai" /><category term="application-layer" /><category term="founders" /><category term="product" /><category term="strategy" /><summary type="html"><![CDATA[A founder's-eye view of building durable AI products at the application layer, where workflows, UX, and distribution matter more than any single model.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/analytiq_hub_front_page.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/analytiq_hub_front_page.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Document Workflows with Temporal and DocRouter.AI</title><link href="https://analytiqhub.com/tech/programming/ai/tutorials/how-to-create-document-workflows-with-temporal-and-docrouter-ai/" rel="alternate" type="text/html" title="Document Workflows with Temporal and DocRouter.AI" /><published>2025-12-25T00:00:00+00:00</published><updated>2025-12-25T00:00:00+00:00</updated><id>https://analytiqhub.com/tech/programming/ai/tutorials/how-to-create-document-workflows-with-temporal-and-docrouter-ai</id><content type="html" xml:base="https://analytiqhub.com/tech/programming/ai/tutorials/how-to-create-document-workflows-with-temporal-and-docrouter-ai/"><![CDATA[<p>🚀 Just spent the last few days building a powerful multi-step document processing pipeline — and it handles 200+ page medical records like a champ!</p>

<p>Single-prompt tools like DocRouter.AI shine for ~20-25 page docs… but what about massive collated files with labs, facesheets, insurance cards, and multiple patients mixed together? → One prompt = impossible.</p>

<p><strong>Enter the solution: Temporal + DocRouter.AI in a smart, scalable workflow.</strong></p>

<p>This post describes a real-world implementation that uses <a href="https://temporal.io/">Temporal</a> to orchestrate document processing workflows with <a href="http://docrouter.ai">DocRouter.AI</a>, solving the challenge of processing massive medical records through intelligent multi-step orchestration.</p>

<p>The implementation is available at <a href="https://github.com/analytiq-hub/doc-router-temporal/blob/blog_post_dec_2025">doc-router-temporal</a> and processes medical documents containing hundreds of pages, extracting patient names, dates of birth, and medical insurance information.</p>

<h2 id="the-problem-massive-medical-records-need-smart-orchestration">The Problem: Massive Medical Records Need Smart Orchestration</h2>

<p>Medical records often come as massive collated files containing 200+ pages with:</p>
<ul>
  <li>Lab results and test reports</li>
  <li>Patient facesheets with demographics</li>
  <li>Insurance cards and coverage details</li>
  <li>Clinical notes and progress reports</li>
  <li>Multiple patients’ information mixed together</li>
</ul>

<p><strong>The challenge</strong>: These documents are too large to process in a single LLM prompt due to token limits (typically 128K-200K tokens). One prompt = impossible for comprehensive extraction.</p>

<p><strong>The solution</strong>: A multi-step workflow that intelligently orchestrates the process:</p>

<ol>
  <li><strong>Split</strong>: Break the massive PDF into individual pages</li>
  <li><strong>Classify</strong>: Identify each page’s type and which patient it belongs to</li>
  <li><strong>Group</strong>: Intelligently group pages by patient</li>
  <li><strong>Extract</strong>: Process each patient’s page bundle for precise, targeted extraction</li>
</ol>

<h2 id="why-this-pattern-rocks">Why This Pattern Rocks</h2>

<p>This Temporal + DocRouter.AI combination delivers powerful advantages:</p>

<p>✅ <strong>Constant memory usage</strong> — scales effortlessly to 1,000+ pages without running out of resources</p>

<p>✅ <strong>Super general pattern</strong> → classify → group → process per group → works for any document type</p>

<p>✅ <strong>Fully durable &amp; retry-safe</strong> thanks to Temporal’s built-in resilience</p>

<p>✅ <strong>Built lightning-fast</strong> in just a couple of days using AI tools</p>

<p>✅ <strong>Parallel processing</strong> — handles multiple patients simultaneously while maintaining order</p>

<p>✅ <strong>Production-ready</strong> with automatic error handling, timeouts, and state management</p>

<div data-excalidraw="/assets/excalidraw/document_processing_solution.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/document_processing_solution.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>

<h2 id="the-smart-workflow-in-action">The Smart Workflow in Action</h2>

<p>Here’s how the Temporal + DocRouter.AI workflow processes massive medical records:</p>

<p>🔹 <strong>Temporal splits</strong> the 200+ page PDF into individual pages</p>

<p>🔹 <strong>Uploads them one by one</strong> to DocRouter.AI for processing</p>

<p>🔹 <strong>DocRouter classifies each page</strong> → identifies patient name + document type (lab results, insurance card, facesheet, etc.)</p>

<p>🔹 <strong>Temporal intelligently groups pages by patient</strong> using fuzzy name matching and DOB correlation</p>

<p>🔹 <strong>Sends each patient’s page bundle back to DocRouter.AI</strong> for precise, targeted extraction</p>

<p>🔹 <strong>Temporal aggregates everything</strong> → clean, complete per-patient results</p>

<h2 id="technical-implementation">Technical Implementation</h2>

<h2 id="why-temporal">Why Temporal?</h2>

<p><a href="https://temporal.io/">Temporal</a> provides durable workflow orchestration that’s perfect for this use case. Unlike traditional approaches (queues, background jobs, or simple scripts), Temporal handles:</p>

<ul>
  <li><strong>Durable execution</strong>: Resumes from crashes during 200-page processing</li>
  <li><strong>Parallel processing</strong>: Processes multiple pages simultaneously while maintaining order</li>
  <li><strong>Error handling</strong>: Automatic retries for API rate limits and network issues</li>
  <li><strong>State management</strong>: Tracks processed pages and identified patients</li>
  <li><strong>Long-running workflows</strong>: Handles processes that take minutes to hours</li>
</ul>

<p>Temporal’s architecture is built around two key concepts: <strong>Workflows</strong> (orchestration logic) and <strong>Activities</strong> (actual work). The diagram below illustrates how these components work together:</p>

<div data-excalidraw="/assets/excalidraw/temporal_workflows_activities.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/temporal_workflows_activities.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>

<h2 id="the-workflow-implementation">The Workflow Implementation</h2>

<p>The implementation uses a hierarchical workflow structure with two main workflows:</p>

<ol>
  <li><strong>Classify and Group PDF Pages</strong> (<code class="language-plaintext highlighter-rouge">ClassifyAndGroupPDFPagesWorkflow</code>): Chunks the PDF, classifies each page, and groups pages by patient</li>
  <li><strong>Extract Insurance Information</strong> (<code class="language-plaintext highlighter-rouge">ClassifyGroupAndExtractInsuranceWorkflow</code>): Creates patient-specific PDFs and extracts insurance card data</li>
</ol>

<p>The main workflow (<code class="language-plaintext highlighter-rouge">ClassifyGroupAndExtractInsuranceWorkflow</code>) orchestrates the entire process:</p>

<div data-excalidraw="/assets/excalidraw/temporal_docrouter_workflow.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/temporal_docrouter_workflow.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>

<style>
.excalidraw-container {
  width: 100%;
  border: 2px solid #e0e0e0;
  border-radius: 8px;
  box-shadow: 0 2px 8px rgba(0,0,0,0.1);
  background: white;
  display: block;
  margin: 2rem 0;
  min-height: 400px;
}

.excalidraw-container svg {
  width: 100%;
  height: auto;
  display: block;
  margin: 0;
}

.loading-placeholder {
  padding: 2rem;
  text-align: center;
  color: #666;
}
</style>

<script type="module" src="/assets/js/excalidraw/render-excalidraw.js"></script>

<h2 id="creating-schemas-and-prompts-with-claude-agent">Creating Schemas and Prompts with Claude Agent</h2>

<p>Before building the Temporal workflow, we created the extraction schemas and prompts using the <strong>Claude Agent for DocRouter.AI</strong> (an MCP server at <a href="https://github.com/analytiq-hub/doc-router/tree/main/packages/typescript/mcp"><code class="language-plaintext highlighter-rouge">doc-router/packages/typescript/mcp</code></a>).</p>

<p>The Claude Agent allows Claude Code to create extraction schemas and prompts. For example, you can prompt: <em>“Create a schema for extracting patient information from medical record pages”</em> and it will validate, create, and test the schema automatically.</p>

<p>The diagram below illustrates how DocRouter.AI operations work and how they integrate with Temporal workflows:</p>

<div data-excalidraw="/assets/excalidraw/docrouter_operations.excalidraw" class="excalidraw-container">
  <div class="loading-placeholder">Loading diagram...</div>
</div>
<div style="text-align: center; margin-top: 1rem;">
  <a href="/excalidraw-edit?file=/assets/excalidraw/docrouter_operations.excalidraw" target="_blank" style="color: #2563eb; text-decoration: none; font-weight: 500;">
    📝 Edit in Excalidraw
  </a>
</div>

<p><strong>Key distinction</strong>: DocRouter.AI implements <strong>discrete operations</strong> (single prompt-and-schema processing per document), while Temporal implements the <strong>workflow orchestration</strong> (chunking, grouping, uploading pages for classification, and uploading chunks for extraction).</p>

<p>For this implementation, we created:</p>
<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">medical_page_classifier</code></strong>: Classifies pages as labs, facesheets, insurance cards, clinical notes, or other document types</li>
  <li><strong><code class="language-plaintext highlighter-rouge">insurance_card</code></strong>: Extracts insurance card information from patient pages</li>
</ul>

<h2 id="workflow-implementation">Workflow Implementation</h2>

<p>The main workflow (<code class="language-plaintext highlighter-rouge">ClassifyGroupAndExtractInsuranceWorkflow</code>) orchestrates the entire process. Complete implementation: <a href="https://github.com/analytiq-hub/doc-router-temporal/blob/blog_post_dec_2025/workflows/classify_group_and_extract_insurance.py"><code class="language-plaintext highlighter-rouge">workflows/classify_group_and_extract_insurance.py</code></a>.</p>

<h3 id="step-1-classify-and-group-pages">Step 1: Classify and Group Pages</h3>

<p>The workflow calls <code class="language-plaintext highlighter-rouge">ClassifyAndGroupPDFPagesWorkflow</code> to:</p>
<ol>
  <li><strong>Chunk the PDF</strong> into individual pages</li>
  <li><strong>Classify each page</strong> using DocRouter.AI</li>
  <li><strong>Group pages by patient</strong> using name and DOB matching</li>
</ol>

<p>The grouping logic (<a href="https://github.com/analytiq-hub/doc-router-temporal/blob/blog_post_dec_2025/activities/group_classification_results.py"><code class="language-plaintext highlighter-rouge">activities/group_classification_results.py</code></a>) includes name normalization, DOB parsing, and fuzzy matching with Levenshtein distance to handle typos and variations.</p>

<h3 id="step-2-extract-insurance-information">Step 2: Extract Insurance Information</h3>

<p>For each patient group, the workflow:</p>
<ol>
  <li><strong>Creates patient-specific PDFs</strong> with only that patient’s pages (<a href="https://github.com/analytiq-hub/doc-router-temporal/blob/blog_post_dec_2025/activities/create_and_upload_patient_pdf.py"><code class="language-plaintext highlighter-rouge">activities/create_and_upload_patient_pdf.py</code></a>)</li>
  <li><strong>Uploads them to DocRouter.AI</strong> for insurance card extraction</li>
  <li><strong>Polls for completion</strong> and retrieves results</li>
</ol>

<p>To avoid passing large binary data through Temporal, PDFs are read from disk and uploaded directly to DocRouter.AI.</p>

<h2 id="creating-temporal-workflows-with-cursor">Creating Temporal Workflows with Cursor</h2>

<p>The Temporal workflow was developed in <strong>Cursor</strong> using natural language prompts. Cursor’s AI understood the codebase context and Temporal patterns, enabling rapid development without deep workflow expertise.</p>

<p><strong>Key benefits:</strong></p>
<ul>
  <li>Context awareness across multiple files and existing activities</li>
  <li>Automatic Temporal pattern suggestions (activities, workflows, child workflows)</li>
  <li>Natural language refactoring and error handling implementation</li>
</ul>

<p><strong>Example development prompts:</strong></p>

<p><em>“Create a workflow that processes each patient’s pages into separate PDFs, uploads them with insurance_card tag, waits for completion, then retrieves insurance extraction results.”</em></p>

<p><em>“Add fuzzy name matching to group pages with names differing by up to 2 letters using Levenshtein distance.”</em></p>

<p><em>“Handle edge cases where medical records contain individual patient names vs. multiple patient summaries.”</em></p>

<p>Cursor handled the complex Temporal implementation, error handling, and performance optimizations, resulting in production-ready code in just a few hours.</p>

<h2 id="key-implementation-details">Key Implementation Details</h2>

<h3 id="design-decisions">Design Decisions</h3>

<ul>
  <li><strong>Avoid large data transfer</strong>: PDFs are read from disk and uploaded directly to DocRouter.AI, not passed through Temporal</li>
  <li><strong>Parallel processing</strong>: Multiple patients processed concurrently with status polling</li>
  <li><strong>Error handling</strong>: Retry logic, graceful degradation, and timeout handling</li>
  <li><strong>State management</strong>: Only document IDs and metadata flow through Temporal to keep history efficient</li>
</ul>

<h2 id="results">Results</h2>

<p>The implementation successfully processes massive medical record documents with hundreds of pages, extracting patient names, dates of birth, and medical insurance information. It handles large documents (200+ pages), parallel patient processing, error recovery, and long-running operations.</p>

<h3 id="running-the-workflow">Running the Workflow</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start the Temporal worker</span>
python worker.py

<span class="c"># In another terminal, run the client</span>
python client_classify_group_and_extract_insurance.py &lt;path_to_pdf&gt;
</code></pre></div></div>

<p>See the <a href="https://github.com/analytiq-hub/doc-router-temporal/blob/blog_post_dec_2025/README.md">README</a> and <a href="https://github.com/analytiq-hub/doc-router-temporal/blob/blog_post_dec_2025/client_classify_group_and_extract_insurance.py">client script</a> for details.</p>

<p>The workflow returns JSON with file name, page classifications, schedule pages, and patient data with insurance information.</p>

<h2 id="were-in-a-new-era">We’re in a New Era</h2>

<p><strong>What used to take months of engineering can now be shipped in days.</strong></p>

<p>This Temporal + DocRouter.AI pipeline was built end-to-end using Claude Code-based Agent (for prompts + schemas) + Cursor (for Temporal workflows). I barely knew Temporal before starting — didn’t matter. AI tools let me iterate fast, prototype, and perfect the logic in record time.</p>

<p>The result: reliable, scalable document processing with durable workflows, parallel processing, and rapid schema iteration. The implementation took just 2 days to build and handles 200+ page medical records like a champ.</p>

<p>If you’re building AI-powered document workflows (especially in healthcare), this combo is 🔥.</p>

<p>Code available at <a href="https://github.com/analytiq-hub/doc-router-temporal/tree/blog_post_dec_2025">doc-router-temporal</a>.</p>

<ul>
  <li><a href="https://docs.temporal.io/">Temporal Documentation</a></li>
  <li><a href="https://docrouter.ai/docs/quick-start">DocRouter.AI Documentation</a></li>
  <li><a href="https://docrouter.ai/docs/mcp">DocRouter.AI MCP Server</a></li>
</ul>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="tech" /><category term="programming" /><category term="ai" /><category term="tutorials" /><summary type="html"><![CDATA[How to build multi-step document processing pipelines with Temporal and DocRouter.AI for handling 200+ page medical records and complex workflows.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/temporal_docrouter_workflows.svg" /><media:content medium="image" url="https://analytiqhub.com/assets/images/temporal_docrouter_workflows.svg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Announcing Analytiq Pages: Jekyll Theme with Tailwind CSS</title><link href="https://analytiqhub.com/webdev/jekyll/tailwind/github-pages/theme/release/announcing-analytiq-pages-theme/" rel="alternate" type="text/html" title="Announcing Analytiq Pages: Jekyll Theme with Tailwind CSS" /><published>2025-11-29T00:00:00+00:00</published><updated>2025-11-29T00:00:00+00:00</updated><id>https://analytiqhub.com/webdev/jekyll/tailwind/github-pages/theme/release/announcing-analytiq-pages-theme</id><content type="html" xml:base="https://analytiqhub.com/webdev/jekyll/tailwind/github-pages/theme/release/announcing-analytiq-pages-theme/"><![CDATA[<p>🎉 <strong>We’re excited to announce the release of Analytiq Pages Theme v0.1.6</strong> - a modern, feature-rich Jekyll theme that transforms our Analytiq Pages approach into a reusable, professional-grade solution for building beautiful company websites.</p>

<h2 id="the-evolution-from-method-to-theme">The Evolution: From Method to Theme</h2>

<p>Analytiq Pages started as a methodology for building company websites using Jekyll, GitHub Pages, and Tailwind CSS. Today, we’re proud to release it as <strong>Analytiq Pages Theme</strong> - a fully packaged Jekyll theme that makes this powerful combination accessible to everyone.</p>

<div class="grid md:grid-cols-2 gap-8 my-8">
  <div class="bg-white rounded-lg shadow-lg p-6 border border-gray-200">
    <div class="w-12 h-12 bg-blue-600 rounded-lg flex items-center justify-center mb-4">
      <svg class="w-6 h-6 text-white" fill="currentColor" viewBox="0 0 24 24">
        <path d="M12 2L2 7l10 5 10-5-10-5zM2 17l10 5 10-5M2 12l10 5 10-5" />
      </svg>
    </div>
    <h3 class="text-xl font-semibold text-gray-900 mb-3">Before: Analytiq Pages</h3>
    <p class="text-gray-600">A methodology requiring manual setup of Jekyll, Tailwind, and custom configurations for each site.</p>
  </div>

  <div class="bg-white rounded-lg shadow-lg p-6 border border-gray-200">
    <div class="w-12 h-12 bg-green-600 rounded-lg flex items-center justify-center mb-4">
      <svg class="w-6 h-6 text-white" fill="currentColor" viewBox="0 0 24 24">
        <path d="M12 2l3.09 6.26L22 9.27l-5 4.87 1.18 6.88L12 17.77l-6.18 3.25L7 14.14 2 9.27l6.91-1.01L12 2z" />
      </svg>
    </div>
    <h3 class="text-xl font-semibold text-gray-900 mb-3">Now: Analytiq Pages Theme</h3>
    <p class="text-gray-600">A complete, ready-to-use Jekyll theme with all features pre-configured and professionally designed.</p>
  </div>
</div>

<h2 id="-whats-new-in-analytiq-pages-theme">✨ What’s New in Analytiq Pages Theme</h2>

<h3 id="-advanced-features">🚀 Advanced Features</h3>

<ul>
  <li><strong>Tailwind CSS Integration</strong>: Modern, responsive design with utility-first styling</li>
  <li><strong>Enhanced Syntax Highlighting</strong>: Beautiful code blocks with copy functionality using highlight.js</li>
  <li><strong>Interactive Diagrams</strong>: Full Excalidraw integration for creating and embedding technical diagrams</li>
  <li><strong>Professional Blog Layouts</strong>: Complete blog system with sidebar, pagination, and category support</li>
  <li><strong>Responsive Navigation</strong>: Mobile-first navigation with dropdown menus and hamburger menu</li>
  <li><strong>Dark Theme Support</strong>: Built-in dark mode (Minima skin) for modern aesthetics</li>
</ul>

<h3 id="-developer-experience">🛠 Developer Experience</h3>

<ul>
  <li><strong>Three Customization Hooks</strong>: Override <code class="language-plaintext highlighter-rouge">custom-head.html</code>, <code class="language-plaintext highlighter-rouge">custom-header.html</code>, and <code class="language-plaintext highlighter-rouge">custom-footer.html</code> for site-specific modifications</li>
  <li><strong>Reusable Components</strong>: Pre-built Tailwind components (alerts, buttons, cards)</li>
  <li><strong>SEO Optimized</strong>: Integrated jekyll-seo-tag for better search engine visibility</li>
  <li><strong>PDF Embedding</strong>: Native support for embedding PDF documents</li>
  <li><strong>RSS Feed Generation</strong>: Automatic blog feed generation with jekyll-feed</li>
</ul>

<h3 id="-content-features">🎨 Content Features</h3>

<ul>
  <li><strong>MathJax Support</strong>: Render mathematical equations in your content</li>
  <li><strong>Multiple Layouts</strong>: Specialized layouts for homepages, blog posts, documentation, and more</li>
  <li><strong>Excalidraw Editor</strong>: Built-in diagram editor accessible at <code class="language-plaintext highlighter-rouge">/excalidraw-edit</code></li>
  <li><strong>Smart Embeds</strong>: Flexible diagram embedding with static, interactive, and link modes</li>
</ul>

<h2 id="installation-get-started-in-minutes">Installation: Get Started in Minutes</h2>

<h3 id="option-1-quick-start-with-existing-site">Option 1: Quick Start with Existing Site</h3>

<p>Add to your Jekyll site’s <code class="language-plaintext highlighter-rouge">Gemfile</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">gem</span> <span class="s2">"analytiq-pages-theme"</span><span class="p">,</span> <span class="ss">git: </span><span class="s2">"https://github.com/analytiq-hub/analytiq-pages-theme"</span>
</code></pre></div></div>

<p>Or for a specific stable version:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">gem</span> <span class="s2">"analytiq-pages-theme"</span><span class="p">,</span> <span class="ss">git: </span><span class="s2">"https://github.com/analytiq-hub/analytiq-pages-theme"</span><span class="p">,</span> <span class="ss">tag: </span><span class="s2">"v0.1.6"</span>
</code></pre></div></div>

<p>Update your <code class="language-plaintext highlighter-rouge">_config.yml</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">theme</span><span class="pi">:</span> <span class="s">analytiq-pages-theme</span>
</code></pre></div></div>

<p>Install and serve:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bundle <span class="nb">install
</span>bundle <span class="nb">exec </span>jekyll serve
</code></pre></div></div>

<h3 id="option-2-new-site-from-scratch">Option 2: New Site from Scratch</h3>

<p>The simplest way to get started:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create new Jekyll site</span>
jekyll new my-company-site
<span class="nb">cd </span>my-company-site

<span class="c"># Replace minima theme with analytiq-pages-theme in Gemfile</span>
<span class="nb">sed</span> <span class="nt">-i</span> <span class="s1">'s/gem "minima".*/gem "analytiq-pages-theme", git: "https:\/\/github.com\/analytiq-hub\/analytiq-pages-theme", tag: "v0.1.6"/'</span> Gemfile

<span class="c"># Configure theme in _config.yml</span>
<span class="nb">sed</span> <span class="nt">-i</span> <span class="s1">'s/^theme: .*/theme: analytiq-pages-theme/'</span> _config.yml

<span class="c"># Install and serve</span>
bundle <span class="nb">install
</span>bundle <span class="nb">exec </span>jekyll serve
</code></pre></div></div>

<p>Or if you prefer manual editing:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create new Jekyll site</span>
jekyll new my-company-site
<span class="nb">cd </span>my-company-site

<span class="c"># Edit Gemfile: replace the minima line with:</span>
<span class="c"># gem "analytiq-pages-theme", git: "https://github.com/analytiq-hub/analytiq-pages-theme", tag: "v0.1.6"</span>

<span class="c"># Edit _config.yml: replace the theme line with:</span>
<span class="c"># theme: analytiq-pages-theme</span>

<span class="c"># Install and serve</span>
bundle <span class="nb">install
</span>bundle <span class="nb">exec </span>jekyll serve
</code></pre></div></div>

<p>Visit <code class="language-plaintext highlighter-rouge">http://localhost:4000</code> to see your new site!</p>

<h3 id="option-3-local-installation-alternative">Option 3: Local Installation (Alternative)</h3>

<p>If you encounter repository access issues, you can install the theme locally:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Download the theme release</span>
curl <span class="nt">-L</span> https://github.com/analytiq-hub/analytiq-pages-theme/archive/refs/tags/v0.1.6.zip <span class="nt">-o</span> theme.zip
unzip theme.zip
<span class="nb">mv </span>analytiq-pages-theme-0.1.6/ _themes/analytiq-pages-theme/

<span class="c"># Or clone locally if you have access</span>
git clone https://github.com/analytiq-hub/analytiq-pages-theme.git _themes/analytiq-pages-theme
<span class="nb">cd </span>_themes/analytiq-pages-theme <span class="o">&amp;&amp;</span> git checkout v0.1.6

<span class="c"># Add to _config.yml</span>
<span class="nb">echo</span> <span class="s2">"theme: _themes/analytiq-pages-theme"</span> <span class="o">&gt;&gt;</span> _config.yml
</code></pre></div></div>

<h2 id="key-improvements-over-manual-setup">Key Improvements Over Manual Setup</h2>

<h3 id="before-analytiq-pages-method">Before (Analytiq Pages Method)</h3>
<ul>
  <li>Manual Tailwind CSS configuration</li>
  <li>Custom Jekyll setup for each project</li>
  <li>Repeated configuration of syntax highlighting</li>
  <li>Manual Excalidraw integration setup</li>
  <li>No standardized component library</li>
</ul>

<h3 id="after-analytiq-pages-theme">After (Analytiq Pages Theme)</h3>
<ul>
  <li><strong>One-line installation</strong>: <code class="language-plaintext highlighter-rouge">theme: analytiq-pages-theme</code></li>
  <li><strong>Pre-configured features</strong>: Everything works out of the box</li>
  <li><strong>Professional components</strong>: Reusable Tailwind components included</li>
  <li><strong>Advanced integrations</strong>: Excalidraw, MathJax, PDF embeds ready to use</li>
  <li><strong>Consistent experience</strong>: Standardized layouts and styling across sites</li>
</ul>

<h2 id="showcase-real-world-examples">Showcase: Real-World Examples</h2>

<p>The theme powers several professional websites:</p>

<ul>
  <li><strong><a href="https://analytiqhub.com">Analytiq Hub</a></strong> - Business intelligence and analytics platform</li>
  <li><strong><a href="https://docrouter.ai">DocRouter.AI</a></strong> - AI-powered document routing solution</li>
  <li><strong><a href="https://sigagent.ai">SigAgent.AI</a></strong> - Signature analysis and automation platform</li>
  <li><strong><a href="https://bitdribble.github.io">Bitdribble</a></strong> - Technology consulting and development</li>
</ul>

<h2 id="migration-guide-upgrading-from-analytiq-pages">Migration Guide: Upgrading from Analytiq Pages</h2>

<p>If you’re currently using the Analytiq Pages methodology, migration is straightforward:</p>

<ol>
  <li><strong>Add the theme</strong> to your Gemfile and <code class="language-plaintext highlighter-rouge">_config.yml</code></li>
  <li><strong>Remove manual Tailwind configuration</strong> (now handled by the theme)</li>
  <li><strong>Update custom includes</strong> to use the new hook system</li>
  <li><strong>Migrate Excalidraw files</strong> to <code class="language-plaintext highlighter-rouge">assets/excalidraw/</code> directory</li>
</ol>

<p>Your existing content and configuration will continue to work seamlessly.</p>

<h2 id="technical-architecture">Technical Architecture</h2>

<p>The theme is built with modern web standards:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>analytiq-pages-theme/
├── _layouts/           # 5 specialized layouts
├── _includes/          # 16+ reusable components
├── assets/
│   ├── css/           # Tailwind + custom styles
│   └── js/            # Pagination, Excalidraw renderer
├── _config.yml        # Default configuration
└── analytiq-pages-theme.gemspec
</code></pre></div></div>

<p><strong>Dependencies:</strong></p>
<ul>
  <li>Jekyll &gt;= 3.9, &lt; 5.0 (supports both GitHub Pages and Jekyll 4.x)</li>
  <li>jekyll-feed ~&gt; 0.12</li>
  <li>jekyll-seo-tag ~&gt; 2.6</li>
  <li>jekyll-pdf-embed ~&gt; 1.1</li>
</ul>

<h2 id="why-choose-analytiq-pages-theme">Why Choose Analytiq Pages Theme?</h2>

<h3 id="for-startups--small-businesses">For Startups &amp; Small Businesses</h3>
<ul>
  <li><strong>Zero hosting costs</strong> with GitHub Pages</li>
  <li><strong>Professional appearance</strong> without design costs</li>
  <li><strong>Content-first approach</strong> with Markdown simplicity</li>
  <li><strong>Scalable foundation</strong> that grows with your business</li>
</ul>

<h3 id="for-agencies--consultants">For Agencies &amp; Consultants</h3>
<ul>
  <li><strong>Rapid deployment</strong> for client websites</li>
  <li><strong>Consistent branding</strong> across projects</li>
  <li><strong>Advanced features</strong> for technical content</li>
  <li><strong>Easy customization</strong> for client-specific needs</li>
</ul>

<h3 id="for-enterprise-teams">For Enterprise Teams</h3>
<ul>
  <li><strong>Git-based workflows</strong> for version control and collaboration</li>
  <li><strong>Security compliance</strong> with GitHub’s enterprise infrastructure</li>
  <li><strong>SEO optimization</strong> built-in</li>
  <li><strong>Extensible architecture</strong> for custom requirements</li>
</ul>

<h2 id="troubleshooting">Troubleshooting</h2>

<h3 id="jekyll-version-compatibility">Jekyll Version Compatibility</h3>

<p>The theme supports both Jekyll 3.9+ (GitHub Pages) and Jekyll 4.x (modern installations). If you encounter version conflicts:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># For GitHub Pages compatibility (Jekyll 3.x)</span>
gem <span class="s2">"github-pages"</span>, group: :jekyll_plugins

<span class="c"># For modern Jekyll 4.x installations</span>
gem <span class="s2">"jekyll"</span>, <span class="s2">"~&gt; 4.3"</span>
</code></pre></div></div>

<p>The theme will work with either version automatically.</p>

<h3 id="bundle-install-issues">Bundle Install Issues</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Clear bundle cache</span>
bundle cache clean <span class="nt">--force</span>

<span class="c"># Clear bundler git cache</span>
<span class="nb">rm</span> <span class="nt">-rf</span> ~/.local/share/gem/ruby/cache/bundler/git/

<span class="c"># Try installing again</span>
bundle <span class="nb">install</span>
</code></pre></div></div>

<h3 id="theme-not-loading">Theme Not Loading</h3>

<ul>
  <li>Verify <code class="language-plaintext highlighter-rouge">_config.yml</code> has <code class="language-plaintext highlighter-rouge">theme: analytiq-pages-theme</code></li>
  <li>Clear Jekyll cache: <code class="language-plaintext highlighter-rouge">rm -rf _site .jekyll-cache</code></li>
  <li>Rebuild: <code class="language-plaintext highlighter-rouge">bundle exec jekyll build</code></li>
</ul>

<h2 id="getting-help--contributing">Getting Help &amp; Contributing</h2>

<ul>
  <li><strong>Documentation</strong>: Comprehensive README at <a href="https://github.com/analytiq-hub/analytiq-pages-theme">analytiq-pages-theme</a></li>
  <li><strong>Issues &amp; Support</strong>: GitHub Issues for bug reports and feature requests</li>
  <li><strong>Contributing</strong>: Pull requests welcome for theme improvements</li>
  <li><strong>Migration Support</strong>: Contact us for help upgrading from manual Analytiq Pages setups</li>
</ul>

<h2 id="how-this-fits-into-your-stack">How This Fits Into Your Stack</h2>

<p>Analytiq Pages Theme transforms our proven methodology into a professional, reusable solution that makes building beautiful company websites accessible to everyone. Whether you’re launching a startup, building client sites, or managing enterprise web presence, this theme delivers the perfect balance of simplicity and power.</p>

<p>Ready to upgrade your web presence? Try Analytiq Pages Theme today!</p>

<hr />

<p><em>This theme powers the very website you’re reading now. Experience the Analytiq Pages Theme in action and see the <a href="https://github.com/analytiq-hub/analytiq-hub.github.io">source code</a> for implementation examples.</em></p>

<p><em>📢 <a href="https://www.linkedin.com/feed/update/urn:li:activity:7367581674697629697/">Join the discussion on LinkedIn</a> about modern Jekyll themes and web development workflows.</em></p>]]></content><author><name>Andrei Radulescu-Banu</name></author><category term="webdev" /><category term="jekyll" /><category term="tailwind" /><category term="github-pages" /><category term="theme" /><category term="release" /><summary type="html"><![CDATA[Announcing Analytiq Pages Theme: a modern open-source Jekyll theme with Tailwind CSS, dropdown navigation, case studies, and GitHub Pages hosting.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://analytiqhub.com/assets/images/announcing_analytiq_pages_theme.png" /><media:content medium="image" url="https://analytiqhub.com/assets/images/announcing_analytiq_pages_theme.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>