CocoIndex

CocoIndex is an incremental data framework that keeps the context behind AI agents continuously fresh by reprocessing only what changed. Open source, Rust core, Python on top. Apache 2.0.

Canonical version: CocoIndex.

CocoIndex is an incremental data framework that keeps the context behind AI agents continuously fresh by reprocessing only what changed. Open source, Rust core, Python on top. Apache 2.0.

The problem it solves: batch pipelines go stale. You re-index everything on a schedule, the agent reads outdated data between runs, and the bill scales with the whole corpus instead of the delta. CocoIndex flips that. Change a source byte and only the affected rows propagate, in under a second.

The mental model

Target = F(Source). You declare the desired target state; the engine keeps it in sync, React-style. You don't build the DAG by hand; CocoIndex derives the processing graph from your code.

@coco.fn(memo=True)
async def index_file(file, table):
    for chunk in RecursiveSplitter().split(await file.read_text()):
        table.declare_row(text=chunk.text, embedding=embed(chunk.text))

Why it matters

  • Delta-only processing. Only the Δ is recomputed on every change. Up to 10x cheaper at scale.
  • Sub-second freshness. Changes reach the target store almost immediately.
  • End-to-end lineage. Every output traces back to its source byte; you can audit and invalidate precisely.
  • Code-aware caching. Memoization invalidates only the transformations whose code or inputs actually changed.
  • Production reliability from the Rust core: parallel chunking, retries, exponential backoff, dead-letter queues.

What it connects

  • Sources: codebases, meeting notes, APIs, filesystems, databases, message queues, images/video, transcripts.
  • Targets: relational databases, data warehouses, vector databases (Qdrant, LanceDB), graph databases, message queues, feature stores.

Common builds: code embedding, PDF indexing, knowledge-graph extraction from meeting notes, Kafka streaming, real-time codebase indexing. The code-intelligence offering ships as a separate product, CocoIndexCode.

Core concepts

  • App — the top-level executable. It reads sources, transforms data, and declares target states to sync.
  • Processing component — a logical unit that groups one item's processing with its target state. Each runs independently and commits its changes atomically; it doesn't wait for the whole app to finish.
  • Target stateTargetState = Transform(SourceState). A pure function of the source, synced to a database, vector store, or filesystem.
  • Functions / transforms — discrete operations (PDF to markdown, chunking, embedding). Memoized, so unchanged inputs and unchanged code skip recomputation.

Memoization works at two levels: skip an entire processing component when its inputs and logic are unchanged, and skip individual transforms when intermediate results still match. The engine detects what changed and applies only the needed inserts, updates, and deletes inside atomic transactions. No manual delta computation, no hand-rolled state tracking.

Getting started

pip install -U cocoindex   # also: uv add cocoindex, or Poetry ^1.0

Requires Python 3.11–3.13 on macOS, Linux, or Windows 10+. Key CLI verbs:

  • cocoindex init — scaffold a project (main.py, pyproject.toml, README).
  • cocoindex ls — list apps and their persisted status.
  • cocoindex show — inspect an app's stable paths and components.
  • cocoindex update — run in catch-up mode; add --live to keep components processing as sources change. Also --reset, --full-reprocess, --preview.
  • cocoindex drop — revert all target states and clear internal state.

References


About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

Found this valuable? Share it with someone who needs it.

Join 6,000+ readers. Get practical systems for knowledge & AI. Free.

Subscribe ✨

Free: Knowledge System Checklist

A clear roadmap to building your own knowledge system. Subscribe and get it straight to your inbox.

6,000+ readers. No spam. Unsubscribe anytime.