Approach

The problem

Every organization drowns in unstructured knowledge. Research papers, technical reports, patents, market analyses, clinical studies — the volume grows exponentially while the capacity to read and synthesize it stays flat.

Traditional approaches fall short:

Manual curation doesn’t scale. A domain expert can read maybe 3–5 papers a day with care.
Keyword search finds documents but doesn’t understand them.
RAG (Retrieval-Augmented Generation) retrieves relevant chunks and generates fluent text, but it doesn’t comprehend — it has no persistent model of what it’s learned.

Agentic curation

Agentic curation is a different paradigm. Instead of retrieving and regurgitating, AI agents read, extract, structure, and reason — building a persistent, queryable knowledge base that grows with every document processed.

RAG retrieves. Agentic curation comprehends.

The key difference: after processing 500 papers on a rare disease, a RAG system gives you a chatbot. An agentic curation system gives you a structured knowledge graph — every gene, pathway, symptom, and treatment linked with provenance to the source text.

The 5-step pipeline

1. Foraging

AI agents systematically search and collect relevant documents from diverse sources — academic databases, patent offices, preprint servers, government reports, web sources. The agent applies inclusion/exclusion criteria, deduplicates, and builds a working corpus.

2. Ingestion

Raw documents are parsed, normalized, and prepared for analysis. PDFs become structured text. Metadata is extracted and standardized. Each document gets a persistent identity in the knowledge graph.

3. Sensemaking

This is the core step. Agents read each document and extract structured facts according to a domain-specific schema — entities, relationships, measurements, claims, evidence. Extracted facts are mapped to typed entities in a TypeDB knowledge graph.

4. Analysis

With structured knowledge in place, agents can now reason: identify gaps, find contradictions, surface patterns, compare across sources. This isn’t keyword matching — it’s graph queries and inference over real typed relationships.

5. Reporting

Curated knowledge is synthesized into deliverables: evidence summaries, gap analyses, competitive landscapes, structured datasets, or API endpoints. The knowledge graph persists for ongoing queries.

Knowledge architecture

The knowledge graph distinguishes two fundamental layers:

Domain-things — the real-world entities being studied:

Genes, proteins, diseases, drugs, pathways (biomedical)
Companies, products, markets, technologies (competitive intelligence)
Papers, authors, institutions, findings (scientific literature)

Information-content entities — the documents and claims that describe domain-things:

A paper reports a finding about a gene
A patent claims a method involving a compound
A review summarizes evidence about a treatment

This separation means you can always trace what is known back to who said it and where.

Domain examples

Agentic curation applies wherever there’s complex, evolving knowledge that needs systematic treatment:

Rare disease research — building a comprehensive evidence base from scattered case reports and small studies
Job market intelligence — structuring skills, roles, and requirements from thousands of job postings
Technology landscape mapping — tracking emerging technologies across patents, papers, and startup activity
Scientific literature review — systematic evidence synthesis at a scale no manual review can match