Methodology
Approach
The problem
Every organization drowns in unstructured knowledge. Research papers, technical reports, patents, market analyses, clinical studies — the volume grows exponentially while the capacity to read and synthesize it stays flat.
Traditional approaches fall short:
- Manual curation doesn’t scale. A domain expert can read maybe 3–5 papers a day with care.
- Keyword search finds documents but doesn’t understand them.
- RAG (Retrieval-Augmented Generation) retrieves relevant chunks and generates fluent text, but it doesn’t comprehend — it has no persistent model of what it’s learned.
Agentic curation
Agentic curation is a different paradigm. Instead of retrieving and regurgitating, AI agents read, extract, structure, and reason — building a persistent, queryable knowledge base that grows with every document processed.
RAG retrieves. Agentic curation comprehends.
The key difference: after processing 500 papers on a rare disease, a RAG system gives you a chatbot. An agentic curation system gives you a structured knowledge graph — every gene, pathway, symptom, and treatment linked with provenance to the source text.
The 5-step pipeline
1. Foraging
AI agents systematically search and collect relevant documents from diverse sources — academic databases, patent offices, preprint servers, government reports, web sources. The agent applies inclusion/exclusion criteria, deduplicates, and builds a working corpus.
2. Ingestion
Raw documents are parsed, normalized, and prepared for analysis. PDFs become structured text. Metadata is extracted and standardized. Each document gets a persistent identity in the knowledge graph.
3. Sensemaking
This is the core step. Agents read each document and extract structured facts according to a domain-specific schema — entities, relationships, measurements, claims, evidence. Extracted facts are mapped to typed entities in a TypeDB knowledge graph.
4. Analysis
With structured knowledge in place, agents can now reason: identify gaps, find contradictions, surface patterns, compare across sources. This isn’t keyword matching — it’s graph queries and inference over real typed relationships.
5. Reporting
Curated knowledge is synthesized into deliverables: evidence summaries, gap analyses, competitive landscapes, structured datasets, or API endpoints. The knowledge graph persists for ongoing queries.
Knowledge architecture
The knowledge graph distinguishes two fundamental layers:
Domain-things — the real-world entities being studied:
- Genes, proteins, diseases, drugs, pathways (biomedical)
- Companies, products, markets, technologies (competitive intelligence)
- Papers, authors, institutions, findings (scientific literature)
Information-content entities — the documents and claims that describe domain-things:
- A paper reports a finding about a gene
- A patent claims a method involving a compound
- A review summarizes evidence about a treatment
This separation means you can always trace what is known back to who said it and where.
Domain examples
Agentic curation applies wherever there’s complex, evolving knowledge that needs systematic treatment:
- Rare disease research — building a comprehensive evidence base from scattered case reports and small studies
- Job market intelligence — structuring skills, roles, and requirements from thousands of job postings
- Technology landscape mapping — tracking emerging technologies across patents, papers, and startup activity
- Scientific literature review — systematic evidence synthesis at a scale no manual review can match