Scientific Literature
Scientific Literature Skill
Multi-source scientific literature search, ingestion, and semantic analysis.
Purpose
The scientific-literature skill lets the agent search across four major biomedical literature sources, ingest papers into the knowledge graph, and perform embedding-based semantic search and thematic clustering. It’s the foundation for building research corpora and tracing hypothesis evolution (see Skills: Literature Trends).
Sources
| Source | Type | Coverage |
|---|---|---|
| Europe PMC | Full-text search | Life sciences; open-access full text available |
| PubMed (NCBI) | Full-text search | Biomedical; authoritative MEDLINE index |
| OpenAlex | Full-text search | Cross-disciplinary; open access; includes abstracts reconstructed from inverted index |
| bioRxiv / medRxiv | Preprints | Biology and medicine preprints; real-time feed |
Prerequisites
- TypeDB running (
make db-start) uvinstalled
Optional (for semantic commands):
- Qdrant running (
make qdrant-start) VOYAGE_API_KEYenvironment variable set (from dash.voyageai.com)
Commands
uv run python .claude/skills/scientific-literature/scientific_literature.py <command> [args]
| Command | What it does |
|---|---|
count |
Count papers matching a query (without ingesting) |
search |
Search and display results from one or more sources |
ingest |
Search and ingest matching papers into TypeDB |
list |
List papers already ingested into TypeDB |
embed |
Generate Voyage AI embeddings for ingested papers |
search-semantic |
Find papers by semantic similarity to a query |
cluster |
Cluster ingested papers by embedding similarity (HDBSCAN + UMAP) |
export-corpus |
Export a collection of papers to JSON or CSV |
Typical Workflow
Build a research corpus:
You: Search PubMed for papers about CRISPR base editing published since 2022
You: How many results are there?
You: Ingest the top 50 papers into my knowledge graph
You: Now embed those papers
You: Cluster them thematically and summarize what you find
Find semantically related papers:
You: I have a paper about prime editing using PE3max.
Find other papers in my corpus that are semantically similar.
Schema
Papers are stored as scilit-paper entities (a domain-thing subtype) with attributes:
id(key),title,abstract,doi,pmid,year,journalcontent(full text, when available)cache-path(local file reference for large content)
Papers can be tagged, added to collections (scilit-corpus), and annotated with notes.
Notes
- Deduplication: Papers are deduplicated by DOI and PMID across sources.
- Semantic search requires Qdrant: Start with
make qdrant-startbefore embedding. - OpenAlex abstracts: OpenAlex returns abstracts as an inverted index; the skill reconstructs them as plain text automatically.