Design Concepts
Design Concepts
The architecture and design principles behind Skillful Alhazen.
Core Philosophy: Curation Over Collection
The system exists to help you make sense of material, not just store it:
- Collection = passively accumulating information
- Curation = actively interrogating, extracting meaning, building structured understanding
Every component serves the curation mission. We embody Alhazen’s philosophy: be an enemy of all you read.
The 6-Phase System Design
Skills are designed around a 6-phase framework that traces the full lifecycle from system purpose to reporting. The curation-skill-builder skill tracks design decisions through all six phases using TypeDB.
Phase 1: GOAL → Define what the system is for and how success is measured
Phase 2: ENTITY-SCHEMA → TypeDB entity/relation/attribute types for the domain
Phase 3: SOURCE-SCHEMA → External data sources and how artifacts are captured
Phase 4: DERIVATION → Ingestion functions (artifact → structured entities)
Phase 5: ANALYSIS → Query and sensemaking functions (entities → insight)
Phase 6: REPORTING → Dashboard views and synthesized outputs
In practice, skill development moves through these phases iteratively. Schema gaps discovered during Phase 4 or 5 drive revisions to Phase 2. See Gap Architecture.
The curation workflow (Phases 3–6 in action)
FORAGING → INGESTION → SENSEMAKING → ANALYSIS → REPORTING
│ │ │ │ │
Discover Capture Extract Reason Present
sources raw meaning over actionable
content knowledge insight
↓ ↓ ↓ ↓ ↓
URLs, Artifacts, Fragments, Synthesis Dashboards,
APIs, provenance, notes, notes, answers,
feeds timestamps relations trends recommendations
The TypeDB Schema Hierarchy
The core schema defines three branches from an abstract root:
identifiable-entity (abstract) — id, name, description, provenance
├── domain-thing — real-world objects
│ ├── scilit-paper (scientific-literature namespace)
│ ├── apt-disease (alg-precision-therapeutics namespace)
│ ├── jobhunt-position (jobhunt namespace)
│ └── ... (one or more types per skill)
├── collection — typed sets of domain objects
│ ├── scilit-corpus
│ ├── jobhunt-search
│ └── ...
└── information-content-entity (abstract) — content-bearing entities
├── artifact — raw captured content (PDF, HTML, API response)
├── fragment — extracted piece of an artifact
└── note — agent analysis or annotation
A gene or job posting is not information content. Only artifacts, fragments, and notes carry content (content, cache-path, format attributes). Domain objects are what you reason about; ICEs are what you reason with.
The aboutness relation links notes to any identifiable-entity (the subject role). This is how the agent attaches its analysis to a specific paper, disease, or job posting.
Separation of Concerns
| Layer | What it knows | What it doesn’t know |
|---|---|---|
| Schema | Types, attributes, relations | What data is stored |
| CLI script | How to read/write TypeDB | What the agent is trying to accomplish |
| SKILL.md/USAGE.md | What the agent should do and why | Implementation details |
| Dashboard | How to display data | How it was produced |
This separation means skills can be developed and tested at each layer independently. Schema changes don’t require rewriting the agent instructions; agent instruction updates don’t require schema migrations.
Artifact Cache
Large content artifacts (PDFs, HTML pages, images) are stored on disk rather than inline in TypeDB:
- Content < 50 KB → stored inline in the TypeDB
contentattribute - Content ≥ 50 KB → stored in
~/.alhazen/cache/<type>/, referenced viacache-pathattribute
Cache directories: html/, pdf/, image/, json/, text/, github/
The cache is shared across all skills. A PDF ingested by jobhunt (a resume) uses the same pdf/ directory as papers ingested by scientific-literature.
Gap-Driven Schema Evolution
When the agent tries to represent a concept that has no place in the current schema, it encounters a schema gap. These are not failures — they are signals:
A schema gap means the knowledge work has outgrown the model.
The skilllog system detects gaps automatically via a PostToolUse hook that scans for TypeDB error codes ([SYR1], [TYR01], [FEX1], etc.). Gaps are filed as structured GitHub issues, fixed locally against the running TypeDB instance, and merged via PR with human review.
This gives the knowledge graph a mechanism for organic growth driven by actual use, not by top-down schema planning. See Gap Architecture for the full workflow.
TypeDB vs. Other Databases
TypeDB is the ontological foundation — not just a store. Key properties:
- Schema-first: types, relations, and constraints are defined before data is inserted. TypeDB will reject writes that violate the schema, making gaps detectable at insertion time.
- Pattern matching: queries express structural patterns across the graph, not just key lookups. The agent can ask “find all diseases whose mechanisms involve gene X and are associated with HPO phenotype Y” in a single TypeQL query.
- Cross-skill queries: because all skills share the same database and schema hierarchy, the agent can reason across skill boundaries. A job posting that mentions a disease can be connected to the agent’s disease mechanism research.
- Ontological hierarchy: the
identifiable-entity→domain-thing→ namespace type hierarchy means generic operations (tagging, noting, collecting) work uniformly across all domain types.