Building a Knowledge Graph of Philippine History
The Problem of Fragmented Knowledge
Philippine history exists in fragments. Spanish chroniclers recorded what they observed through colonial lenses. American-era scholars imposed Western frameworks. Post-independence historians have labored to recover indigenous perspectives from oral traditions, archaeological evidence, and the few surviving pre-colonial documents.
The result is a corpus that spans multiple languages (Spanish, English, Filipino, Visayan, Arabic, Chinese), multiple scripts (Latin, Baybayin, Kawi, Jawi), and multiple epistemological frameworks. No single researcher can hold it all.
This is where agents come in.
The Architecture
The Philippine History Knowledge Graph (PHKG) is built by a constellation of specialized agents, each responsible for a domain:
Agent-Maritime → ships, routes, ports, naval battles
Agent-Legal → laws, treaties, customary codes, court decisions
Agent-Culinary → foodways, agriculture, trade goods, recipes
Agent-Linguistic → languages, scripts, loanwords, translations
Agent-Diplomatic → alliances, wars, embassies, treaties
Agent-Material → artifacts, sites, excavations, dating evidence
Each agent ingests source material, extracts entities and relationships, and writes them to a shared graph database. A coordinator agent resolves conflicts, identifies connections across domains, and generates synthesis.
Entity Modeling
The graph uses a property graph model with typed nodes and edges:
Node types:
Person— historical figures with birth/death dates, titles, lineagePolity— political entities (sultanates, barangays, colonial governments)Place— geographic locations with temporal boundariesEvent— dated occurrences with participants and outcomesDocument— primary sources with provenance and languageArtifact— material objects with archaeological context
Edge types:
RULED,ALLIED_WITH,FOUGHT— between persons and politiesLOCATED_IN,TRADED_WITH— spatial and economic relationshipsAUTHORED,REFERENCED— documentary connectionsPRECEDED,CAUSED,INFLUENCED— temporal and causal links
Agent Design
Each agent follows a consistent pipeline:
- Ingest — Read source documents (PDFs, OCR’d manuscripts, structured data)
- Extract — Identify entities and relationships using domain-specific prompts
- Validate — Cross-reference against known facts and flag contradictions
- Commit — Write validated entities to the graph with provenance metadata
- Connect — Propose cross-domain links for coordinator review
The agents are not autonomous. They operate under human oversight, with a historian-in-the-loop reviewing flagged items and resolving ambiguities.
Why Agents, Not RAG
A retrieval-augmented generation (RAG) pipeline answers questions. A knowledge graph structures understanding. The difference is fundamental:
RAG says: “According to Blair & Robertson Vol. 3, Legazpi arrived in Cebu in 1565.”
The knowledge graph says: “Legazpi (Person) → ARRIVED_AT → Cebu (Place) → ON → 1565-04-27 (Date) → WHICH_DISPLACED → Rajah Tupas (Person) → WHO_RULED → Sugbu polity (Polity) → WHICH_TRADED_WITH → China, Siam, Borneo (Places).”
The graph encodes relationships. From relationships, you can ask questions that no single document answers.
The Vision
The goal is not a database. It is a living atlas of Philippine civilization — one that grows as agents process new sources, that reveals connections no single historian has drawn, and that makes the richness of 4,000 years of history navigable by anyone.
This is what Augmented Philippine Intelligence means: not replacing human scholarship but amplifying it with agents that never sleep, never forget, and never lose the thread.
References: Emma E. Blair and James A. Robertson, The Philippine Islands, 1493–1898, 55 vols. (1903–1909) — the foundational English-language primary source collection; William Henry Scott, Prehispanic Source Materials for the Study of Philippine History (1984); Laura Lee Junker, Raiding, Trading, and Feasting (1999); Zhao Rugua, Zhufan Zhi (c. 1225), trans. F. Hirth and W.W. Rockhill (1911). Technical: Amit Singhal, “Introducing the Knowledge Graph,” Google Blog (2012); Ian Robinson, Jim Webber, and Emil Eifrem, Graph Databases (O’Reilly, 2nd ed., 2015); Patrick Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” NeurIPS 2020.