Building a Knowledge Graph of Philippine History

The Problem of Fragmented Knowledge

Philippine history exists in fragments. Spanish chroniclers recorded what they observed through colonial lenses. American-era scholars imposed Western frameworks. Post-independence historians have labored to recover indigenous perspectives from oral traditions, archaeological evidence, and the few surviving pre-colonial documents.

The result is a corpus that spans multiple languages (Spanish, English, Filipino, Visayan, Arabic, Chinese), multiple scripts (Latin, Baybayin, Kawi, Jawi), and multiple epistemological frameworks. No single researcher can hold it all.

This is where agents come in.

The Architecture

The Philippine History Knowledge Graph (PHKG) is built by a constellation of specialized agents, each responsible for a domain:

Agent-Maritime    → ships, routes, ports, naval battles
Agent-Legal       → laws, treaties, customary codes, court decisions
Agent-Culinary    → foodways, agriculture, trade goods, recipes
Agent-Linguistic  → languages, scripts, loanwords, translations
Agent-Diplomatic  → alliances, wars, embassies, treaties
Agent-Material    → artifacts, sites, excavations, dating evidence

Each agent ingests source material, extracts entities and relationships, and writes them to a shared graph database. A coordinator agent resolves conflicts, identifies connections across domains, and generates synthesis.

Entity Modeling

The graph uses a property graph model with typed nodes and edges:

Node types:

Person — historical figures with birth/death dates, titles, lineage
Polity — political entities (sultanates, barangays, colonial governments)
Place — geographic locations with temporal boundaries
Event — dated occurrences with participants and outcomes
Document — primary sources with provenance and language
Artifact — material objects with archaeological context

Edge types:

RULED, ALLIED_WITH, FOUGHT — between persons and polities
LOCATED_IN, TRADED_WITH — spatial and economic relationships
AUTHORED, REFERENCED — documentary connections
PRECEDED, CAUSED, INFLUENCED — temporal and causal links

Agent Design

Each agent follows a consistent pipeline:

Ingest — Read source documents (PDFs, OCR’d manuscripts, structured data)
Extract — Identify entities and relationships using domain-specific prompts
Validate — Cross-reference against known facts and flag contradictions
Commit — Write validated entities to the graph with provenance metadata
Connect — Propose cross-domain links for coordinator review

The agents are not autonomous. They operate under human oversight, with a historian-in-the-loop reviewing flagged items and resolving ambiguities.

Why Agents, Not RAG

A retrieval-augmented generation (RAG) pipeline answers questions. A knowledge graph structures understanding. The difference is fundamental:

RAG says: “According to Blair & Robertson Vol. 3, Legazpi arrived in Cebu in 1565.”

The knowledge graph says: “Legazpi (Person) → ARRIVED_AT → Cebu (Place) → ON → 1565-04-27 (Date) → WHICH_DISPLACED → Rajah Tupas (Person) → WHO_RULED → Sugbu polity (Polity) → WHICH_TRADED_WITH → China, Siam, Borneo (Places).”

The graph encodes relationships. From relationships, you can ask questions that no single document answers.

The Vision

The goal is not a database. It is a living atlas of Philippine civilization — one that grows as agents process new sources, that reveals connections no single historian has drawn, and that makes the richness of 4,000 years of history navigable by anyone.

This is what Augmented Philippine Intelligence means: not replacing human scholarship but amplifying it with agents that never sleep, never forget, and never lose the thread.

References: Emma E. Blair and James A. Robertson, The Philippine Islands, 1493–1898, 55 vols. (1903–1909) — the foundational English-language primary source collection; William Henry Scott, Prehispanic Source Materials for the Study of Philippine History (1984); Laura Lee Junker, Raiding, Trading, and Feasting (1999); Zhao Rugua, Zhufan Zhi (c. 1225), trans. F. Hirth and W.W. Rockhill (1911). Technical: Amit Singhal, “Introducing the Knowledge Graph,” Google Blog (2012); Ian Robinson, Jim Webber, and Emil Eifrem, Graph Databases (O’Reilly, 2nd ed., 2015); Patrick Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” NeurIPS 2020.