Knowledge-Based Agents, Revisited: A Spatial Ontology for Explainable Public Health Analytics

“The map is not the territory.” — Alfred Korzybski, Science and
Sanity (1933)

PART I — THE
PROBLEM, AND A PREVIEW OF THE ANSWER

1. Impossible Questions

It is mid-April in Kigali — the second week of the long rains, a cool
drizzle on the zinc roofs and the motos idling in the traffic below. A
district health officer sits at her desk facing a quarterly deadline.
She has data: spreadsheets of clinic visits, vaccination records and
supply-chain logs. She wants to ask a question that should, in
principle, be easy to answer: “Which of my 416 sectors have the
highest population but the lowest immunization coverage, and how does
that compare to the provincial average?”

She cannot begin to answer it, although it is an answerable question
and the data to answer it are partially in her possession and the rest
publicly available. To do so, she would have to splice five systems
whose vocabularies have never met: her DHIS2 facility reports (keyed by
organization-unit ID), a WorldPop population grid (raster cells on a
lat/lon grid), administrative boundary shapefiles (keyed by p-code), WHO
indicator definitions (keyed by GHO code), and a geocoding service to
thread them all together. So it is in principle answerable; given the
time pressure and the work it would take, it might as well be
impossible. No Rosetta Stone exists, so the question goes unasked, and
the sectors with the sharpest need remain invisible.

We built this kind of Rosetta Stone.

Rwanda night-light intensity at province level, 5 polygons

Rwanda night-light intensity at sector level, 402 polygons

Figure 1. Rwanda night-light intensity (VIIRS 2020),
rendered at province level (5 polygons) and sector level (402 polygons)
as Tufte-style small multiples. Interactive
dashboard.

At province level, both maps collapse into binaries, and the two
binaries refuse to align. Look at the population panel: three provinces
sit in the darkest band, Eastern, Southern, and Kigali, each holding
close to three million people, their totals within five percent of one
another; Northern and Western sit in the lightest band, both under two
million. The three middle bands of the color scale stand empty. Now look
at the night-lights panel beside it: one province sits in the darkest
band, Kigali, and the other four sit in the lightest, with mean
intensities clustered between 0.19 and 0.26, roughly a twentieth of
Kigali’s 4.39. The brightest province on the lights map is only third on
the population map. The dimmest, Eastern, is the most populous of all.
Light is not a proxy for population at this lens; the demographic
structure that does exist (three near-equal centers of population, two
thinner provinces) disappears inside both renderings. At sector level,
that single bright cell resolves into a tiered pattern: four sectors at
the brightest band (Muhima, Nyarugenge, Gitega, and Kimihurura) burning
at the city’s commercial and administrative core; Remera a band below;
Kacyiru, Gikondo, and Kicukiro a band below that; and roughly eight more
flickering at the lowest still-visible level before the rest of the
province falls dark. The pattern that emerges from this gradation is the
load-bearing one. Kinyinya holds 93,000 residents, placing it in the
country’s top sector population band, yet sits in the bottom luminosity
band at the same scale, registering a VIIRS mean of 6.6 against a
national sector ceiling of 37. So do several of its neighbors.
Conditional on being inside Kigali Province, light and population are
anticorrelated: the densest sectors are among the dimmest. The province
scale conceals both the structure of the bright core and the urban
poverty pressed up against it; the comparable populations across
provinces are equally invisible at that lens. Same underlying
distribution; a different lens, a different map; a different map, a
different recommendation. This is a structural property of spatial data
called the Modifiable Areal Unit Problem (Openshaw
1984), and understanding it is what separates maps that guide policy
from maps that mislead it. We will return to this in Section 5.

Why the Rosetta Stone had to exist at all. A
vaccination campaign requires administrative sectors. The funding report
due Friday is keyed to national borders. The malaria control plan slices
the country by ecological zones (wetlands, highlands, savanna), none of
which the political map names. Each map earns its keep in proportion to
how closely it matches the question being asked, and the officer at her
desk is routinely asked half a dozen such questions before the morning
meeting. What she needs is a system that can hold every one of those
maps at once and let her switch between them without losing her grip on
her own data.

What provides this is an ontology. It extends the reach of any single
map in two directions. First, it lets you switch between spatial
abstractions (province, district, sector, leaf) on demand, within a
single system. Second, and more radically, it opens a second, stranger
dimension: a conceptual space where polygons and time periods fall into
groups by any relationship that binds them.

“Show me maternal mortality rates in countries that have ratified the
Maputo Protocol on women’s rights” is a treaty-membership query joined
to a health indicator and projected onto polygons. “Compare vaccination
coverage in PEPFAR-supported countries versus Global Fund-supported
countries” is an organizational-affiliation query joined to surveillance
data, rendered as side-by-side choropleths. Each question selects
polygons through a conceptual lens (treaties signed, programs joined),
and the same underlying geography serves them all.

Most polygons in the graph carry Wikidata identifiers; all well known
ones do, and the ones that don’t have parents in the hierarchy that do.
Rwanda (Q1037) has 30
districts and 416 sectors, encoded through a consensus process that
thousands of contributors maintain. The European Union (Q458) has
27 member states resolvable through a member_of
(P463) edge that updates as membership changes. The Holy Roman
Empire’s Imperial Circles sit alongside modern nation-states in the same
graph, each a different lens on the same land.

Floridi (2008) calls this
the method of levels of abstraction: different typed views of
the same system yielding different but complementary accounts. A
knowledge-based agent walking that ontology in response to a
plain-English question is, in effect, a conversational doorway into the
spatial graph — community-maintained, multilingual, queryable in any
direction. The officer in Kigali who cannot pose her question today will
pose it tomorrow in the language she thinks in, and have an answer back
before her morning meeting.

This post is about how we built that interface. It picks up where Building
baby-NICER (the agent architecture) and Agents
for Data Warehousing (the three-layer warehouse) left off; those
posts described the agents and the infrastructure. This one reveals the
substrate they reason over. It begins with the scale of the
problem that substrate was meant to solve.

2.
The Last-Mile Problem: Why Scale Matters for Public Health

National averages launder the inequalities that matter most. Rwanda’s
2017 Malaria
Indicator Survey reported a national prevalence of 7.4% in children
under 5, a number that occludes sector-level variation an order of
magnitude wide, from very low rates in the high-altitude northwest to
many times higher in the eastern lowlands near Akagera (Semakula et al.
2020). Gavi’s “zero-dose” mapping initiative exists precisely
because national immunization rates belie pockets of zero coverage.
Twenty-nine percent of sub-Saharan Africa’s population lives more than
two hours from a hospital (Ouma et al.
2018), a number that is invisible in country-level statistics and
only surfaces when you geocode every facility and compute travel time on
a 1 km grid. A global version of the same exercise found that while 91%
of the world can reach a clinic within an hour by motorized transport,
only 57% can do so on foot (Weiss et al. 2020).
The gap is the inequality, and it is a spatial one.

The tangle of coding systems makes cross-system analysis nearly
impossible even when the data exists. A single district might be
identified by its ISO 3166-2 code in one dataset, its OCHA p-code in
another, its DHIS2 organization-unit ID in a third, and its GADM GID in
a fourth. When boundaries redraw themselves (the Democratic Republic of
the Congo moved from 11 to 26 provinces in 2015; South Sudan was created
in 2011), the time-series break along the new lines. A malaria incidence
series kept against the DRC’s eleven pre-2015 provinces does not extend
cleanly onto the twenty-six that replaced them: the provinces are not
the same provinces, and the numbers they hold cannot simply be
reassigned. The data that existed yesterday still exists, but it no
longer speaks to the data that will be collected tomorrow. Without a
principled way to bridge these vocabularies and track their evolution
over time, longitudinal analysis becomes a recurring archaeological
dig.

The cost of this fragmentation manifests in mortality data. Burstein et al.
(2019), mapping 123 million neonatal, infant, and child deaths
across 99 low- and middle-income countries, estimated that 58% of
under-five deaths between 2000 and 2017 could have been averted if
mortality rates within each country had matched that country’s
best-performing district. Geographic inequality is the dominant
share of the burden. Optimizing supply chains, deploying community
health workers, and targeting behavior-change campaigns all require data
below the national scale. The Sustainable Development Goal equity agenda
— “leave no one behind” — is fundamentally a spatial problem. A Rosetta
Stone that lets a district health officer ask her question in plain
English, and get an answer grounded in the right geography at the right
scale, is an instrument of equity.

PART II — THE
SOLUTION: A SPATIAL KNOWLEDGE GRAPH

3. Architecture
of the Geospatial Knowledge Base

The geospatial_knowledge_base is a BigQuery dataset of
55 tables, at the time of this writing, assembled from 28 public
sources. This data is better to be compared to a library’s holdings than
to a spreadsheet: about 1.3 million canonical regions, 655,000 Wikidata
links, 78 million pre-computed raster statistics, and 1.3 billion
subnational statistical observations spanning the late eighteenth
century through projections to 2100. By contrast, modern Microsoft Excel
worksheets can hold 1,048,576 rows and 16,384 columns per worksheet (up
to column XFD). The count, however, tells you only the size of the
collection. The shape of the collection is what does the work,
and the shape has four parts.

3a. The Canonical Region
Hierarchy

Every territory in the system — country, province, district, sector,
neighborhood, census block group — is a row in a
canonical_regions table, with a stable identifier and a
geometry. Parent-child edges live in a companion table,
canonical_decomposition_edges, which makes the hierarchy
traversable in both directions: a sector rolls up to a district, a
district to a province; a country decomposes into provinces, provinces
into districts.

The key architectural insight: the same territory can be
decomposed multiple ways — administrative, historical,
linguistic, ecological — via different set_name values on
the same edges table. The structure is a lattice: the same sector can
sit under multiple parents, depending on which decomposition you ask
about. Modern administrative subdivisions from Overture Maps sit
alongside the Holy Roman Empire’s 1555 Imperial Circles, the
Glottography atlas of European language areas, and ArchaeoGLOBE’s
prehistoric land-use zones. All are first-class decompositions of the
same underlying regions.

Sources are deliberately diverse — Overture Maps (administrative
depths 2–9), US Census TIGER, Wikidata P150
(contains administrative subdivisions), ArchaeoGLOBE, HistoGIS’s HRE
1555, the Glottography project, and geoBoundaries — because each source
has its blind spots, and cross-validation is the only way to catch
them.

3b. The Wikidata Bridge

Every canonical region that has a resolvable counterpart in Wikidata
— 655,000 of them so far — is joined by a QID. Rwanda is Q1037; Kigali is Q3859; the European Union is
Q458; France is
Q142. Through these identifiers the graph inherits the full
relational web that Wikidata’s
community maintains: treaty memberships, heads of government,
official languages, bordering countries, currencies, legal systems. A
query for “OPEC member states” is a two-hop traversal (P463 → Q7795); the membership
updates the moment a state joins or leaves.

The Habermas
Machine post argued that meaningful deliberation requires shared
evidence. Long before Habermas, the Tswana of southern Africa gathered
in the kgotla,
a community assembly where any adult could speak and where a chief ruled
only with the people’s assent — or, in their own proverb: kgosi ke
kgosi ka batho, “a king is king by the grace of the people.” The
Wikidata bridge does the same work in infrastructure: it provides a
common conceptual vocabulary that every party to a debate can inspect
and, if they disagree, contest through a community process.

And it gives the graph a particular cast. A node is constituted by
its edges, the way Georg Simmel
argued a person is constituted by the intersecting circles she belongs
to. Rwanda (Q1037) is Rwanda by virtue of the treaties it has signed,
the languages it recognizes, the borders it shares with its neighbors;
change the edges and you change what the node means. The bridge is, in
effect, Simmel’s web of group affiliations rendered in
infrastructure, and what the kgotla already knew — that identity is
relational and accountability is public — becomes an architectural
property of the system rather than a norm laid on top of it.

This matters for public health: when a ministry, a donor, and an
implementing NGO need to agree on “which districts are eligible for this
program,” the bridge gives them a shared place to look.

3c. Raster
Integration (Google Earth Engine)

Not all spatial data lives in polygons. Population sits on a grid
(WorldPop, GHSL); so do temperature (NOAA), air quality (interpolated
from the OpenAQ ground stations), and land cover (Overture, Hansen).
Rasters, in short — those regular arrays of cells that overlay a country
at any zoom level. A raster_catalog registers each of these
substrates; a region_raster_stats table stores pre-computed
zonal statistics — sums, means, areas — per region, per substrate, per
time slice (78 million rows). A finer-grained table,
region_cell_linkage (465 million rows), maps individual
raster cells to the region they fall in.

Rollups are handled by our system code, which walks
canonical_decomposition_edges with a 95%-coverage guard and
area-weighted formulas. The reducer matters: district population is the
sum of sector populations; district tree cover is the
area-weighted mean of sector tree cover. Using a mean for
population, or a sum for percentages, produces silent nonsense. Most
ad-hoc analyses get this wrong. The data product gets it right by
construction. We have to think about it once carefully and then our
clients can rely on our work.

3d. Temporal and Statistical
Fabric

A temporal_catalog registers 1.96 million
indicator-source combinations: Eurostat, WHO GHO, national censuses,
OECD, World Bank WDI. The catalog carries event streams alongside the
indicator panels. GDELT‘s
fifteen-minute global-news feed joins the same fabric as the OECD’s
annual tables, arriving as timestamped, geolocated rows that meet the
canonical hierarchy through the graph’s crosswalks. The
global_subnational_statistics table stores the observations
themselves: 1.3 billion rows, spanning from 1789 historical census
records through projections out to 2100. Time, like space, is typed in
Wikidata. Every year and every month exists as its own QID, alongside
many specific days, with the calendar relationships and
surrounding-event links the community has accreted. The relational mode
of reasoning, traversals over a graph of typed entities and the edges
that bind them, is one form of inference among several, and it is the
form we teach our agents, uniformly across both axes. SPARQL is the
query language for it, and it handles space and time with the same
primitives. The traversal that walks regional containment (P150) also walks
temporal containment, so “sovereign countries on 9 July 2011”
(the day South Sudan declared independence) and “months in
2020” answer in the same shape that “EU member states”
answers. The temporal layer of geospatial_knowledge_base
joins observation period strings to those Wikidata time entities the
same way the spatial layer joins regions; a Kimball-style
dim_time helper denormalizes calendar arithmetic for fast
aggregation, but the typing itself is inherited, not invented. The frame
problem discussed by Russell & Norvig in Chapter 10 of Artificial Intelligence: A
Modern Approach, how an agent keeps track of what is true at
what time, is sidestepped here by handing the agent one typed graph that
spans both axes and teaching it to reason relationally over that
graph.

Bridging vocabularies. Crosswalks relate FIPS to ISO, NUTS version
2013 to 2016 to 2021, OCHA p-codes to national census codes, DHIS2
organization-unit IDs to ISO 3166-2. Every major coding system is
preserved as its own column, and the Wikidata QID serves as the Rosetta
Stone among them.

4. From Data to
Knowledge: The Ontological Leap

What the reader just saw, explained. The four
pillars above transform raw data into something qualitatively different.
The DIKW hierarchy (Ackoff
1989; Rowley
2007) gives that transformation a name. Add context to data and you
have information; embed information in a relational framework where
inferences can run, and you have knowledge. Linking a population
statistic to a region (Q1037, Rwanda), a time period (2020), a concept
(P1082, population), and a source (WorldPop) is precisely that
operation, performed once at ingest and remembered ever after. The
number that lived in a spreadsheet now lives as a node in a knowledge
graph; the agent reads it from a sourced row with a vintage. Query the
graph for Rwanda’s districts in 2020 and thirty come back; redirect the
same query at 1990 and ten prefectures turn up instead, because Rwanda
redrew the lines in 2006 and the graph remembers both versions.
Dissolved states work the same way: a 1985 query of Europe’s sovereign
countries returns a map in which the Soviet Union, Yugoslavia, and
Czechoslovakia are all still intact. The agent’s vocabulary is whatever
the graph holds at the moment in question (thirty districts in 2020, ten
prefectures in 1990, ten Imperial Circles in 1555), and every layer
carries the date and the source that produced it. Discipline lives in
supplying both.

An ontology you inherit. Russell & Norvig (2020, Ch. 12) identify
ontological engineering, building categories, objects, relations,
events, time, and space from scratch, as a perennial bottleneck in AI.
For half a century, every serious knowledge-based system has had to
quarry its own ontology from the rock. Wikidata (Vrandečić & Krötzsch
2014), now with over 100 million items and thousands of active
contributors, turns that quarrying job into inheritance. It furnishes
the upper ontology as community-maintained public infrastructure, the
way OpenStreetMap provides the road network. The
geospatial_knowledge_base supplies the pieces below that
layer (geometry at usable resolution, raster statistics, dense temporal
observations) and inherits the upper ontology intact from the
community’s work.

The neuro-symbolic synthesis. Garcez and Lamb
(2023) argue that AI is entering a “third wave” combining neural and
symbolic reasoning. Pan et al. (2024),
in the definitive recent survey, map three paradigms: KG-enhanced LLMs,
LLM-enhanced KGs, and synergistic frameworks in which each does what it
does best. Two concrete architectures have been influential here. Think-on-Graph (Sun et al.
2024) has the LLM run beam search over the graph, treating the graph
as the reasoning substrate itself. Reasoning on Graphs (Luo et al.
2024) has the LLM generate relation-path plans grounded in
the graph, then retrieves and reasons over them.

Our system sits squarely in this third paradigm. Ask it “compare
vaccination coverage in EU member states and ECOWAS member states,” and
the two halves interlock visibly: the LLM parses intent and shapes the
query; the graph supplies both membership sets through P463 traversal,
joined to the coverage indicator. Each half is load-bearing. The LLM
carries the language; the graph carries the facts. Hogan et al. (2021), in their
comprehensive ACM Computing Surveys treatment, make exactly
this case for knowledge graphs as the natural substrate of hybrid
reasoning.

A personal note on disciplines. As someone trained
in both the quantitative social sciences and the qualitative
anthropological tradition that has long insisted on the irreducibility
of cultural context, I read this architecture as something larger than
an engineering pattern. The historical fault line between economics and
mathematical sociology on one side and cultural anthropology on the
other has run along the question of context: the quantitative side has
counted what it could measure, and the qualitative side has insisted
that what could not be measured was precisely what mattered. Our system
here is an attempt to narrow that gap. Subtle questions (“Maputo
Protocol countries”, “PEPFAR-supported sectors”, “kgotla-practicing
communities”, “post-Soviet states with continuous statistical series”)
name concepts grounded in particular legal, organizational, linguistic,
and cultural moments, and they are the questions a sensitive analyst
actually wants to ask. The LLM hears them in the language and register
the questioner already inhabits; the typed graph then resolves each name
through embedding and ontological normalization against a
community-maintained substrate, so the noun phrase the analyst wrote
becomes a path the agent can walk. What was previously unquantifiable,
the meaning of a term as anchored in a moment, becomes a relation to
traverse, with provenance attached. It is, in a small but real way, a
meeting of cultural anthropology at scale with mathematical sociology
and economics: each lending the other what it has historically lacked.
The anthropologist gets to keep the texture of her categories while
reaching populations no fieldwork team could ever cover. The
mathematical sociologist, in turn, gets to count the right thing: the
term as the community it describes would name it, not the closest
abstraction the survey instrument happened to support.

Explainability as an ethical imperative. When the
agent says “France is a member of the EU,” that sentence is traceable:
Q142 → P463 → Q458. When it says “Akarere ka Gasabo has 1.87 million
people,” the number traces to a specific raster substrate (GHSL 2020), a
specific rollup formula (sum, area-weighted, 95% coverage guard), and a
specific set of grid cells weighted by their coverage fractions. Floridi (2008)
argues that explicability, the duty to let people understand and contest
algorithmic conclusions, is a necessary condition of ethical AI. For
health resource allocation, where a five-meter shift in a boundary can
change which clinic a village is assigned to, the duty is strict. The Agents
for Data Warehousing post argued that bias should be a first-class
KPI and that digital humanism should be operational. The spatial
knowledge graph is where that principle becomes concrete: each statistic
carries its provenance, each aggregation carries the formula that
produced it, and the geometry of any boundary you can name traces back
to a specific shapefile from a specific source. The agent that reasons
over the data sees the receipts alongside the numbers.

PART III — THE DEMONSTRATIONS

5.
Rwanda: The Modifiable Areal Unit Problem Made Visible

Interactive dashboard: Three
Rwandas, Three Lenses: the Modifiable Areal Unit Problem made
visible — three indicators × three scales, rendered live.

Return to the officer’s question: which of my 416 sectors have
the highest population but the lowest coverage? Before we can
overlay her coverage data (that is Part IV), we have to see what the
first half of that question — “highest population” — looks like. The
answer depends entirely on the scale at which the map is drawn.

Rwanda population at province level, 5 polygons — totals sum cleanly, a national bird's-eye view

Rwanda population at district level, 27 polygons — the Kigali concentration emerges

Rwanda population at sector level, 402 polygons — every urban pocket visible

Figure 2. Rwanda population (WorldPop 2020) rendered
at three scales: province (5 polygons), district (27), sector (402).
Tufte-style small multiples. (The dashboard currently carries
complete raster fill for 402 of Rwanda’s 416 sectors and 27 of its 30
districts; the small gap is coverage, not administrative truth.) Interactive
dashboard.

At province level, three of Rwanda’s five provinces
(Eastern at 3.14M, Southern at 3.11M, and Kigali at 2.98M) sit in the
same darkest band on the count-based color scale, their totals within
five percent of one another; Northern (1.86M) and Western (1.64M) sit in
the lightest band. The map collapses to a binary: three populous
provinces, two thinner ones. This is the scale of national strategy
documents. It is too coarse to act on, and the picture it presents, that
the country’s population sits in two indistinguishable blocks, actively
conceals the polynuclear reality underneath.

At district level, the country becomes legible and
Kigali Province separates into its three components: Nyarugenge, Gasabo,
and Kicukiro. Akarere ka Gasabo emerges as the country’s most populous
district at 1.87 million people; Nyarugenge, which contains the downtown
and Nyamirambo, has roughly a third of Gasabo’s count yet averages the
highest density of any district in Rwanda (38.8 people per GHSL cell
against Gasabo’s 19.8). Gasabo’s lower mean is itself a lie at this
scale: the district stretches from the urban core of Kimihurura out past
Jali Mountain to rural hills, and the arithmetic mean of a
government-ministry sector and a hilltop agricultural sector describes
neither. The surprise of the district map is what sits beside the Kigali
three: Nyagatare District, out on the Ugandan border in Eastern
Province, ranks second nationally at 761,000, and Rubavu District,
against the Congolese frontier in the Western highlands, ranks fourth at
539,000. Population is not ‘all in Kigali, with everyone else thinned
out’; it concentrates in pockets distributed across the territory, and
Gasabo is one of several centers, not a singularity.

At sector level, dispersion sharpens into a view
neither earlier scale could approximate. Kimihurura and Remera, in
central Gasabo, pack thousands of people per square kilometre. So does
Kinyinya, just up the hill, the same Kinyinya that on the night-lights
map registers as dark. Across central Kigali a band of dense, often
poorly-lit residential sectors carries much of the city’s population.
Meanwhile Gasabo’s outer rural sectors drop to a small fraction of that
density. A sector-level map of Rwanda shows not five stories but four
hundred and two. The officer can now point to the specific polygons
where density is highest and, once her coverage data is joined, where
the intersection with low coverage lives. Below sector, Rwanda’s actual
administrative hierarchy continues into roughly 2,100 cells (akagari)
and around 15,000 villages (umudugudu); the graph does not name those
exact units, but it does already carry finer-than-sector polygons for
Rwanda from several sources: another fifty or so localities at deeper
Overture admin depths (5 through 7), plus geoBoundaries supplements
(synthetic and Wikidata-matched) sitting below the standard sector
layer. Eight source-depths of polygon coverage in all, from country
through to those deepest leaves. The architectural decision was to
standardize the universal hierarchy only down to the deepest level every
country has, and to layer country-specific finer subdivisions on top of
that whenever a source provides them. Beneath every polygon, the raster
grid sits at the very bottom of the stack as the universal floor: finer
than any polygon, present everywhere on Earth, under everything else.
For now, sector is where the current dashboards stop, and the grid is
what carries any signal below that.

Different indicator, different signature. Population
is only one of three layers on this dashboard, and each has its own MAUP
fingerprint. Tree cover at province level already names the country’s
ecological structure: Western Province sits at roughly 37% mean canopy
cover (Hansen), well above Southern and Northern at about 22%, Eastern
at 17%, and Kigali at 16%. Western is the clear standout, anchored by
Nyungwe Forest in its south. The structure sharpens at district scale:
Rusizi (Western, abutting Nyungwe) tops the country at 55%, Musanze in
the Northern highlands carries Volcanoes National Park at 36%, Nyamagabe
and Nyaruguru on the Southern flank of Nyungwe sit around 31%, and the
Kigali districts settle in the mid-teens. The pattern is highland-forest
versus the eastern savanna and Kigali’s stripped-down urban landscape.
At sector level it resolves further into a farmland-versus-forest
patchwork. Night lights, as the top of the post already showed, behave
differently again. Kigali alone fills the province-scale top band, and
at sector scale that single bright cell resolves into a tiered cluster
of administrative and commercial sectors with densely populated
residential sectors like Kinyinya remaining dark in their midst. Within
Kigali, light and population are anticorrelated. Each indicator has its
own MAUP fingerprint: some conceal structure at coarse scales
(population, lights), others sharpen it (tree cover); the question is
never whether the map is hiding something, but how.

The pattern has a name. This is the Modifiable
Areal Unit Problem (Openshaw
& Taylor 1979; Openshaw
1984) in its classical form. Openshaw and Taylor, in what remains
the most visceral empirical demonstration, varied the zoning scheme on a
fixed dataset a million times and watched the correlation coefficient
drift from near −1 to near +1 as a function of nothing but how the
boundaries were drawn. Fotheringham & Wong
(1991) extended the phenomenon into multivariate regression: any
dashboard with two or more indicators is vulnerable. This is the
geometry of reality interacting with the units we impose on it, and the
knowledge base is designed to render the interaction visible.

The ecological fallacy, concretely. Robinson (1950) gave this family of
errors its name: patterns that hold at the aggregate do not
automatically hold for the parts the aggregate contains. “Rwanda is 7%
malaria prevalence” is true at the country level (2017 RMIS, children
under 5) and wrong at every sector level: the high-altitude northwest
sits at very low rates while a handful of eastern and southern sectors
carry the burden (Semakula et al.
2020). The province-level map shows the country’s apparent calm. The
sector-level map shows where the disease lives, and where the bed nets
need to land first.

The rollup mechanics matter. Producing a district
number from its sectors is a calibrated operation. Nyarugenge district’s
population is the sum of its sectors’ populations; its tree cover is the
area-weighted mean of its sectors’ tree covers; its immunization
coverage, once the client data is joined, is the population-weighted
mean of its sectors’ coverage rates. Each indicator has its own correct
reducer. The rollup engine walks the
canonical_decomposition_edges table with a 95% coverage
guard, emitting a district value only when at least 95% of the
district’s geometry has sector data underneath. The wrong reducer
silently produces a convincing answer that misleads.

The agent does the discovery, over a substrate built to make
discovery cheap. In our end-to-end test that shipped with the
Rwanda enrichment work, the agent is given a plain-English prompt:
“Create a choropleth map of Rwanda population by sector.” It
discovers the tables itself: it calls
list_all_datasets_and_tables(), inspects the schema of
v_region_spine and region_raster_stats, writes
a single short join, and returns the chart. The data product is
engineered to keep that join short. Pre-computed rollups, materialized
crosswalks, and views that pre-join the layers an agent would otherwise
have to assemble live underneath, so the agent asks for a row rather
than building one. Fewer joins to compose means fewer chances to
hallucinate one. At the rendering boundary the same principle applies:
the agent does not call the Plotly or Superset API directly. It emits a
typed Pydantic object describing its intent (the polygons, the
indicator, the reducer, the color scale, the labels), and a
deterministic translation layer hands that object to the renderer. The
Pydantic schema is the lock: an LLM constrained to produce only valid
instances of the model cannot invent a field, mistype an enum, or call a
method that does not exist. Hallucination at the rendering layer becomes
a category of error the architecture has eliminated, not one the agent
has to remember to avoid; the thrashing-and-retry loop that ordinarily
eats agent budget when an LLM guesses at an API surface is structurally
absent. What stays for the LLM is what only an LLM can do: read the
user’s natural-language intent and pick the right shape from the typed
surface area we hand it. Every step is traceable: a specific tool call,
a specific schema read, a specific SQL join, a specific Pydantic object
and a specific render. The baby-NICER
post described an agent with memory; here the relevant memory is the
schema of the knowledge graph itself, which tells the agent which
questions are answerable at all, and under which join.

6. Europe: The Palimpsest

Interactive dashboard: Three
Europes, One Continent: decomposed by politics (Holy Roman Empire,
1555), language (Asher & Moseley 2007 Atlas), and land use
(ArchaeoGLOBE 2019) — three decompositions of the same territory,
rendered live.

Three maps of the same ground. Three different answers to “what is
Europe”: one political, one linguistic, one agricultural; one from 1555,
one from 2007, one from 1850 CE. None of them is the modern
administrative map, and that is the point. Each is a legitimate
organizing principle that leaves marks on contemporary health outcomes
the modern map obscures.

Political Europe, 1555 — the ten Imperial Circles of the Holy Roman Empire and the Kreisfreies Gebiet

Linguistic Europe — seventy-eight language areas from Asher & Moseley 2007

Agricultural Europe ~6,000 BP — when only the Balkans farmed at scale

Agricultural Europe 1,850 CE — intensive farming has reached almost every region

Figure 3. Europe rendered through three
decompositions, with the agricultural decomposition shown as a
before/after pair on a shared color scale: political (Holy Roman
Empire’s ten Imperial Circles, 1555; HistoGIS from IEG-Maps), linguistic
(seventy-eight contemporary language areas from Asher & Moseley’s
Atlas of the World’s Languages, 2007), and agricultural — at
~6,000 BP, when only the Balkans farmed at scale, and at 1,850 CE, when
intensive farming had reached almost every region (ArchaeoGLOBE). Three
Europes dashboard. From
Fertile Crescent to Industrial Field — agricultural pair.

Every territory carries layered geographies. Lefebvre (1974/1991)
called this the production of space: the land stays fixed; the meanings
imposed on it shift with culture and custom, and each imposition stands
as a legitimate reading, with habit alone making one of them the
default. The graph treats that property as a first-class feature. All
three maps decompose the same parent polygon (Europe, Q46); what differs
between them is only the set_name on the edges.

Political Europe, 1555. The Holy Roman Empire’s ten
Imperial Circles and one Kreisfreies Gebiet, from Andreas Kunz’s
historical atlas and digitized by HistoGIS
at the Austrian Centre for Digital Humanities.
set_name = 'hre_1555'. Ten of the eleven features carry
Wikidata QIDs: the Kurrheinischer Kreis around Mainz, Trier, and
Koblenz; the Bayerischer Kreis around Munich, Salzburg, and Regensburg;
the Obersächsischer Kreis around Dresden and Brandenburg; seven others.
Lay any of these against a modern German Länder map and the overlap
shows a clear pattern. Several of today’s Länder are composed of pieces
that once belonged to the same Imperial Circle, and several that
belonged to different Circles still sit in different Länder now. This is
an organizational memory, sedimented in clinical networks, regulatory
jurisdictions, and referral patterns.

Linguistic Europe, contemporary. The second layer,
Glottography’s
digitization of Asher & Moseley’s Atlas of the World’s
Languages (2007), contributes seventy-eight language polygons.
set_name = 'glottography'. The current dashboard colors
each polygon by its area in square kilometers. Northern Frisian along
the Wadden Sea coast anchors the lower bound at 85 km²; English’s global
extent (across the British Isles, North America, Australia, New Zealand,
and the rest of the Anglophone world) anchors the upper bound at 25.3
million km², with Spanish, Russian, Portuguese, and the Arabic varieties
filling the top bins. The visible scale therefore reads as a footprint
ranking. The linguistic taxonomy lives in the schema’s
family field rather than in the palette: the seventy-eight
areas span ten families. Forty are Indo-European, filling roughly half
the dataset, from Iceland through Iberia, France, Italy, the Slavic
east, and into the Indo-Iranian fringe of the Caucasus. Seventeen are
Afro-Asiatic, the Arabic and Berber zones along the Mediterranean and
North African rim. Six are Uralic, threading from Finland through
Estonia into Hungary. Four are Turkic across the Black Sea steppe and
Anatolia. Four are Kartvelian, three are Abkhaz-Adyge, and one is
Nakh-Daghestanian, the three families sharing the Caucasus. One area is
Mongolic-Khitan (Kalmyk on the lower Volga); one is Eskimo-Aleut (the
Arctic fringe); and one is the isolate Basque, straddling the
French-Spanish border in the western Pyrenees and splitting cleanly
across a boundary every modern administrative map treats as hard. The
architectural point is that the same
set_name = 'glottography' decomposition can be re-rendered
tomorrow colored by family, or by speaker count, or by date of first
attestation, by changing one field in the chart definition. The
substrate carries the encodings; the chart picks one. This layer maps
the language geography as it lives today, cutting through present-day
political borders with its own logic. Pathogens travel through people,
trade, and language. Healthcare-seeking behavior tracks language as
closely as any of them. A Basque-speaking patient in Bayonne and a
Basque-speaking patient in Bilbao face the same linguistic choice at the
clinic door, regardless of which country is paying the clinician. The
Kurdish-speaking regions span Turkey, Iraq, Syria, and Iran; a
tuberculosis surveillance study that stopped at any of those borders
would stop before the epidemic did.

Agricultural Europe across time. The third layer
draws on the ArchaeoGLOBE Project,
a consortium of more than 250 archaeologists that reconstructed global
human land use across ten time intervals from 10,000 years before the
present to today. set_name = 'archaeoglobe'. The dashboard
renders two slices side by side on a shared color scale: ~6,000 BP and
1,850 CE. The contrast is the point. At 6,000 BP only Romania, Bulgaria,
and Moldova reach the highest INAG (intensive-agriculture) bracket in
the dataset, about 40% of land area, while every other European region
sits at 0.1 or below. By 1,850 CE almost every European and
Mediterranean polygon has reached that same bracket. Between those two
frames lies a six-millennium wave of intensification: Central Europe and
France hit the top bracket around 4,000 BP, the Bronze Age; the
Mediterranean ring of Italy, Greece, and Iberia joins by 2,000 BP;
Britain and Ireland reach it by 1,000 BP; Scandinavia does not cross
until 1750 CE; Eastern Europe’s interior, the Caucasus, the Russian
Volga, and the Baltic-Finnish fringe never reach it in the historical
record, sitting in the next bracket down (around 10%) as late as
1850.

By 1850 CE, then, the dashboard’s right-hand frame shows a continent
in which intensive agriculture spans most of the western and central
landmass, the Balkans, Iberia, the Pontic steppe, and the eastern
Mediterranean rim; the thinner footprints are Iceland, Finland and the
Baltic, the Caucasus, and Russia’s interior. This is the economic
geography that steam, synthetic fertilizer, and the railway were about
to upend, and it still matters. Settlement density, rural-urban
migration patterns, food systems, and even contemporary
zoonotic-exposure profiles across Europe trace back to where intensive
agriculture had already taken hold by 1850, and to when it took hold,
since regions that intensified four thousand years apart entered the
modern era with very different soil regimes, animal-husbandry
traditions, and pathogen reservoirs. A public-health team studying
obesity, cardiovascular disease, or antibiotic resistance in European
rural populations can now ask whether the patterns align with the 1850
agricultural core, or with the earlier Bronze Age core, or with the
latecomers of the seventeenth-to-nineteenth-century intensification
wave. A SQL join answers each version of the question.

The architectural principle. What makes this work is
that all three decompositions live in the same table. The same code that
walks one walks any of them. A query that rolled COVID mortality up
against modern German Länder can be reissued against the 1555 Imperial
Circles by changing one string (set_name = 'hre_1555'), and
the shapefile comes back from five centuries ago instead of this one.
The rest of the system notices nothing different.

Why this matters beyond the demo. The Breaking
the Echo Chamber post argued that seeing only one perspective
distorts understanding. The palimpsest is the spatial version of that
warning. Three organizing principles (the political inheritance of 1555,
the living language geography of 2007, the agricultural footprint of
1850) each explain something about present-day health outcomes that the
modern administrative map elides. A post-conflict boundary change can
break a time series, as it did when the DRC went from eleven provinces
to twenty-six in 2015 and when South Sudan was created in 2011. A
linguistic community can straddle a border, as Kurdish does across four
states and Basque across the Pyrenees. A diaspora’s health trajectory
often tracks its cultural map more closely than the political one. A
rural economy’s shape may be older than any sitting minister. In each
case the right abstraction depends on the question being asked. The
graph lets the team pick the abstraction that fits, and the answer
carries the vintage and the source of whatever abstraction it was
computed against.

PART
IV — THE INTEGRATION: CLIENT DATA MEETS THE KNOWLEDGE GRAPH

7. How a Client’s Data
Joins the Substrate

Return once more to the officer in Kigali. Her data is the sort any
ministry official amasses over a career: a DHIS2 instance with monthly
facility reports, a spreadsheet of GPS coordinates for the roughly 1,500
health facilities her ministry tracks, a donor register that tags each
activity with Gavi, the Global Fund, or PEPFAR, and a
Kinyarwanda-labeled indicator dictionary she and her colleagues have
refined over the years. None of it speaks the graph’s language by
default. Onboarding is the translation work, four kinds of it, done in a
single pass.

The spatial join. Her facility GPS coordinates are
projected onto region_geometry and linked to every level of
the canonical hierarchy in a single pass. A clinic in central Kimihurura
resolves to Kimihurura sector (depth 5), Gasabo district (depth 4),
Kigali Province (depth 3), and Rwanda (Q1037, depth 2) simultaneously.
One row in her facility table now carries four usable geographic
keys.

The temporal join. Her monthly reports match against
dim_time. A row dated “2026-03” lands on a pre-materialized
month entry that already knows its year, its quarter, where the month
sits in her ministry’s fiscal calendar, and how it relates to every
other slice of time the graph holds. The three-year trend query she
would otherwise have to assemble through date arithmetic is now a single
join.

The conceptual join. Her indicator definitions get
linked to Wikidata concepts through a concept_qid_mapping
table. The ministry’s “DPT3 coverage” resolves to the concept node for
diphtheria-tetanus-pertussis third dose (and inherits, through
that node, the WHO GHO definition, ICD cross-references, and synonyms in
40+ languages). Her local indicator dictionary becomes searchable by the
name a donor’s analyst in Geneva would use.

The organizational join. Her funding tags (“Gavi,”
“Global Fund,” “PEPFAR”) resolve to the Wikidata entities for those
organizations, and through them to their own relational webs: board
composition, country commitments, focus areas, treaty obligations. A
query for “which of my sectors are covered by Gavi and PEPFAR?” is
translated from English to a two-hop graph query in SPARQL.

Population weighting, free of charge. Because
WorldPop is already linked to every region at every level, any coverage
metric she computes can be population-weighted at any scale.
“District A: 60% coverage” — flat, the way a ministry quarterly
report would show it — opens up into “District A covers 500,000
people at 60%, while District B covers 50,000 people at 95%”, and
the contrast changes which district she pages first. The national
average the ministry reports stops hiding the million people served
worst.

The result is a custom data warehouse: the officer’s tables
co-resident in BigQuery with the geospatial_knowledge_base,
joins pre-computed, provenance preserved. She owns her data. The graph
supplies the context that makes it interpretable. The Agents
for Data Warehousing post described the Reflective SQL Agent, the
DBT Data-Vault Agent, and the Analytics Agent. Those agents operate on a
joined warehouse of this shape. The knowledge graph is what makes their
reasoning grounded in sourced rows.

8. Plain
English to Choropleth: The Agent in Action

The officer opens Slack. She types: “Show me sectors in
Nyarugenge where DPT3 coverage is under 80%, ordered by
population.”

Next, the ReflectiveSqlAgent reads her sentence. It
calls list_all_datasets_and_tables() and
inspect_table_schema() to see what it has to work with: her
coverage table, v_region_spine for the sector polygons,
region_raster_stats for the population counts. It writes
the SQL join itself. A deterministic pipeline wraps the result with
Overture Maps geometry and then applies the design conventions Edward
Tufte argued for over his career: a perceptually uniform color scale, no
chartjunk, and labels on the outliers. Superset renders the map. Slack
delivers it back to her, sector labels in Kinyarwanda.

The data-first principle, written into our
deterministic pipeline as a load-bearing architectural rule, deserves a
line of its own: no static lookup files, no Python-side format
detection, no language-specific logic. All matching happens in BigQuery
via OR conditions against the actual names
column, across all 150+ languages Overture Maps carries. BigQuery is the
single source of truth, and no decision about what a name or a code
means ever lives in something as brittle as a python dictionary that
could drift out of sync with the data.

Multilingual rendering does real work. The same map
of India renders in Devanagari Hindi, Gujarati, Tamil, Russian Cyrillic,
or Japanese because every region in the graph carries a Wikidata QID and
Wikidata maintains labels for each entity in hundreds of languages,
community-curated, continually updated, keyed off the same identifier
the rest of the system already uses. The dashboard pulls the label in
the viewer’s language straight from the QID. The same QID-keyed bridge
that joins a region to its treaty memberships and its currency relations
also joins it to its name in Kinyarwanda, in Kannada, or in Kirundi. A
district health team in Karnataka reads its map in Kannada the way it
already speaks. Srinivasan (2019)
argues that the language of presentation shapes who can participate in
interpretation; multilingual mapping is therefore an enfranchisement
feature. The officer in Kigali reads her dashboard in Kinyarwanda
because that is the language she thinks in, and because Kinyarwanda is
the language the hundred community health workers she will hand it off
to read.

A week later, a quiet alarm. The officer wants
something her routine reporting cycle cannot give her: an early signal,
ahead of the monthly DHIS2 figures that will eventually show what has
already happened. The DHIS2 instance she already runs is the standard
ministry-of-health stack across much of Africa. Clinics file monthly
returns, the numbers move up the chain, and a district aggregate appears
weeks after the spike began. That is one kind of surveillance, and it is
the only kind she has. The capability we are building with her now is a
second kind that sits alongside it. She opens Slack and asks: “Let
me know if local news anywhere in my districts starts reporting fever
clusters, pneumonia spikes, or hospital overcrowding.” The agent
sets up a watch, and most days the watch stays quiet, because outbreaks
are rare and so are the news patterns that precede them. Under the hood
the watch leans on GDELT,
the open project that aggregates print, broadcast, and web news from
nearly every country in over a hundred languages on a fifteen-minute
cadence and extracts structured events from the prose. The agent filters
that stream by the CAMEO event codes that flag health emergencies,
geocodes each surfaced story to a place name, threads the name through
the canonical hierarchy via the graph’s FIPS-to-ISO crosswalk, and
matches the result against her district list. On the morning three of
her districts light up, the agent lays the news flags against her DHIS2
fever incidence from the last reporting month and pings her: where
is the news bright and the surveillance quiet? Two of the three are
reporting-lag artifacts she already knows. The third surprises her. By
afternoon a team is on the road. This is what the World Health
Organization calls event-based
surveillance: the open record of what local journalism is saying,
joined to the quiet record of what the state has measured, at the scale
where a vehicle can leave today. GDELT and BigQuery are the plumbing;
the question the officer asked was a public-health question.

PART V — THE PHILOSOPHICAL
FRAME

9. Knowledge-Based Agents,
Revisited

Russell & Norvig (2020, Ch. 7) defined a
knowledge-based agent as one that maintains a knowledge base and
operates through two primitives: TELL, which adds sentences
to the KB, and ASK, which queries it. Fifty years of the
field’s work have lived inside that skeleton. What we have built is the
skeleton filled in at present-day scale, with a natural-language
front.

TELL is the ingest side. The agent meets a new client’s
warehouse and reads its schemas, the way a new hire opens the shared
drive on her first Monday. A researcher somewhere registers a new
decomposition under a fresh set_name. A community editor on
Wikidata adds an edge in the small hours, and by morning it has landed
in the concept_qid_mapping table. Each of these is a
TELL.

ASK is the other side. A Slack prompt becomes a SQL
join. The join becomes a rollup, gated by the 95% coverage guard. The
rollup becomes a choropleth that comes back to the asker in whatever
tongue she thinks in: Kinyarwanda, Kannada, or Kinyarwanda’s Burundian
cousin Kirundi.

What the LLM contributes to this pattern is natural language,
graceful handling of ambiguity, generalization from a handful of
examples, and the ability to explain its reasoning in the user’s own
register. The officer in Kigali asks in English, French, or Kinyarwanda,
and the LLM figures out the shape of the query, handling the SQL,
SPARQL, and schema on her behalf.

What the knowledge graph contributes is the discipline that structure
brings. Every fact the agent cites traces to a sourced row, which is
soundness in the formal sense. Rwanda is Q1037 in every
conversation; the third district from the south is always spelled
Kicukiro, letter for letter, regardless of the random seed that steers
the LLM that morning. That is consistency. Decompositions carry
vintages, so a 1985 query of Europe’s sovereign countries returns a 1985
map, a temporal grounding the LLM cannot drift away from. And every
polygon the agent names has been ingested as a geometry from a sourced
shapefile, a spatial grounding the model cannot fabricate at runtime.
The LLM reasons; the graph ensures there is something to reason about.
Garcez & Lamb’s (2023) third wave
is a usable architecture only because each half checks the other. This
post has been, throughout, a catalog of the ways that check gets
cashed.

10. The
Capability Approach: Data Access as Freedom

One last reading holds everything together. Amartya Sen (1999)
argued that development should be measured by the substantive freedoms
(capabilities, in his vocabulary) that people have. Giving a health team
a dataset is providing a resource. Giving them the capability to
interrogate that dataset, to ask their own spatial questions, generate
their own maps, test their own hypotheses, is providing a freedom.

In most health systems today, spatial analysis flows along a
dependency model. District teams collect the data and pass it up the
line; somewhere at headquarters, or in a contracted office on another
continent, a GIS specialist turns it into maps; weeks or months later
those maps come back down as static PDFs, answering questions that
someone other than the data-collectors had formulated. The team that
knows the territory best is the last to hear what its own data is
saying.

The spatial data product inverts this flow. The officer in Kigali
asks her own question at her own keyboard, and the knowledge graph holds
still long enough for her to interrogate it. That is the difference Sen
means between having a resource and having a capability: the freedom to
use the data, over and above the data itself.

Nussbaum (2011) makes
this operational. Her list of central capabilities includes practical
reason, the ability to form a conception of the good and reflect
critically on the planning of one’s life. Practical reason requires an
inspectable evidence base, the kind a spatial knowledge graph is
designed to be.

Drèze and Sen (2013)
extended the argument to democratic accountability: a functioning
democracy requires public data that ordinary people can access and
interpret. When data is locked in specialist systems, claims about a
community stay sequestered from the people who live in it. A
self-service spatial knowledge graph, inspectable to the QID, is
democratic infrastructure in a direct and literal sense.

Chinua Achebe, asked in 1994 what
literature was for, quoted an older African proverb: until the
lions have their own historians, the history of the hunt will always
glorify the hunter. The spatial knowledge graph is one step toward
giving the lions their historians, toward letting the officer in Kigali,
her community health workers walking between villages, and the Basque
midwife in Bayonne tell the history of their own places.

The Behavioral
Science Path to Political Nihilism post argued that people disengage
from systems they feel powerless to understand. The spatial data product
is a small antidote to that disengagement. It lets the officer, the
community health worker, the district planner, and the civic journalist
look at the same evidence base, with the same tools, and contest each
other’s reading of it.

PART VI — CONCLUSION

11. The Map as
Democratic Infrastructure

Come back once more to Kigali. The long rains are the same, the motos
still idling in the traffic below. But the officer’s quarterly report is
done. She typed her question into Slack two hours ago (which of my
416 sectors have the highest population but the lowest DPT3
coverage?) and the answer came back: a choropleth in Kinyarwanda
showing twelve sectors in red, ranked. She has already forwarded it to
her district supervisor and to the community health workers who cover
those twelve zones. The question that, a week ago, she could not pose at
all is now the first thing she does with her morning coffee.

A well-designed spatial data product is democratic infrastructure.
Tufte’s design conventions are applied throughout, so each map carries
as much information as its pixels can bear. The labels render in the
viewer’s language, meeting her where she already reads. Historical and
alternative geographies sit alongside the modern administrative map, so
the land discloses itself as a palimpsest rather than a single chart.
The granularity matches the scale at which decisions are actually taken:
sector, not country. And the whole thing is built as a product:
pre-computed, validated, self-service.

The measure of such a product is whether it puts the question in the
hands of the person who already knows the territory. A district planner
who has worked the same health system for twenty years should not have
to wait for a GIS specialist in another country to render her own
sectors into a map for her. The day she can interrogate the data that
describes those sectors herself, at her own keyboard, in the language
she thinks in, the resource becomes a capability in Sen’s sense, and the
gap between those who generate evidence and those who are described by
it starts to close.

This is the substrate we build at CollectiWise: spatial knowledge
graphs that let public-health teams ask their planning and
resource-allocation questions in plain English, and receive answers
grounded in the right geography at the right scale. If that describes
work you are trying to do, we
would like to hear from you.

The Enlightenment
Spirits at the Pub post reflected on how ideas emerge from unhurried
conversation: people sitting with the same evidence, asking each other
what it means. The spatial knowledge graph is infrastructure for a
different register of the same conversation, between a health team and
the data that describes its own territory, with an agent in the middle
who holds memory across the dialogue, walks a graph that a much wider
community keeps current, and shows its working in whatever language the
reader already reads. The map is not the territory. But a well-made map,
inspectable down to the QID by everyone it describes, is where we begin
to reason about the territory together.

References

Achebe, C. (1994) ‘The Art of Fiction No. 139’ (interview by Jerome Brooks), The Paris Review, Issue 133, Winter 1994. Available at: theparisreview.org.

Ackoff, R. L. (1989) ‘From Data to Wisdom’, Journal of Applied Systems Analysis, 16(1), pp. 3–9. Available at: faculty.ung.edu.

ArchaeoGLOBE Project (Stephens, L., Fuller, D. et al.) (2019) ‘Archaeological assessment reveals Earth’s early transformation through land use’, Science, 365(6456), pp. 897–902. doi:10.1126/science.aax1192.

Burstein, R., Henry, N. J., Collison, M. L. et al. (2019) ‘Mapping 123 million neonatal, infant and child deaths between 2000 and 2017’, Nature, 574(7778), pp. 353–358. doi:10.1038/s41586-019-1545-0.

Drèze, J. and Sen, A. (2013) An Uncertain Glory: India and its Contradictions. Princeton, NJ: Princeton University Press. Available at: press.princeton.edu.

Floridi, L. (2008) ‘The Method of Levels of Abstraction’, Minds and Machines, 18(3), pp. 303–329. doi:10.1007/s11023-008-9113-7.

Forkel, R. et al. (n.d.) Glottography Dataset: Asher & Moseley 2007 World Atlas of Languages (CLDF format). Available at: github.com/glottography/asher2007world.

Fotheringham, A. S. and Wong, D. W. S. (1991) ‘The Modifiable Areal Unit Problem in Multivariate Statistical Analysis’, Environment and Planning A, 23(7), pp. 1025–1044. doi:10.1068/a231025.

Garcez, A. d’A. and Lamb, L. C. (2023) ‘Neurosymbolic AI: The 3rd Wave’, Artificial Intelligence Review, 56(11), pp. 12387–12406. doi:10.1007/s10462-023-10448-w.

GDELT Project (n.d.) Global Database of Events, Language, and Tone (GDELT 2.0). Available at: gdeltproject.org.

HistoGIS / Piechl, A. (compiler), based on Kunz, A. (n.d.) Holy Roman Empire Imperial Circles 1555 dataset. Austrian Centre for Digital Humanities. Available at: histogis.acdh.oeaw.ac.at.

Hogan, A., Blomqvist, E., Cochez, M. et al. (2021) ‘Knowledge Graphs’, ACM Computing Surveys, 54(4), Article 71. doi:10.1145/3447772.

Korzybski, A. (1933) Science and Sanity: An Introduction to Non-Aristotelian Systems and General Semantics. Lancaster, PA: Institute of General Semantics.

Lefebvre, H. (1974/1991) The Production of Space. Translated by D. Nicholson-Smith. Oxford: Basil Blackwell. Available at: archive.org.

Luo, L., Li, Y.-F., Haffari, G. and Pan, S. (2024) ‘Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning’, in Proceedings of ICLR 2024. arXiv:2310.01061.

Nussbaum, M. C. (2011) Creating Capabilities: The Human Development Approach. Cambridge, MA: The Belknap Press of Harvard University Press. Available at: hup.harvard.edu.

Openshaw, S. (1984) The Modifiable Areal Unit Problem. Concepts and Techniques in Modern Geography, No. 38. Norwich: Geo Books. Available at: uio.no.

Openshaw, S. and Taylor, P. J. (1979) ‘A Million or So Correlation Coefficients: Three Experiments on the Modifiable Areal Unit Problem’, in Wrigley, N. (ed.) Statistical Applications in the Spatial Sciences. London: Pion, pp. 127–144. Catalog: Semantic Scholar.

Ouma, P. O. et al. (2018) ‘Access to emergency hospital care provided by the public sector in sub-Saharan Africa in 2015: a geocoded inventory and spatial analysis’, The Lancet Global Health, 6(3), pp. e342–e350. doi:10.1016/S2214-109X(17)30488-6.

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J. and Wu, X. (2024) ‘Unifying Large Language Models and Knowledge Graphs: A Roadmap’, IEEE Transactions on Knowledge and Data Engineering, 36(7), pp. 3580–3599. doi:10.1109/TKDE.2024.3352100.

Reiter, B. (2024) ‘In a world where political polarization and disengagement are denting democracy, does Botswana’s “kgotla” system hold the key?’, The Conversation, 25 November. Available at: theconversation.com.

Robinson, W. S. (1950) ‘Ecological Correlations and the Behavior of Individuals’, American Sociological Review, 15(3), pp. 351–357. Reprinted (2009) in International Journal of Epidemiology, 38(2), pp. 337–341. doi:10.1093/ije/dyn357.

Rowley, J. (2007) ‘The Wisdom Hierarchy: Representations of the DIKW Hierarchy’, Journal of Information Science, 33(2), pp. 163–180. doi:10.1177/0165551506070706.

Russell, S. J. and Norvig, P. (2020) Artificial Intelligence: A Modern Approach. 4th edn. Pearson. Available at: aima.cs.berkeley.edu.

Rwanda Biomedical Center, National Institute of Statistics of Rwanda and ICF (2018) Rwanda Malaria Indicator Survey 2017. Rockville, Maryland, USA: RBC, NISR, and ICF. Available at: dhsprogram.com.

Semakula, M., Niragire, F. and Faes, C. (2020) ‘Bayesian spatio-temporal modeling of malaria risk in Rwanda’, PLOS ONE, 15(9), e0238504. doi:10.1371/journal.pone.0238504.

Sen, A. (1999) Development as Freedom. Oxford: Oxford University Press. Available at: global.oup.com.

Simmel, G. (1908) ‘Die Kreuzung sozialer Kreise’, in Soziologie: Untersuchungen über die Formen der Vergesellschaftung. Leipzig: Duncker & Humblot. English translation (1955): ‘The Web of Group-Affiliations’, in Wolff, K. H. and Bendix, R. (eds.) Conflict and the Web of Group-Affiliations. New York: The Free Press. Wikipedia overview: en.wikipedia.org/wiki/Georg_Simmel.

Srinivasan, R. (2019) Beyond the Valley: How Innovators Around the World Are Overcoming Inequality and Creating the Technologies of Tomorrow. Cambridge, MA: MIT Press. Available at: mitpress.mit.edu.

Sun, J., Xu, C., Tang, L. et al. (2024) ‘Think-on-Graph: Deep and Responsible Reasoning of LLM on Knowledge Graph’, in Proceedings of ICLR 2024. arXiv:2307.07697.

Vrandečić, D. and Krötzsch, M. (2014) ‘Wikidata: A Free Collaborative Knowledgebase’, Communications of the ACM, 57(10), pp. 78–85. doi:10.1145/2629489.

Weiss, D. J. et al. (2020) ‘Global maps of travel time to healthcare facilities’, Nature Medicine, 26, pp. 1835–1838. doi:10.1038/s41591-020-1059-1.

World Health Organization (n.d.) Epidemic Intelligence from Open Sources (EIOS). Available at: who.int/initiatives/eios.

PART I — THE PROBLEM, AND A PREVIEW OF THE ANSWER

1. Impossible Questions

2. The Last-Mile Problem: Why Scale Matters for Public Health

PART II — THE SOLUTION: A SPATIAL KNOWLEDGE GRAPH

3. Architecture of the Geospatial Knowledge Base

3a. The Canonical Region Hierarchy

3b. The Wikidata Bridge

3c. Raster Integration (Google Earth Engine)

3d. Temporal and Statistical Fabric

4. From Data to Knowledge: The Ontological Leap