### **Ontology for the Living Knowledge Map**


This document defines the data ontology for the Narrative Intelligence Platform, nicknamed Research Tool. It creates a "Living Knowledge Map," which is a dynamic, queryable representation of information and its evolution. This ontology moves beyond simple data storage to model context, change, and human insight as first-class citizens.

First, understand these three foundational concept:

1.  **The Source Digital Twin (Pillar 1):** Research Tool doesn't just store data; it preserve its original context. The `:CanonicalSource`, `:SourceVersion`, and `:Element` nodes work together to create a high-fidelity, hierarchical replica of every source artifact. This ensures every piece of information can be traced back to its precise origin, solving the problem of context destruction.

2.  **Pre-Computed Analytical Insights (Pillar 2):** Research Tool doesn't wait for a query to understand meaning. The system pre-calculates the context and dynamics of the data.
    *   **Analytic Spines** (`:Temporal`, `:Geospatial`, `:SeverityLevel`, etc.) are pre-materialized continuums that provide the *axes* for analysis (when, where, how bad).
    *   **Analytic Arcs** (`:AnalyticArc` nodes) are pre-computed, delta-encoded relationships that explicitly model *change* between two states (e.g., growth, acceleration, escalation). This "cognitive offloading" makes analyzing dynamics fast and simple.

3.  **Human-in-the-Loop Verification (Pillar 3):** Research Tool ensures trust through a **Two-Tiered Knowledge System**. AI-generated insights (e.g., a detected change) are treated as *provisional* until a human expert validates them. The `:Annotation` node and `verified` properties across the graph are the mechanisms for capturing this expert judgment, making it a permanent, auditable part of the knowledge map.


#### **Pillar 1: The Source Integrity Layer (The Digital Twin)**

This layer represents the immutable, verifiable evidence from source artifacts.

_**Note for Clarity:** The nodes in this pillar (`:CanonicalSource`, `:SourceVersion`, `:Element`) are not just data containers. They are the building blocks of the **Source Digital Twin**, a core concept in Research TOol. Their relationships (`HAS_VERSION`, `PARENT_OF`, `NEXT_ELEMENT`) reconstruct the structure and sequence of the original document, preserving its context._

**1. `CanonicalSource` Node**

*   **Label:** `:CanonicalSource`
*   **Description:** The abstract, persistent identity of a conceptual document or media series (e.g., "AAPL 10-K Filings," "Patient P-0877's MRI Scans").
*   **Properties:**
    *   `source_id` (string, required, unique): A unique identifier for the conceptual source.
    *   `name` (string, optional): A human-readable name (e.g., "Patient P-0877 MRI Series").
    *   `uri` (string, required): The location of the original media, usually a blob store, an http page, etc.
*   **Relationships:**
    *   `-[HAS_VERSION]->(:SourceVersion)`

**2. `SourceVersion` Node**

*   **Label:** `:SourceVersion`
*   **Description:** A specific, concrete instance of a `CanonicalSource` at a point in time (e.g., "AAPL 10-K for Q2 2025"). This is the primary anchor for version-aware analysis.
*   **Properties:**
    *   `source_id` (string, required, unique): The unique identifier for the original source media.
    *   `unique_id` (string, required, unique): A unique identifier for this specific version.
    *   `version_number` (integer, optional): A sequential version number (e.g., 1, 2, 3).
    *   `effective_date` (datetime, required): The real-world date of the artifact (e.g., filing date, scan date).
    *   `ingestion_date` (datetime, required): The timestamp of when the artifact was processed.
*   **Relationships:**
    *   `<-[HAS_VERSION]-(:CanonicalSource)`
    *   `-[SUPERSEDES]->(:SourceVersion)`: Connects to the previous version.
    *   `-[EFFECTIVE_ON]->(:Day)`: Links to the Temporal Spine.
    *   `-[CONTAINS]->(:Element)`: Connects to all structural elements that make up this version.

**3. `Element` Node**

*   **Label:** `:Element`
*   **Description:** A single, canonical structural element from a source artifact. This is the atomic unit of the digital twin.
*   **Properties:**
    *   `source_id` (string, required, unique): The unique identifier of the source media.
    *   `source_location` (string, required, unique): The location of the source media; can be a blob uri, a http URL, local storage, etc.
    *   `unique_id` (string, required, unique): The primary key, linking elements beetween Neo4j, Elasticsearch, and the vector DB.
    *   `source_type` (string, required): The mime-type or nature of the original artifact (e.g., `pdf`, `html`, `mp4`).
    *   `element_type` (string, required): The type of element, e.g., `paragraph`, `section_header`, `table_cell`, `image`, `video_segment`.
    *   `sequence_number` (int, required): The sequence number of the node in the structure.
    *   `total_elements` (int, required): The total number of elements in the structure.
    *   `parent_id` (string, optional): The unique_id of the element's parent (e.g. a section header).
    *   `preceding_id` (string, optional): The unique_id of the preceding element.
    *   `text_content` (string, optional): The textual content of the element.
    *   `confidence_score` (float, optional): The confidence score (0.0-1.0) from the ingestion process (OCR, transcription).
    *   `uri` (string, optional): A URI to the raw artifact in blob storage (especially for `image` or `video_segment` types).
    *   `hash` (string, optional): A hash of the text_content, useful for quickly evaluating if the text has changed from one version to the other.

*   **Relationships:**
    *   `<-[CONTAINS]-(:SourceVersion)`
    *   `-[PARENT_OF]->(:Element)`: Models the document hierarchy (e.g., a paragraph's parent is a section header).
    *   `-[NEXT_ELEMENT]->(:Element)`: Models the sequential flow of the document. Following these relationships reconstructs the original order.

**4. `Boilerplate` Node**

*   **Label:** `:Boilerplate`
*   **Description:** Similar to an :Element, a :Boilerplate is a single, canonical structural element from a source artifact. This is the atomic unit of the digital twin, build specifically to provide a baseline for comparison with other digital twins.
*   **Properties:**
    *   `source_id` (string, required, unique): The unique identifier of the source media.
    *   `source_location` (string, required, unique): The location of the source media; can be a blob uri, a http URL, local storage, etc.
    *   `unique_id` (string, required, unique): The primary key, linking elements beetween Neo4j, Elasticsearch, and the vector DB.
    *   `source_type` (string, required): The mime-type or nature of the original artifact (e.g., `pdf`, `html`, `mp4`).
    *   `element_type` (string, required): The type of element, e.g., `paragraph`, `section_header`, `table_cell`, `image`, `video_segment`.
    *   `sequence_number` (int, required): The sequence number of the node in the structure.
    *   `total_elements` (int, required): The total number of elements in the structure.
    *   `parent_id` (string, optional): The unique_id of the element's parent (e.g. a section header).
    *   `preceding_id` (string, optional): The unique_id of the preceding element.
    *   `text_content` (string, optional): The textual content of the element.
    *   `confidence_score` (float, optional): The confidence score (0.0-1.0) from the ingestion process (OCR, transcription).
    *   `uri` (string, optional): A URI to the raw artifact in blob storage (especially for `image` or `video_segment` types).
    *   `hash` (string, optional): A hash of the text_content, useful for quickly evaluating if the text has changed from one version to the other.

*   **Relationships:**
    *   `<-[CONTAINS]-(:SourceVersion)`
    *   `-[PARENT_OF]->(:Element)`: Models the document hierarchy (e.g., a paragraph's parent is a section header).
    *   `-[NEXT_ELEMENT]->(:Element)`: Models the sequential flow of the document. Following these relationships reconstructs the original order.

> _**Design Note:** A `:Boilerplate` node has the same structure as an `:Element` node. It is modeled as a separate node type for conceptual clarity and query performance. It allows the system to perform highly efficient "template-based differential analysis" by isolating the baseline text from the evolving documents being compared against it._

### **Pillar 2: The Analytical Layer (Context & Dynamics)**

This is the heart of the system, where raw data is transformed into meaningful, queryable insight. It models not just *what* happened, but *how* and *why*.

_**Note for Clarity:** This pillar introduces Research Tool's most novel concepts. **Analytic Spines** give the data context, while **Analytic Arcs** give it dynamics. This is how we move from storing facts to modeling narratives._

#### **Part A: Detecting Change**

**5. `Change` Node**

*   **Label:** `:Change`
*   **Description:** An automatically generated node that objectively records *the fact that* a difference was detected between two versions of an `:Element` or between an `:Element` and a `:Boilerplate`. It is the "what" of the change detection process.
*   **Properties:**
    *   `unique_id` (string, required, unique): A unique ID for the change event itself.
    *   `change_type` (string, required): An enum describing the mechanical change, e.g., `MODIFICATION`, `ADDITION`, `DELETION`, `MOVE`.
    *   `verified` (bool, optional): Indicates if the enrichment has been verified by a user.
    *   `verified_by` (string, optional): Indicates the user_name of the person doing the verification.
    *   `diff_algorithm` (string, optional): The method used for detection (e.g., `hash_mismatch`, `semantic_similarity > 0.95`).

*   **Relationships:**
    *   `-[APPLIES_TO]->(:SourceVersion)`: Indicates in which version this change appeared.
    *   `-[MODIFIED | ADDED | DELETED | MOVED]->(:Element)`: Points to the `:Element` node(s) affected by the change in the new version.
    *   `-[FROM_ELEMENT]->(:Element)`: In the case of `MODIFICATION` or `MOVE`, points to the original `:Element` in the prior version.
    *   `-[DEVIATES_FROM]->(:Boilerplate)`: In the case of template analysis, this links the change to the specific `:Boilerplate` element it differs from.

> ### **Crucial Concept: From `Change` to `AnalyticArc`**
>
> Understanding the relationship between a `:Change` node and an `:AnalyticArc` is key to understanding Research Tool.
>
> 1.  **Automated Detection (The "What"):** The system's hashing and diffing algorithms automatically create a `:Change` node. It mechanically flags a modification, addition, or deletion. It is objective, fast, and machine-generated. **It finds the needle in the haystack.**
>
> 2.  **Human Interpretation (The "So What"):** An analyst is alerted to the unverified `:Change`. They review it and determine its *semantic meaning*. This act of human validation creates one or more **`:AnalyticArc`** nodes/relationships. **The Analytic Arc tells you how sharp the needle is, what it's made of, and which direction it's pointing.**
>
> This two-step process implements Research Tool's **Two-Tiered Knowledge System**. The machine proposes; the human disposes. The result is a high-fidelity, trustworthy, and auditable analysis of how information evolves.

#### **Part B: Enriching the Digital Twin (Entities and Topics)**

_**Note for Clarity:** The following nodes are used to add semantic layers to the Digital Twin. They are the primary output of the Analysis & Enrichment Engine and are often the subject of human-in-the-loop verification._

**6. `Enrichment` Node**

*   **Label:** `:Enrichment`

*   **Base Properties:**
    *   `unique_id` (string, required, unique): The primary key, linking elements beetween Neo4j, Elasticsearch, and the vector DB.
    *   `enrichment_type` (string, required): The type of enrichment, e.g., `entity`, `topic`, `summary`, `comparison`, `explanation`.
    *   `text_content` (string, optional): The textual content of the entity.
    *   `verified` (bool, optional): Indicates if the enrichment has been verified by a user.
    *   `verified_by` (string, optional): Indicates the user_name of the person doing the verification.
    *   `hash` (string, optional): A hash of the text_content, useful for quickly evaluating if the text has changed from one version to the other.

    ## **Entity Enrichment:** An enrichment element that points to a entity within the structure. An entity is a noun, such as a Person, Company, or Product.

    ### Unique Properties
    *   `entity_type` (string, required): The type of entity, e.g., `person`, `company`, `product`.

    ### Relationships:
    *   `-[IS_*_OF]->(:Enityt`: Can conect entities to other entities. A wildcard indicates the relationship. E.g a Person entity -[IS_CEO_OF]-> Company entity, Person entity -[IS_FRIEND_OF]-> Person entity, Person entity -[IS_SHAREHOLDER_OF]-> Company entity, Person entity -[IS_VICTIM_OF]-> Company entity
    *   `-[APPEARS_IN]->(:Element)`: Indicates which element(s) within or between documents are relevant to the entity

    ## **Topic Enrichment:** An enrichment element that points to a topic within the structure. A versitile and flexible element that can point to a type of content in the document (e.g. a use case) or the nature of the content (e.g. disagreement). 

    ### Unique Properties
    *   `topic_type` (string, required): The type of topic, e.g., `use_case`, `quiz`, `subject`, `argument`, `disagreement`, `healthcare`, `aerospace`, `farming`, `foreign policy`, `war`.

    ### Relationships:
    *   `(:Element)<-[RELATES_TO]->(:Element)`: Can join elements, e.g. an entity and a structure element. In this way we can have a "disagrees with" topic that points to a person entity and a structure element(s)
    *   `-[CONTAINED_IN]->(:Element)`: Indicates which element(s) within or between documents are relevant to the topic.

    ## **Summary Enrichment:** A special enrichment element that points to a entity within the structure. An entity is a noun, such as a Person, Company, or Product.

    ### Unique Properties
    None

    ### Relationships:
    *   `-[SUMMARIZES]->(:Element)`: Indicates which element(s) within or between documents are being summarized.

    ## **Comparison Enrichment:** A special enrichment element that points to a entity within the structure. An entity is a noun, such as a Person, Company, or Product.

    ### Unique Properties
    None

    ### Relationships:
    *   `(:Element)<-[COMPARES]->(:Element)`: Indicates which element(s) within or between documents are being compared

   ## **Explanation Enrichment:** A special enrichment element that points to a entity within the structure. An entity is a noun, such as a Person, Company, or Product.

    ### Unique Properties
    None

    ### Relationships:
    *   `-[EXPLAINS]->(:Element)`: Indicates which element(s) within or between documents are being explained.


#### **Pillar 2: The Analytical Layer (Context & Dynamics)**

_**Note for Clarity:** Analytic Spines are pre-materialized axes of analysis. By materializing them as actual node chains, we make the analytical context itself discoverable and queryable, which is a massive performance and usability improvement over storing these values as simple properties._

**9. `Temporal Spine` Nodes**
*   **Description:** A temporal spine is a pre-created series of nodes that reflect a timeline. It's a special type of spine because, while most spines reflect a numerical continuum, temporal spines reflect a hierarchy of dates using Typically the nodes are :Year -> :Month -> Day with more granular data reflected on the timestamp on the relationship of the element pointing to the node.
*   **Label:** `:Year`
    *   **Properties:**
        *   `year` (integer, required, unique): e.g., `2025`

*   **Label:** `:Month`
    *   **Properties:**
        *   `month_id` (string, required, unique): e.g., `"2025-05"`
        *   `month` (integer, required): e.g., `5`
        *   `year` (integer, required): e.g., `2025`

*   **Label:** `:Day`
    *   **Properties:**
        *   `date` (date, required, unique): The native graph database `date` type, e.g., `2025-05-21`. This is the natural primary key and allows for efficient date-based range queries.
        *   `day_of_week` (string, optional): e.g., `"Wednesday"`
        *   `day` (integer, required): e.g., `21`
        *   `month` (integer, required): e.g., `5`
        *   `year` (integer, required): e.g., `2025`

*   **Hierarchical Relationships (building the spine):**
    *   `(:Day)-[:IN_MONTH]->(:Month)`
    *   `(:Month)-[:IN_YEAR]->(:Year)`

*   **Data Connection Relationship (linking events to the spine):**
    *   `(:Event)-[:OCCURRED_ON]->(:Day)`

### **A Complete Example**

An event, like an MRI scan, that occurs on May 21, 2025, would be connected to the Temporal Spine like this:

```
// The specific Event node
(event:Event {id: "mri_scan_123"})

// The connection to the most granular part of the spine
(event)-[:OCCURRED_ON]->(day:Day {date: date('2025-05-21')})

// The pre-materialized spine hierarchy
(day)-[:IN_MONTH]->(month:Month {month_id: "2025-05"})
(month)-[:IN_YEAR]->(year:Year {year: 2025})
```
 
**10. `GeoSpacial Spine` Nodes**
**Node Label:** `:GeoCell`
*   **Description:** Represents a single, unique cell in the S2 grid system. The entire collection of these nodes forms the pre-materialized Geospatial Spine. The hierarchy from large areas to smaller areas is modeled via `PARENT_OF` relationships between `:GeoCell` nodes of different levels.
*   **Properties:**
    *   `s2_token` (string, required, unique): The compact, URL-safe string representation of the S2 cell ID. This is the natural primary key and should be indexed for fast lookups.
    *   `s2_id` (long, required, unique): The 64-bit integer ID of the S2 cell. Useful for certain library calculations.
    *   `level` (integer, required): The S2 level of the cell (typically 0-30). This allows for filtering by resolution (e.g., "show me all data aggregated at level 6").
    *   `name` (string, optional): A human-readable name, typically applied only to cells that align with well-known geographic features (e.g., "California", "Paris").
    *   `center_lat` (float, optional): The latitude of the cell's center. Storing this is a performance optimization that avoids having to calculate it from the token every time for display purposes.
    *   `center_lon` (float, optional): The longitude of the cell's center.

*   **Relationships:**
    *   `-[PARENT_OF]->(:GeoCell)`: Connects a parent cell to a child cell. For example, a level 5 cell would have a `PARENT_OF` relationship pointing to the four level 6 cells it contains. This is the core of the hierarchy.
    *   `<-[OCCURRED_AT]-(:Event)`: The crucial link from the Analytical Layer. This relationship grounds a specific `:Event` within a precise cell on the Geospatial Spine.

### **Method of Spine Creation (The Pre-Materialization Script)**

The spine is generated by a one-time script, independent of any ingested data. This script would perform the following logical steps:

1.  **Define Geographic Scopes:** Start with a list of large, relevant geographic areas (e.g., continents, countries, states/provinces).
2.  **Generate High-Level Cells:** For each scope (e.g., "USA"), use the S2 library to find the set of cells at a low `level` (e.g., level 4) that covers the area. For each of these S2 cells, create a `:GeoCell` node in the graph database.
3.  **Iteratively Build the Hierarchy:** For each `:GeoCell` node just created, use the S2 library to get its four child cells at the next level (level 5).
    *   Create a new `:GeoCell` node for each child.
    *   Create a `PARENT_OF` relationship from the parent node to the new child node.
4.  **Recurse to Desired Depth:** Repeat step 3, descending level by level, until you reach the desired analytical granularity. A good target is often level 12-14, which represents resolutions from a few square kilometers down to the size of a city block.

The result is a pre-calculated tree (or forest) of `:GeoCell` nodes in your graph, representing the entire world or your areas of interest at multiple resolutions.

### **Use Case (Bringing the Patent to Life)**

This structure directly enables the powerful queries described in the patent (). Let's re-examine the query: *"Find all patient-reported 'gait instability' events of 'high' or 'moderate' severity that occurred in the last quarter within 50 km of a specific physical therapy facility."*

Here's how the S2-based spine makes this efficient:

1.  **Location Lookup:** The system takes the lat/lon coordinates of the physical therapy facility.
2.  **S2 Covering:** Instead of doing a radius search (which is computationally expensive), it uses the S2 library to calculate a "covering" set of S2 cells that approximates the 50km radius at a chosen `level` (e.g., level 8).
3.  **Graph Traversal:** The system now executes a simple graph query:
    *   Find all `:GeoCell` nodes where `s2_token` is in the covering set calculated in step 2.
    *   From this set of `Nearby_Locations`, traverse the incoming `OCCURRED_AT` relationships to find all `Relevant_Events`.
    *   Filter these `Relevant_Events` based on their links to the Temporal Spine (`OCCURRED_ON`) and the "Symptom Severity" Thematic Spine (`HAS_SEVERITY`).

### **Advantages of this S2-based Approach**

*   **Performance:** It converts expensive, on-the-fly geospatial calculations into simple, index-based lookups and fast graph traversals. This is the essence of pre-computation described in the patent.
*   **Native Hierarchy:** The parent-child relationship is a natural and efficient way to model containment (city within a state, state within a country), which is a core requirement.
*   **Multi-Resolution Analysis:** It's trivial to perform analysis at any level. You can "roll up" data by simply querying for events linked to cells at a lower `level` (e.g., `level: 6`) to see a state-wide view, or "drill down" by querying at a higher `level` (e.g., `level: 12`) for a neighborhood view.
*   **Scalability:** The S2 system is designed by Google for planetary-scale indexing and is incredibly efficient, ensuring this approach will scale as the volume of data grows.    

**11. `Analytic Spine` Nodes**
*   **Labels:** A generic label like `:SpineNode` and a specific label, e.g., `:SeverityLevel`.
*   **Description:** A pre-materialized, ordered series of nodes representing an analytical continuum. The order and value are defined by a `rank` property.
*   **Example Node (`:SeverityLevel`):**
    *   **Properties:**
        *   `spine_name` (string, required): "Symptom Severity"
        *   `name` (string, required): e.g., "Severe"
        *   `rank` (integer or float, required): e.g., `3`. This is the crucial property that makes the abstract concept machine-sortable and enables range queries.
        *   `description` (string, optional): "Life-threatening symptoms requiring immediate intervention."
*   **Relationships (within the spine):**
    *   `-[NEXT_LEVEL]->(:SeverityLevel)`: Connects nodes sequentially based on rank (e.g., from rank 2 to rank 3).
*   **Relationships (connecting data to the spine):**
    *   `(:Event)-[:HAS_SEVERITY]->(:SeverityLevel)`

### ** NOTE ON ANALYTIC SPINES **

A continuum within a graph typically has two node types and a relationship with a property:

#### **Typical Method: Materialized Node Continuum**

*   A central `:Event` node.
*   A single `:ThematicSpine` node (e.g., `:ThematicSpine {name: "Symptom Severity"}`).
*   A relationship connecting them: `(:Event)-[:HAS_SEVERITY {value: 3}]->(:ThematicSpine)`

This is clean and simple, but it has significant analytical limitations.

#### **Our Method: Materialized Node Continuum**

This model creates a chain of nodes for the spine itself:

*   A central `:Event` node.
*   A series of specific spine nodes:
    *   `(:SeverityLevel {name: "Mild", rank: 1})`
    *   `(:SeverityLevel {name: "Moderate", rank: 2})`
    *   `(:SeverityLevel {name: "Severe", rank: 3})`
*   Relationships forming the spine: `(:SeverityLevel {rank:1})-[:NEXT_LEVEL]->(:SeverityLevel {rank:2})`
*   A relationship connecting the data to the spine: `(:Event)-[:HAS_SEVERITY]->(:SeverityLevel {rank: 3})`

### **Why Our Method Is Superior**

The analytic spine's premise is built on pre-calculating and materializing context to enable high-performance, complex queries. Our model achieves this; the 'typical' model does not.

#### **Example: Discoverability and Contextual Anchoring**

*   **Question:** "What are the possible levels of severity I can search for?"
    *   **In Typical Model:** This is impossible to answer from the graph alone. You would have to scan every single `:HAS_SEVERITY` relationship to find all the unique `value` properties used. The context is not discoverable.
    *   **In Our Model:** This is a trivial and fast query: `MATCH (s:SeverityLevel) RETURN s.name, s.rank`. **The spine itself becomes a discoverable entry point for analysis.** An analyst can explore the *context* first, then see all the data associated with it. This is a cornerstone of the patent's claims.

#### **Performance on Aggregation and Range Queries**

Our model is vastly more scalable for the types of queries the ontology is meant to support.

*   **Question:** "Find all events where severity was 'Moderate' or higher."
    *   **Typical Model:** The database must find *every* `:Event` node, inspect every outgoing `:HAS_SEVERITY` relationship, and check if its `value` property is `>= 2`. This is a full scan of all relationships of that type and is computationally expensive at scale.
    *   **Our Model:** The query is dramatically more efficient. The database first finds the tiny set of `:SeverityLevel` nodes where `rank >= 2`. From those 2-3 nodes, it then traverses the incoming `:HAS_SEVERITY` relationships to find the relevant events. This is an index-assisted lookup followed by a traversal, which is orders of magnitude faster.

#### **Semantic Richness and Clarity**

*   A node `:SeverityLevel {name: "Severe", rank: 3}` is far more semantically meaningful than a property `{value: 3}`. It explicitly states what `3` means.
*   The `NEXT_LEVEL` relationships between the spine nodes explicitly model the continuum itself, making the graph self-describing and more auditable.

**12. `Event` Node**

*   **Label:** `:Event`
*   **Description:** * A core concept in Research Tool. This nexus entity implements the "hub-and-spoke" model, linking *what* happened (`IS_INSTANCE_OF`) to its full multi-dimensional context: *when* (`OCCURRED_ON`), *where* (`OCCURRED_AT`), *how bad* (`HAS_SEVERITY`), and *what's the proof* (`EVIDENCED_BY`).
*   **Properties:**
    *   `id` (string, required, unique): A unique identifier for the event.
    *   `duration_seconds` (integer, optional): Used for events that have a start and end time.
*   **Relationships:**
    *   `-[IS_INSTANCE_OF]->(SemanticNode)`: e.g., `->(:Symptom {name: "Headache"})`. This defines *what* the event is.
    *   `-[OCCURRED_ON]->(:Day)`: Links to the Temporal Spine.
    *   `-[OCCURRED_AT]->(:GeoCell)`: Links to the Geospatial Spine.
    *   `-[HAS_SEVERITY | HAS_RISK | ...]->(:ThematicSpineNode)`: Links to a relevant Thematic Spine.
    *   `-[EVIDENCED_BY]->(:Element)`: The crucial link back to the source document, providing the grounding evidence for the event.

**13. `Analytic Arc`**

*   **Type:** `:AnalyticArc`
*   **Description:**  The explicit, structural representation of a dynamic change, Research Tool's priimary tool for analyzing evolution. This node **reifies** a relationship, turning the abstract concept of "change" (e.g., growth, acceleration, escalation) into a first-class, queryable data entity. It is the pre-computed, human-validated insight that powers high-performance narrative analysis.

*   **Connects:** `(:State)` to `(:State)`. Any node that can be part of a time series (like `:Event` or `:SourceVersion`) should also receive the `:State` label to conform to the `ndl_graph_construction.md` guide.
*   **Properties:**
    *   `unique_id` (string, required, unique): The primary key for the arc.
    *   `spine` (string, required): The name of the spine the change occurred on (e.g., "TumorVolume", "CorporateRisk").
    *   `type` (string, required): A semantic classification of the change (e.g., `growth`, `decline`, `escalation`, `acceleration`).
    *   `unit` (string, optional): e.g., `"percent"`, `"mm"`, `"USD"`, `"days"`.
    *   `value` (float, required): The quantitative magnitude of the change.
    *   `delta_t` (integer, optional): The time elapsed in seconds/days between the two states.
    *   `change_in_rate` (float, optional): A value representing the change from the *previous* arc's value. For example, if the previous growth was 5% and this growth is 8%, the `change_in_rate` would be `3`. This allows for direct numerical queries on the rate of acceleration itself.
    *   `competes_with` (list, optional): Analysts can disagree on the nature of analytic arcs. This property enables this to be modeled within the ontology (e.g., `competes_with: ["arc_id_123", "arc_id_456"]`) that lists the IDs of the other arcs assessing the same dynamic event. This allows an analyst to query directly for points of expert disagreement
    *   `verified` (bool, required): The verification status true or false.
    *   `verified_by` (string, optional): The ID of the user who validated the arc.

*   **Relationships:**    
    *    `(:State)<-[:HAS_STATE_BEFORE]-(:AnalyticArc)`: Connects the "before" state (e.g., the V1 `:Element`) to the analysis.
    *   `(:State)-[:HAS_STATE_AFTER]->(:AnalyticArc)`: Connects the "after" state (e.g., the V2 `:Element`) to the analysis.
    *   `(:AnalyticArc)-[:COMPETES_WITH]->(:AnalyticArc)`: A direct, queryable, and performant relationship between competing analyses of the same event.
    *   `(:AnalyticArc)-[:JUSTIFIED_BY]->(:Annotation)`: Connects the analysis to the human commentary explaining it.

> _**Design Note on Reification:** We model the `AnalyticArc` as a **node** instead of just a relationship so we can attach rich metadata to it (like `value`, `type`, `verified_by`, etc.) and connect other nodes to it (like `:Annotation`s or competing `:AnalyticArc`s). This graph modeling pattern is essential for enabling the complex, multi-faceted analysis Research Tool performs._

#### **Pillar 3: The Social Layer (Collaboration & Trust)**

_**Note for Clarity:** The `:Annotation` node is the primary mechanism for the **Social Pillar** and the **Human-in-the-Loop Feedback Cycle**. Every time an expert creates an annotation of type `validation`, `correction`, or `classification`, they are not just adding a comment; they are generating high-quality training data and promoting provisional knowledge (Tier 2) to verified truth (Tier 1)._

**14. `Annotation` Node**

*   **Label:** `:Annotation`
*   **Description:** A first-class entity representing a user's input, insight, or judgment.
*   **Properties:**
    *   `unique_id` (string, required, unique): Unique ID for the annotation.
    *   `text_content` (string, required): The user's comment/insight.
    *   `annotation_type` (string, required): e.g., `key_insight`, `question`, `correction`, `validation`, `classification`.
    *   `classification_spine` (string, optional): Makes the annotation machine readable even if the annotation text is ambiguous, e.g. "CorporateRisk" 
    *   `classification_value (int, optional): Makes the annotation machine readable even if the annotation text is ambiguous, e.g. "4" would correspond to a node in the "CorporateRisk" spine
    *   `status` (string, optional): This introduces a workflow and governance layer, e.g. `proposed', 'approved', 'rejected'. 
    *   `createdBy` (string, required): The ID of the user who created it.
    *   `createdAt` (datetime, required): The timestamp of creation.
    *   `hash` (string, optional): A hash of the text_content, useful for quickly evaluating if the text has changed from one version to the other.

*   **Relationships:**
    *   `-[ANNOTATES]->(Any Node or Relationship)`: An annotation can point to a `:Element` node, an `:Event`, or even an `:EVOLVED_ON_SPINE` relationship to capture commentary *about the change itself*.
    *   (:Annotation)-[:REPLIES_TO]->(:Annotation)

### **Key Relationships Summary**

For quick reference, these are some of the most important relationships that drive the system's logic:

*   **Structural Relationships:**
    *   `-[HAS_VERSION]->`, `-[SUPERSEDES]->`: Builds the document version history.
    *   `-[PARENT_OF]->`, `-[NEXT_ELEMENT]->`: Constructs the Source Digital Twin's hierarchy and sequence.
*   **Analytical Context Relationships:**
    *   `-[OCCURRED_ON]->`, `-[OCCURRED_AT]->`: Anchors an `:Event` to the Temporal and Geospatial Spines.
    *   `-[HAS_SEVERITY]->`, `-[HAS_RISK]->`: Links an `:Event` to a Thematic Spine.
    *   `-[EVIDENCED_BY]->`: The crucial link grounding an analytical `:Event` back to its source `:Element`.
*   **Dynamic & Evolutionary Relationships:**
    *   `(:State)<-[:HAS_STATE_BEFORE]-(:AnalyticArc)-[:HAS_STATE_AFTER]->(:State)`: The pattern that connects an `:AnalyticArc` analysis to the two states it describes.
    *   `-[DEVIATES_FROM]->`: Connects a change directly to a `:Boilerplate` element.
*   **Social & Curation Relationships:**
    *   `-[ANNOTATES]->`: Allows a user to attach their insight to any node or relationship in the graph.
    *   `-[COMPETES_WITH]->`: Explicitly models expert disagreement between two `:AnalyticArc` analyses.
