What is a Graph Database?

A graph database stores data as nodes, relationships, and properties instead of tables and rows. This structure mirrors how data exists in the real world—as interconnected entities with meaningful relationships.

Think of it as sketching ideas on a whiteboard: you draw circles (entities) and connect them with arrows (relationships). A graph database stores data exactly that way.

What are the core components?

Component	Description	Example
Node	An entity or object in your domain	`Person`, `Document`, `Application`
Label	A tag that classifies nodes into groups	`:User`, `:Admin`, `:Resource`
Relationship	A named, directed connection between two nodes	`OWNS`, `MEMBER_OF`, `CAN_ACCESS`
Property	A key-value pair on nodes or relationships	`email: "alice@example.com"`

Why are relationships important?

In graph databases, relationships are first-class citizens—they are stored natively alongside nodes, not computed at query time.

This is fundamentally different from relational databases, where relationships are represented as foreign keys and computed through JOIN operations.

What does "first-class citizen" mean?

Stored natively: Each node physically points to its connected nodes.
Always available: Relationships exist as persistent data, not derived from keys.
Rich with properties: Relationships can have their own attributes (e.g., GRANTED_ON: "2024-01-15").
Directional: Every relationship has a start node, end node, and type.

How do graph databases compare to relational databases?

Why do relational databases struggle with connected data?

Relational databases were designed for structured, tabular data. When data becomes highly connected, they face significant challenges:

Challenge	Relational Database	Graph Database
Representing relationships	Foreign keys in separate tables	Native relationship objects
Querying relationships	JOIN operations across tables	Direct traversal between nodes
Multi-hop queries	Multiple JOINs, exponential complexity	Simple path traversal
Schema changes	ALTER TABLE, migrations, downtime	Add nodes/relationships on the fly
Performance at scale	Degrades with more JOINs	Consistent regardless of data size

What is the JOIN problem?

In relational databases, answering the question "Can Alice access Document X?" might require:

Query the users table to find Alice.
JOIN with user_groups to find her groups.
JOIN with group_roles to find roles for those groups.
JOIN with role_permissions to find permissions for those roles.
JOIN with resource_permissions to check if any permission grants access to Document X.

Each JOIN adds latency and computational cost. As the number of hops increases, performance degrades exponentially.

How does a graph database solve this?

The same query in a graph database:

MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF]->(:Group)-[:HAS_ROLE]->(:Role)-[:CAN_ACCESS]->(doc:Document {id: "X"})
RETURN doc

The graph database traverses from Alice through her groups, roles, and permissions in a single operation. Performance remains consistent regardless of data size because it only visits the nodes connected to Alice—not the entire dataset.

When should I use a graph database?

Graph databases excel when relationships between data points matter more than the individual points themselves.

What are the ideal use cases?

Use Case	Why Graphs Work Better
Identity & Access Management	Users, groups, roles, permissions form a natural hierarchy of relationships
Authorization	Access decisions depend on traversing relationship paths
Fraud Detection	Detecting fraud requires finding suspicious patterns across connected entities
Recommendation Engines	"Users who liked X also liked Y" is a relationship query
Knowledge Graphs	Representing semantic relationships between concepts
Network & IT Operations	Infrastructure components are interconnected by nature
Master Data Management	Connecting data across siloed systems

When are relational databases still appropriate?

Primarily tabular data with few relationships.
Simple CRUD operations on isolated records.
Transactional systems with rigid schemas.
Reporting and analytics on flat data.

Why is a graph database essential for authorization?

Authorization is fundamentally about relationships:

Does this user have a relationship to this resource that permits this action?
Is this user a member of a group that has a role with permission to access this resource?

These questions are naturally expressed as graph traversals.

What problems do traditional authorization systems have?

Problem	With Relational DB	With Graph DB
Role explosion	Create thousands of roles to cover all combinations	Model relationships directly, no artificial roles needed
Policy sprawl	Maintain long ACLs for every resource	Define policies as graph patterns
Complex hierarchies	Recursive queries or denormalization	Natural hierarchy traversal
Real-time decisions	Cached permissions, stale data	Live traversal of current state
Audit trails	Difficult to trace decision path	Clear, traceable graph paths

What is the Identity Knowledge Graph (IKG)?

The Identity Knowledge Graph is IndyKite's graph database that stores all identity and resource data. It is the foundation of the IndyKite platform.

What does the IKG contain?

Human identities: Users, customers, employees with their attributes.
Non-human identities: Applications, services, devices, AI agents.
Resources: Documents, APIs, data, any protected asset.
Relationships: How all these entities connect to each other.
Context: Attributes, metadata, and environmental data.

Why is the IKG important for IndyKite?

Every IndyKite capability operates on the IKG:

Capability	How it uses the IKG
Capture	Stores nodes and relationships in the graph
KBAC (AuthZEN)	Evaluates policies by traversing graph relationships
ContX IQ (CIQ)	Reads and updates graph data with authorization
Token Introspect	Maps token claims to nodes in the graph
Outbound Events	Triggers events when graph data changes

Why is a graph database important for KBAC?

Knowledge-Based Access Control (KBAC) is IndyKite's authorization model that uses the IKG to make intelligent, context-aware access decisions.

How does KBAC use the graph?

KBAC policies are defined as graph patterns. When an authorization request arrives, IndyKite:

Maps the request to nodes in the IKG (subject, resource, action).
Evaluates the policy by traversing relationships in the graph.
Returns a decision based on whether the pattern exists.

Example: Can Alice drive Car X?

Policy definition:

{
	"subject": { "type": "Person" },
	"actions": ["CAN_DRIVE"],
	"resource": { "type": "Car" },
	"condition": {
		"cypher": "MATCH (subject:Person)-[:OWNS]->(resource:Car)"
	}
}

This policy says: A Person can CAN_DRIVE a Car if there is an OWNS relationship between them.

The graph database traverses from Alice to Car X, looking for an OWNS relationship. If it exists, access is granted. If not, access is denied.

Why can't relational databases do this efficiently?

Variable-depth traversals: "Can Alice access any document in any project she's a member of?" requires recursive queries.
Multiple relationship types: Different paths may grant access (direct ownership, group membership, role assignment).
Real-time evaluation: Authorization must be checked at request time, not from cached data.
Contextual attributes: Decisions may depend on attributes along the path, not just endpoints.

Why is a graph database important for CIQ?

ContX IQ (CIQ) delivers authorized data retrieval and mutation. It uses the graph to:

Define what data can be accessed: Policies specify graph patterns.
Query related data: Knowledge Queries traverse relationships to find results.
Update connected data: Create or modify nodes and relationships in context.

Example: Get license plates for a person's vehicles

Policy pattern:

MATCH (person:Person)-[:ACCEPTED]->(contract:Contract)-[:COVERS]->(vehicle:Vehicle)-[:HAS]->(ln:LicenseNumber)

This query traverses from a Person through their Contracts to covered Vehicles and their License Numbers. The graph database follows these relationships efficiently, regardless of how many contracts, vehicles, or license numbers exist in the system.

How does graph performance scale?

Why is graph traversal fast?

Graph databases use index-free adjacency: each node directly references its connected nodes. This means:

No index lookups: Relationships are stored as direct pointers.
Local traversal: Queries only visit relevant nodes, not the entire dataset.
Constant time per hop: Adding more data doesn't slow down individual traversals.

How does this compare to JOIN performance?

Aspect	Relational JOIN	Graph Traversal
Time complexity	O(n × m) for each JOIN	O(k) where k = nodes visited
Multi-hop queries	Exponential slowdown	Linear with path length
Index dependency	Requires careful index design	Built-in via adjacency
Dataset growth	JOINs slow as tables grow	Unaffected by total size

Graph databases can traverse millions of relationships per second, maintaining consistent performance year over year.

How does schema flexibility help?

What is schema-optional?

Graph databases like Neo4j are schema-optional: you can add new node types, relationship types, and properties without schema migrations.

Why does this matter for identity and authorization?

Evolving requirements: Add new entity types (AI agents, IoT devices) without restructuring.
Heterogeneous data: Different nodes can have different properties.
Integration: Connect data from multiple sources with different schemas.
Rapid iteration: Model changes don't require downtime or migrations.

Example: Adding AI agents to your system

With a relational database, adding AI agents as a new identity type might require:

Creating new tables (ai_agents, ai_agent_permissions).
Modifying existing tables to reference the new tables.
Updating all JOIN queries to include the new tables.
Running migrations and potentially causing downtime.

With a graph database:

Create nodes with the :AIAgent label.
Create relationships to existing resources.
Existing queries continue to work.
New queries can traverse to AI agents immediately.

What is Cypher?

Cypher is Neo4j's declarative query language for graph databases. It is designed to express graph patterns intuitively.

How does Cypher work?

Cypher uses ASCII art-like syntax to describe graph patterns:

(node) — Represents a node
-[:RELATIONSHIP]-> — Represents a directed relationship
{property: value} — Filters by property

Example queries

Find all documents Alice can access:

MATCH (alice:User {email: "alice@example.com"})-[:CAN_ACCESS]->(doc:Document)
RETURN doc

Find access through group membership:

MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF]->(group:Group)-[:CAN_ACCESS]->(doc:Document)
RETURN doc

Variable-length path (any depth):

MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF*1..5]->(group:Group)-[:CAN_ACCESS]->(doc:Document)
RETURN doc

How does IndyKite use Cypher?

IndyKite KBAC and CIQ policies use Cypher in the condition.cypher field to define graph patterns. When a policy is evaluated, IndyKite executes the Cypher pattern against the IKG to determine access.

How does the IKG support real-time decisions?

What problem does real-time authorization solve?

Traditional authorization systems often use cached permissions:

Permissions are computed periodically and stored.
Changes take time to propagate.
Stale data can grant access that should be revoked.

How does the IKG enable real-time decisions?

The IKG maintains the current state of all entities and relationships:

Live data: Authorization queries traverse current relationships.
Immediate revocation: Delete a relationship, access is revoked instantly.
Contextual factors: Evaluate time, location, device at request time.
No cache invalidation: No need to manage permission caches.

What IKG options does IndyKite provide?

When creating a Project in IndyKite, you choose how to provision your IKG:

Option	Description	Best for
Managed IKG	IndyKite hosts and manages the Neo4j database	Quick start, no database management overhead
Bring Your Own DB	Connect your own Neo4j instance	Existing Neo4j investment, custom requirements

How do I connect my own Neo4j database?

Provide the connection details when creating your Project:

URL: Neo4j connection string (e.g., neo4j+s://xxxxx.databases.neo4j.io)
Username: Database user
Password: Database password
Database name: The specific database to use

You can get a free Neo4j instance from:

Summary: Why graphs for IndyKite?

The graph database is essential to IndyKite because:

Requirement	Why Graphs Deliver
Identity relationships	Users, groups, roles, resources are naturally connected
Authorization decisions	Access depends on relationship paths, not table rows
Real-time evaluation	Traverse current state, not cached permissions
Policy flexibility	Express rules as graph patterns (Cypher)
Performance at scale	Millions of traversals per second, consistent latency
Schema evolution	Add new identity types without migrations
Context awareness	Attributes on nodes and relationships inform decisions
Auditability	Clear paths show why access was granted or denied

Next Steps

Environment setup: Environment Guide
Capture data: Developer Hub Resources
KBAC policies: Dynamic Authorization Guide
CIQ queries: ContX IQ Guide
AuthZEN: AuthZEN Guide
Terraform: Terraform Guide
Neo4j documentation: Neo4j Getting Started
Cypher reference: Cypher Manual

Why use a Graph Database?