Back to all guides
Environment

Why use a Graph Database?

Understanding why graph databases are essential for identity, authorization, and the IndyKite platform.

What is a Graph Database?

A graph database stores data as nodes, relationships, and properties instead of tables and rows. This structure mirrors how data exists in the real world—as interconnected entities with meaningful relationships.

Think of it as sketching ideas on a whiteboard: you draw circles (entities) and connect them with arrows (relationships). A graph database stores data exactly that way.

What are the core components?

Component Description Example
Node An entity or object in your domain Person, Document, Application
Label A tag that classifies nodes into groups :User, :Admin, :Resource
Relationship A named, directed connection between two nodes OWNS, MEMBER_OF, CAN_ACCESS
Property A key-value pair on nodes or relationships email: "alice@example.com"

Why are relationships important?

In graph databases, relationships are first-class citizens—they are stored natively alongside nodes, not computed at query time.

This is fundamentally different from relational databases, where relationships are represented as foreign keys and computed through JOIN operations.

What does "first-class citizen" mean?

  • Stored natively: Each node physically points to its connected nodes.
  • Always available: Relationships exist as persistent data, not derived from keys.
  • Rich with properties: Relationships can have their own attributes (e.g., GRANTED_ON: "2024-01-15").
  • Directional: Every relationship has a start node, end node, and type.

How do graph databases compare to relational databases?

Why do relational databases struggle with connected data?

Relational databases were designed for structured, tabular data. When data becomes highly connected, they face significant challenges:

Challenge Relational Database Graph Database
Representing relationships Foreign keys in separate tables Native relationship objects
Querying relationships JOIN operations across tables Direct traversal between nodes
Multi-hop queries Multiple JOINs, exponential complexity Simple path traversal
Schema changes ALTER TABLE, migrations, downtime Add nodes/relationships on the fly
Performance at scale Degrades with more JOINs Consistent regardless of data size

What is the JOIN problem?

In relational databases, answering the question "Can Alice access Document X?" might require:

  1. Query the users table to find Alice.
  2. JOIN with user_groups to find her groups.
  3. JOIN with group_roles to find roles for those groups.
  4. JOIN with role_permissions to find permissions for those roles.
  5. JOIN with resource_permissions to check if any permission grants access to Document X.

Each JOIN adds latency and computational cost. As the number of hops increases, performance degrades exponentially.

How does a graph database solve this?

The same query in a graph database:

MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF]->(:Group)-[:HAS_ROLE]->(:Role)-[:CAN_ACCESS]->(doc:Document {id: "X"})
RETURN doc

The graph database traverses from Alice through her groups, roles, and permissions in a single operation. Performance remains consistent regardless of data size because it only visits the nodes connected to Alice—not the entire dataset.

When should I use a graph database?

Graph databases excel when relationships between data points matter more than the individual points themselves.

What are the ideal use cases?

Use Case Why Graphs Work Better
Identity & Access Management Users, groups, roles, permissions form a natural hierarchy of relationships
Authorization Access decisions depend on traversing relationship paths
Fraud Detection Detecting fraud requires finding suspicious patterns across connected entities
Recommendation Engines "Users who liked X also liked Y" is a relationship query
Knowledge Graphs Representing semantic relationships between concepts
Network & IT Operations Infrastructure components are interconnected by nature
Master Data Management Connecting data across siloed systems

When are relational databases still appropriate?

  • Primarily tabular data with few relationships.
  • Simple CRUD operations on isolated records.
  • Transactional systems with rigid schemas.
  • Reporting and analytics on flat data.

Why is a graph database essential for authorization?

Authorization is fundamentally about relationships:

  • Does this user have a relationship to this resource that permits this action?
  • Is this user a member of a group that has a role with permission to access this resource?

These questions are naturally expressed as graph traversals.

What problems do traditional authorization systems have?

Problem With Relational DB With Graph DB
Role explosion Create thousands of roles to cover all combinations Model relationships directly, no artificial roles needed
Policy sprawl Maintain long ACLs for every resource Define policies as graph patterns
Complex hierarchies Recursive queries or denormalization Natural hierarchy traversal
Real-time decisions Cached permissions, stale data Live traversal of current state
Audit trails Difficult to trace decision path Clear, traceable graph paths

What is the Identity Knowledge Graph (IKG)?

The Identity Knowledge Graph is IndyKite's graph database that stores all identity and resource data. It is the foundation of the IndyKite platform.

What does the IKG contain?

  • Human identities: Users, customers, employees with their attributes.
  • Non-human identities: Applications, services, devices, AI agents.
  • Resources: Documents, APIs, data, any protected asset.
  • Relationships: How all these entities connect to each other.
  • Context: Attributes, metadata, and environmental data.

Why is the IKG important for IndyKite?

Every IndyKite capability operates on the IKG:

Capability How it uses the IKG
Capture Stores nodes and relationships in the graph
KBAC (AuthZEN) Evaluates policies by traversing graph relationships
ContX IQ (CIQ) Reads and updates graph data with authorization
Token Introspect Maps token claims to nodes in the graph
Outbound Events Triggers events when graph data changes

Why is a graph database important for KBAC?

Knowledge-Based Access Control (KBAC) is IndyKite's authorization model that uses the IKG to make intelligent, context-aware access decisions.

How does KBAC use the graph?

KBAC policies are defined as graph patterns. When an authorization request arrives, IndyKite:

  1. Maps the request to nodes in the IKG (subject, resource, action).
  2. Evaluates the policy by traversing relationships in the graph.
  3. Returns a decision based on whether the pattern exists.

Example: Can Alice drive Car X?

Policy definition:

{
	"subject": { "type": "Person" },
	"actions": ["CAN_DRIVE"],
	"resource": { "type": "Car" },
	"condition": {
		"cypher": "MATCH (subject:Person)-[:OWNS]->(resource:Car)"
	}
}

This policy says: A Person can CAN_DRIVE a Car if there is an OWNS relationship between them.

The graph database traverses from Alice to Car X, looking for an OWNS relationship. If it exists, access is granted. If not, access is denied.

Why can't relational databases do this efficiently?

  • Variable-depth traversals: "Can Alice access any document in any project she's a member of?" requires recursive queries.
  • Multiple relationship types: Different paths may grant access (direct ownership, group membership, role assignment).
  • Real-time evaluation: Authorization must be checked at request time, not from cached data.
  • Contextual attributes: Decisions may depend on attributes along the path, not just endpoints.

Why is a graph database important for CIQ?

ContX IQ (CIQ) delivers authorized data retrieval and mutation. It uses the graph to:

  • Define what data can be accessed: Policies specify graph patterns.
  • Query related data: Knowledge Queries traverse relationships to find results.
  • Update connected data: Create or modify nodes and relationships in context.

Example: Get license plates for a person's vehicles

Policy pattern:

MATCH (person:Person)-[:ACCEPTED]->(contract:Contract)-[:COVERS]->(vehicle:Vehicle)-[:HAS]->(ln:LicenseNumber)

This query traverses from a Person through their Contracts to covered Vehicles and their License Numbers. The graph database follows these relationships efficiently, regardless of how many contracts, vehicles, or license numbers exist in the system.

How does graph performance scale?

Why is graph traversal fast?

Graph databases use index-free adjacency: each node directly references its connected nodes. This means:

  • No index lookups: Relationships are stored as direct pointers.
  • Local traversal: Queries only visit relevant nodes, not the entire dataset.
  • Constant time per hop: Adding more data doesn't slow down individual traversals.

How does this compare to JOIN performance?

Aspect Relational JOIN Graph Traversal
Time complexity O(n × m) for each JOIN O(k) where k = nodes visited
Multi-hop queries Exponential slowdown Linear with path length
Index dependency Requires careful index design Built-in via adjacency
Dataset growth JOINs slow as tables grow Unaffected by total size

Graph databases can traverse millions of relationships per second, maintaining consistent performance year over year.

How does schema flexibility help?

What is schema-optional?

Graph databases like Neo4j are schema-optional: you can add new node types, relationship types, and properties without schema migrations.

Why does this matter for identity and authorization?

  • Evolving requirements: Add new entity types (AI agents, IoT devices) without restructuring.
  • Heterogeneous data: Different nodes can have different properties.
  • Integration: Connect data from multiple sources with different schemas.
  • Rapid iteration: Model changes don't require downtime or migrations.

Example: Adding AI agents to your system

With a relational database, adding AI agents as a new identity type might require:

  1. Creating new tables (ai_agents, ai_agent_permissions).
  2. Modifying existing tables to reference the new tables.
  3. Updating all JOIN queries to include the new tables.
  4. Running migrations and potentially causing downtime.

With a graph database:

  1. Create nodes with the :AIAgent label.
  2. Create relationships to existing resources.
  3. Existing queries continue to work.
  4. New queries can traverse to AI agents immediately.

What is Cypher?

Cypher is Neo4j's declarative query language for graph databases. It is designed to express graph patterns intuitively.

How does Cypher work?

Cypher uses ASCII art-like syntax to describe graph patterns:

  • (node) — Represents a node
  • -[:RELATIONSHIP]-> — Represents a directed relationship
  • {property: value} — Filters by property

Example queries

Find all documents Alice can access:

MATCH (alice:User {email: "alice@example.com"})-[:CAN_ACCESS]->(doc:Document)
RETURN doc

Find access through group membership:

MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF]->(group:Group)-[:CAN_ACCESS]->(doc:Document)
RETURN doc

Variable-length path (any depth):

MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF*1..5]->(group:Group)-[:CAN_ACCESS]->(doc:Document)
RETURN doc

How does IndyKite use Cypher?

IndyKite KBAC and CIQ policies use Cypher in the condition.cypher field to define graph patterns. When a policy is evaluated, IndyKite executes the Cypher pattern against the IKG to determine access.

How does the IKG support real-time decisions?

What problem does real-time authorization solve?

Traditional authorization systems often use cached permissions:

  • Permissions are computed periodically and stored.
  • Changes take time to propagate.
  • Stale data can grant access that should be revoked.

How does the IKG enable real-time decisions?

The IKG maintains the current state of all entities and relationships:

  • Live data: Authorization queries traverse current relationships.
  • Immediate revocation: Delete a relationship, access is revoked instantly.
  • Contextual factors: Evaluate time, location, device at request time.
  • No cache invalidation: No need to manage permission caches.

What IKG options does IndyKite provide?

When creating a Project in IndyKite, you choose how to provision your IKG:

Option Description Best for
Managed IKG IndyKite hosts and manages the Neo4j database Quick start, no database management overhead
Bring Your Own DB Connect your own Neo4j instance Existing Neo4j investment, custom requirements

How do I connect my own Neo4j database?

Provide the connection details when creating your Project:

  • URL: Neo4j connection string (e.g., neo4j+s://xxxxx.databases.neo4j.io)
  • Username: Database user
  • Password: Database password
  • Database name: The specific database to use

You can get a free Neo4j instance from:

Summary: Why graphs for IndyKite?

The graph database is essential to IndyKite because:

Requirement Why Graphs Deliver
Identity relationships Users, groups, roles, resources are naturally connected
Authorization decisions Access depends on relationship paths, not table rows
Real-time evaluation Traverse current state, not cached permissions
Policy flexibility Express rules as graph patterns (Cypher)
Performance at scale Millions of traversals per second, consistent latency
Schema evolution Add new identity types without migrations
Context awareness Attributes on nodes and relationships inform decisions
Auditability Clear paths show why access was granted or denied

Next Steps