What is a Graph Database?
A graph database stores data as nodes, relationships, and properties instead of tables and rows. This structure mirrors how data exists in the real world—as interconnected entities with meaningful relationships.
Think of it as sketching ideas on a whiteboard: you draw circles (entities) and connect them with arrows (relationships). A graph database stores data exactly that way.
What are the core components?
| Component | Description | Example |
| Node | An entity or object in your domain | Person, Document, Application |
| Label | A tag that classifies nodes into groups | :User, :Admin, :Resource |
| Relationship | A named, directed connection between two nodes | OWNS, MEMBER_OF, CAN_ACCESS |
| Property | A key-value pair on nodes or relationships | email: "alice@example.com" |
Why are relationships important?
In graph databases, relationships are first-class citizens—they are stored natively alongside nodes, not computed at query time.
This is fundamentally different from relational databases, where relationships are represented as foreign keys and computed through JOIN operations.
What does "first-class citizen" mean?
- Stored natively: Each node physically points to its connected nodes.
- Always available: Relationships exist as persistent data, not derived from keys.
- Rich with properties: Relationships can have their own attributes (e.g.,
GRANTED_ON: "2024-01-15"). - Directional: Every relationship has a start node, end node, and type.
How do graph databases compare to relational databases?
Why do relational databases struggle with connected data?
Relational databases were designed for structured, tabular data. When data becomes highly connected, they face significant challenges:
| Challenge | Relational Database | Graph Database |
| Representing relationships | Foreign keys in separate tables | Native relationship objects |
| Querying relationships | JOIN operations across tables | Direct traversal between nodes |
| Multi-hop queries | Multiple JOINs, exponential complexity | Simple path traversal |
| Schema changes | ALTER TABLE, migrations, downtime | Add nodes/relationships on the fly |
| Performance at scale | Degrades with more JOINs | Consistent regardless of data size |
What is the JOIN problem?
In relational databases, answering the question "Can Alice access Document X?" might require:
- Query the
userstable to find Alice. - JOIN with
user_groupsto find her groups. - JOIN with
group_rolesto find roles for those groups. - JOIN with
role_permissionsto find permissions for those roles. - JOIN with
resource_permissionsto check if any permission grants access to Document X.
Each JOIN adds latency and computational cost. As the number of hops increases, performance degrades exponentially.
How does a graph database solve this?
The same query in a graph database:
MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF]->(:Group)-[:HAS_ROLE]->(:Role)-[:CAN_ACCESS]->(doc:Document {id: "X"})
RETURN doc
The graph database traverses from Alice through her groups, roles, and permissions in a single operation. Performance remains consistent regardless of data size because it only visits the nodes connected to Alice—not the entire dataset.
When should I use a graph database?
Graph databases excel when relationships between data points matter more than the individual points themselves.
What are the ideal use cases?
| Use Case | Why Graphs Work Better |
| Identity & Access Management | Users, groups, roles, permissions form a natural hierarchy of relationships |
| Authorization | Access decisions depend on traversing relationship paths |
| Fraud Detection | Detecting fraud requires finding suspicious patterns across connected entities |
| Recommendation Engines | "Users who liked X also liked Y" is a relationship query |
| Knowledge Graphs | Representing semantic relationships between concepts |
| Network & IT Operations | Infrastructure components are interconnected by nature |
| Master Data Management | Connecting data across siloed systems |
When are relational databases still appropriate?
- Primarily tabular data with few relationships.
- Simple CRUD operations on isolated records.
- Transactional systems with rigid schemas.
- Reporting and analytics on flat data.
Why is a graph database essential for authorization?
Authorization is fundamentally about relationships:
- Does this user have a relationship to this resource that permits this action?
- Is this user a member of a group that has a role with permission to access this resource?
These questions are naturally expressed as graph traversals.
What problems do traditional authorization systems have?
| Problem | With Relational DB | With Graph DB |
| Role explosion | Create thousands of roles to cover all combinations | Model relationships directly, no artificial roles needed |
| Policy sprawl | Maintain long ACLs for every resource | Define policies as graph patterns |
| Complex hierarchies | Recursive queries or denormalization | Natural hierarchy traversal |
| Real-time decisions | Cached permissions, stale data | Live traversal of current state |
| Audit trails | Difficult to trace decision path | Clear, traceable graph paths |
What is the Identity Knowledge Graph (IKG)?
The Identity Knowledge Graph is IndyKite's graph database that stores all identity and resource data. It is the foundation of the IndyKite platform.
What does the IKG contain?
- Human identities: Users, customers, employees with their attributes.
- Non-human identities: Applications, services, devices, AI agents.
- Resources: Documents, APIs, data, any protected asset.
- Relationships: How all these entities connect to each other.
- Context: Attributes, metadata, and environmental data.
Why is the IKG important for IndyKite?
Every IndyKite capability operates on the IKG:
| Capability | How it uses the IKG |
| Capture | Stores nodes and relationships in the graph |
| KBAC (AuthZEN) | Evaluates policies by traversing graph relationships |
| ContX IQ (CIQ) | Reads and updates graph data with authorization |
| Token Introspect | Maps token claims to nodes in the graph |
| Outbound Events | Triggers events when graph data changes |
Why is a graph database important for KBAC?
Knowledge-Based Access Control (KBAC) is IndyKite's authorization model that uses the IKG to make intelligent, context-aware access decisions.
How does KBAC use the graph?
KBAC policies are defined as graph patterns. When an authorization request arrives, IndyKite:
- Maps the request to nodes in the IKG (subject, resource, action).
- Evaluates the policy by traversing relationships in the graph.
- Returns a decision based on whether the pattern exists.
Example: Can Alice drive Car X?
Policy definition:
{
"subject": { "type": "Person" },
"actions": ["CAN_DRIVE"],
"resource": { "type": "Car" },
"condition": {
"cypher": "MATCH (subject:Person)-[:OWNS]->(resource:Car)"
}
}
This policy says: A Person can CAN_DRIVE a Car if there is an OWNS relationship between them.
The graph database traverses from Alice to Car X, looking for an OWNS relationship. If it exists, access is granted. If not, access is denied.
Why can't relational databases do this efficiently?
- Variable-depth traversals: "Can Alice access any document in any project she's a member of?" requires recursive queries.
- Multiple relationship types: Different paths may grant access (direct ownership, group membership, role assignment).
- Real-time evaluation: Authorization must be checked at request time, not from cached data.
- Contextual attributes: Decisions may depend on attributes along the path, not just endpoints.
Why is a graph database important for CIQ?
ContX IQ (CIQ) delivers authorized data retrieval and mutation. It uses the graph to:
- Define what data can be accessed: Policies specify graph patterns.
- Query related data: Knowledge Queries traverse relationships to find results.
- Update connected data: Create or modify nodes and relationships in context.
Example: Get license plates for a person's vehicles
Policy pattern:
MATCH (person:Person)-[:ACCEPTED]->(contract:Contract)-[:COVERS]->(vehicle:Vehicle)-[:HAS]->(ln:LicenseNumber)
This query traverses from a Person through their Contracts to covered Vehicles and their License Numbers. The graph database follows these relationships efficiently, regardless of how many contracts, vehicles, or license numbers exist in the system.
How does graph performance scale?
Why is graph traversal fast?
Graph databases use index-free adjacency: each node directly references its connected nodes. This means:
- No index lookups: Relationships are stored as direct pointers.
- Local traversal: Queries only visit relevant nodes, not the entire dataset.
- Constant time per hop: Adding more data doesn't slow down individual traversals.
How does this compare to JOIN performance?
| Aspect | Relational JOIN | Graph Traversal |
| Time complexity | O(n × m) for each JOIN | O(k) where k = nodes visited |
| Multi-hop queries | Exponential slowdown | Linear with path length |
| Index dependency | Requires careful index design | Built-in via adjacency |
| Dataset growth | JOINs slow as tables grow | Unaffected by total size |
Graph databases can traverse millions of relationships per second, maintaining consistent performance year over year.
How does schema flexibility help?
What is schema-optional?
Graph databases like Neo4j are schema-optional: you can add new node types, relationship types, and properties without schema migrations.
Why does this matter for identity and authorization?
- Evolving requirements: Add new entity types (AI agents, IoT devices) without restructuring.
- Heterogeneous data: Different nodes can have different properties.
- Integration: Connect data from multiple sources with different schemas.
- Rapid iteration: Model changes don't require downtime or migrations.
Example: Adding AI agents to your system
With a relational database, adding AI agents as a new identity type might require:
- Creating new tables (
ai_agents,ai_agent_permissions). - Modifying existing tables to reference the new tables.
- Updating all JOIN queries to include the new tables.
- Running migrations and potentially causing downtime.
With a graph database:
- Create nodes with the
:AIAgentlabel. - Create relationships to existing resources.
- Existing queries continue to work.
- New queries can traverse to AI agents immediately.
What is Cypher?
Cypher is Neo4j's declarative query language for graph databases. It is designed to express graph patterns intuitively.
How does Cypher work?
Cypher uses ASCII art-like syntax to describe graph patterns:
(node)— Represents a node-[:RELATIONSHIP]->— Represents a directed relationship{property: value}— Filters by property
Example queries
Find all documents Alice can access:
MATCH (alice:User {email: "alice@example.com"})-[:CAN_ACCESS]->(doc:Document)
RETURN doc
Find access through group membership:
MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF]->(group:Group)-[:CAN_ACCESS]->(doc:Document)
RETURN doc
Variable-length path (any depth):
MATCH (alice:User {email: "alice@example.com"})-[:MEMBER_OF*1..5]->(group:Group)-[:CAN_ACCESS]->(doc:Document)
RETURN doc
How does IndyKite use Cypher?
IndyKite KBAC and CIQ policies use Cypher in the condition.cypher field to define graph patterns. When a policy is evaluated, IndyKite executes the Cypher pattern against the IKG to determine access.
How does the IKG support real-time decisions?
What problem does real-time authorization solve?
Traditional authorization systems often use cached permissions:
- Permissions are computed periodically and stored.
- Changes take time to propagate.
- Stale data can grant access that should be revoked.
How does the IKG enable real-time decisions?
The IKG maintains the current state of all entities and relationships:
- Live data: Authorization queries traverse current relationships.
- Immediate revocation: Delete a relationship, access is revoked instantly.
- Contextual factors: Evaluate time, location, device at request time.
- No cache invalidation: No need to manage permission caches.
What IKG options does IndyKite provide?
When creating a Project in IndyKite, you choose how to provision your IKG:
| Option | Description | Best for |
| Managed IKG | IndyKite hosts and manages the Neo4j database | Quick start, no database management overhead |
| Bring Your Own DB | Connect your own Neo4j instance | Existing Neo4j investment, custom requirements |
How do I connect my own Neo4j database?
Provide the connection details when creating your Project:
- URL: Neo4j connection string (e.g.,
neo4j+s://xxxxx.databases.neo4j.io) - Username: Database user
- Password: Database password
- Database name: The specific database to use
You can get a free Neo4j instance from:
Summary: Why graphs for IndyKite?
The graph database is essential to IndyKite because:
| Requirement | Why Graphs Deliver |
| Identity relationships | Users, groups, roles, resources are naturally connected |
| Authorization decisions | Access depends on relationship paths, not table rows |
| Real-time evaluation | Traverse current state, not cached permissions |
| Policy flexibility | Express rules as graph patterns (Cypher) |
| Performance at scale | Millions of traversals per second, consistent latency |
| Schema evolution | Add new identity types without migrations |
| Context awareness | Attributes on nodes and relationships inform decisions |
| Auditability | Clear paths show why access was granted or denied |
Next Steps
- Environment setup: Environment Guide
- Capture data: Developer Hub Resources
- KBAC policies: Dynamic Authorization Guide
- CIQ queries: ContX IQ Guide
- AuthZEN: AuthZEN Guide
- Terraform: Terraform Guide
- Neo4j documentation: Neo4j Getting Started
- Cypher reference: Cypher Manual