CSR Format
How edges are stored for fast traversal
KiteDB uses Compressed Sparse Row (CSR) format to store graph edges. CSR is a standard format for sparse matrices that provides fast traversal with minimal memory overhead.
The Problem with Naive Edge Storage
Consider a graph with 100,000 nodes and 1 million edges.
Adjacency Matrix
Linked Adjacency Lists
Problem: Pointer chasing. Each lookup goes to random memory.
The CSR Solution
CSR stores all edges in two flat arrays: offsets and destinations. No pointers, no wasted space.
CSR Solution
How Traversal Works
Finding a node's neighbors is two array lookups:
Why It's Fast: Memory Layout
After the first access, B/C/D/A are already in CPU cache.
Bidirectional Edges
KiteDB stores edges in both directions for fast traversal either way:
Edge Types and Sorting
Real graphs have different edge types (follows, likes, knows). KiteDB stores edge types in a parallel array, sorted within each node:
Edge Existence Check
To check if edge A→B exists with type KNOWS:
Performance Numbers
| Operation | CSR | Linked List |
|---|---|---|
| Start traversal | O(1) – two array lookups | O(1) – follow pointer |
| Iterate k neighbors | O(k) – sequential read | O(k) – but cache misses |
| Edge existence | O(log k) – binary search | O(k) – linear scan |
| Cache behavior | Excellent – prefetcher works | Poor – random access |
Next Steps
- Snapshot + Delta – How CSR fits into the storage model
- Key Index – How node lookups work
- Performance – Optimization techniques