Key Index
Fast node lookups by key
Every node in KiteDB can have a string key for lookup. The key index provides O(1) average-case lookups from key to NodeID.
The Problem
"Find user:alice"
Scan all nodes → O(n)
1M nodes × 1μs = 1 second
"Find user:alice"
Hash lookup → O(1)
1M nodes = ~100ns
Index Structure
The key index uses hash buckets with linear probing:
Key Index Structure
← Start offsets into entry array
hash64: xxHash64 of the key string
stringId: Index into string table (for collision resolution)
nodeId: The NodeID this key maps to
Lookup Process
Lookup Process
If key in delta.keyIndexDeleted → return null
If key in delta.keyIndex → return NodeID
hash = xxHash64(key)
bucket = hash % numBuckets
start = bucketArray[bucket]
end = bucketArray[bucket + 1]
Find entry where entry.hash64 == hash
Verify: stringTable[entry.stringId] == key
Return entry.nodeId
Two-Level Lookup
The key index is split between delta (memory) and snapshot (disk):
Two-Level Lookup
keyIndexMap<string, NodeID>keyIndexDeletedSet<string>bucketArrayu32[]entries{hash64, stringId, nodeId}[]Lookup order:
delta.keyIndexDeleted→ If found, return nulldelta.keyIndex→ If found, return NodeIDsnapshot index→ Search hash bucketsThis order ensures recent changes override old data.
Why xxHash64
Why xxHash64
For typical key lengths (10-100 bytes):
xxHash64
~50-100ns per hash
~10 GB/s throughput
SHA-256
~500-1000ns per hash
Cryptographic (overkill)
10x faster, and we don't need cryptographic security.
Handling Collisions
Handling Collisions
Scenario: Two keys hash to same bucket
"user:alice" → hash: 0x1234...
"user:alfred" → hash: 0x1234... (collision!)
Cost: O(k) string comparisons where k = entries with same hash. With 64-bit hash: k ≈ 1
Load Factor
Load Factor
Load factor = entries / buckets
KiteDB uses ~50% load factor
(2x buckets as entries)
- •Low collision rate
- •Reasonable memory usage
- •Fast lookups
With 1M keys:
Lookup: 1 bucket read + 1-2 entry reads = ~100ns
Next Steps
- Snapshot + Delta – How the two-level lookup fits in
- Performance – Index optimization techniques