Skip to content
~/docs/internals/single-file
DOCUMENTATION

Single-File Format

The .kitedb file layout

KiteDB stores everything in a single .kitedb file. This makes databases portable, simplifies deployment, and enables atomic operations.

File Layout

Header
4 KB

Database metadata, pointers, checksums

WAL Area
~64 MB
Primary Region75% — normal writes
Secondary Region25% — during checkpoint
Snapshot Area
grows

CSR data, compressed with zstd

The header is 4 KB and contains all metadata needed to open the database:

Header Contents

offset 0, always 4 KB
Magic bytes"KITE" + version
Page size4096 (default)
Snapshot locationStart page, page count
WAL locationStart page, page count
WAL pointersHead and tail positions
CountersMax node ID, next tx ID
Snapshot generationIncremented on checkpoint
ChecksumsCRC32C of header data

Atomic Updates

The header enables atomic state transitions. A checkpoint works like this:

Checkpoint Process

1Write new snapshot to free space at end of file
2fsync() to ensure snapshot is durable
3Update header with new snapshot location
4fsync() header
5Old snapshot space becomes free

If crash occurs:

Before step 4:Old snapshot valid
After step 4:New snapshot valid

No intermediate state is possible.

WAL Area

The WAL area is a circular buffer divided into two regions:

Default WAL size is 1MB. Auto-checkpoint is enabled by default and triggers when WAL usage exceeds 80% of the active region. Increase WAL size for high-throughput ingest. WAL size is fixed at creation; change it via `resizeWal` (offline) or rebuild into a new file.

WAL Area

64 MB example
Primary
48 MB
Secondary
16 MB

Why two regions?

  • Checkpoint reads primary to build new snapshot
  • Concurrent transactions write to secondary
  • No blocking between reads and writes

Snapshot Area

The snapshot area holds the CSR-formatted graph data:

Snapshot Sections

Node ID mappingsPhysical ↔ Logical ID translation
Out-edge CSRoffsets[], destinations[], edge_types[]
In-edge CSRoffsets[], sources[], edge_types[]
PropertiesNode and edge property values
String tableDeduplicated string storage
Key indexHash-bucketed node key lookups
SchemaLabels, edge types, property keys
Each section independently compressed (zstd). Typical ratio: 40-60% of raw size.

File Growth

The file grows in predictable ways:

File Size Examples

Example below assumes a 64MB WAL. Default WAL size is 1MB and configurable.

Initial
~64 MB
100K nodes
~72 MB
1M nodes
~150 MB
Header (fixed)
WAL (configurable)
Snapshot (grows)

Single-File vs Multi-File

KiteDB previously supported a directory-based format. Single-file is now the default:

AspectSingle-FileDirectory (legacy)
PortabilityCopy one fileCopy entire directory
Atomic opsHeader flipManifest + renames
Disk usage~40% smallerMore overhead
ComplexitySimplerMore moving parts

Opening a Database

Opening a Database

1Read header (4 KB at offset 0)
2Validate magic bytes and checksums
3mmap() snapshot area (zero-copy)
4Parse snapshot sections
5Replay WAL to rebuild delta
Ready for queries

If WAL replay finds incomplete transaction:Discard it (never committed). Recovery is automatic and fast.

Next Steps