~/docs/internals/single-file

DOCUMENTATION

Single-File Format

The .kitedb file layout

KiteDB stores everything in a single .kitedb file. This makes databases portable, simplifies deployment, and enables atomic operations.

File Layout

Header

4 KB

Database metadata, pointers, checksums

WAL Area

~64 MB

Primary Region75% — normal writes

Secondary Region25% — during checkpoint

Snapshot Area

grows

CSR data, compressed with zstd

The header is 4 KB and contains all metadata needed to open the database:

Header Contents

offset 0, always 4 KB

├Magic bytes"KITE" + version

├Page size4096 (default)

├Snapshot locationStart page, page count

├WAL locationStart page, page count

├WAL pointersHead and tail positions

├CountersMax node ID, next tx ID

├Snapshot generationIncremented on checkpoint

└ChecksumsCRC32C of header data

Atomic Updates

The header enables atomic state transitions. A checkpoint works like this:

Checkpoint Process

1Write new snapshot to free space at end of file

2fsync() to ensure snapshot is durable

3Update header with new snapshot location

4fsync() header

5Old snapshot space becomes free

If crash occurs:

Before step 4:Old snapshot valid

After step 4:New snapshot valid

No intermediate state is possible.

WAL Area

The WAL area is a circular buffer divided into two regions:

Default WAL size is 1MB. Auto-checkpoint is enabled by default and triggers when WAL usage exceeds 80% of the active region. Increase WAL size for high-throughput ingest. WAL size is fixed at creation; change it via `resizeWal` (offline) or rebuild into a new file.

WAL Area

64 MB example

Primary

48 MB

Secondary

16 MB

Why two regions?

•Checkpoint reads primary to build new snapshot
•Concurrent transactions write to secondary
→No blocking between reads and writes

Snapshot Area

The snapshot area holds the CSR-formatted graph data:

Snapshot Sections

├Node ID mappingsPhysical ↔ Logical ID translation

├Out-edge CSRoffsets[], destinations[], edge_types[]

├In-edge CSRoffsets[], sources[], edge_types[]

├PropertiesNode and edge property values

├String tableDeduplicated string storage

├Key indexHash-bucketed node key lookups

└SchemaLabels, edge types, property keys

Each section independently compressed (zstd). Typical ratio: 40-60% of raw size.

File Growth

The file grows in predictable ways:

File Size Examples

Example below assumes a 64MB WAL. Default WAL size is 1MB and configurable.

Initial

~64 MB

100K nodes

~72 MB

1M nodes

~150 MB

Header (fixed)

WAL (configurable)

Snapshot (grows)

Single-File vs Multi-File

KiteDB previously supported a directory-based format. Single-file is now the default:

Aspect	Single-File	Directory (legacy)
Portability	Copy one file	Copy entire directory
Atomic ops	Header flip	Manifest + renames
Disk usage	~40% smaller	More overhead
Complexity	Simpler	More moving parts

Opening a Database

1Read header (4 KB at offset 0)

2Validate magic bytes and checksums

3mmap() snapshot area (zero-copy)

4Parse snapshot sections

5Replay WAL to rebuild delta

✓Ready for queries

If WAL replay finds incomplete transaction:Discard it (never committed). Recovery is automatic and fast.

Next Steps

WAL & Durability – How the write-ahead log provides crash safety
Snapshot + Delta – How reads merge these two sources

./edit --remote

File Layout

The Header

Header Contents

Atomic Updates

Checkpoint Process

WAL Area

WAL Area

Snapshot Area

Snapshot Sections

File Growth

File Size Examples

Single-File vs Multi-File

Opening a Database

Opening a Database

Next Steps