3.2. SPACE MANAGEMENT

3.2.1. Versioning and Storage of Data

MVCC Mechanism

WiredTiger implements a Multiple Version Concurrency Control (MVCC) mechanism. In MVCC, when a document or record is updated, a new version is created rather than overwriting the existing data. This allows older versions to remain available for transactions that began before the update, ensuring data consistency.

Space Management

The presence of multiple versions due to MVCC requires efficient space management to prevent the accumulation of old versions from consuming excessive disk space.

3.2.2. Handling Dirty Pages

Dirty Pages

When data is modified, the changes are initially stored in memory as “dirty pages.” These pages contain the new versions created by MVCC.

Checkpointing and Space Management

During the checkpointing process, WiredTiger writes these dirty pages out to disk. This process is not just about persisting changes; it’s also a critical component of space management. As the latest version of data is merged with the original on-disk image, older, outdated versions are effectively replaced, freeing up valuable disk space.

Copy-on-Write Strategy

When an update request is received, WiredTiger fetches the corresponding page from storage into DRAM. It then employs a copy-on-write approach, which allows multiple versions of the data to exist without locking the original on-disk image. This strategy ensures that updates are efficiently managed while maintaining data consistency, as only the most recent committed versions are preserved and older versions are eventually reclaimed through the checkpointing process.

3.2.3. Extent Reuse

page-allocation

WiredTiger manages space using an extent data structure, where each extent includes a logical disk offset and size. There are three extent lists for each file that keep track of:

  1. Allocated Space: Extents currently in use, containing data.
  2. Available Space: Extents that are free and can be used for new data.
  3. Discarded Space: Extents that have been freed up after data is deleted or moved, and are awaiting reuse.

As old versions are superseded by new ones and marked for deletion, the space they occupied is moved from the allocated list to the available or discarded list, depending on whether the space is immediately reusable.

Before a data buffer is actually written out, the latest update version is applied to the original on-disk image. Then, the space management system allocates the logical disk address for the upcoming write based on one of three approaches:

  1. First-Fit: Selects the first extent in the available extent list that fits the data buffer.
  2. Best-Fit (default): Selects the smallest extent that fits the data buffer.
  3. Append: Adds the data buffer at the end of the file.

3.2.4. Space Reclamation

Through checkpoints and compaction processes, WiredTiger reclaims space from old, unused data versions, making it available for future writes. This ensures efficient space utilization and prevents the database from growing indefinitely.

The figure below illustrates how WiredTiger manages extent lists and reuses previously allocated space using checkpoints:

live-checkpoint-1
live-checkpoint-2
  1. Checkpointing: Extent list information is maintained in the checkpoint structure and written to persistent storage during checkpointing.
  2. Live Checkpoint: WiredTiger uses a special checkpoint called the live checkpoint, which only exists while the system is running and resides in DRAM. This checkpoint tracks both data changes and updates to the extent lists.
  3. Merging Checkpoints: When the checkpoint server is signaled, the previous checkpoint is fetched from persistent storage and merged with the live checkpoint. After merging, the previous disk space occupied by the same data page is marked as available and can be reused for subsequent writes.

This system ensures efficient space management, allowing WiredTiger to reuse space and maintain data consistency through checkpointing.