1.1. DATABASE INTERNALS

Databases typically consist of two main components: the frontend and the storage engine.

database-internals

1.1.1. Frontend

API

The frontend is the interface we interact with, often using an API. The most common API is SQL (Structured Query Language), which allows us to query and manage data in a structured format like tables, rows, and columns. However, different databases might use other APIs, such as Redis, which operates without structured queries and instead uses commands like GET and SET to store and retrieve documents.

Data Format

The data format defines how data is stored and retrieved. Traditionally, databases used tables with rows and columns, a structure that SQL was designed to query. This design was prevalent from the 1960s and 1970s. However, with the evolution of the web, new data formats emerged, such as JSON documents, which offer more flexibility by not requiring a fixed schema.

1.1.2. Storage Engine

The storage engine is the crucial part of a database, handling how data is stored on disk. It operates independently of the type of data stored, dealing only with bytes organized into pages. The storage engine is also responsible for efficiently managing these pages, supporting features like:

  • Indexing

  • Transactions

  • Data compression

In the 2000s, the NoSQL movement emerged, challenging the fixed schema approach of traditional SQL databases. NoSQL databases, such as those handling JSON documents, prioritize flexibility and scalability. They use different storage engines and APIs, focusing on storing documents or graphs rather than rows and columns.

Indexes and Transactions

  • Indexes are crucial for efficient data retrieval, helping to locate data quickly without scanning entire tables or documents. They can be based on various structures, such as B-trees, to enhance search efficiency.

  • Transactions ensure that multiple operations are performed as a single unit, maintaining database consistency. The storage engine manages these indexes and handles transactions to ensure data integrity. This includes maintaining logs like the Write-Ahead Log (WAL) to recover from crashes by replaying changes made before the failure.

The key difference between SQL and NoSQL databases lies in their approach to data format and API usage. SQL databases use structured tables and SQL queries, while NoSQL databases often use documents or graphs and a variety of APIs to meet modern application needs. The storage engine remains a critical component in both, ensuring data is stored, indexed, and retrieved efficiently.