Skip to main content

Does Google Spanner Provide High Availability and Strong Consistency, Defying CAP Theorem

 I was going through one of the official introduction videos of Google Spanner. It mentions "Google Spanner is a mission-critical relational database service built from the ground up and battle-tested at Google for Strong Consistency and High Availability at a global scale".

A few questions popped up into my mind after this statement:

  • How does a database guarantee high Availability and Strong Consistency on a global scale?

  • Ensuring Partition Tolerance is necessary in building Distributed Systems. On top of that, how does Spanner provide High Availability and Strong Consistency simultaneously?

  • If it provides all three guarantees, does this break the CAP theorem?

The short answer is Google Spanner does not break the CAP theorem. Before going deep, let us revisit the CAP theorem. As per Wikipedia, any distributed data store can provide two of the following three guarantees:

  1. Consistency - Every read receives the most recent write or an error. Each read operation returns either the most recent write or an error.

  2. Availability - Every request receives a (non-error) response, without the guarantee that it contains the most recent write. The system is said to be available if it sends any non-error response. The response might or might not be the most recent one.

  3. Partition Tolerance - The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. The system consists of multiple nodes or servers that communicate with each other through a specified protocol, forming a network. Partitioning occurs when these nodes are unable to communicate with each other, whether due to network disruptions, software or hardware issues, etc. This creates two/more disjoint subsets of networks that cannot communicate with each other. Partition Tolerance denotes the system's capacity to withstand and operate effectively despite these network partitions.

In a network partition, one is left with two options: Consistency or Availability. When a Network Partition failure happens, the system must decide whether to do one of the following:

  1. When choosing consistency over availability, the system will return an error or a time-out if particular information cannot be guaranteed to be the most recent. The system is aware that it has multiple subsets of networks(due to network partition) that cannot communicate with each other. As a result, up-to-date data is not guaranteed. The system will either return an error or time out.

  2. When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee up-to-date data due to network partitioning.

No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. That is why we see most of the systems tagged as an AP or CP System.

An AP system provides 100% availability for reads and writes. Achieving a 100% available system is practically impossible. However, if a system can deliver availability that is so high that most users don't worry about its outages, then users need not worry about it. In practice, Spanner does meet this standard, boasting Availability exceeding five nines (less than one failure in 10^5 requests).

How does Spanner attain such a high level of Availability? Spanner runs on the Google Private Network. Unlike most wide-area networks, and especially the public internet, Google's complete control over its private network allows for hardware and path redundancy, as well as the management of upgrades and overall operations. Say even if one of the optical fibers in the ocean breaks, redundant paths have been established to ensure that communication between nodes remains unaffected. Spanner is deployed on exceptionally high-quality specialized hardware, complemented by redundancy and fallbacks at the hardware level. While occasional disruptions such as optical fiber cuts and equipment failures may occur, the overall system remains highly robust.

Google ensures that Network Partition is exceedingly rare using the infrastructure developed through years of operational enhancements. This leads to very high Availability. When there is no network partition and the system is in a state of high availability, data consistently adheres to Strong Consistency. That is how Spanner provides High Availability, Strong Consistency, and Partition Tolerance.

However, in the rare event of a Network Partition, Spanner gives up its Availability to ensure Consistency of data. So, technically Spanner is a CP System. However, for all practical purposes, it appears to defy the CAP theorem by delivering Strong Consistency, High Availability, and Partition Tolerance.


Reference:

  • https://en.wikipedia.org/wiki/CAP_theorem

  • https://cloud.google.com/blog/products/databases/inside-cloud-spanner-and-the-cap-theorem

  • https://www.youtube.com/watch?v=amcf6W2Xv6M

Comments

Popular posts from this blog

CAP Theorem - Debunking Myths

The CAP theorem is a widely recognized idea in the field of distributed systems. It represents three key concepts: Consistency, Availability, and Partition Tolerance. While most of us are familiar with its definition, the devil lies in the details. In this discussion, we'll clarify common myths and misunderstandings. We'll start by explaining the CAP theorem in detail, and then explore various scenarios that may challenge the common interpretation of it. CAP theorem also known as Brewer's theorem states that any distributed data store can provide only two of the following three guarantees: Consistency:  For every read request, the system should provide the most recent write or an error. Note that this consistency is different from the consistency of the  ACID theorem Availability:   For every request, the system should provide a response, even if it’s not the latest data.  In other words, all non-failing (healthy) nodes in the distributed system return a valid ...

Understanding Merkle Tree

A Merkle Tree is a cryptographic tree structure used in computer science and distributed systems to efficiently verify the integrity of large sets of data (accuracy and consistency of data over its lifecycle).  Merkle Tree, also known as Hash Tree is a tree of hash values.  It has a tree structure in which each leaf node is a hash of a small portion of the data, and each non-leaf node is a hash of its children. It is used in applications such as  NoSQL databases, Git, Cryptocurrencies,  File Systems, etc. Some key characteristics of Merkle Tree are: Binary Tree Structure:  The Merkle Tree is a binary tree, where each leaf node represents a hash of data. Leaf Nodes: The data set is divided into fixed-size blocks or "leaves". Each leaf node contains the hash of a specific data block or piece of information. Non-Leaf Nodes: Non-leaf nodes in the tree represent the hash of the concatenation of their child node's hashes.  If the number of leaves is odd...

Event Driven Architecture - SAGA Pattern (Part-1 : Choreography Model)

The Saga pattern is a distributed transactional pattern used in microservices architecture to maintain data consistency across multiple services. It helps manage long-running transactions involving multiple services by breaking them down into smaller, more manageable work units. There is a famous Database per Service  pattern in the Microservice world. Under this paradigm, each service maintains its own dedicated database. Some business transactions, however, span multiple services so we need a mechanism to implement transactions that span through services. Take, for instance, the scenario of placing an online order, which involves actions like inventory verification and item reservation till payment completion. Since services such as Orders, Inventory, and Payment operate on separate databases, the application cannot simply use a local ACID transaction. 2 Phase Commit Protocol  is one of the options being used for ensuring transactions across services. However, it has se...