Skip to main content

Posts

Showing posts from 2023

Understanding Merkle Tree

A Merkle Tree is a cryptographic tree structure used in computer science and distributed systems to efficiently verify the integrity of large sets of data (accuracy and consistency of data over its lifecycle).  Merkle Tree, also known as Hash Tree is a tree of hash values.  It has a tree structure in which each leaf node is a hash of a small portion of the data, and each non-leaf node is a hash of its children. It is used in applications such as  NoSQL databases, Git, Cryptocurrencies,  File Systems, etc. Some key characteristics of Merkle Tree are: Binary Tree Structure:  The Merkle Tree is a binary tree, where each leaf node represents a hash of data. Leaf Nodes: The data set is divided into fixed-size blocks or "leaves". Each leaf node contains the hash of a specific data block or piece of information. Non-Leaf Nodes: Non-leaf nodes in the tree represent the hash of the concatenation of their child node's hashes.  If the number of leaves is odd...

Bloom Filter

A Bloom filter is a space-efficient probabilistic data structure used to determine whether an element is a member of a set. The filter responds to queries with either a definite No or a probable Yes. The probabilistic nature of the Bloom filter is apparent in its ' probably Yes ' responses. This implies the possibility of false positives. When the Bloom filter indicates Yes, the element might or might not be present in the set. However, if the Bloom filter returns No, then the element is definitively not present in the set. Software engineering is all about tradeoffs. When using the Bloom filter, we sacrifice a bit of accuracy and flexibility to gain a substantial boost in performance. The key consideration is identifying use cases where it's acceptable to detect an element in the set, even if it's not actually present. Let us understand the core algorithm first then we will see its real life applications: Algorithm: A key element in the construction of a Bloom Filter i...

Client-Server Communication Model (Part-3) - WebSockets

This is the third and final part of the Client-Server Communication Model. You can have a look at the first and second parts where we discussed different ways in which client and server can communicate. Here, we will focus on one of the most popular communication models: WebSockets. 5. WebSockets WebSockets ( RFC-6455 ) enable real-time, two-way, full-duplex communication between client and server over a single, long-lived connection. WebSocket, like SSE , provides a persistent connection. But in contrast to SSE, it enables full-duplex communication, which means that both client and server can communicate over the same channel. WebSockets are suitable for building interactive and dynamic applications that require continuous communication between clients and servers. It leverages the HTTP protocol for connection establishment. Since HTTP/1.0 is a non-persistent connection, WebSocket can not be implemented on HTTP/1.0. It needs a protocol having a persistent connection, like HTTP/...

Client-Server Communication Model (Part-2) - Server Sent Events

This is the second part of the Client-Server Communication Model. You can have a look at the first part  where we discussed Polling, Long Polling, and Webhooks. Here, we will focus on Server-Sent Events. 4. Server-Sent Events (SSE) Server-Sent Events (SSE) are based on server push technology enabling a client to receive automatic updates from a server via an HTTP connection. The clients make a persistent long-lived connection with the server. Then, the server uses this connection to send the data to the client. It is important to note that the client can’t send the data to the server using the SSE. It is a unidirectional way to flow data from the server to the client. The request-response flow of SSE is: The client issues an HTTP GET request to the server, with the request headers Accept: text/event-stream and   Connection: keep-alive . The inclusion of   Accept: text/event-stream  in the request informs the server that the client aims to initiate an SSE co...

Client-Server Communication Model (Part-1)

The foundation of the internet lies in the communication between computers. Computers, acting as servers, own and can provide resources to other computers, serving as clients. The roles of computers shift over time – one moment, a computer may own resources, while at another time, it may require resources from others. Communication is essential for data exchange between them. The predominant protocol facilitating Client-Server communication is the HyperText Transfer Protocol (HTTP). HTTP is a protocol for fetching web pages, documents, images, media files, binaries, etc. Pretty much the whole of the internet runs on HTTP. We have discussed HTTP in detail here . A typical HTTP request flow is as follows: A client opens a connection and requests a resource from a server. The server calculates the response The server sends the response to the client on the same opened connection Some of the most popular Client Server Communication models are: Polling / Short Polling / AJAX Polling Long Po...

Does Google Spanner Provide High Availability and Strong Consistency, Defying CAP Theorem

 I was going through one of the official introduction videos of Google Spanner. It mentions " Google Spanner is a mission-critical relational database service built from the ground up and battle-tested at Google for Strong Consistency and High Availability at a global scale ". A few questions popped up into my mind after this statement: How does a database guarantee high Availability and Strong Consistency on a global scale? Ensuring Partition Tolerance is necessary in building Distributed Systems. On top of that, how does Spanner provide High Availability and Strong Consistency simultaneously? If it provides all three guarantees, does this break the CAP theorem? The short answer is Google Spanner does not break the CAP theorem . Before going deep, let us revisit the CAP theorem. As per Wikipedia, any distributed data store can provide two of the following three guarantees: Consistency - Every read receives the most recent write or an error . Each read operation returns ...

Caching Strategy Part-1

Caching is a widely used technique to enhance system performance. In essence, frequently accessed content is stored in a faster temporary storage, called a Cache. This allows the content to be retrieved directly from the cache, rather than fetching it from the actual source every time. Retrieving content from the cache improves performance, as it is faster than retrieving it from the actual source. The actual data source can be a service, database, or any other system. Although the following illustrations take a database as the data source, the same concept applies to other systems. Accessing a database can be an expensive operation. Frequent access to the same data can have performance implications on both the application and the database. Caching can reduce response time for the application and decrease the load on the underlying database. There are different strategies for selecting the right kind of cache. The choice of these strategies depends on how the data is used, like how it...