Caching Strategy Part-1

Caching is a widely used technique to enhance system performance. In essence, frequently accessed content is stored in a faster temporary storage, called a Cache. This allows the content to be retrieved directly from the cache, rather than fetching it from the actual source every time. Retrieving content from the cache improves performance, as it is faster than retrieving it from the actual source. The actual data source can be a service, database, or any other system. Although the following illustrations take a database as the data source, the same concept applies to other systems.

Accessing a database can be an expensive operation. Frequent access to the same data can have performance implications on both the application and the database. Caching can reduce response time for the application and decrease the load on the underlying database. There are different strategies for selecting the right kind of cache.

The choice of these strategies depends on how the data is used, like how it's read and written. Some of the popular strategies being used in the industry are:

Read Strategy: Strategies while reading data from the System
1. Cache Aside
2. Read Through
Write Strategy: Strategies while writing data to the System

Write Around
Write Back
Write Through

1. a) Cache Aside Strategy:

This is the most common Cache strategy being used in the industry. The application interacts with the Cache as well as the database. It is the default way of working with cache for most applications.

Redis and Memcached are the widely used caches that follow this strategy. The flow is like:

The initial step involves the application querying the cache.
The information is read and transmitted to the client upon locating the data in the cache (cache hit). In the absence of cached data (cache miss), the application proceeds to the next step.
The application tries to retrieve the required data directly from the database.
The requested data is then obtained from the database.
After receiving the data, the application updates the cache with the newly acquired information and then sends it to the client.

Use Case:

The Cache Aside strategy works best for the read-heavy system. It is well-suited for scenarios where certain pieces of data are frequently read but not frequently updated.
When incorporating caching into established systems, particularly those of a legacy nature, the Cache-Aside strategy often emerges as the favored option. It allows for a gradual introduction of caching without adding significant modifications to the existing data storage and retrieval mechanisms.

Advantages:

Applications using the Cache Aside strategy are resilient to cache failure. If the cache is down, the application can still fetch data from the database. Although this defeats the purpose of using the cache, but at least the application is not impacted till the load is not very high.
The cache can have a different data model than that of the database. The benefit of this is we can store the aggregate result of multiple queries of the database.
Cache-Aside is suitable for scenarios with predictable access patterns. This predictability allows developers to strategically preload or cache specific data items, optimizing system performance by proactively loading relevant information into the cache.

Limitations:

This strategy relies on the application to load data into the cache on demand. When the cache is not preloaded or warmed up, frequent cache misses can result in increased response times as the application fetches data from the underlying storage.
There is a risk of serving stale data to users. By default, the application refreshes the cached data on Cache miss. It might take some time for the Cache miss to trigger, even after the data is updated in the database. This can impact the consistency of the system. It can be handled to some extent using time to live (TTL) and refresh data after TTL. There are Write strategies as well to refresh data, which we will explore in the second part of this series.

1. b) Read Through Cache Strategy:

In this strategy, the application only interacts with the Cache. When there is a cache miss, the Cache loads missing data from the database.

The flow is like:

The initial step involves the application querying the cache.
The information is read and transmitted to the client upon locating the data in the cache (cache hit). In the absence of cached data (cache miss), the cache proceeds to the next step.
The cache tries to retrieve the required data directly from the database.
The requested data is then obtained from the database.
After receiving the data, the cache updates itself with the newly acquired information and then sends it to the application.

Use Case:

Like the Cache Aside strategy, Read Through Cache also works best for the read-heavy system, when the system is less frequently updated.
Read through Cache strategy is used by many Content Delivery Networks (CDN). CDN implements this strategy to automatically manage the cache for static content. When a user requests a specific resource, the CDN checks its cache. If the resource is not in the cache, it fetches it from the origin server.

Advantages:

It simplifies the application code and abstracts away the complexity of cache management and data retrieval.
It can contribute to real-time responsiveness. The cache is updated automatically when data is requested but not present, ensuring that subsequent read requests can be served quickly with the most current data. Unlike, the cache aside strategy there is a lesser delay here as the cache is updated by the application. This turns out to be an important factor in the read-heavy systems, as more subsequent requests will be served by the cache.
Read Through Caching is adaptable to changing access patterns. The caching layer can dynamically adjust the cache based on the most recent read patterns, ensuring that the cache remains optimized for current usage trends.

Limitations:

The application has limited control over the loading of data into the cache. It may not be optimal for scenarios where a more customized approach to data loading is required.
Because the cache is not automatically updated, there is a risk of serving stale data to users. The cache may become inconsistent with the database. It can be handled to some extent using time to live (TTL) and refresh data after TTL.

The limitations of stale data for both strategies can be reduced by using time to live (TTL). Additionally, an alternative approach to addressing stale data involves employing specific strategies during database write operations. In the next section of this series, we'll dive into different Write Strategies.

HTTP Series - Part-1: (TCP, HTTP and HTTP/1.1)

Hypertext Transfer Protocol (HTTP) enables the transfer of data over the Internet. HTTP is an application-layer protocol that facilitates the transmission of hypermedia documents, such as HTML. It was designed for communication between web browsers and servers but can also be used for other purposes. I have discussed in detail about different modes of communication between a client and a server. They all use HTTP under the hood. HTTP is a “stateless” protocol, which means each request is executed independently, without any knowledge of the requests that were executed before it. It uses the underlying transport protocol TCP (Transmission Control Protocol) to establish and manage connections between a client and a server. Transmission Control Protocol (TCP): TCP is a connection-oriented transport layer protocol. It provides a fully duplex and reliable exchange of messages between different devices over a network. Some of the main features of TCP are: Reliability: TCP ens...

Tech With Rahul

Search This Blog