How to Choose Right Database in System Design Interviews - Part

How to Choose Right Database in System Design Interviews - Part - 1

Choosing the right database in the system design interviews is one of the most important decisions. The database selection impacts the design correctness and scalability of the system. There are many factors which might impact the decisions, but the most important ones are:

Structure of the Data:

Different databases support data models such as relational, document, key-value, graph, etc.
The choice of database depends on which data model best fits the structure of your data. For example, a traditional relational database like MySQL or PostgreSQL might be appropriate if the data is highly structured and relational.
A NoSQL database like Cassandra or MongoDB might be more suitable if the data is semi-structured or unstructured.

Query Pattern:

Understanding the types of queries and operations performed on the data helps select a database that can efficiently handle those operations.
Are they read-heavy or write-heavy? Do they involve complex joins, aggregations, or full-text searches?
Some databases are optimized for complex analytical queries (OLAP), while others excel at fast transactional queries (OLTP).
Some databases are optimized for specific use cases, such as time-series data, geospatial data, or graph data.

Data Integrity and Consistency:

The database's support for enforcing data integrity constraints, such as unique keys, foreign key constraints, and transactions, is critical to maintaining data consistency.
Relational databases typically excel in this area, whereas some NoSQL databases may sacrifice strict consistency for performance or scalability

Data Volume and Scalability:

The volume of data that needs to be stored and managed can also influence the database selection. However, unlike popular opinion, the volume of data is not a big factor in selecting a relational or no-SQL database.
One common misconception is that a traditional RDBMS might only be sufficient for small to moderate-sized datasets. For large-scale data sets that require horizontal scalability, distributed databases like Apache Cassandra or Amazon DynamoDB might be necessary. This is not always true.
Scaling is not an issue with modern relational distributed Databases like Google Spanner, Amazon RDS, and Cockroach DB. They scale seamlessly. However, they are relational databases so built-in ACID support is there, which might impact the latency. One needs to consider other factors mentioned above to make a final decision.

Let's discuss some of the most common use cases for databases:

Time-Series Database:

Time series databases (TSDBs) are optimized for storing, querying, and analyzing time-stamped data points. They are commonly used in various industries and applications where data is collected over time and needs to be efficiently managed and analyzed.
Some common TSDBs are InfluxDB and Amazon Timestream. Bigtable and MongoDB also work well for time-series data.
Some common use cases are:

Monitoring and Observability:

TSDBs are widely used for monitoring and observability applications, such as monitoring infrastructure performance (e.g., CPU usage, memory utilization, network traffic), network devices, servers, and application metrics (e.g., response times, error rates).
They store metrics and telemetry data collected from sensors, devices, and software systems in real-time.
Prominent tools like Prometheus, Grafana, and InfluxDB widely use TSDBs.

Financial Services:

Financial services use Time series databases to track and analyze stock market data, cryptocurrency prices, trading volumes, and other financial metrics.
These databases help in historical analysis, real-time monitoring of market trends, algorithmic trading, risk management, and compliance reporting.

Log Management and Analytics:

TSDBs are used for storing and analyzing log data generated by applications, servers, network devices, and security systems.
They enable centralized log management, log aggregation, real-time log analysis, anomaly detection, and forensic investigations, helping organizations gain insights into system behavior and security incidents.

Data warehouses:

Data warehouses are used for analytics on the whole company from different sources. E.g Data warehouses are responsible for doing analytics like the number of orders, geography-wide orders, most popular items for each geography, etc.
Data warehouses are specialized databases designed for storing, analyzing, and reporting large volumes of structured and unstructured data from multiple sources.
Here a lot of data from all services are dumped and can be used to generate reports on top of that data. These data from different sources are collected in the Data Warehouse using ETL(Extract, Transform, Load) jobs.
Some of the popular Data warehouses are BigQuery, Amazon Redshift, etc.

Text Searching Capabilities:

Systems like Amazon provide search capabilities on various products based on title, product description, seller, etc
Systems like Netflix where one needs to search on movie name, title, genres, etc
Social media platforms like Facebook, Twitter, and LinkedIn employ text search engines to enable users to search for posts, tweets, profiles, and hashtags. Users can search for specific topics, people, or conversations using keywords or hashtags.
CMS(Content Management System) platforms, such as WordPress, Drupal, and Joomla, use text search engines to enable users to search for articles, blog posts, pages, and other content within the CMS. Users can search by title, content, tags, categories, and metadata.
Text search engines like Elastic Search can be used for such use cases.
They are useful for fuzzy search as well. It allows users to perform fuzzy search to find approximate matches for a given query.
One Important point to note is they are not DB but the search engine. Search engines do not guarantee the durability of data. Search Engines need to be used along with the DB. DB is always the source of truth and data can be loaded into the search engine on a need basis.

We will explore the SQL and No-SQL databases in the next part. What are the factors that help us to choose among them?

Tech With Rahul

Search This Blog

How to Choose Right Database in System Design Interviews - Part - 1

Time-Series Database:

Data warehouses:

Text Searching Capabilities:

Comments

Post a Comment

Popular posts from this blog

CAP Theorem - Debunking Myths

Understanding Merkle Tree

HTTP Series - Part-1: (TCP, HTTP and HTTP/1.1)