LUC #16: Top 5 Database Scaling Solutions for Optimal Performance
A detailed overview of the most commonly used database scaling techniques
Hello, and welcome back! Let’s dive into another edition of Level Up Coding’s newsletter.
In today’s issue:
Read time: 7 minutes
A big thank you to our partner Postman who keeps this newsletter free to the reader.
Wish you could test your API without leaving the code editor? Postman’s new VS Code extension lets you do exactly that! Check it out!
Exploring the Top 5 Database Scaling Techniques
Applications are not initially designed to handle high volumes of traffic. This is why systems experience significant bottlenecks as their user base grows.
The most common culprit for bottlenecks is the database.
When it comes to scaling a system, solutions that focus on the database tend to be the lowest-hanging fruit.
Before we dive into the best options to scale a database, it’s important to note that a system shouldn’t be optimized prematurely. This is because scaling solutions tend to introduce the following complexities:
The system becomes more complex with more pieces and variables involved.
Adding new features takes longer as there are more components to consider.
Code can be more difficult to test with more edge cases to accommodate.
Finding and resolving bugs becomes more complicated.
Once you’re certain that these trade-offs are necessary to deliver a reasonably-performing system, the following five database scaling techniques are worth considering.
Cache Database Queries
Caching database queries is one of the simplest improvements you can make to handle database load. Usually, an application will have a handful of queries that make up the majority of the requests made.
Rather than making a round trip over the network each time for that data, it can simply be cached in memory on the webserver.
The first request will fetch data from the database and cache the results on the server, future requests just read from the cache. This results in increased performance as the data spends less time traveling through the network and is closer to the client.
Database indexing is a technique that improves the speed of data retrieval operations on a database table. Indexes are used to quickly locate data without having to search every row in a table every time the table is accessed.
Depending on the number of rows in a table this can save significant amounts of time off queries that use the indexed column.
Move Session Data
A lot of applications save session IDs in cookies, with the actual session data in a database table. This puts a lot of load on the database. One alternative is to shift session data to in-memory caches like Redis or Memcached. While this boosts access speed, there's a risk of data loss if the cache goes down. Alternatively, using JWTs lets you store session details directly in the cookie, reducing server-side session dependency, but it comes with its own set of issues.
Database Read Replication
If your database is still under too much load from reads even after caching common queries, creating efficient indexes, and handling session storage, replication may be the next best solution.
With read replication, a primary database handles writes while several replicas, on separate machines, manage reads. This setup enhances write performance, distributes read load, and optimizes read speed by placing replicas in various regions. However, there can be data inconsistencies since written data takes time to propagate to replicas. For real-time updates, like immediate profile rendering, you can read from the primary database. While read replication offers strong scaling advantages, it has complexities. Consider this method after trying simpler solutions.
Most scaling methods focus on managing database reads. Database sharding aims to tackle both reads and writes by partitioning the main database into faster, manageable 'shards'.
There are two sharding techniques available: horizontal and vertical. Horizontal sharding distributes tables across machines with identical columns but unique rows. Vertical sharding splits a table into distinct tables across machines, each with unique rows and columns.
Sharding improves query speeds and boosts the system’s resilience against failures. In sharded setups, an outage typically affects only one shard, not the whole system.
While sharding offers significant benefits, it adds a lot of complexity which means high setup and maintenance costs. Sharding should be considered after other scaling solutions have been exhausted.
Data management systems are continually evolving and growing, opening up other techniques to scale a database. Despite the numerous alternatives, the five described above will continue to be strong techniques that are applicable to most use cases now and in the future.
Monolithic vs Microservices (recap)
Monolithic is a software design pattern where all application components are combined into a single, tightly-coupled, unified application.
Whereas in a microservices design, components of an application are structured as a collection of loosely coupled, independently deployable services. Each service corresponds to a specific business functionality.
Microservices have been very popular as of late, but that doesn’t mean every application should be moved into a microservices architecture. Whilst there are significant benefits, there are also significant drawbacks. Which is best depends on the requirements of the system, the context of the team, and the business’s goals.
How CORS works (recap)
Web browsers use Cross-Origin Resource Sharing (CORS) to manage requests made to a different domain than the one serving the web page. It's a security mechanism to mitigate the risks of cross-site attacks. Below is a simple breakdown of the workflow:
For non-simple requests, the browser first sends a preflight request.
The server responds to the preflight request with the appropriate CORS headers.
The browser then sends the actual request with any necessary credentials, data, or headers.
The server processes the request and sends back the response with the appropriate CORS headers.
SSL vs TLS (recap)
🔸 SSL (Secure Sockets Layer) and TLS (Transport Layer Security) are cryptographic protocols designed to provide secure communication over a network.
🔸 TLS is the successor to SSL and has addressed the vulnerabilities present in SSL.
🔸 The process of how data is sent using these protocols involves roughly 8 steps. The most significant differences between these two protocols occur during the key exchange (step 4) and encrypted data transfer (step 7).
🔸 SSL predominantly uses RSA for key exchange, whereas TLS uses other mechanisms like DH (Diffie-Hellman) and ECDH (Elliptic Curve Diffie-Hellman).
🔸 SSL uses older encryption algorithms, some of which are vulnerable. TLS uses updated & more robust encryption algorithms that have overcome the vulnerabilities in SSL.
That wraps up this week’s issue of Level Up Coding’s newsletter!
Join us again next week where we’ll explore HTTP status codes, SQL count function, and important Git commands.