Rishi Yadav

Mar 23, 2022

The Slow and Steady Evolution of New Age Data Stores

Databases and Data Management Systems have existed even before computers learned how to count time (epoch time: 1st January 1970). It's another matter that computers had to relearn how to count time at the dawn of the new millennium.

Databases became mainstream with the introduction of the relational database management system or RDBMS. In a relational database, the database schema was separated from the physical storage. It made schema the first-class citizen and the physical storage became the secret sauce of leading database vendors like Oracle and Microsoft.

In previous blogs, we have talked a lot about how cloud-native evolution has affected multiple aspects of software architecture. Cloud-native has also disrupted how people think about databases and this aspect is not talked about a lot. In this blog, we are going to focus on databases and data management systems disruption.

It started far before clouds filled the sky

The disruption in databases started in draught-prone California far before cloud technologies from cloud-filled Seattle became mainstream. we discussed in one of the previous blogs how "necessity is the mother of invention" led to the creation of Google File System and Map-Reduce. When Google published papers to outline both technologies, it inspired Doug Cutting to create Hadoop framework

Hadoop was a simple framework to store unstructured data in Hadoop Distributed File System (HDFS) and retrieve it using Map-Reduce. What Hadoop itself achieved was much less important than the innovation trend it started commonly knowns as big data. Big Data was famous for three traits: volume, velocity, and variety.

As much as Hadoop was good for unstructured data, data needs to be given some form to analyze it. In addition, high-velocity data needed a way to be stored and retrieved in an efficient manner. This created Hadoop-specific and Hadoop-adjacent databases HBase and Cassandra respectively. Both were columnar stores as opposed to row-based stores in traditional databases.

Clouds in the sky

When public clouds felt the need to create cloud-native databases, they had a lot of inspiration to draw from big data-oriented databases. It started with AWS releasing DynamoDB in 2012 which was a key-value store. It was the first service billed based on throughput, not storage. Since it was a cloud data store, it had auto-scaling as a core feature.

Summary

The disruption of the legacy database industry is happening in multiple waves. Big Data enabled data to reside in a distributed architecture and slowly took on lower hanging fruits like OLAP. Cloud-native databases took it to the next stage by offering these datastores As-a-Service. The last frontier left is query engines which have constantly been evolving in the new age datastores (at least for a decade) but still have to go a long way to be a threat to the reign of legacy players (who are not sitting idle either).

Rishi Yadav

Welcome to my Social Blog