Resources / Blog

The Key to Distributed Database Performance: Scalability

By: Morpheus Data

Oct 2014

TL;DR: The realities of modern corporate networks make the move to distributed database architectures inevitable. How do you leverage the stability and security of traditional relational database designs while making the transition to distributed environments? One key consideration is to ensure your cloud databases are scalable enough to deliver the technology’s cost and performance benefits.

Your conventional relational DBMS works without a hitch (mostly), yet you’re pressured to convert it to a distributed database that scales horizontally in the cloud. Why? Your customers and users not only expect new capabilities, they need them to do their jobs. Topping the list of requirements is scalability.

David Maitland points out in an October 7, 2014, article on Bobsguide.com that startups in particular have to be prepared to see the demands on their databases expand from hundreds of requests per day to millions — and back again — in a very short time. Non-relational databases have the flexibility to grow and contract almost instantaneously as traffic patterns fluctuate. The key is managing the transition to scalable architectures.

Availability defines a distributed database

A truly distributed database is more than an RDBMS with one master and multiple slave nodes. One with multiple masters, or write nodes, definitely qualifies as distributed because it’s all about availability: if one master fails, the system automatically rolls over to the next and the write is recorded. InformationWeek’s Joe Masters Emison explains the distinction in a November 20, 2013, article.

The Evolving Web Paradigm

The evolution of database technology points to a “federated” database that is document and graph based, as well as globally queryable. Source: JeffSayre.com

The CAP theorem states that you can have strict availability or strict consistency, but not both. It happens all the time: a system is instructed to write different information to the same record at the same time. You can either stop writing (no availability) or write two different records (no consistency). In the real world, everything falls between these two extremes: business processes favor high availability first and deal with inconsistencies later.

Kyle Kingsbury’s Call Me Maybe project measured the ability of distributed databases such as NoSQL to handle multiple partitions in real-world conflict situations. InformationWeek’s Joe Masters Emison describes the project in a September 5, 2013, article. The upshot is that distributed databases fail — as all databases sometimes do — but they do so less cleanly than single-node databases, so tracking and correcting the resulting data loss requires asking a new set of questions.

The Morpheus database-as-a-service (DBaaS) delivers the flexibility modern databases require while ensuring the performance and security IT managers require. Morpheus provides the reliability of 100% bare-metal SSD hosting on a high-availability network with ultra-low latency to major peering points and cloud hosts. You can optimize queries in real time and analyze key database metrics.

Morpheus supports heterogeneous ElasticSearch, MongoDB, MySQL, and Redis databases. Visit the Morpheus site for pricing information or to sign up for a free trial account.

Securing distributed databases is also more complex, and not just because the data resides in multiple physical and virtual locations. As with most new technologies, the initial emphasis is on features rather than safety. Also, as the databases are used in production settings, unforeseen security concerns are more likely to be addressed as they arise. (The upside of this equation is that because the databases are more obscure, they present a smaller profile to the bad guys.)

The advent of the self-aware app

Databases are now designed to monitor their connections, available bandwidth, and other environmental factors. When demand surges, such as during the holiday shopping season, the database automatically puts more cloud servers online to handle the increased demand, and similarly puts them offline when demand returns to normal.

This on-demand flexibility relies on the cloud service’s APIs, whether they use proprietary API calls or open-source technology such as OpenStack. Today’s container-based architectures, such as Docker, encapsulate all resources required to run the app, including frameworks and libraries.