NoSQL Will Protect You From The Onslaught of Data Overload

By: Morpheus Data

TL;DR: As the amount of unstructured data being collected by organizations skyrockets, their existing databases come up short: they’re too slow, too inflexible, and too expensive. What’s needed is a DBMS that isn’t constricted by the relational schema, and one that accommodates object-oriented data structures without the complexity and latency of object-relational mapping frameworks. NoSQL (a.k.a. Not Only SQL) provides the flexibility, scalability, and availability required to manage the deluge of unstructured data, albeit with some shortcomings of its own.

Data isn’t what it used to be. Gone (or going) is the well-structured model of data stored neatly in tables and rows queried via the data’s established relations. Along comes Google, Facebook, Twitter, Amazon, and untold other sources of unstructured data that simply doesn’t fit comfortably in a conventional relational database.

That isn’t to say RDBMSs are an endangered species. An August 24, 2014, article on TheServerSide.com points out that enterprises continue to prefer SQL databases, primarily for their reliability through compliance with the atomicity, consistency, isolation, and durability (ACID) model. Also, there are plenty of DBAs with relational SQL experience, but far fewer with NoSQL skills.

Still, RDBMSs don’t accommodate unstructured data easily — at least not in their current form. The future is clearly one in which the bulk of data in organizations is unstructured. As far back as June 2011 an IDC study (pdf) predicted that 90 percent of the data generated worldwide in the next decade would be unstructured. How much data are we talking? How about 8000 exabytes in 2015, which is the equivalent of 8 trillion gigabytes.

Growth in Data

The tremendous growth in the amount of data in the world — most of which is unstructured — requires a non-relational approach to management. Credit: IDC

As the IDC report points out, enterprises can no longer afford to “consume IT” as part of their internal infrastructure, but rather as an external service. This is particularly true as cloud services such as the Morpheus database as a service (DBaaS) incorporate the security and reliability required for companies to ensure the safety of their data and regulatory compliance.

By supporting both MongoDB and MySQL, Morpheus offers organizations the flexibility to transition existing databases and their new data stores to the cloud. They can use a single console to monitor their queries in real time to find and remove performance bottlenecks. Connections are secured via VPN, and all data is backed up, replicated, and archived automatically. Morpheus’s SSD-backed infrastructure ensures fast connections to data stores, and direct links to EC2 provide ultra-low latency. Visit the Morpheus site for pricing information or to sign up for a free account.

Addressing SQL’s scalability problem head-on

A primary shortcoming of SQL is that as the number of transactions being processed goes up, performance goes down. The traditional solution is to add more RDBMS servers, but doing so is expensive, not to mention a management nightmare as optimization and troubleshooting become ever more complex.

With NoSQL, your database scales horizontally rather than vertically. The resulting distributed databases host data on thousands of servers that can be added or deleted without affecting performance. Of course, reality is rarely this simple. In a November 20, 2013, article on InformationWeek, Joe Masters explains that high availability is simple to achieve on read-only distributed systems. Writing to those systems is much trickier.

As stated in the CAP theorem (or Brewer theorem, named after Eric Brewer), you can have strict availability, or strict consistency, but not both. NoSQL databases lean toward the availability side, at the expense of consistency. However, distributed databases are getting better at handling timeouts, although there’s no way to do so without affecting the database’s performance.

Another NoSQL advantage is that it doesn’t lock you into a rigid schema the way SQL does. As Jnan Dash explains in a September 18, 2013, article on ZDNet, revisions to the data model can cause performance problems, but rarely do designers know all the facts about the data model before it goes into production. The need for a dynamic data model plays into NoSQL’s strength of accommodating changes in markets, changes in the organization, and even changes in technology.

The benefits of NoSQL’s data-model flexibility

NoSQL data models are grouped into four general categories: key-value (K-V) stores, document stores, column-oriented stores, and graph databases. Ben Scofield has rated these NoSQL database categories in comparison with relational databases. (Note that there is considerable variation between NoSQL implementations.)

 

The four general NoSQL categories are rated by Ben Scofield in terms of performance, scalability, flexibility, complexity, and functionality. Credit: Wikipedia

The fundamental data model of K-V stores is the associative array, which is also referred to as a map or directory. Each possible key of a key-value pair appears in a collection no more than once. As one of the simplest non-trivial data models, K-V stores are often extended to more-powerful ordered models that maintain keys in lexicographic order, among other purposes.

The documents that comprise the document store encapsulate or encode data in standard formats, typically XML, YAML, or JSON (JavaScript Object Notation), but also binary BSON, PDF, and MS-Office formats. Documents in collections are somewhat analogous to records in tables, although the documents in a collection won’t necessarily share all fields the way records in a table do.

A NoSQL column is a key-value pair comprised of a unique name, value, and timestamp. The timestamp is used to distinguish valid content from stale content and thus addresses the consistency shortcomings of NoSQL. Columns in distributed databases don’t need the uniformity of columns in relational databases because NoSQL “rows” aren’t tied to “tables,” which exist only conceptually in NoSQL.

Graph databases use nodes, edges, and properties to represent and store data without need of an index. Instead, each database element has a pointer to adjacent elements. Nodes can represent people, businesses, accounts, or other trackable items. Properties are data elements that pertain to the nodes, such as “age” for a person. Edges connect nodes to other nodes and to properties; they represent the relationships between the elements. Most of the analysis is done via the edges.

Once you’ve separated the NoSQL hype from the reality, it becomes clear that there’s plenty of room in the database environments of the future for NoSQL and SQL alike. Oracle, Microsoft, and other leading SQL providers have already added NoSQL extensions to their products, as InfoWorld’s Eric Knorr explains in an August 25, 2014, article. And with DBaaS services such as Morpheus, you get the best of both worlds: MongoDB for your NoSQL needs, and MySQL for your RDBMs. It’s always nice to have options!