Matching Storage Model to Data Structure in Mixed Database Environments

By: Brian Wheeler

Not so long ago, data storage was limited to three physical forms: direct-attached storage (DAS) such as traditional standbys SCSI and SATA disk drives; storage-area networks (SAN) that cluster disks into logical units (LUN) on servers accessed via a network; and network-attached storage (NAS) that allows concurrent access to the disk clusters via file-system protocols that abstract storage from physical machines to virtual machines.

NAS abstraction was a precursor to the real game-changer in storage technology: virtualization. As Brandon Salmon explains in a January 20, 2015, article on InfoWorld’s New Tech Forum, virtualization abstracts physical storage into virtual disks. A hypervisor creates an emulated hardware environment for each virtual machine: processor, memory, and storage. Just as local disks are perceived as part of the physical computer, virtual disks are part of the virtual machine rather than independent objects. When the VM is deleted, the virtual disk is deleted along with it.

Virtual environments such as VMware vSphere, Microsoft Hyper-V, Red Hat Enterprise Virtualization, and Xen platforms use a virtual-disk model. The I/O from a virtual machine goes to software in the hypervisor rather than to hardware via a device bus. This means the protocol used by the VM to communicate with the hypervisor doesn’t have to match the protocol used by the hypervisor to communicate with the storage. The storage model that is exposed upward to the VM and administrator is separated from the storage protocol used by the hypervisor to store the data.

Being able to mix and match storage models and storage protocols, and to switch protocols dynamically without affecting VMs, provides administrators with an unprecedented level of flexibility. As Salmon points out, the storage protocol is no longer application-specific and functionally dependent on the app; it is now part of the infrastructure and is chosen based on cost and performance.

Cloud services take storage abstraction to new levels

In the cloud, the entire storage stack is virtualized, so the application is completely separate from the infrastructure. One cloud storage model is instance storage, which is used the same as virtual disks and can be implemented as DAS (ephemeral storage), or more-reliable NAS or volume storage.

Volume storage is a hybrid of instance storage and SAN that is a primary unit of storage rather than a complete VM. This means you can detach a volume from one VM and attach it to another. In terms of scale and abstraction, volume storage is more like a file than a logical unit; it is considered reliable enough for storing user data. Because volume storage is a model rather than a protocol, it runs atop NFS and such block protocols as iSCSI.

Object storage, such as Amazon’s S3, offers a single logical name space across an entire region, but its ‘eventual consistency’ means that not all users will get the same answers to their requests at any given time. Many cloud apps are designed to leverage this nearly unlimited name space to realize scale and cost advantages over NAS.

Typically, object stores are geared to use over high-latency WAN links that benefit from a simplified collection of data operations. Objects are listed in a bucket, read in their entirety, and have their data replaced with entirely new data. Conversely, NAS lets apps read and write small blocks within a file, change file sizes, and move files between directories, among other management operations.

The advantage of object stores is gigantic namespaces that can extend across great distances inexpensively and reliably. Object storage is popular with cloud-native apps for storing images, static web content, backups, and customer files. It’s not a good choice for NAS workloads requiring strong consistency, nor as a replacement for instance or volume storage, which offer strong consistency, small block updates, and write-intensive, random workloads.

Types of Cloud Storage

The three most common types of cloud storage ‘ instance, volume, and object ‘ are each suited to a particular storage infrastructure. Source: InfoWorld

Consider storage models when selecting a DBMS

Choice of storage model is generally dictated by the database environment: development operations favor simple storage models supporting lightweight prototyping, which helps explain the continuing popularity of document-based DBMSs such as Oracle, MySQL and SQL Server. IT Pro Portal’s John Esposito writes in an April 21, 2016, article that DBMS selection in production environments, where non-developer specialists typically manage data stores, is based on factors other than the optimal combination of data processing, storage, and retrieval mechanisms.

Conversely, non-production environments tend to be more amenable to NoSQL and other non-traditional DBMSs, where databases can be optimized for best structural fit and ease of access. A primary example is MongoDB, which features a static-schema-free document orientation, a document format similar to the popular JSON, and a wide range of connectors. This makes the systems easy to set up in terms of data modeling, and well suited to applications that aren’t particularly data-intensive.

A growing trend among developers is polyglot persistence, in which an application uses more than one storage model. Esposito posits that the near parity between applications using one storage model and those using two indicates that developers are looking to match persistence with the data structures requiring persistence.

Graph structures, which store most information in nodes and edges, don’t match well with the tabular structure of RDBMSs, which rely on data residing in columns and rows. Still, it is worthwhile to store data naturally modeled as a graph in an RDBMS because the relational model is time-tested, it is widely popular with developers and DBAs, and many powerful object-relational mappers are available to facilitate accessing relational data from application code.

Hybrid storage: Having your cake and eating it, too?

Despite the continuing enhancements in cloud storage security, performance, and reliability, companies still hesitate to go all-in on the cloud. Topping the list of concerns about cloud storage are the three IT standbys: security, compliance, and latency. Many companies adopt hybrid cloud storage as a way to combine the cost benefits and scalability of the cloud with the reliability and safety of in-house networks.

Hybrid Cloud Storage

Hybrid cloud storage combines the efficiency and scalability of the public cloud with the security and performance of the private cloud. Source: TechTarget

In a May 22, 2016, article on TechCrunch, Don Basile describes four different storage models intended to address the astonishing increase in the amount of data forecast to flood organizations in coming years. Further complicating future data-storage needs is the variety of data types tomorrow’s information will use.

‘ Hybrid data storage combines the scalability and cost-effectiveness of the cloud with the safety of storing sensitive data in-house.

‘ Flash data storage, which is increasingly common in consumer devices, is being enhanced to meet the big-data needs of enterprises, as evident in Pure Storage’s FlashBlade box that can store as much as 16 petabytes of data (the company expects to double that amount in a single box by the end of 2017).

‘ Intelligent Software Designed Storage (I-SDS) replaces the traditional proprietary hardware stacks with storage infrastructure managed and automated by intelligent software, offering more cost-efficiency and faster response times.

‘ Cold storage archiving takes advantage of slower-moving, less-expensive commodity disks used to store data that isn’t accessed very often, while ‘hot’ data that is accessed more frequently is stored on faster, more expensive flash drives.