ETL: Teaching an Old Data Cleanup Tool New Tricks

By: Morpheus Data

The role of the traditional data-warehouse extract, transform, load function has broadened to become the foundation for a new breed of graphical business-intelligence tools. It has also spawned a market for third-party ETL tools that support a range of data types, sources, and systems.

The data-warehousing concept of extract, transform, load (ETL) almost seems quaint in the burgeoning era of unstructured data stored in petabyte-sized containers. Some analysts have gone so far as to declare ETL all but dead. In fact, technologies as useful and pervasive as ETL rarely disappear — they just find new roles to play.

After all, ETL is intended to improve accessibility to and analysis of data. This function becomes even more important as data stores grow and analyses become more complex. In addition, the people analyzing the data are now more likely to be business managers using point-and-click dashboards rather than statisticians using sophisticated modeling tools. The IT chestnut “garbage in, garbage out” has never been more relevant.

In a February 2, 2015, post on the Smart Data Collective, Mario Barajas asserts that the best place to ensure the quality of data is at the source: the data input layer. ETL is used at the post-input stage to aggregate data in report-ready form. The technology becomes the lingua franca that “preps” diverse data types for analysis, rather than a data validator.

Gartner analyst Rita Sallam refers to this second generation of ETL as “smart data discovery,” which she expects will deliver sophisticated business-intelligence capabilities to the 70 percent of people in organizations who currently lack access to such data-analysis tools. Sallam is quoted in a January 28, 2015, article on FirstPost.

Big-data analysis without the coding

Off-the-shelf ETL products were almost unheard of when data warehouses first arrived. The function was either built in by the warehouse vendor or hand-coded by the customer. Two trends have converged to create a market for third-party ETL tools: the need to accommodate unstructured data (think Twitter streams and video feeds); and to integrate multiple platforms (primarily mobile and other external apps).

ETL has morphed from a specialty function either built into a data-warehouse system or coded by customers, to a product category that extends far beyond any single data store. Source: Data-Informed

Representing this new era of ETL are products such as ClearStoryPaxataTamr, and Trifacta. As Gigaom’s Barb Darrow explains in a February 4, 2015, article, the tools are intended to facilitate data sharing and integration with a company’s partners. The key is to be able to do so at the speed of modern business. This is where next-gen ETL differs from its slow, deliberate data-warehouse counterpart.

Running at the speed of business is one of the primary benefits of the new Morpheus Virtual Appliance. The Morpheus database-as-a-service (DBaaS) lets you provision and manage SQL, NoSQL, and in-memory databases across hybrid clouds via a simple point-and-click interface. The service supports heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases.

A full replica set is provisioned with each database instance for fault tolerance and fail over, and Morpheus ensures that the replica and master are synched in near real time. The integrated Morpheus dashboard lets you monitor read/writes, queries per second, IOPs, and other stats across all your SQL, NoSQL, and in-memory databases. Visit the Morpheus site for pricing information and to create a free account.