Three Approaches to Creating a SQL-Join Equivalent in MongoDB

By: Morpheus Data


Integrating MongoDB document data with SQL and other table-centric data sources needn’t be so processor-intensive.

TL;DR: While there’s no such operation as a SQL-style table join in MongoDB, you can achieve the same effect without relying on table schema. Here are three techniques for combining data stored in MongoDB document collections with minimal query-processing horsepower required.

The signature relational-database operation is the table join: combine data from table 1 with data from table 2 to create table 3. The schema-less document-container structure of MongoDB and other non-relational databases makes such table joins impossible.

Instead, as the MongoDB Manual explains, MongoDB either denormalizes the data by storing related items in a single document, or it relates that data in separate documents. One way to relate documents is via manual references: the _id field of one document is saved in the other document as a reference. The application simply runs a second query to return the related data.

When you need to link multiple documents in multiple collections, DBRefs let you relate documents using the value of one document’s _id field, collection name, and, optionally, its database name. The application resolves DBRefs by running additional queries to return the referenced documents.

tutorial in the MongoDB Manual demonstrates use of denormalization in a social-media application. The manual also provides a SQL-to-aggregation mapping chart.

Simple function for ‘joining’ data within a single MongoDB collection

An alternative approach to relating data in a MongoDB collection is via a function you run in the MongoDB client console. The process is explained in a Stack Overflow post from March 2014.

For example, in a library database, you first create fields for “authors”, “categories”, “books”, and “lending”.

The fields to be “joined” in the MongoDB database are “authors”, “categories”, “books”, and “lending”. Source: Stack Overflow

Then you apply the function.

Run MongoDB’s find() method to retrieve related documents in a collection. Source: Stack Overflow

The result is the rough equivalent of a join operation on SQL tables.

After running the find() method the documents in the collection related as specified are returned. Source: Stack Overflow

Ensuring MongoDB apps integrate with your organization’s other data

Lack of a one-to-one join equivalent is only one of the many ways MongoDB differs from SQL databases. In a July 17, 2013, post, Julian Hyde, lead developer of the Mondrian open-source OLAP engine, explains how he built a MongoDB-to-SQL interface using the Optiq dynamic data management framework.

Optiq features a SQL parser and a query optimizer powered by rewrite rules. Hyde created rules to map SQL tables onto MongoDB collections, and to map relational operations onto MongoDB’s find and aggregate operators. The result is the equivalent of a JDBC driver for MongoDB based on a hybrid query-processing engine intended to shift as much query processing as possible to MongoDB. Joins and other operations are handled by the client.

The process allows you to convert each MongoDB collection to a table. The COLUMNS and TABLES system tables are supplied by Optiq, and the ZIPS view is defined in mongo-zips-model.json.

The Optiq framework allows a MongoDB collection to be converted to a SQL-style table. Source: Julian Hyde

Simple management of SQL, NoSQL, and in-memory databases is a key feature of the new Morpheus Data platform as a service (PaaS) solution. With Morpheus you can provision, deploy, and monitor heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single point-and-click console. Morpheus lets you work with all your databases across public, private, and hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

In addition, the service allows you to migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site to sign up for a demo now!