The Importance of Schema Design in ‘Schema-less’ MongoDB

By: Morpheus Data

TL;DR: A common misconception about the document-based MongoDB NoSQL database is that it requires no schema at all. In fact, the first step in designing in MongoDB is selecting a schema that matches the database users’ needs. Choosing the right schema allows you to take full advantage of the system’s performance, efficiency, and scalability benefits.

Most of the people designing MongoDB databases come from a relational-database background. Transferring from a world of tables, joins, and normalization to the document-based approach of MongoDB and other NoSQL databases can be liberating and daunting at the same time.

MongoDB is designed for speed: You can embed all sorts of data types and structures in collections of documents that are easy to query. To realize the performance potential of MongoDB, you have to design collections to match the app’s most common access patterns.

In an August 2013 post on Percona’s MySQL Performance Blog, Stephane Combaudon uses the example of a simple passport database. In MySQL, you would typically create a “people” table with “id” and “name” columns, and a “passport” table with “id”, “people_id”, “country”, and “valid_until” columns. Then you would use joins between the tables to run queries.

A basic passport database in MySQL might use joins between two separate tables to query the database. Source: MySQL Performance Blog

In contrast, a MongoDB database for the same purpose could use a single collection to store all the passport information, but this makes it difficult to determine which attributes are associated with which objects.

The same passport database in MongoDB could place all data elements in a single collection. Source: MySQL Performance Blog

Alternatively, you could embed the passport information inside the people information, or vice-versa, although this could be a problem if some people don’t have passports, such as “Cinderella” in the example below.

A MongoDB passport database could embed the people information inside the passport information, though this design likely doesn’t optimize performance. Source: MySQL Performance Blog

In this example, you’re much more likely to access people information than passport information, so having two separate collections makes sense because it keeps less data in memory. When you need the passport data, simply add a join to the app.

The dangers of attempting 1:1 conversions of relational DBs

Many of the skills you learned in developing relational databases transfer smoothly to MongoDB’s document-based model, but the principal exception is schema design, as InfoWorld’s Andrew C. Oliver explains in a January 14, 2014, article. If you attempt a 1:1 port of an RDBMS schema to MongoDB, you’re almost certain to run into performance problems.

Oliver points out that most of the complaints about MongoDB are by people whose choice of schema was all wrong for a document-focused database. A 1:1 table-to-document port is prone to cause missed joins, lost atomicity (although you can have atomic writes within a single MongoDB document), more required operations, and a failure to realize the performance benefits of parallelism.

By not enforcing a schema on a document or schema the way pre-defined schemas are required in RDBMSs, your databases are theoretically easier to develop and modify. In practice, things don’t always work out this way. Among the MongoDB gotchas examined by Russell Smith in a Rainforest Blog post from November 2012 and updated on July 29, 2014, is failure to give schema design the attention it deserves.

Of course, MongoDB databases don’t exist in isolation. Services such as the Morpheus database-as-a-service (DBaaS) are geared to meet the real-world needs of organizations that rely on a mix of SQL, NoSQL, and in-memory databases. In fact, Morpheus is the first and only DBaaS that lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases.

With Morpheus, you can bring up an instance of any database, monitor it, and optimize its performance in just seconds via a single dashboard. And all database instances include a free full replica set. Visit the Morpheus site for to sign up for a free account.