Schema design in MongoDB

Posted By : Akash Sharma | 18-Sep-2013

In this blog I am going to share the details of schema design in mongodb.I am trying to compare some features of relational world with mongodb and compiling some advantages/disadvantages of using mongodb.After touching some features of RDBMS I will try to show some example of domain mapping in mongodb.

 

Relational DB requires data in 3rd normal form.Whereas in mongodb we focus on how easily we can save and fetch data in our application i.e. matching the data access pattern for our application.

Mongodb supports various features:

(1)Rich documents / embedded documents

(2)No joins

(3)No foreign key constraint

(4)No declared schema

(5)Do not support transactions

(6)Support atomic operations

 

Why we need normalized data?

Answer :

The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.

Free the database of modification anomalies

update : multiple rows are updated for single update may lead to inconsistent data

insert : there can be some rows in which any column may not contain any value

delete : deletion of one entry might delete another entry

Minimize redesign when extending the database structure

minimizing the efforts for redesigning the application which is using database if there is any change.

Avoid bias towards any particular pattern of querying

By making denormalized data we can have a situation in which some queries have easy data access and some have very difficult.In case of normalized data all type of access pattern are unbiased.

 

Living without constraints

In relational world one of the most powerful concept is foreign key constraints.

A foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table.In other words, a foreign key is a column or a combination of columns that is used to establish and enforce a link between the data in two tables.The purpose of the foreign key in the referencing table is to identify a row of the referenced table.

 

In Mongodb there is no foreign key constraint.We can use primary id of one collection as a foreign key in other collection but there is no referential actions to be followed by default.Programmer has to explicitly create a relationship via programming in the two collections and to handle that by himself.

The advantage in this is embedding of data in data itself.We can embed a json document inside a another json document which makes it lot easier for our application to fetch data.

For example here is a short example of blog collection

{
 id : 1,
 author : ‘ashok’,
 title : ‘my blog’,
 tags : [ ‘god’ , ‘religious’ ],
 …….
}

 

Living without Transactions

Transaction consists of four basic features known as ACID properties.

Atomicity  Consistency  Isolation  Durability

Transaction is required in relational world because when you update multiple tables in one go you might go with some consistency problems.

In mongodb there is no concept of transaction.But it has the concept of Atomic operations i.e. operations on one document will be atomic.Programmer has to manually handle this through his code.

 

Schema Design

So now lets move to domain mapping of database in mongodb.We can have three types of mapping strategies in relational world:

One-to-One

One-to-Many

Many-to-Many

 

One to One

For demonstrating the situation I will take an example.

Employee  --  Resume

One Employee has one resume and one Resume belongs to one employee.

Lets figure type of situations and their advantages / disadvantages.

(1)
Employee
{
 id : 20,
 name : ‘ashok’
 ….
}

Resume
{
 id : 30,
 job : [ ...] ,
 education : [..],
 employeeId : 20
 ...
}
(2)
Employee
{
 id : 20,
 name : ‘ashok’,
 resumeId : 30
 ….
}


Resume
{
 id : 30,
 job : [ ...] ,
 education : [..],
 employeeId : 20
 ...
}
(3)Embed Resume in Employee
(4)Embed Employee in Resume

 

Parameters that affect type of database modeling are:

Frequency of Access : If application needs more data access of employee information that resume information than its better to make separate collections for both because every time you fetch employee data resume data is also loaded into memory.

Size of Item : There might be a case when resume has large number of write operations than employee.In that case you can go with seperate collection.Also you cannot exceed the limit of 16 mb of data per document.

Atomicity of data : As mongodb do not support transactions over updates in multiple documents in multiple collection, if you think you can have consistency problems with separate collection you can embed the data into one.

 

One to Many

In this relationship one entity is mapped to one or more than one entities.In this case we have two options available:

One to Many

One to Few

I will demonstrate each with an example.

 

City  -- Person

Let say we have these two domains in which one City can have many person and one person belongs to one City.

(1)
City
{
 name : ‘delhi’,
 area : 2500,
 ….
 person : [
   {
     name : ‘Ashok’,
     age : 20,
     emailId : ‘[email protected]’
   },
   {
     name : ‘Ankit’,
     age : 22,
     emailId : ‘[email protected]’
   },
   ……...
 ]
}

 

This schema design shows that all person documents are embedded in city document i.e. an array of person documents in one city document.This is not a good design because number of persons in a city like Delhi may be upto 20 millions which means loading data of 20 millions user at a time when loading a single city document.

 

(2)If I embed all information of city in person document then it may cause data redundancy which can lead to update anomalies.

 

(3)The Best solution is to have separate collections for both domain i.e. true linking.

City
{
 Id : 21,
 name : ‘delhi’,
 area : 2500,
 ….
}

Person
{
 name: ‘Ashok’,
 age : 20,
 emailId: ‘[email protected]’,
 cityId : 21 
}

 

Now lets move to another example.

BlogPost  --  Comments

 

In the previous example of ( City -- Person ) number of persons would be very large.But in this example we can have a BlogPost having 10 or 20 Comments.In this case we can do embedding of documents.

BlogPost
{
 id : 2,
 title : ‘my Blog’,
 content : ‘This is my first blog’,
 author : ‘ashok’,
 comments : [
   {
      name : ‘amit’,
      comment : ‘nice blog’,
      emailId : ‘[email protected]’
   },
   {
      …...
   }
   ],
   …..
}

 

This is the case of One to Few where you have very few number of related entities with little redundancy.

So in the case of One to Many relational mapping you have to decide that which kind of mapping you want to implement.

 

Many To Many

In Mongodb many to many can categories in two ways:

Few To Few

Many To Many

 

Lets use an example for Few To Few.

Book -- Author

(1)
Book
{
 id: 20,
 authorList: [60,81,92],
 …...
}
Author
{
 id:81,
 bookList: [20,25],
 ...
}

(2)
Author
{
 id: 81,
 Books : [
    {
      id: 20,
      ….
    },
    {
       id: 25,
       ….
    }
 ]
 …..
}

You can implement both ways but in case (2) there could be redundancy of Book data in multiple authors which could lead to update anomalies.

In case (1) I have linked both collections with id list.But as there is no foreign key constraint in mongodb so you have to manually update the authorList and bookList fields.

 

Considering another example for Many to Many.

Student -- Teacher

 

In this case we can follow the Book Author Case(1) example.

There could be a situation where a Student instance is created with no Teacher or there can be a new teacher with no student assigned yet.So in this case it will not ok to embed documents.

 

Advantages of Embedding documents inside another document

When we query to find/update in a collection it tries to seek the file on disk.As documents of each collection are stored in separate file, it is very fast to query on a single collection(Performance booster) whereas when you query on more than one collection it tries to load different files on different locations which takes relatively more time.

 

So ultimately there is a fight between performance and consistency.You have to choose the best option available.

 

Akash Sharma

About Author

Author Image
Akash Sharma

Akash is a bright Groovy and Grails developer and have worked on development of various SaaS applications using Grails technologies. Akash loves playing Cricket and Tennis

Request for Proposal

Name is required

Comment is required

Sending message..