Schema design in MongoDB
Posted By : Akash Sharma | 18-Sep-2013
In this blog I am going to share the details of schema design in mongodb.I am trying to compare some features of relational world with mongodb and compiling some advantages/disadvantages of using mongodb.After touching some features of RDBMS I will try to show some example of domain mapping in mongodb.
Relational DB requires data in 3rd normal form.Whereas in mongodb we focus on how easily we can save and fetch data in our application i.e. matching the data access pattern for our application.
Mongodb supports various features:
(1)Rich documents / embedded documents
(2)No joins
(3)No foreign key constraint
(4)No declared schema
(5)Do not support transactions
(6)Support atomic operations
Why we need normalized data?
Answer :
The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.
Free the database of modification anomalies
update : multiple rows are updated for single update may lead to inconsistent data
insert : there can be some rows in which any column may not contain any value
delete : deletion of one entry might delete another entry
Minimize redesign when extending the database structure
minimizing the efforts for redesigning the application which is using database if there is any change.
Avoid bias towards any particular pattern of querying
By making denormalized data we can have a situation in which some queries have easy data access and some have very difficult.In case of normalized data all type of access pattern are unbiased.
Living without constraints
In relational world one of the most powerful concept is foreign key constraints.
A foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table.In other words, a foreign key is a column or a combination of columns that is used to establish and enforce a link between the data in two tables.The purpose of the foreign key in the referencing table is to identify a row of the referenced table.
In Mongodb there is no foreign key constraint.We can use primary id of one collection as a foreign key in other collection but there is no referential actions to be followed by default.Programmer has to explicitly create a relationship via programming in the two collections and to handle that by himself.
The advantage in this is embedding of data in data itself.We can embed a json document inside a another json document which makes it lot easier for our application to fetch data.
For example here is a short example of blog collection
{ id : 1, author : ‘ashok’, title : ‘my blog’, tags : [ ‘god’ , ‘religious’ ], ……. }
Living without Transactions
Transaction consists of four basic features known as ACID properties.
Atomicity Consistency Isolation Durability
Transaction is required in relational world because when you update multiple tables in one go you might go with some consistency problems.
In mongodb there is no concept of transaction.But it has the concept of Atomic operations i.e. operations on one document will be atomic.Programmer has to manually handle this through his code.
Schema Design
So now lets move to domain mapping of database in mongodb.We can have three types of mapping strategies in relational world:
One-to-One
One-to-Many
Many-to-Many
One to One
For demonstrating the situation I will take an example.
Employee -- Resume
One Employee has one resume and one Resume belongs to one employee.
Lets figure type of situations and their advantages / disadvantages.
(1) Employee { id : 20, name : ‘ashok’ …. } Resume { id : 30, job : [ ...] , education : [..], employeeId : 20 ... } (2) Employee { id : 20, name : ‘ashok’, resumeId : 30 …. } Resume { id : 30, job : [ ...] , education : [..], employeeId : 20 ... } (3)Embed Resume in Employee (4)Embed Employee in Resume
Parameters that affect type of database modeling are:
Frequency of Access : If application needs more data access of employee information that resume information than its better to make separate collections for both because every time you fetch employee data resume data is also loaded into memory.
Size of Item : There might be a case when resume has large number of write operations than employee.In that case you can go with seperate collection.Also you cannot exceed the limit of 16 mb of data per document.
Atomicity of data : As mongodb do not support transactions over updates in multiple documents in multiple collection, if you think you can have consistency problems with separate collection you can embed the data into one.
One to Many
In this relationship one entity is mapped to one or more than one entities.In this case we have two options available:
One to Many
One to Few
I will demonstrate each with an example.
City -- Person
Let say we have these two domains in which one City can have many person and one person belongs to one City.
(1) City { name : ‘delhi’, area : 2500, …. person : [ { name : ‘Ashok’, age : 20, emailId : ‘[email protected]’ }, { name : ‘Ankit’, age : 22, emailId : ‘[email protected]’ }, ……... ] }
This schema design shows that all person documents are embedded in city document i.e. an array of person documents in one city document.This is not a good design because number of persons in a city like Delhi may be upto 20 millions which means loading data of 20 millions user at a time when loading a single city document.
(2)If I embed all information of city in person document then it may cause data redundancy which can lead to update anomalies.
(3)The Best solution is to have separate collections for both domain i.e. true linking.
City { Id : 21, name : ‘delhi’, area : 2500, …. } Person { name: ‘Ashok’, age : 20, emailId: ‘[email protected]’, cityId : 21 }
Now lets move to another example.
BlogPost -- Comments
In the previous example of ( City -- Person ) number of persons would be very large.But in this example we can have a BlogPost having 10 or 20 Comments.In this case we can do embedding of documents.
BlogPost { id : 2, title : ‘my Blog’, content : ‘This is my first blog’, author : ‘ashok’, comments : [ { name : ‘amit’, comment : ‘nice blog’, emailId : ‘[email protected]’ }, { …... } ], ….. }
This is the case of One to Few where you have very few number of related entities with little redundancy.
So in the case of One to Many relational mapping you have to decide that which kind of mapping you want to implement.
Many To Many
In Mongodb many to many can categories in two ways:
Few To Few
Many To Many
Lets use an example for Few To Few.
Book -- Author
(1) Book { id: 20, authorList: [60,81,92], …... } Author { id:81, bookList: [20,25], ... } (2) Author { id: 81, Books : [ { id: 20, …. }, { id: 25, …. } ] ….. }
You can implement both ways but in case (2) there could be redundancy of Book data in multiple authors which could lead to update anomalies.
In case (1) I have linked both collections with id list.But as there is no foreign key constraint in mongodb so you have to manually update the authorList and bookList fields.
Considering another example for Many to Many.
Student -- Teacher
In this case we can follow the Book Author Case(1) example.
There could be a situation where a Student instance is created with no Teacher or there can be a new teacher with no student assigned yet.So in this case it will not ok to embed documents.
Advantages of Embedding documents inside another document
When we query to find/update in a collection it tries to seek the file on disk.As documents of each collection are stored in separate file, it is very fast to query on a single collection(Performance booster) whereas when you query on more than one collection it tries to load different files on different locations which takes relatively more time.
So ultimately there is a fight between performance and consistency.You have to choose the best option available.
Akash Sharma
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Akash Sharma
Akash is a bright Groovy and Grails developer and have worked on development of various SaaS applications using Grails technologies. Akash loves playing Cricket and Tennis