Parmeet Singh
Data Modeling with MongoDB
MongoDB is a popular NoSQL document database used by developers and organizations worldwide. It provides a flexible data modeling approach that enables developers to build scalable and efficient applications. In this blog, we will discuss the principles and best practices of MongoDB data modeling.
Significance of Data Model.
Data modeling is a crucial part which helps us to define simplified and logical databases as per the workload. It is necessary to eliminate storage requirements, redundancy and efficient data queries.
One important principle of MongoDB data modeling is to renormalize data. This means that data should be stored in a way that minimizes the need for joins and lookups. In a traditional relational database, data is often normalized into separate tables to reduce redundancy. However, in MongoDB, redundancy is acceptable and even desirable because it allows for faster and more efficient queries.
Schema Design in MongoDB.
When designing a schema for MongoDB, it is important to consider the access patterns of the application. This means thinking about how the data will be read, written, and updated. The goal is to create a data model that optimizes for the most common access patterns.
For example, if an application needs to display a list of products with their prices, it might make sense to store the product data and price data in the same document. This way, the application can retrieve the product and price data with a single query, rather than performing a join between two separate tables.
Schema design consist of the following:
- Entity: It’s an independent object that also represents the logical part in the application. Entity can be a real world object i.e laptop or Operating system, something which don’t have a physical form.
- Entity Type: It refers to a category of object, person or concepts that share common characteristics. For example :Asus razor belongs to an entity type laptop.
- Attributes: it refers to properties of an entity that help to define or describe it. For example, the entity “Novel” has the attribute title (String).
- Relationship: It defines the relationship between entities. For example, many users can rent a house. Relationship between users and a house is one to many.
Right way to design Data Model.
In MongoDB, firstly identify the workload, i.e basic understanding of your application whether it has more reads or writes operations, size of the data on which you are performing queries and measure the quantity of operations performed.
After workload you can start identifying the relations, i.e How the objects would be related to each other(one to one, one to many and many to many) and attributes you used to describe the objects.
MongoDB documents can contain nested documents and arrays of documents. This allows developers to model complex data structures without the need for separate tables or joins.
For example, if an application needs to store a list of comments for a blog post, the comments could be stored as an array of embedded documents within the blog post document.
Using embedded documents and arrays can help to reduce the number of queries required to retrieve data. However, it is important to keep in mind that MongoDB has a document size limit of 16MB. If a document grows too large, it can cause performance issues and may need to be split into multiple documents or collections.
Choosing the Right Data Model Pattern.
A data model pattern is a reusable design solution for organizing data within a database or information system. It provides a standardized approach for representing and structuring data, which can simplify the data modeling process and improve data quality.
In mongoDB we have design patterns based on previous use-cases. There are a total 10 well documented data patterns in MongoDB. After identifying workload and relations between them, we can check with the data pattern whether it’s relevant for your application or not.
For example: Attribute Pattern: In MongoDB, an attribute pattern is a way of organizing data by grouping related fields together as key-value pairs within a single document. This can help to improve query performance and simplify data retrieval.
In this example, the document represents a person with a name, age, address, and phone number. The address and phone fields are both examples of the attribute pattern, because they group related information together as sub-documents and arrays respectively.
The phone field is an array that contains one or more sub-documents, each of which represents a phone number. This allows multiple phone numbers to be stored and retrieved for the same person, and also allows each phone number to be associated with a type (e.g. “home” or “work”)
Conclusion
MongoDB data modeling is a powerful and flexible approach to building scalable and efficient applications. By denormalizing data, using embedded documents and arrays, and indexing effectively, developers can create data models that optimize for the most common access patterns. When designing a data model, it is important to consider the specific use case and access patterns of the application to ensure optimal performance.
Get In Touch
Manisha Shire
MULTIDIMENSIONAL GROUPING in MongoDB Using $facet Aggregation
According to MongoDB official documentation, they describe ‘$facet’ as:
“Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.
The $facet stage allows you to create multi-faceted aggregations which characterize data across multiple dimensions, or facets, within a single aggregation stage. Multi-faceted aggregations provide multiple filters and categorizations to guide data browsing and analysis. Retailers commonly use faceting to narrow search results by creating filters on product price, manufacturer, size, etc.”
So, first of all, Why do we use ‘$facet’, What is ‘$facet’ mainly useful for or and what is ‘$facet’?
Sometimes when we are creating a report on specific data, you get to know that we need to do the same preliminary processing for a number of reports, and you have to create and maintain an intermediate collection though out the processing. For better optimization of the mongo query, we use “$facet” to improve the performance of the mongo Query.
For a better understanding of “$facet” you can think of it as follows:
In the restaurant, you have ordered 2 different plates of Pasta, so what the chef will do is, he will keep the same pasta and then put other ingredients like sauces and veggies in it accordingly.
So that was basically a short introduction of what $facet exactly is.
Now, will take a look at the problem scenario, which I recently faced and how I overcame it with the $facet.
Let’s suppose there is a collection in the MongoDB document named metricsCaptured, which holds basic details like name, region, and count.
I wanted the data of the top 5 records of the AMER region and the top 5 of the APAC region. ( I was using Node.js with MongoDB )
I first chose the fields from the main collection named “metricsCaptured” and used $unwind on that array field. then I used $facet for all those necessary fields as an individual array like AMER_region and APAC_region. In that used $match to get all records of AMER and APAC regions and then used $sort, $limt, and $project to get the top 5 records of metricscaptured from the main collection.
In the processing documents, each sub-pipeline within $facet like APAC _region and AMER_region is given the exact same set of input documents from stage $match. These sub-pipelines are completely independent of each other and the array output is stored in separate fields in the output document i.e APAC_region and AMER_region.
This process is way more efficient and in case of any future change depending on the requirement, all you need to do is add or remove more sub-arrays from the facet without disturbing the other arrays.
Conclusion:
“$facet” is nothing but multiple aggregation pipelines within a single stage on the same set of input documents.