
MongoDB Schema Design Explained: Embedding vs Referencing
This presentation explains why MongoDB schema design is one of the most important parts of building a fast and scalable application. The speaker explains that many developers focus first on indexing, caching, or hardware, but the real foundation is how the data is structured. If the schema is poorly designed, the application may become slow even before it grows.
The researcher introduces the topic from a developer’s point of view. He explains that many developers come to MongoDB with a traditional SQL mindset. They often design MongoDB collections the same way they design relational database tables. This can work in some cases, but it often causes performance problems because MongoDB is designed differently.
SQL vs MongoDB Design
In a traditional SQL database, developers usually design data by separating it into tables. For example, a user table may be separate from a professions table and a cars table. These tables are connected through foreign keys. This process is called normalization, and its main goal is to avoid duplicate data.
MongoDB works differently. Instead of splitting everything into separate tables, MongoDB allows related data to be stored together in one document. For example, a user document can include the user’s name, location, professions, and cars all in one place. This makes it easier and faster to read the data when the application needs everything at once.
Designing for the Application
The key message is that MongoDB schema design should be based on how the application uses the data. The developer should ask: “What data does the app need most often?” and “How will users access this data?”
There are three important things to consider: how the data is stored, how fast queries need to be, and how much hardware is needed. A good schema should help the app run faster without requiring unnecessary server power.
Embedding vs Referencing
The researcher explains that MongoDB schema design mainly depends on two choices: embedding and referencing.
Embedding means placing related data inside the same document. For example, a user profile can include addresses, professions, or social links directly inside the user document. This is useful when the app usually needs all that data together. It allows the app to get everything with one query.
Referencing means storing related data in separate documents and connecting them with IDs. This is useful when the related data is large, rarely needed, or may grow too much. For example, an e-commerce product may reference thousands of parts instead of storing all parts inside one product document.
When to Embed
Embedding is usually the better default choice in MongoDB. It is fast because the app can retrieve all needed data in one query. It also allows updates inside one document to be handled safely and efficiently.
For example, if a profile page always needs the user’s name, city, and social links, those fields should probably be embedded in the same document.
When to Reference
Referencing is better when documents become too large or when some data is not always needed. MongoDB documents have a 16 MB size limit, so very large arrays or unlimited growing data should not be embedded.
For example, server logs should not be stored inside one server document forever. Logs can grow endlessly. In that case, each log message should be stored separately and reference the server it belongs to.
Common Relationship Types
For one-to-one relationships, MongoDB can simply use normal key-value fields.
For one-to-few relationships, such as a user with a few addresses, embedding works well.
For one-to-many relationships, such as a product with many parts, referencing may be better.
For one-to-millions relationships, such as servers with endless logs, reverse referencing is usually the right choice.
For many-to-many relationships, such as users and tasks, both documents may reference each other.
Main Rules
The researcher gives several practical rules. Developers should favor embedding unless there is a strong reason not to. They should avoid joins and lookups when possible, but they should not fear references when they make the design better.
Arrays should not grow without limit. Most importantly, the schema must match the application’s real access patterns.
Final Message
The main lesson is simple: MongoDB is flexible, but that flexibility must be used carefully. A good schema is not copied from SQL design. It is built around the real needs of the application.
In the end, the best MongoDB design is the one that helps the app read, write, and scale efficiently.