
MongoDB’s aggregation framework provides many stages to query, process, and transform data. While you can use and combine stages as you like, there are several common pipelines you can use as blueprints to accomplish common tasks.
In this video, we’ll walk you through one common aggregation pipeline sequence using the match, group, and project stages.
Match is used to filter for specific documents. The group stage groups documents by a specified value. Finally, the project stage is used so the pipeline returns only the fields we need.
Before we get into the pipeline, let’s briefly introduce you to the dataset.
We’ll be working with data from an online bookstore app in a sales collection. We want to generate a report showing the total revenue generated from book sales for each genre of book during twenty twenty-five.
We can use this information to make informed decisions about inventory and marketing. We’ll use MongoDB’s aggregation framework to accomplish this task.
Let’s get started.
First, let’s look at an example document from the sales collection. As you can see, it contains documents that record all of the books sold to a specific customer on a particular date.
The customer is recorded as an object ID, which references our customer collection.
Note that the books field is an array where each book in the array contains the genre and price.
We need to group all books sold by genre. To do this, we’ll use the match and group stages.
Since the genres are stored in an array, we also need an unwind stage after the match stage in our pipeline.
The unwind stage deconstructs an array field from the input documents. It outputs a document for each element of the array, effectively flattening the array.
By unwinding the books array, each book sold in every sales document will be represented by its own document in the output of this stage. This is crucial for setting up the next stage where we group them by genre.
Since we want our report to aggregate data from the year twenty twenty-five, the first thing we’ll do is ensure that we’re only working with sales documents from that year.
To accomplish this, we’ll write a match stage using the greater than or equal to and less than operators to specify that the date should be on or after January first twenty twenty-five and before January first twenty twenty-six.
As discussed earlier, it’s usually best to place a match stage at the beginning of a pipeline. Not only does this reduce the size of the dataset being passed to the next stage, but it also allows us to use existing indexes to improve performance.
If we only ran the match stage, the pipeline would return all documents with a date field in the year twenty twenty-five.
If that was our only goal, a simple find operation would be enough. But since we want to calculate and transform data, we need additional stages.
Now we have sales by year.
Next, to track each book sold, we’ll unwind the books array so we can group them by genre.
After using unwind, we now have one document for each book sold in twenty twenty-five.
The next step is grouping these documents by genre and calculating how much revenue they generated.
We do this by adding a group stage and specifying the group key as the genre of each book.
Because the genre field is inside the embedded books document, we use dot notation:
books.genre
We also use the sum operator to add up all sales and create a field named total revenue.
This field shows how much revenue each genre generated.
Now we’ve transformed our data from sales by customer into revenue by genre.
We now have one document for each genre showing the total revenue.
To make the output cleaner, we use the project stage.
Project allows us to exclude, include, rename, or create new fields.
Here, we rename the group ID field to genre so it is easier for stakeholders to understand.
To do this, we remove the group ID field by setting it to zero, create a new field called genre, and assign it the value of the old group ID.
We also keep the total revenue field by setting it to one.
After running the aggregation pipeline, we can clearly see the total revenue generated by each book genre.
To recap, this example used the pipeline sequence:
Match → Unwind → Group → Project
This pattern is one of the most common and powerful ways to filter, transform, and analyze data in MongoDB.
The unwind stage was essential here because the field we grouped by was stored inside an array.