Aggregation Pipeline

Aggregation Pipeline

It is new in version 2.2. The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated results. The aggregation pipeline provides an alternative to map-reduce and may be the preferred solution for many aggregation tasks where the complexity of map-reduce may be unwarranted.

Image 26
Diagram of the annotated aggregation pipeline operation. The aggregation pipeline has two phases: $match and $group.

Aggregation pipeline have some limitations on value types and result size.

Pipeline – Conceptually, documents from a collection travel through an aggregation pipeline, which transforms these objects as they pass through. For those familiar with UNIX-like shells (e.g. bash), the concept is analogous to the pipe (i.e. |).

The MongoDB aggregation pipeline starts with the documents of a collection and streams the documents from one pipeline operator to the next to process the documents. Each operator in the pipeline transforms the documents as they pass through the pipeline. Pipeline operators do not need to produce one output document for every input document. Operators may generate new documents or filter out documents. Pipeline operators can be repeated in the pipeline.

Changed in version 2.6: The db.collection.aggregate() method returns a cursor and can return result sets of any size. Previous versions returned all results in a single document, and the result set was subject to a size limit of 16 megabytes.

For example usage of the aggregation pipeline, consider Aggregation with User Preference Data and Aggregation with the Zip Code Data Set, as well as the aggregate command and the db.collection.aggregate() method reference pages.

Pipeline Expressions – Each pipeline operator takes a pipeline expression as its operand. Pipeline expressions specify the transformation to apply to the input documents. Expressions have a document structure and can contain fields, values, and operators. Pipeline expressions can only operate on the current document in the pipeline and cannot refer to data from other documents: expression operations provide in-memory transformation of documents.

Generally, expressions are stateless and are only evaluated when seen by the aggregation process with one exception: accumulator expressions. The accumulator expressions, used with the $group pipeline operator, maintain their state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline.

Aggregation Pipeline Behavior – In MongoDB, the aggregate command operates on a single collection, logically passing the entire collection into the aggregation pipeline. To optimize the operation, wherever possible, use the following strategies to avoid scanning the entire collection.

Pipeline Operators and Indexes – The $match, $sort, $limit, and $skip pipeline operators can take advantage of an index when they occur at the beginning of the pipeline before any of the following aggregation operators: $project, $unwind, and $group. New in version 2.4, is the $geoNear pipeline operator takes advantage of a geospatial index. When using $geoNear, the $geoNear pipeline operation must appear as the first stage in an aggregation pipeline. For unsharded collections, when the aggregation pipeline only needs to access the indexed fields to fulfill its operations, an index can cover the pipeline. As an example, consider the following index on the orders collection

{ status: 1, amount: 1, cust_id: 1 }

This index can cover the following aggregation pipeline operation because MongoDB does not need to inspect the data outside of the index to fulfill the operation

db.orders.aggregate([

{ $match: { status: “A” } },

{ $group: { _id: “$cust_id”, total: { $sum: “$amount” } } },

{ $sort: { total: -1 } }

])

Early Filtering – If your aggregation operation requires only a subset of the data in a collection, use the $match, $limit, and $skip stages to restrict the documents that enter at the beginning of the pipeline. When placed at the beginning of a pipeline, $match operations use suitable indexes to scan only the matching documents in a collection. Placing a $match pipeline stage followed by a $sort stage at the start of the pipeline is logically equivalent to a single query with a sort and can use an index. When possible, place $match operators at the beginning of the pipeline.

Additional Features – The aggregation pipeline has an internal optimization phase that provides improved performance for certain sequences of operators.

Apply for MongoDB Certification Now!!

https://www.vskills.in/certification/databases/mongodb-server-administrator

Back to Tutorial

Share this post
[social_warfare]
Introduction
Map-Reduce

Get industry recognized certification – Contact us

keyboard_arrow_up