Skip to main content

$bucket

The $bucket operator in MongoDB is used within an aggregation pipeline to categorize incoming documents into buckets or groups based on a specified expression and boundaries. This operator is particularly useful for dividing a collection of documents into ranges and performing aggregate calculations on each range.

Syntax

Here's the basic syntax of the $bucket operator:

{
$bucket: {
groupBy: <expression>,
boundaries: [<lowerbound1>, <lowerbound2>, ...],
default: <default_value>,
output: {
<output_field1>: { <accumulator1>: <expression1> },
...
}
}
}
  • groupBy: The expression by which to group documents.
  • boundaries: An array of values that specify the boundaries for each bucket.
  • default: The value to use for documents that don't fall into any bucket.
  • output: Optional. The fields to include in the output documents, along with their corresponding accumulator expressions.

Example

Consider a sales collection with the following documents:

[
{ "_id": 1, "amount": 100 },
{ "_id": 2, "amount": 200 },
{ "_id": 3, "amount": 300 },
{ "_id": 4, "amount": 400 },
{ "_id": 5, "amount": 500 }
]

You can use the $bucket operator to categorize these sales into different ranges:

db.sales.aggregate([
{
$bucket: {
groupBy: "$amount",
boundaries: [0, 200, 400, 600],
default: "Other",
output: {
count: { $sum: 1 },
average_amount: { $avg: "$amount" }
}
}
}
])

This will produce:

[
{ "_id": 0, "count": 2, "average_amount": 150 },
{ "_id": 200, "count": 2, "average_amount": 350 },
{ "_id": 400, "count": 1, "average_amount": 500 }
]

Considerations

  • The boundaries array must be sorted in ascending order, and it cannot contain duplicate values.

  • The groupBy expression can include field paths, literals, and other expressions.

  • The default field is mandatory for handling documents that don't fit into any of the specified buckets.

  • The output field allows you to apply various accumulator expressions like $sum, $avg, $min, $max, etc., to the documents in each bucket.