Aggregation en ;MongoDB

Introducción a Agregaciones en MongoDB

Fuente:

https://docs.mongodb.com/manual/aggregation/

Las operaciones de agregación procesan los registros de datos y devuelven los resultados calculados. Las operaciones de agregación agrupan los valores de varios documentos juntos y pueden realizar una variedad de operaciones en los datos agrupados para devolver un solo resultado. MongoDB proporciona tres formas de realizar la agregación: el canal de agregación, la función de reducción de mapas y los métodos de agregación de propósito único.

Aggregation Pipeline

Map-Reduce

Single Purpose Aggregation Operations

SQL vs Aggregation

Fuente MongoDB ejemplos

https://docs.mongodb.com/manual/reference/sql-aggregation-comparison/

SQL Terms, Functions, and Concepts

MongoDB Aggregation Operators

WHERE

GROUP BY

HAVING

SELECT

ORDER BY

LIMIT

SUM()

COUNT()

join

Ejemplos:

SQL

Description

SELECT COUNT(*) AS count FROM orders

Count all records from orders

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

SELECT SUM(price) AS total FROM orders

Sum the price field from orders

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        total: { $sum: "$price" }
     }
   }
] )

SELECT cust_id, SUM(price) AS total FROM orders GROUP BY cust_id

For each unique cust_id, sum the price field.

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

SELECT cust_id, SUM(price) AS total FROM orders GROUP BY cust_id ORDER BY total

For each unique cust_id, sum the price field, results sorted by sum.

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   },
   { $sort: { total: 1 } }
] )

SELECT cust_id, ord_date, SUM(price) AS total FROM orders GROUP BY cust_id, ord_date

For each unique cust_id, ord_date grouping, sum the price field. Excludes the time portion of the date.

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   }
] )

SELECT cust_id, count(*) FROM orders GROUP BY cust_id HAVING count(*) > 1

For cust_id with multiple records, return the cust_id and the corresponding record count.

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        count: { $sum: 1 }
     }
   },
   { $match: { count: { $gt: 1 } } }
] )

SELECT cust_id, ord_date,SUM(price) AS total FROM orders GROUP BY cust_id, ord_date HAVING total > 250

For each unique cust_id, ord_date grouping, sum the price field and return only where the sum is greater than 250. Excludes the time portion of the date.

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
              month: { $month: "$ord_date" },
              day: { $dayOfMonth: "$ord_date" },
              year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

SELECT cust_id, SUM(price) as total FROM ordersWHERE status = 'A' GROUP BY cust_id

For each unique cust_id with status A, sum the price field.

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

SELECT cust_id, SUM(price) as total FROM orders WHERE status = 'A' GROUP BY cust_id HAVING total > 250

For each unique cust_id with status A, sum the price field and return only where the sum is greater than 250

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

SELECT cust_id, SUM(li.qty) as qty FROM orders o, order_lineitem li WHERE li.order_id = o.id GROUP BY cust_id

For each unique cust_id, sum the corresponding line item qty fields associated with the orders.

db.orders.aggregate( [
   { $unwind: "$items" },
   {
     $group: {
        _id: "$cust_id",
        qty: { $sum: "$items.qty" }
     }
   }
] )

SELECT COUNT(*) FROM (SELECT cust_id, ord_date FROM orders GROUP BY cust_id, ord_date) as DerivedTable

Count the number of distinct cust_id, ord_date groupings. Excludes the time portion of the date.

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
              month: { $month: "$ord_date" },
              day: { $dayOfMonth: "$ord_date" },
              year: { $year: "$ord_date"}
           }
        }
     }
   },
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

SQL Example

Fuente:

MongoDB Aggregation Example

https://examples.javacodegeeks.com/software-development/mongodb/mongodb-aggregation-example/

$project

db.employee.aggregate([
 {"$project":{ "_id" : 0, "emp_fname" : 1, "emp_dept" : 1, "emp_band" : 1, "emp_specs" : 1 }}
  ])

$match

db.employee.aggregate([ {"$match": { "emp_dept": "Finance" }} ])

$limit

db.employee.aggregate([ {"$match": { "emp_dept": "Technology" }}, {"$limit": 1} ])

$group

db.employee.aggregate([ {"$group": {"_id": {"emp_dept" : "$emp_dept"}, "No_of_Times": {"$sum": 1}}} ])

$sort

db.employee.aggregate([ {"$match": { "emp_dept": "HR" }}, {"$sort" : {"emp_fname": 1}} ])

$lookup

db.emp_city.aggregate([ {$lookup: {from: "employee", localField: "emp_country_id", foreignField: "_id", as: "Country" }} ])

$out

db.employee.aggregate([ {"$match": { "emp_dept": "HR" }}, {"$sort" : {"emp_fname": 1}}, {"$out" : "sorted_emp"} ])

$unwind

db.employee.aggregate([ {"$match": { "emp_fname": "April" }}, {"$unwind": "$emp_specs"} ])

Expresiones usadas en la funciòn Aggregation

Expression

Description

$sum

Suma

$avg

Promedio

$min

Retorna el valor mínimo

$max

Retorna el valor máximo.

AGGREGATION BUILDER

Fuente:

http://mongodb.github.io/mongo-java-driver/3.2/builders/aggregation/

  • Se pueden mezclar con Document().

Match

The$matchpipeline stage passes all documents matching the specified filter to the next stage. Though the filter can be an instance of any class that implementsBson, it’s convenient to combine with use of theFiltersclass. In the example below, it’s assumed that theeqmethod of theFiltersclass has been statically imported.

This example creates a pipeline stage that matches all documents where theauthorfield is equal to"Dave":

match(eq(
"author"
, 
"Dave"
))

Project

The$projectpipeline stage passes the projected fields of all documents to the next stage. Though the projection can be an instance of any class that implementsBson, it’s convenient to combine with use of theProjectionsclass. In the example below, it’s assumed that theinclude,excludeId, andfieldsmethods of theProjectionsclass have been statically imported.

This example creates a pipeline stage that excludes the_idfield but includes thetitleandauthorfields:

project(fields(include(
"title"
, 
"author"
), excludeId()))

Projecting Computed Fields

The$projectstage can project computed fields as well.

This example simply projects theqtyfield into a new field calledquantity. In other words, it renames the field:

project(computed(
"quantity"
, 
"$qty"
))

Sample

The$samplepipeline stage randomly select N documents from its input. This example creates a pipeline stage that randomly selects 5 documents from the collection:

sample(
5
)

Sort

The$sortpipeline stage passes all documents to the next stage, sorted by the specified sort criteria. Though the sort criteria can be an instance of any class that implementsBson, it’s convenient to combine with use of theSortsclass. In the example below, it’s assumed that thedescending,ascending, andorderBymethods of theSortsclass have been statically imported.

This example creates a pipeline stage that sorts in descending order according to the value of theagefield and then in ascending order according to the value of thepostsfield:

sort(orderBy(descending(
"age"
), ascending(
"posts"
)))

Skip

The$skippipeline stage skips over the specified number of documents that pass into the stage and passes the remaining documents to the next stage.

This example skips the first5documents:

skip(
5
)

Limit

The$limitpipeline stage limits the number of documents passed to the next stage.

This example limits the number of documents to10:

limit(
10
)

Lookup

Starting in 3.2, MongoDB provides a new$lookuppipeline stage that performs a left outer join with another collection to filter in documents from the joined collection for processing.

This example performs a left outer join on thefromCollectioncollection, joining thelocalfield to thefromfield and outputted in thejoinedOutputfield:

lookup(
"fromCollection"
, 
"local"
, 
"from"
, 
"joinedOutput"
)

Group

The$grouppipeline stage groups documents by some specified expression and outputs to the next stage a document for each distinct grouping. A group consists of an_idwhich specifies the expression on which to group, and zero or moreaccumulatorswhich are evaluated for each grouping. To simplify the expression of accumulators, the driver includes anAccumulatorsclass with static factory methods for each of the supported accumulators. In the example below, it’s assumed that thesumandavgmethods of theAccumulatorsclass have been statically imported.

This example groups documents by the value of thecustomerIdfield, and for each group accumulates the sum and average of the values of thequantityfield into thetotalQuantityandaverageQuantityfields, respectively.

group(
"$customerId"
, sum(
"totalQuantity"
, 
"$quantity"
), avg(
"averageQuantity"
, 
"$quantity"
))

Unwind

The$unwindpipeline stage deconstructs an array field from the input documents to output a document for each element.

This example outputs, for each document, a document for each element in thesizesarray:

unwind(
"$sizes"
)

Available with MongoDB 3.2, this example also includes any documents that have missing ornullvalues for the$sizesfield or where the$sizeslist is empty:

unwind(
"$sizes"
, 
new
 UnwindOptions().preserveNullAndEmptyArrays(
true
))

Available with MongoDB 3.2, this example unwinds thesizesarray and also outputs the array index into the$positionfield:

unwind(
"$sizes"
, 
new
 UnwindOptions().includeArrayIndex(
"$position"
))

Out

The$outpipeline stage outputs all documents to the specified collection. It must be the last stage in any aggregate pipeline:

This example writes the pipeline to theauthorscollection:

out(
"authors"
)

Creating a Pipeline

The above pipeline operators are typically combined into a list and passed to theaggregatemethod of aMongoCollection. For instance:

collection.aggregate(Arrays.asList(match(eq(
"author"
, 
"Dave"
)),
                                   group(
"$customerId"
, sum(
"totalQuantity"
, 
"$quantity"
),
                                                        avg(
"averageQuantity"
, 
"$quantity"
))
                                   out(
"authors"
)));

Last updated