Writing a library for building aggregations

Often it seems that the simplest libraries are some of the most useful. Today I played around with mongo aggregations and wrote a library to help write reusable and maintainable aggregations.

Understanding Aggregations

What are aggregations?

Aggregations allow you to perform operations on Mongo documents. You can then chain / pipe those results into another operation and rinse and repeat until you achieve your desired result. You can read the comprehensive documentation here.

An example

Lets assume a collection of Artist documents like the following:

[{ 
    name: 'Vincent Van Gogh',
    type: 'painter',
    works: [
        { title: 'The Starry Night', forSale: true },
        { title: 'The Potato Eaters', forSale: true }
    ]
}, {
    name: 'Pablo Picasso',
    type: 'painter',
    works: [
        { title: 'The Old Guitarist', forSale: false },
        { title: 'Guernica', forSale: true }
     ]
}]

For this example, lets say that we want to get a list of artists who are painters, have paintings available for sale, and then get a count of how many paintings they have.

Using mongo-native, the aggregation would look like this:

artistCollection.aggregate([  
   {$match: {type: 'painter', 'works.forSale': true }},
   {$unwind: '$works'},
   {$match: {'works.forSale':  true }},
   {$group: {_id: '$name', paintingsForSaleCount: { $sum: 1 }}}
], function(err, results) {
    results.forEach(function(a) {
        console.log("Artist %a has %d paintings for sale", a._id, a.paintingsForSaleCount);
    });
})

Let's go over what this aggregation is doing. Remember, each operation takes the result and pipes it into the next operation.

  1. $match: Find all of the artists who are type painter, and have a work forSale: true in their works.

  2. $unwind: For each work in the works array, create a document that is a copy of the artist, but instead of having an array for the works field, have a single object that is work.

  3. $match: For each of those documents, filter on if the work is for sale. (Note: The first time we did this we were finding any artists who had a work for sale, now we are filtering all of the artists works, to just those that are for sale.

  4. $group: Group the collection of documents on the name field, and for each document in the group, increment the paintingsForSaleCount by 1.

Ultimately, the result looks like the following:

[
        {
            "_id" : "Pablo Picasso",
            "paintingsForSaleCount" : 1
        },
        {
            "_id" : "Vincent Van Gogh",
            "paintingsForSaleCount" : 2
        }
]

Writing a library to do it better

There's nothing terribly wrong with how mongo-native handles aggregations, however I generally prefer working with single objects as opposed to large arrays filled with objects. In my opinion, single-object approaches make it easier to read and maintain aggregations, as well as break them up into reusable components. This was the motivation for my new library maggregate.

Ultimately, the API tries to be pretty similar to mongo-native. Each of the operations simply maps to a function with the same name, which optionally takes a callback when you are ready to execute.

The above example, with maggregate
   var aggregation = maggregate(artistCollection);
   aggregation
     .match({type: 'painter', 'works.forSale': true})
     .unwind('$works')
     .match({'worksForSale': true})
     .group({_id: '$name', a.paintingsForSaleCount: { $sum: 1}})
     .exec(function(err, results) {
         console.log("Artist %a has %d paintings for sale", a._id, a.paintingsForSaleCount);
     })

Behind the scenes, maggregate is simply maintaining the array of operations, and then, upon exec being called, passing the array to mongo native. Very simple, but very useful.

Reusability

Structuring aggregations in a chainable way makes it easy to break our aggregations into re-usable parts. For example, lets say that we wanted to write another report that actually gives us the works that are for sale, instead of just the count.

function artistWorksForSale() {  
   var aggregation = maggregate(artistCollection);
   aggregation
     .match({type: 'painter', 'works.forSale': true})
     .unwind('$works')
     .match({'worksForSale': true});
   return aggregation;
}

// First Report
artistWorksForSale()  
    .group({_id: '$name', a.paintingsForSaleCount: { $sum: 1}})
    .exec(function(err, results) {
        results.forEach(function(a) {
               console.log("Artist %a has %d paintings for sale", a._id, a.paintingsForSaleCount);
           }
    });

// Second Report
artistWorksForSale()  
    .group({_id: '$name', paintingsForSale: { $addToSet: '$works'}})
    .exec(function(err, results) {
        results.forEach(function(a) {
               console.log("Artist %a has the following paintings for sale:", a._id);
               a.paintingsForSale.forEach(function(painting) {
                   console.log("\t%s", painting.title);
               });
           }
    });
Adding Extra Features

Often aggregations happen within the context of models, or objects, and as such you really want to pipe the results into javascript constructor functions that then provide additional functionality. For this, maggregate provides .wrap. My preferred library for models is modella, but maggregate will work with any constructor function. Going off the same example as aboveā€¦

   var AristSalesReport = modella('ArtistSalesReport').attr('paintingsForSale')

   ArtistSalesReport.prototype.numberOfPainingsForSale = function() {
       return this.paintingsForSale().length;
   };

   ArtistSalesReport.prototype.artistName = function() {
       return this.primary();
   });

   artistWorksForSale()
    .group({_id: '$name', paintingsForSale: { $addToSet: '$works'}})
    .wrap(ArtistSalesReport)
    .exec(function(err, reports) {
        reports.forEach(function(rep) {
            console.log("%s has %d paintings for sale, they are:", rep.artistName(), rep.numberOfPaintingsForSale());
            rep.paintingsForSale().forEach(function(p) {
                console.log(p.title);
            })
        });
    });

Wrap Up

In conclusion, simple utility libraries can expose a lot of power with very little abstraction -- in this case making aggregations more readable, maintainable, and re-usable. Ultimately the eco-system of these tiny utility libraries is one of my favorite features of the Node.js community.