The Missing $some Operator in MongoDB

There is a void between $in and $all where a new array query operator would come in handy

In MongoDB, querying documents where the value of a field is an array can be straight forward. Using the $in comparison operator on an array field will give you documents where the field’s array holds at least one of your provided values.

You also have the array query operator $all that works the same way, but where the array needs to hold all of your provided values.

These two operators cover a lot of ground, but when you want a query for documents where the array holds at least four values, or exactly eight, then MongoDB is missing a useful operator.

Let’s take a simple example. You have a collection of customers, and each customer document has a purchasedProducts field, which is an array of ObjectIds (references to all products the customer has purchased).

Now, for a marketing campaign, you want to find all your customers who have purchased at least three out of a set of ten selected products, to send a message to these customers.

In this scenario, the $in operator won’t do because it would give you all customers who have purchased at least one of the products. The $all operator won’t work either, because it would only give you the customers who have purchased all ten products in the set.

In this scenario, a $some operator would come in handy.

The $some operator could take an array with two items; the first item being an array of values to match against and the second item being an object declaring the condition for a match.

/*
 * Example of a $some operator where the document would match if purchasedDocuments
 * contain at least two of the three provided ObjectIds
 */
db.collection.find({
  purchasedProducts: {
    $some: [[ObjectId('ID1'), ObjectId('ID2'), ObjectId('ID3')], { $gte: 2 }]
  }
})

While the $some operator doesn’t exist, it is still possible to achieve the result. Using an aggregation pipeline, we get access to the tools we need to create the equivalent of a $some operator.

In theory, what the $some operator would do is that it would create an intersection between an array field on the document and the provided array of values to match against.

The size (array length) of this intersection would then be compared against the provided condition, and if the size is at least two (in the example above), then the document is a match.

To transform this into an aggregation pipeline, we will need four stages:

Narrow down the number of documents to those that match at least one of the provided values.
Create an intersection between the array field on the document and the provided array of values. Get the size of the intersection and compare it with your match condition, and save the result as a temporary field on the document.
Based on the temporary field, filter out any document that didn’t match.
Clean up the documents by removing the temporary field from each document

/*
 * Workaround using aggregation pipeline stages to solve what a
 * $some operator could have done.
 *
 * Inspired by this SO answer: https://stackoverflow.com/a/28161130/678801
 */

const purchasedProducts = [[ObjectId('ID1'), ObjectId('ID2'), ObjectId('ID3')];

db.collection.aggregate([
  // Only include those that match at least one of the values
  // to narrow down the result 
  {
    $match: {
      purchasedProducts: { $in: purchasedProducts },
    },
  },

  // Add a temporary field that holds a boolean value of whether the
  // document matches the conditions
  {
    $addFields: {
      matched: {
        $gte: [
          {
            $size: {
              $setIntersection: ['$purchasedProducts', purchasedProducts]
            }
          },
          2
        ]
      }
    }
  },

  // Filter out anything that did not match
  { $match: { matched: true } },

  // Remove the temporary field from all documents
  { $project: { matched: 0 } }
])

With the_ $expr operator, it is actually possible to make this both shorter and available in the regular collection methods. 👍

/*
 * Example of a $some operator workaround using $expr. The document would match if
 * purchasedProducts contain at least two of the three provided ObjectIds
 */

const purchasedProducts = [[ObjectId('ID1'), ObjectId('ID2'), ObjectId('ID3')]

db.collection.find({
  purchasedProducts: { $in: purchasedProducts },
  $expr: {
    $gt: [
      { $size: { $setIntersection: [‘$purchasedProducts’, purchasedProducts] } },
      2
    ]
  }
})

Even though it is possible to work around the lack of a $some operator, it gets a bit verbose, and a $some operator would make a nice addition to the $in and $all operators.

Let me know what your thoughts are on a $some operator and if you happen to have contacts on the MongoDB dev team, feel free to share the article with them as a feature request.

Thank you and see you!

#Mongodb #JavaScript #Database #Nodejs