Exploring MongoDB’s Powerful Aggregation Pipeline: Dynamic Ranking with $lookup
Adam C. |

While working on a recent project that involved ranking swimmers based on their performance times, I had the opportunity to leverage one of the most powerful features of MongoDB: the aggregation pipeline. Specifically, I needed to calculate how a swimmer ranked against others, both nationwide (USA rank) and within smaller scopes like zones and Local Swimming Committees (LSCs).

Photo by Manel & Sean on Unsplash

MongoDB’s aggregation framework made it possible to run dynamic queries and efficiently calculate these rankings, all while handling a variety of filters such as gender, age group, and event type. In this post, I'll walk through how I used MongoDB's $lookup feature within an aggregation pipeline to create these rankings.

The Problem: Ranking Swimmers by Performance Time

Imagine a scenario where we have a large database of swimmers’ times from various competitions. We need to calculate where each swimmer ranks based on their performance relative to others for the same event. For example, if a swimmer has a time of 50 seconds in the 50 Freestyle event, we want to know how many swimmers have a faster time in that same event and calculate the swimmer's rank dynamically.

To achieve this, MongoDB’s aggregation pipeline comes to the rescue.

The Aggregation Breakdown

Here’s how MongoDB’s aggregation framework works in this context. We will be looking at the key steps in the pipeline and focusing particularly on the $lookup stage, which allows us to perform dynamic ranking lookups.

Step 1: Matching the Swimmer’s Data

The first step in the aggregation is to match the swimmer’s data. We’re filtering the database to find only the records relevant to the swimmer we are interested in.

{
  $match: {
    slug: slug,            // Match the unique identifier (slug) for the swimmer
    type: type,            // Match the stroke type (e.g., LCM)
    ageGroup: ageGroup,    // Match the age group (e.g., 15-16)
    sex: sex,              // Match the swimmer's gender
  },
}

Here’s what’s happening:

  • We use $match to filter documents and find only those that correspond to the swimmer’s slug (unique identifier), the stroke type, the age group, and gender.
  • This ensures that we are only looking at the relevant records for the swimmer's performance in the current context (e.g., a specific season).

Step 2: Projecting the Necessary Fields

Next, we use $project to limit the fields we retrieve. We want only the data necessary for ranking calculations, such as the event name, the swimmer's time (intTime), and details like their team, zone, and LSC.

{
  $project: {
    event: 1,          // Include the event (e.g., 50 Free, 100 Back)
    intTime: 1,        // Include the internal time value (used for ranking)
    time: 1,           // Include the swimmer's official time
    zone: 1,           // Include the zone the swimmer belongs to
    lsc: 1,            // Include the Local Swimming Committee (LSC)
    teamCode: 1,       // Include the swimmer’s team code
  },
}

This stage narrows down the dataset to only include the fields we care about for ranking purposes. We focus on:

  • intTime: The internal representation of the swimmer’s performance time, which we’ll use to compare against other swimmers.
  • event: The swimming event (e.g., 50 Free), since rankings must be done event by event.
  • zone, lsc, teamCode: These fields will later be used to calculate more specific rankings, such as zone or team rankings.

The Magic: $lookup for USA Rank

Here’s where the real power of MongoDB’s aggregation framework shines: the $lookup stage.

In this case, the $lookup stage allows us to perform a JOIN-like operation in MongoDB. It enables us to dynamically compare a swimmer's performance time (intTime) with others in the same event and count how many swimmers have a faster time.

{
  $lookup: {
    from: "best_strokes" + season, // Use the dynamic season-based table
    let: { swimmerTime: "$intTime", event: "$event" }, // Pass the current swimmer's time and event
    pipeline: [
      {
        $match: {
          type: type,         // Match stroke type (e.g., LCM)
          ageGroup: ageGroup, // Match age group (e.g., 15-16)
          sex: sex,           // Match swimmer's gender
          $expr: {
            $and: [
              { $eq: ["$event", "$$event"] },           // Ensure it's the same event
              { $lt: ["$intTime", "$$swimmerTime"] },   // Find swimmers with faster times
            ],
          },
        },
      },
      { $count: "fasterSwimmers" }, // Count the number of swimmers with faster times
    ],
    as: "usaRankings", // Store the results in the "usaRankings" field
  },
}

Here’s how this works:

  1. from: We are performing the lookup on the same collection (best_strokes), but for a specific season. The from field points to a dynamically generated table based on the season, which allows us to query season-specific data.
  2. let: This defines the variables (swimmerTime and event) from the current document (the swimmer’s record). These variables will be used inside the $match stage of the $lookup to filter swimmers in the same event with faster times.
  3. pipeline: This is where the lookup does its heavy lifting. It runs a subquery ($match) where:
    • We filter swimmers based on their event ($eq: ["$event", "$$event"]) to ensure we’re comparing times within the same event.
    • We look for swimmers with a faster time ($lt: ["$intTime", "$$swimmerTime"]).
  4. $count: After filtering the swimmers with faster times, we use $count to get the number of swimmers who are faster. The result is stored in a field called fasterSwimmers.
  5. as: The results of this lookup are stored in the usaRankings field, which will later be used to calculate the swimmer's USA rank.

Calculating the USA Rank

After running the lookup, we use $addFields to calculate the actual USA rank. This is done by adding 1 to the number of swimmers with faster times.

{
  $addFields: {
    usaRank: {
      $add: [{ $ifNull: ["$usaRankings.fasterSwimmers", 0] }, 1],
    },
  },
}

This ensures that:

  • If there are no faster swimmers (i.e., the swimmer is the fastest), fasterSwimmers will be 0, and the rank will be 1.
  • Otherwise, the rank is 1 + the number of faster swimmers.

Why This is Powerful

MongoDB's aggregation pipeline, especially the $lookup stage, is incredibly powerful for scenarios like dynamic ranking because:

  • Flexibility: It allows you to dynamically JOIN data across collections (or within the same collection, as in this case) and process complex ranking logic without the need for multiple queries.
  • Efficiency: By using $lookup with pipelines, we can filter and aggregate data in one step, reducing the need for separate queries and simplifying the ranking process.
  • Dynamic Queries: The ability to use variables like $$swimmerTime and $$event within the lookup makes the entire process highly adaptable, as we can perform customized lookups based on the current swimmer’s data.

Conclusion

Using MongoDB’s aggregation framework with the $lookup stage allowed me to dynamically calculate rankings for swimmers in a single query. This approach is both flexible and efficient, and it showcases the power of MongoDB when dealing with complex data relationships, such as dynamic rankings across different scopes (e.g., USA, zone, LSC).

If you're working on a similar problem involving dynamic lookups or rankings, I highly recommend exploring MongoDB’s aggregation pipeline. It not only simplifies the query logic but also allows you to handle large-scale data processing efficiently within a single request.

Let me know if you have any questions or want to dive deeper into any of the aggregation stages!