While working on a recent project that involved ranking swimmers based on their performance times, I had the opportunity to leverage one of the most powerful features of MongoDB: the aggregation pipeline. Specifically, I needed to calculate how a swimmer ranked against others, both nationwide (USA rank) and within smaller scopes like zones and Local Swimming Committees (LSCs).
MongoDB’s aggregation framework made it possible to run dynamic queries and efficiently calculate these rankings, all while handling a variety of filters such as gender, age group, and event type. In this post, I'll walk through how I used MongoDB's $lookup
feature within an aggregation pipeline to create these rankings.
Imagine a scenario where we have a large database of swimmers’ times from various competitions. We need to calculate where each swimmer ranks based on their performance relative to others for the same event. For example, if a swimmer has a time of 50 seconds in the 50 Freestyle event, we want to know how many swimmers have a faster time in that same event and calculate the swimmer's rank dynamically.
To achieve this, MongoDB’s aggregation pipeline comes to the rescue.
Here’s how MongoDB’s aggregation framework works in this context. We will be looking at the key steps in the pipeline and focusing particularly on the $lookup stage, which allows us to perform dynamic ranking lookups.
The first step in the aggregation is to match the swimmer’s data. We’re filtering the database to find only the records relevant to the swimmer we are interested in.
{
$match: {
slug: slug, // Match the unique identifier (slug) for the swimmer
type: type, // Match the stroke type (e.g., LCM)
ageGroup: ageGroup, // Match the age group (e.g., 15-16)
sex: sex, // Match the swimmer's gender
},
}
Here’s what’s happening:
$match
to filter documents and find only those that correspond to the swimmer’s slug (unique identifier), the stroke type, the age group, and gender.Next, we use $project
to limit the fields we retrieve. We want only the data necessary for ranking calculations, such as the event name, the swimmer's time (intTime
), and details like their team, zone, and LSC.
{
$project: {
event: 1, // Include the event (e.g., 50 Free, 100 Back)
intTime: 1, // Include the internal time value (used for ranking)
time: 1, // Include the swimmer's official time
zone: 1, // Include the zone the swimmer belongs to
lsc: 1, // Include the Local Swimming Committee (LSC)
teamCode: 1, // Include the swimmer’s team code
},
}
This stage narrows down the dataset to only include the fields we care about for ranking purposes. We focus on:
$lookup
for USA RankHere’s where the real power of MongoDB’s aggregation framework shines: the $lookup
stage.
In this case, the $lookup
stage allows us to perform a JOIN-like operation in MongoDB. It enables us to dynamically compare a swimmer's performance time (intTime
) with others in the same event and count how many swimmers have a faster time.
{
$lookup: {
from: "best_strokes" + season, // Use the dynamic season-based table
let: { swimmerTime: "$intTime", event: "$event" }, // Pass the current swimmer's time and event
pipeline: [
{
$match: {
type: type, // Match stroke type (e.g., LCM)
ageGroup: ageGroup, // Match age group (e.g., 15-16)
sex: sex, // Match swimmer's gender
$expr: {
$and: [
{ $eq: ["$event", "$$event"] }, // Ensure it's the same event
{ $lt: ["$intTime", "$$swimmerTime"] }, // Find swimmers with faster times
],
},
},
},
{ $count: "fasterSwimmers" }, // Count the number of swimmers with faster times
],
as: "usaRankings", // Store the results in the "usaRankings" field
},
}
Here’s how this works:
best_strokes
), but for a specific season. The from
field points to a dynamically generated table based on the season, which allows us to query season-specific data.swimmerTime
and event
) from the current document (the swimmer’s record). These variables will be used inside the $match
stage of the $lookup
to filter swimmers in the same event with faster times.$match
) where:$eq: ["$event", "$$event"]
) to ensure we’re comparing times within the same event.$lt: ["$intTime", "$$swimmerTime"]
).$count
to get the number of swimmers who are faster. The result is stored in a field called fasterSwimmers
.usaRankings
field, which will later be used to calculate the swimmer's USA rank.After running the lookup, we use $addFields
to calculate the actual USA rank. This is done by adding 1
to the number of swimmers with faster times.
{
$addFields: {
usaRank: {
$add: [{ $ifNull: ["$usaRankings.fasterSwimmers", 0] }, 1],
},
},
}
This ensures that:
fasterSwimmers
will be 0
, and the rank will be 1
.1 + the number of faster swimmers
.MongoDB's aggregation pipeline, especially the $lookup
stage, is incredibly powerful for scenarios like dynamic ranking because:
$lookup
with pipelines, we can filter and aggregate data in one step, reducing the need for separate queries and simplifying the ranking process.$$swimmerTime
and $$event
within the lookup makes the entire process highly adaptable, as we can perform customized lookups based on the current swimmer’s data.Using MongoDB’s aggregation framework with the $lookup
stage allowed me to dynamically calculate rankings for swimmers in a single query. This approach is both flexible and efficient, and it showcases the power of MongoDB when dealing with complex data relationships, such as dynamic rankings across different scopes (e.g., USA, zone, LSC).
If you're working on a similar problem involving dynamic lookups or rankings, I highly recommend exploring MongoDB’s aggregation pipeline. It not only simplifies the query logic but also allows you to handle large-scale data processing efficiently within a single request.
Let me know if you have any questions or want to dive deeper into any of the aggregation stages!