When to Stop Digging
Unlimited Rabbit Hole Entrances
One of the best/worst parts of software engineering is that every day, you encounter endless Rabbit Hole Entrances.
You’re constantly coming across things that seem interesting, things that don’t behave the way you’d expect, things that seem worth investigating.
A very important skill is the ability to determine which Rabbit Holes to go down, and once you do start the journey, at what point to turn back.
Big books of data
Here’s an example to a Rabbit Hole Entrance I recently came across. I’m using an Imperfect Analogy so that a technical background isn’t required to understand.
Let’s say we have a big book of Movies.
Imagine this book is simply a giant list, where each item has some basic info, like the movie name, release year, runtime, etc.
For example:
title=Minari; release_year=2020; runtime_minutes=150; movie_id=123
title=Dune; release_year=2021; runtime_minutes=155; movie_id=124
title=CODA; release_year=2021; runtime_minutes=111; movie_id=125
...
(many, many more list items)
Now let’s say we have a big book of Awards.
Imagine this book is simply a giant list, where each item has some basic info, like the granting authority (“Academy Awards”), the award name (“Best Picture”), the Movie connected to that award, whether the Movie won the award, etc.
For example:
movie_id=123; authority=Academy Awards; award=Best Picture; won=no; award_id=4567
movie_id=124; authority=Academy Awards; award=Best Picture; won=no; award_id=4568
movie_id=125; authority=Academy Awards; award=Best Picture; won=yes; award_id=4569
...
(many, many more list items)
(Note: at the beginning of the Awards book is a Special Page that tells the reader how the list items in the Awards book are connected to the list items in the Movies book. The Fancy Word for this is “foreign key relationship”, but the Fancy Word isn’t important for this story.)
Our helpful robot, Django
Let’s say we have a robot that has memorized the contents of the big book of Movies and the big book of Awards. (Plus a lot of other fancy stuff that isn’t important for this story.)
Assuming we know how to talk in this robot’s language, we can give the robot instructions on what we want to know, and it will give us an output.
Let’s call this robot Django.
Our First Question (Success!)
Let’s say we want a list of movies that were released in 2021 and have a runtime of 111 minutes. (Note that we only need the big book of Movies for this question.)
Here’s one way to write instructions to give to Django:
results = (Movie.objects
.filter(release_year=2021)
.filter(runtime_minutes=111)
)
The above instructions basically say:
- Take all the movies in the big book
- Filter that big list to only include movies released in 2021 to give a new smaller list
- And then filter that new smaller list to only include movies with a runtime of 111 minutes
Django will give us a nice final output list of movies based on our instructions.
Our Second Question (Confusion…)
Now let’s say we want a list of movies that were nominated for the Academy Award for Best Picture. (Note that we need both big books to answer this question.)
Here’s our instructions to Django:
results = (Movie.objects
.filter(award__authority=“Academy Awards”)
.filter(award__award=“Best Picture”)
)
Django has given us an output list based on our instructions again… but this list has duplicate items!
In other words, our list has “Minari” twice, “Dune” twice, and “CODA” twice.
Um. What the heck, Django!
Django made me facepalm
The big book of Movies only has “Minari” once, so why does an instruction that is “filtering” that list into a SMALLER, MORE LIMITED list give us “Minari” twice??
Why does the same approach of “chaining filters” have different behavior when asking these two questions??
Congratulations. We’ve discovered the Entrance to a Rabbit Hole.
A Peek into Sam’s Brain
(Please feel free to skip to next section…)
For the project I was working on, I needed to figure out how to give Django instructions so that the output list didn’t have duplicates. I’m also trying to learn more about how Django works. Not to mention, I’m an extremely Curious person.
My personal experience with this Rabbit Hole went something like this:
- I need to find a way to not output duplicates
- Dig dig dig.
- Okay, I found a way to not output duplicates.
- But these two ways seem so similar. I read the Django documentation and it’s not clear why they’re different.
- Should I let it go? Or should I Keep Digging?
- Oh look, some diverging paths appeared!
- This seems like poor interface design. Am I missing something obvious? Is this a legacy reason?
- Am I using Django wrong? Should I be doing something different?
- Well, I am trying to learn Django. Probably would be Worth It to learn more. Let’s Keep Digging.
- Dig dig dig.
- Okay. It seems like this is caused by the fact that this is a reverse foreign key relationship.
- But why?
- Should I keep going?
- Or just accept that if I’m doing a query with a reverse foreign key relationship, I can’t chain filters?
- Well, like I said, I’m trying to learn Django. Let’s Keep Digging.
- Dig dig dig.
- Okay. It seems like this is caused by how Django generates SQL queries under the hood, which is then related to how SQL executes Inner Joins.
- Ooof. I understand this deeper layer. But they’re still just symptoms. I still don’t really understand the root cause.
- I mean, it feels important to understand how SQL works.
- But also, I’m already pretty good at using SQL for practical purposes.
- Maybe this isn’t the highest priority thing to be investigating right now.
- Okay. Maybe just a quick look…
- Dig dig dig.
- Hmmm. Having a hard time finding useful answers.
- Some of these stackoverflow answers are so bad omg.
- Should I create a new, isolated, Django project to test this myself?
- Nope nope nope. That’s enough.
- So unsatisfied, though.
- Okay. I need to Stop Digging.
Should we go down the Rabbit Hole?
If you spend much time doing software engineering, this sort of thing happens ALL THE TIME. It can be both exhilarating and exhausting.
But here’s the thing about Rabbit Holes: you never know how deep they go, you never know how many diverging paths they may have, and you never know whether there will be anything satisfying at the end.
I’m constantly struggling with deciding how deep to go down Rabbit Holes… It’s so hard to know When To Stop Digging.
Footnote: Solution to Question 2
For other Curious persons, you can remove duplicates by passing multiple arguments to the same filter, rather than chaining filters. Like so:
Movie.objects.filter(
award__authority="academy awards",
award__award="best picture"
)