When to Stop Digging
Unlimited Rabbit Hole Entrances
One of the best/worst parts of software engineering is that every day, you encounter endless Rabbit Hole Entrances.
Youāre constantly coming across things that seem interesting, things that donāt behave the way youād expect, things that seem worth investigating.
A very important skill is the ability to determine which Rabbit Holes to go down, and once you do start the journey, at what point to turn back.
Big books of data
Hereās an example to a Rabbit Hole Entrance I recently came across. Iām using an Imperfect Analogy so that a technical background isnāt required to understand.
Letās say we have a big book of Movies.
Imagine this book is simply a giant list, where each item has some basic info, like the movie name, release year, runtime, etc.
For example:
title=Minari; release_year=2020; runtime_minutes=150; movie_id=123
title=Dune; release_year=2021; runtime_minutes=155; movie_id=124
title=CODA; release_year=2021; runtime_minutes=111; movie_id=125
...
(many, many more list items)
Now letās say we have a big book of Awards.
Imagine this book is simply a giant list, where each item has some basic info, like the granting authority (āAcademy Awardsā), the award name (āBest Pictureā), the Movie connected to that award, whether the Movie won the award, etc.
For example:
movie_id=123; authority=Academy Awards; award=Best Picture; won=no; award_id=4567
movie_id=124; authority=Academy Awards; award=Best Picture; won=no; award_id=4568
movie_id=125; authority=Academy Awards; award=Best Picture; won=yes; award_id=4569
...
(many, many more list items)
(Note: at the beginning of the Awards book is a Special Page that tells the reader how the list items in the Awards book are connected to the list items in the Movies book. The Fancy Word for this is āforeign key relationshipā, but the Fancy Word isnāt important for this story.)
Our helpful robot, Django
Letās say we have a robot that has memorized the contents of the big book of Movies and the big book of Awards. (Plus a lot of other fancy stuff that isnāt important for this story.)
Assuming we know how to talk in this robotās language, we can give the robot instructions on what we want to know, and it will give us an output.
Letās call this robot Django.
Our First Question (Success!)
Letās say we want a list of movies that were released in 2021 and have a runtime of 111 minutes. (Note that we only need the big book of Movies for this question.)
Hereās one way to write instructions to give to Django:
results = (Movie.objects
.filter(release_year=2021)
.filter(runtime_minutes=111)
)
The above instructions basically say:
- Take all the movies in the big book
- Filter that big list to only include movies released in 2021 to give a new smaller list
- And then filter that new smaller list to only include movies with a runtime of 111 minutes
Django will give us a nice final output list of movies based on our instructions.
Our Second Question (Confusionā¦)
Now letās say we want a list of movies that were nominated for the Academy Award for Best Picture. (Note that we need both big books to answer this question.)
Hereās our instructions to Django:
results = (Movie.objects
.filter(award__authority=āAcademy Awardsā)
.filter(award__award=āBest Pictureā)
)
Django has given us an output list based on our instructions again⦠but this list has duplicate items!
In other words, our list has āMinariā twice, āDuneā twice, and āCODAā twice.
Um. What the heck, Django!
Django made me facepalm
The big book of Movies only has āMinariā once, so why does an instruction that is āfilteringā that list into a SMALLER, MORE LIMITED list give us āMinariā twice??
Why does the same approach of āchaining filtersā have different behavior when asking these two questions??
Congratulations. Weāve discovered the Entrance to a Rabbit Hole.
A Peek into Samās Brain
(Please feel free to skip to next sectionā¦)
For the project I was working on, I needed to figure out how to give Django instructions so that the output list didnāt have duplicates. Iām also trying to learn more about how Django works. Not to mention, Iām an extremely Curious person.
My personal experience with this Rabbit Hole went something like this:
- I need to find a way to not output duplicates
- Dig dig dig.
- Okay, I found a way to not output duplicates.
- But these two ways seem so similar. I read the Django documentation and itās not clear why theyāre different.
- Should I let it go? Or should I Keep Digging?
- Oh look, some diverging paths appeared!
- This seems like poor interface design. Am I missing something obvious? Is this a legacy reason?
- Am I using Django wrong? Should I be doing something different?
- Well, I am trying to learn Django. Probably would be Worth It to learn more. Letās Keep Digging.
- Dig dig dig.
- Okay. It seems like this is caused by the fact that this is a reverse foreign key relationship.
- But why?
- Should I keep going?
- Or just accept that if Iām doing a query with a reverse foreign key relationship, I canāt chain filters?
- Well, like I said, Iām trying to learn Django. Letās Keep Digging.
- Dig dig dig.
- Okay. It seems like this is caused by how Django generates SQL queries under the hood, which is then related to how SQL executes Inner Joins.
- Ooof. I understand this deeper layer. But theyāre still just symptoms. I still donāt really understand the root cause.
- I mean, it feels important to understand how SQL works.
- But also, Iām already pretty good at using SQL for practical purposes.
- Maybe this isnāt the highest priority thing to be investigating right now.
- Okay. Maybe just a quick lookā¦
- Dig dig dig.
- Hmmm. Having a hard time finding useful answers.
- Some of these stackoverflow answers are so bad omg.
- Should I create a new, isolated, Django project to test this myself?
- Nope nope nope. Thatās enough.
- So unsatisfied, though.
- Okay. I need to Stop Digging.
Should we go down the Rabbit Hole?
If you spend much time doing software engineering, this sort of thing happens ALL THE TIME. It can be both exhilarating and exhausting.
But hereās the thing about Rabbit Holes: you never know how deep they go, you never know how many diverging paths they may have, and you never know whether there will be anything satisfying at the end.
Iām constantly struggling with deciding how deep to go down Rabbit Holes⦠Itās so hard to know When To Stop Digging.
Footnote: Solution to Question 2
For other Curious persons, you can remove duplicates by passing multiple arguments to the same filter, rather than chaining filters. Like so:
Movie.objects.filter(
award__authority="academy awards",
award__award="best picture"
)