One of the most important skills to learn as a programmer is how to debug an issue. More often then not your day will be spent debugging issues from small syntax errors you wrote while typing too quickly, to larger errors that do not fail critically and actully preform a task, they just preform that task wrong. In fact the bugs that you will spend the most time on only fail once in a while, or you may encounter a “heizenbug” which seem to change state just by looking at it. The bug doesn’t actually change state it’s just a way of saying simply looking at it will not reveal the issue.
No matter the kind of bug you encounter you will need to solve it somehow. Since the amount of times you can claim a bug is a feature is very low and only once in a great while does a bug fail in a serendipitous and fortuitous way.
So how do we track down bugs? How to you improve this skill? Simple! We find bugs, track them down, and fix them. The only way to get better at it is to practice. That being said I do have a particular methodology when tracking down a bug. I follow these simple steps.
- In what way is this failing?
- How do I expect this to respond when working correctly?
- What is the simplest solution to get us from A to B and produce the fix?
- Start Tracking.
- Find the most likely point of failure, start there.
- Keep tracking until you get no feed back
- Once you get no feed back, start to pull the code apart
Steps 1 through 3 can be asked very quickly and you usually have an answer long before you start looking at this bug. In most cases this gets passed down from a product manager or a customer support who submitted the ticket. In the case of you finding the bug before anyone sees it those are useful questions to ask, but I think those are pretty self explanatory. Which brings us to tracking.
How do we get started? Where do we start tracking?
We start with the user interface by replicating the bug as the user would see it. Now this can be done via TDD and writting a series of tests to replicate the issue, or it can be done by simply loading up a browser and clicking on things. I recommend the former of those two.
Using Tests to drive your debugging is a great way to duplicate the error effectively and ensure you know when you fix it. Once you have an appropriately crafted bug test you can run it and repeatedly duplicate the error. It will save you the time of flipping between you code the browser window.
So We start with the user interface and dig our way deeper. This will most often tell us which controller which action or which file we should be looking in. Once we know where we need to look we start to dig into why its failing. We can start to push and pull on things. This is where you ask yourself “what is the most likely point of failure?”. If this is not obvious we need to start we start to pull the code apart by inserting aborts and raise errors checking to make sure we are getting the results we hope for. This should show us which logic gates are working and which ones are being passed over. Allowing us to eliminate chunks of code that are not executing during the failure and rule them out as possible oppotunities for fixes.
We keep moving the aborts or raises into different chunks of code and we dump our variables looking for the ones that have incorrect data or given X and we expect Z somewhere along the way it gets turned into Y and that causes the failure we have. Once we find the source of the issue we can then start to work our way through making a fix.
Do we patch it and just manipulate the data in a way to make it not fail? or do we keep tracking it to the root of the cause and fix it at its root?
For instance we expect to get the statment “The quick brown fox jumps over the fence”. Instead we get “The quick black fox jumps over the barn”.
1 2 3 4 5 6
Given this example one fix might be to replace the values in final_str before it gets output. This might work but in most cases it would be a fix for this one case and would not respond well given a different input. Instead we need to look at the manipulate_data function which is the real cause of the issue.
1 2 3 4 5 6 7 8 9 10
Turns out the manipulate_data function didn’t do anything except return the string every time. In which case the correct course is to not just replace the values before being output. The correct course is to alter the manipulate_data function to preform its intended job or remove it entirely since its vestigial.
In the above example The real issue is two things, a useless function as well as a misassignment of the str variable.
The above example is an over simplification but this is in general how bugs can be created. You may not see the definition for a function so it may not be apparent why the fucntion is failing, but once you find the function definition it becomes more apparent why its not preforming as you expect.
I showed you this example so you do not get in the habbit of fixing the symptoms and not the root cause of the issue. If I had implemented a string substitution fix then I would only be fixing the code for this one issue and it wouldn’t work for any other case. Fixing the symptoms this time and passing the issue to the next engineer, or bandaiding it until I am force to come back to it. This is a recipe for bad code and a non-flexible architecture.
In most cases the best course is to keep tracking the error and possible causes back into the layers of functions and fix the root cause. If you follow the steps above you should be able to track most errors back through your code and most code bases until you reach the root.
Happy Hacking and I hope this gave you some insight.