Problem-Solving Like a Senior Dev
Real-world problem-solving examples, common pitfalls, and how you can improve this skill
If you look at developer skill roadmap like on roadmap.sh, you’ll find most hard skills like coding, algorithm, database, etc. However, if you ask on Reddit “What are some skills junior developers should learn?“, you’ll see that most replies are related to soft skills (and it’s not just for junior developers).
Since being in a management position, I have found that possessing strong soft skills makes it a lot easier to thrive in any organization.
In this series, I’m going to explore the soft skills that make great software engineers. They include problem-solving, effective communications, being proactive, etc. Among them all, the most important skill is problem-solving.
Key Takeaways 💡
Identifying the problems (not symptoms) before taking actions
Common practices: breaking down problems into smaller actionable items & always coming up with multiple solutions
To improve, set up a problem-solving framework for yourself and follow it.
Real-World Example: Troubleshooting Server Performance Issues
Let’s use a real-world example here.
You either receive an alert from your system, or see some abnormal metrics on the dashboard. You are in the on-call rotation and it’s time to take a look at this issue. Let's explore how to tackle this common challenge step by step.
In this example, I’m using a simplified version of McKinsey’s problem-solving framwork (which is a lot easier to remember). It includes:
Defining problems & symptoms
Breaking it down
Explore multiple solutions
1. Defining Problems (not Symptoms)
First things first, you need to figure out what's really going on. Is the server actually slow, or is it due to external factors?
At this step, it’s important to know which is a symptom, and which is the actual problem. Here are some classic examples:
Example #1:
🤒 Symptom: Database query is slow
🏃 Quick solution: Adding more CPU & memory to the DB server.
🔎 Underlying Cause: Some indexes are missing in tables.
💡 Actual solution: Add indexes to common read queries.
Example #2:
🤒 Symptom: Insufficient memory / OOM (out of memory)
🏃 Quick solution: Restart the processes every day.
🔎 Underlying Cause: Memory use increases gradually, which could be memory leak.
💡 Actual solution: Find the problematic codes and fix/replace them.
In short, seeing the symptoms does not always mean that we know what problem to resolve. Avoid jumping to conclusions and keep digging.
Ask yourself: do you fully understand the problem? If not, it’s likely only a symptom. In the above examples, we only throw CPU/memory resources to the problem, but we cannot tell what’s the actual issue.
It’s also useful to keep looking at other resources and metrics. On a dashboard, it means network issues, API availability, 3rd-party service provider system status, or others.
2. Breaking It Down Into Smaller Problems
For big problems, it’s usually better to break them down into smaller issues. In the example of troubleshooting performance issue, it could be:
[Right now] If it is impacting users now, how can we stop the impact ASAP?
[Short-term] How can we fix it properly?
[Long-term] How can we prevent similar issues from happening again?
Once we have these separate/smaller problems, we would know which solution would be the best for which problem.
3. Exploring Multiple Solutions & Tradeoffs
You've narrowed down the problem, so it's time to brainstorm solutions. In the example that the database query is slow, there are some common fixes:
Add CPU/memory
Optimizing DB queries
Load balancing across multiple servers
Caching
Each solution has pros and cons. Scaling up is quick but costly. Optimizing code takes time but can have long-term benefits. Make a list of options and weigh them against your budget, time constraints, and long-term goals.
There rarely is an answer that can resolve a problem perfectly. Try to come up with multiple solutions, consider their tradeoff, and pick the items that work the best for your situation.
It’s also necessary to keep the communication open with stakeholders about which solutions you have at hand.
Common Pitfalls to Avoid
Solving The Wrong Problem
It's easy to jump into solutions without fully grasping the real issue. This can lead to wasted time and resources.
Using the same performance issue example, adding CPU/memory to machines is always a good solution, but is it permanent solution? Or is it only putting a bandaid on a more serious issue?
A good practice is to be data-driven. Find data that can support our decision. If adding CPU/memory is a good solution, explain why. What’s the cause behind the increased usage of CPU/memory? It might be due to a recent release, or a traffic spike.
Validating assumptions with data before taking actions. It helps us make a more accurate decision.
Overlooking Simpler Solutions
As engineers, we often love complex, elegant solutions. But sometimes, the simplest approach is the best.
Before diving into a complex solution:
Look for existing tools or libraries that might help
Consider if a simpler approach could work, even if it's not perfect
It’s about finding the right balance between simplicity and impact. In most situations, the easiest solution is usually the most effective.
For example, if a simple in-memory cache in certain programming languages can improve latency sightliy, it’s usually more efficient than setting up a new Redis cluster to pursue better improvement.
Don't get caught up in over-engineering. Your future self will thank you for keeping things simple.
How to Improve Problem-Solving Skills
Set Up A Framework And Follow It
Having a problem-solving framework is like having a trusty toolbox. It gives you a starting point when facing tough issues.
Start by understanding the most popular frameworks, each framework has its strength in certain areas, but the most important point is to have a framework for you to start your thinking process.
You’ll also have a checklist during the thinking process. For example, when digging deep into an incident, using the Root Cause Analysis (RCA) approach will allow you to follow the steps of:
✅ Start by collecting data related to the problem
✅ Casually write down all possible factors
✅ Filter out irrelevant factors and find the root cause
Find the framework that works for you the best. Stick to it, and gradually refine it as you work on different problems.
Build A Personal/Team Knowledge Base
Problem-solving is always related to experience. It’s like caching. Once you have that experience, it takes no time to figure out the solution next time. This is why senior developers are more valuable in a team 😅
I would recommend documenting issues that you have handled throughout your career, including bugs, clever code snippets, or helpful resources.
Write down company-specific information on the share wiki, or your personal blog if they are public information.
This knowledge base in your “cache“. The next time you run into similar issues, you know which metrics to look at and how to handle it.
Build a Problem-Solving Network
If you don’t have enough time to build a knowledge base yet, establish a network. Find out the authority of each domain. Start with your immediate team. Get to know their strengths. For example: Who's great at debugging? Who knows the codebase inside out? Knowing who to turn to can save you a lot of time.
Branch out beyond your team too. For example, find out who is an expert in authentication, caching, or database. Once you have this network, you know immediately who to talk to when running into problems.
💡 Bonus tip: offer you help to others once you become an expert in a specific domain. Share your knowledge and help others solve problems. It's a great way to strengthen your own understanding.
Thank you for reading the post!
❤️ it if you like this article.
💬 Leave a comment if you have questions. Or let me know if you are interested in knowing any specific topics.
You can also find me on:
If you want more content like this, consider subscribe to my newsletter.
See you next time 👋