Now that you understand the full scope of your outage and the error group, you can assign the error and update its status. When you assign errors within New Relic, you can transfer all the information you've gathered to the code owners. Managing your errors inbox makes working across multiple teams easier. When the process is easy, implementing a resolution becomes quick and efficient.
This tutorial guides you through managing your errors so you can deploy fixes faster:
- Learn to assign errors to the correct teams
- Update the status of your errors
From the Error group summary page, you can assign the error group to the correct team.
Assigning an error to a person or team eliminates possible miscommunications. The information that helped you solve the error is delivered directly to the code owner, allowing them to pick up where you left off.
The assignment is then delivered to the team via email:
Once assigned, you can update the status of an error.
This functionality has a few different benefits:
- If an error group is expected, you can mark the error as Ignored. Expected errors are known to you and the team—they can be non-critical bugs, or they can be errors associated with the end user (like someone using an incorrect password).
- We recommend resolving expected errors as much as you can, however. Ignoring an error group doesn't prevent New Relic from reporting the error in the future, which contributes to your data ingest.
- New Relic tracks the status of an error over time. For example, if you mark an error group as Resolved but it appears at a later point with a new deployment, New Relic will mark that error as a Regression.
Whether you're reducing common errors or reacting to a critical outage, you're following data that leads you to the direct cause of an error occurrence. You may have fixed the leaking pipe that flooded your yard, but you haven't discovered what caused the crack in the first place.
When you assign error groups to teams, it's easier to hold retrospectives where everyone identifies what processes led to an outage. To bring it back to your cracked pipe: you meet with a plumber and they tell you that the trees in your yard are growing into all of your pipes. Retrospectives where everyone can look at the same data naturally leads to improvements to the overall workflow of your team.
Here are some common root causes to service outages:
- Improper assurance testing in pre-production environments.
- Failing to test every function or method within a codebase to ensure the results are as expected.
- Misunderstanding upstream dependency requirements, capacity, or its limitations. For example, if a database query runs great in pre-production with smaller loads, but under stress begins to slow.
- Lack of capacity planning. Maybe your code passes all its usual tests under ordinary loads, but when demand peaks, it doesn't perform.
Root cause can be as variable as the number of teams that exist. The takeaway, though, is to follow the data, communicate, and dig deeper beyond direct cause.
Congratulations! You've learned how to use errors inbox to track down critical errors in your apps. In this tutorial series, you learned:
- How to discern the service to start with and prioritize your error groups
- How to use stack traces and logs to determine the nature of an error
- How to assign error groups to different teams
Now that you've learned how to use errors inbox to diagnose and resolve errors, you can explore our other tutorials:
- Interested in learning more about errors inbox? Check out our errors inbox doc for some best practices.
- If you're looking to solve incidents in your infrastructure, check out our tutorial about troubleshooting host data.
- Is your app slow? Check out our tutorial about troubleshooting slow app behavior.