As mentioned in this week's Sunday Reboot, I've recovered a few of my old posts from the Wayback Machine—all from my time at the now shut-down Fixate.io (via their blog, Sweetcode). These are all freelance topics, so the style of writing is a bit different than normal (for me, at least), and the topics are a little wider-ranging than I would normally write about here. I can't remember, but I think this one was originally written for Rollbar, but ultimately scrapped for whatever reason.
Original Title: Error Tracking and Continuous Visibility: Why to Track Errors Across the CI/CD Pipeline
User-generated error reports are valuable for identifying the pressing issues in your application. Yet they are also notoriously unreliable and difficult to read.
To a user, every problem is a major one, and the tone of voice that often accompanies user-generated reports can be difficult to parse and diagnose. Users just want their problems fixed. Asking them to run through a series of tests to find reproduction steps and appropriate error messages can be trying at best. At worst, the users don't follow through, and you can't get the details you need to fix a problem.
This is why error tracking solutions like Rollbar and RayGun are so important. They provide the continuous visibility that you need to maintain user expectations, without the subjectivity and inconsistency of user-generated error reports.
Continuous visibility means the ability to trace errors at all stages of the CI/CD workflow. Continuous visibility is much easier than relying on post-deployment error reports to identify and attempt to trace software problems. As I explain in this post, with the help of tools like Rollbar, you can find and fix errors throughout your continuous delivery process, and, ideally, avoid releasing user-impacting bugs into production altogether.
Achieving Continuous Visibility at All Stages of the CI/CD Pipeline
Why integrate error tracking into all stages of the development process? Because integrating error tracking into production and staging environments, as well as post-deployment environments provides transparency into the health of your application, eases the diagnosis of issues, and identifies problems before users even know they exist.
In addition, by integrating error tracking into your entire workflow, it becomes easier to identify environment-specific errors by tracking exactly which environments and branches your errors exist in.
Test Suites are Not Enough
You might be thinking: "I run software tests pre-deployment, so I'm covered. I already have continuous visibility into my CI/CD pipeline."
In reality, however, software testing is not the same thing as error tracking, and it doesn't deliver the same type of continuous visibility. Your Jenkins (or TeamCity, or Bamboo, or whatever CI server you prefer) integration tests help you do what their name implies: Test code as you integrate it into your codebase. Meanwhile, software usability, performance and quality tests using platforms like Selenium or Cucumber help you to verify that your application works as expected, but not find every application error, or identify the root cause of errors.
In short, a typical test suite may show you what failed, and possibly even how. Yet not every exception results in a failed test. Plus, in the case of poorly written tests—which happens to the best of us—or integration tests, a successful test may trigger an exception that you never see because you aren't looking for it.
APM Is Not Enough
Just as pre-deployment software test suites aren't enough to provide continuous visibility, post-deployment monitoring also fails to deliver all of the visibility you need in order to guarantee a positive user experience.
SolarWinds, Nagios, Splunk and the like are all great tools, and you should include them in your stack. But don't make the mistake of thinking that they cover all of your post-deployment visibility needs. They will help to notify you when something in your application breaks or experiences a service degradation, but in most cases they won't identify application errors. Your Web app could even be spitting out error messages that your users see, as long as the application continues to run, an APM tool generally won't notice there being a problem.
APM tools look for performance patterns that seem out of the ordinary, and for services that stop being available. They often lack the ability to track errors at the application level unless those errors cause the application to go down entirely.
Achieving Where Error Tracking Fits In
All of the above is why error tracking is so important. Adding error tracking to all stages of your CI/CD pipeline bolsters the visibility that automated testing provides pre-deployment, and that APM provides post-deployment. Error tracking offers unique visibility that other types of tools don't provide.
In addition, error tracking tools help you not only to identify problems before they reach a production environment, but also to identify exactly when tricky issues get introduced into the codebase.
Identifying the when of an error is particularly important. Not every issue is guaranteed to be raised by customers immediately after a deployment. It can therefore be difficult to identify exactly how long something has been a problem. With a thorough testing routine that involves automated tests as well as error tracking tools like Rollbar, exceptions are far more likely to be raised almost immediately. This will give you much more insight into your application, both in and out of test process.
When it comes to tracking errors, there is no such thing as too much data. By integrating error monitoring into your CI/CD process, you can achieve Continuous Visibility into a great deal more than your test suite and build process.
The Value of Error Data: Identifying Trends
Error data is useful not just for fixing a particular problem, but also for identifying trends in your development process, and tracking how well your team resolves issues as they happen. If an issue is identified in a production environment, the next step in the process should be replicating that issue in your test suite, and then pushing up a fix for it. If you can replicate a problem in your CI/CD suite, you can track the exact moment when it gets fixed, and verify that fix immediately. Once you mark the error as resolved in your error tracking solution, it can continue monitoring the error in production, and notify you if it reoccurs.
Ultimately, the power of exception monitoring isn't in the identification of issues, but in the tracking of them. Knowing the who, what, where, when, and why of an issue will provide far more insight into your application than ever before, and by integrating this information into other areas of your application development process—like code reviews and retrospectives—you will be far more prepared to handle issues when users run into them.
--
If you like this post or one of my projects, you can buy me a coffee, or send me a note. I'd love to hear from you!