At Banzai Cloud we are building a feature rich enterprise-grade application platform, built for containers on top of Kubernetes, called Pipeline. The platform itself consists of many building blocks - over 30 components - but they share one commonality: they are all developed in Golang. Obviously we are very fond of Go and like it quite a bit, so in this post we’d like to share the error handling practices that our team of 20+ developers adheres to while building the Pipeline platform.
Errors are part of everyday life. We learn and get better at what we do by making mistakes and these mistake serve as the keystone of progress and evolution. In software development, however, errors don’t have as positive an effect.
They can cause a lot of trouble, so it’s vital to gather as much information as possible about errors, including both technical details (eg. stack trace) and contextual details (eg. correlation with other events in the system).
Unfortunately, error handling in Go is relatively unconventional compared to other high level languages.
Apart from a few ground rules set by the authors (like return a type implementing the builtin error
interface as the last return value)
error handling is a vital area of the language.
In this post we are going to present a few error handling practices which improve the observability of a system, and which make debugging easier.
Return custom errors 🔗︎
The builtin error interface requires the implementation of a single Error() string
method.
When it comes to actually handling the error there is not much this interface does to help.
You could parse the error message, but that will likely result in a code smell:
errors are part of the public API and this approach would make the error message part of the API as well.
Error messages are for humans, so let’s find another way.
One alternative is to return custom error values, called sentinel errors.
These kind of errors can be found in the Go standard library (sql.ErrNoRows
, io.EOF
, etc).
They are useful in that they indicate if a certain kind of error has happened (like a database query returning nothing),
but they cannot provide any additional context, so sentinel errors are not a very flexible tool.
On the other hand, they are easy to handle, since they’re based on a simple value equality:
1 err := db.QueryRow("SELECT * FROM users WHERE id = ?", userID)
2 if err == sql.ErrNoRows {
3 // handle record not found error
4 } else if err != nil {
5 // something else went wrong
6 }
Sometimes communicating the type of the error is simply not enough, more context is necessary to properly handle the error. Sticking to the example above, an imaginary user repository would handle a specific user not being found by returning the user ID as part of the error:
1type UserNotFound struct {
2 ID int64
3}
4
5func (e *UserNotFound) Error() string {
6 return fmt.Sprintf("user with ID %d not found", e.ID)
7}
Handling this error is possible using type assertion:
1 user, err := repository.GetUser(1)
2
3 switch err := err.(type) {
4 case nil:
5 // call succeeded, nothing to do
6
7 case *UserNotFound:
8 // user not found
9
10 default:
11 // unknown error
12 }
The two solutions given above are effective when working with specific implementations, but they are less than optimal when dealing with interfaces. Normally, in Go, contracts are defined by the caller and, thanks to the implicit interface implementation, that’s an easy thing to do. However, as soon as a contract requires a specific type or value to be returned, the implementing packages become strongly coupled to the package defining the errors, making the API fragile and less easy to use.
To overcome this problem we can combine custom error types with the following idea: assert errors based on behavior, not by type.
This way we can define an API for errors without exporting the specific types:
1 type userNotFound interface {
2 UserNotFound() (bool, int64)
3 }
4
5 func IsUserNotFound(err error) (bool, int64) {
6 if e, ok := err.(userNotFound); ok {
7 return e.UserNotFound()
8 }
9
10 return false, 0
11 }
12
13 user, err := repository.GetUser(1)
14
15 if ok, id := IsUserNotFound(err); ok {
16 // user not found, log the user id
17 }
And create the error types in the implementation package:
1type userNotFound struct {
2 id int64
3}
4
5func (e *userNotFound) Error() string {
6 return fmt.Sprintf("user with ID %d not found", e.id)
7}
8
9func (e *userNotFound) UserNotFound() (bool, int64) {
10 return true, e.id
11}
The caller still defines the contract and reflects the error’s behavior, but the implementations are no longer coupled to it.
Don’t just return errors 🔗︎
When going up in the application stack, it’s important to handle each and every error, even if it means returning them eventually.
Let’s take a look at the following code:
1func Authorize(u *User) error {
2 err := authorizer.Authorize(u)
3 if err != nil {
4 return err
5 }
6
7 return nil
8}
An obvious improvement to this function would be to simply return the result of the authorizer.Authorize(u)
call:
1func Authorize(u *User) error {
2 return authorizer.Authorize(u)
3}
Now imagine that you call the REST interface of the application and receive the following error:
no such file or directory
Good luck finding the root cause. :)
You could probably trace the error back to the Authorize
function,
but having just a little extra information might save you a lot of time:
authorization failed: no such file or directory
One way to do that is by simply creating a new error:
1func Authorize(u *User) error {
2 err := authorizer.Authorize(u)
3 if err != nil {
4 return fmt.Errorf("authorization failed: %v", err)
5 }
6
7 return nil
8}
Unfortunately this solution doesn’t work well with the opaque errors described above, as the original error becomes lost.
Wrap errors 🔗︎
Sometimes we need to be able to access the original error and add context at the same time.
A commonly accepted practice in Go is to wrap the error in another, while keeping the original:
1type authorizationError struct {
2 operation string
3
4 // original error
5 err error
6}
7
8func (e *authorizationError) Error() string {
9 return fmt.Sprintf("authorization failed during %s: %v", e.operation, e.err)
10}
That’s nice, but we need a way to access it. To keep the flexibility of opaque errors let’s define a common contract for this as well:
1type causer interface {
2 Cause() error
3}
4
5func Cause(err error) error {
6 if e, ok := err.(causer); ok {
7 return e.Cause()
8 }
9
10 return nil
11}
This way any error that implements this interface exposes the original error for further examination:
1func (e *authorizationError) Cause() error {
2 return e.err
3}
Of course it is possible that we might need to wrap errors more than once, so with a little change we can make
our Cause
function return the root cause of the error:
1func Cause(err error) error {
2 for err != nil {
3 cause, ok := err.(causer)
4 if !ok {
5 break
6 }
7
8 err = cause.Cause()
9 }
10
11 return err
12}
Extracting context from an error 🔗︎
Go’s philosophy surrounding errors dictates that we should handle them gracefully whenever possible. In this context, we can do a couple of things to adhere to that philosophy (eg. retrying an operation), but sometimes our best option is to let the error bubble up to the top of the application and handle it there.
Typically, the most common error handling method is logging. When using a structured logger, information is usually represented as a set of key-value pairs. But at this juncture we don’t want to handle specific errors anymore, so we need a generic solution to extract data from errors.
As usual, let’s create a contract on the caller side:
1type contextor interface {
2 Context() map[string]interface{}
3}
4
5func Context(err error) map[string]interface{} {
6 ctx := new(map[string]interface{})
7
8 if e, ok := err.(contextor); ok {
9 ctx = e.Context()
10 }
11
12 return ctx
13}
Errors implementing this interface can provide context as key-value pairs.
Using the above code with logrus:
1 err := doSomething()
2
3 logrus.WithFields(logrus.Fields(Context(err))).Error(err)
in combination with the causer
interface from the previous example, we can gather context
from each wrapped error:
1func Context(err error) map[string]interface{} {
2 ctx := make(map[string]interface{})
3
4 for err != nil {
5 if e, ok := err.(contextor); ok {
6 for key, value := range e.Context() {
7 ctx[key] = value
8 }
9 }
10
11 cause, ok := err.(causer)
12 if !ok {
13 break
14 }
15
16 err = cause.Cause()
17 }
18
19 return ctx
20}
Alright, give me the tools! 🔗︎
The techniques and patterns explained here work fine on their own or in combination, and many of them can be reused between projects… however, if something can be reused, there is already an open source library for it.
github.com/pkg/errors 🔗︎
This package is a drop in replacement for the errors package in the standard library. It provides you with generic tools that work with wrapped errors, implements the “causer pattern” and provides an interface for adding an extra message and stack trace to the error.
Here is a detailed example for using the package:
1package main
2
3import (
4 "fmt"
5 "os"
6
7 "github.com/pkg/errors"
8)
9
10type stackTracer interface {
11 StackTrace() errors.StackTrace
12}
13
14func main() {
15 err := bar()
16 if err != nil {
17 fmt.Println(err) // Output: bar went wrong: foo went wrong
18
19 if err, ok := err.(stackTracer); os.Getenv("DEBUG") != "" && ok {
20 fmt.Printf("%+v", err.StackTrace()[0:2]) // top two frames
21 }
22 }
23}
24
25type fooError struct{}
26
27func (*fooError) Error() string {
28 return "foo went wrong"
29}
30
31func foo() error {
32 return &fooError{}
33}
34
35func bar() error {
36 err := foo()
37 if err != nil {
38 return errors.Wrap(err, "bar went wrong")
39 }
40
41 return nil
42}
For more examples and use cases check the godoc page of the library.
github.com/goph/emperror 🔗︎
The emperror depends on github.com/pkg/errors
and implements the rest of the patterns mentioned in this article.
For example, you can add context to an error without creating a new type:
1func doSomething() error {
2 return emperror.With(
3 errors.New("some error"),
4 "key", "value",
5 )
6}
As you can see the context in this case is not a map, but a variadic interface{}
slice.
This interface is based on go-kit’s logger interface.
It allows you to implement any higher level logging/error handling solution,
wherein the context is not necessarily represented as key-value pairs.
Emperror also introduces an “error handler”:
1type handler interface {
2 Handle(err error)
3}
An error handler is the final destination of an error if every attempt to handle it has failed. It functions as a logger, an error reporting service, etc.
Conclusion 🔗︎
Error handling in Go is not the easiest thing to accomplish, despite it being such an important topic. However, the above pattern implementations and tools can help to make things easier, and, if you take a closer look, you may find they are not too different from other languages.
Further reading 🔗︎
https://dave.cheney.net/2016/04/27/dont-just-check-errors-handle-them-gracefully