7 min read

Circuit Breaker Pattern

Circuit Breaker Pattern
Photo by Markus Spiske / Unsplash
🚀
In one of the previous posts we introduced Eight Pillars of Fault-tolerant Systems and today we will discuss "Circuit Breaker Pattern".

The circuit breaker is a design pattern used in software development to prevent repeated requests to external services that are likely to fail. It can improve the stability and resiliency of applications that interact with unreliable dependencies and can prevent cascading failures in distributed systems. Services protected by circuit breakers can degrade gracefully rather than fail catastrophically if dependencies are unstable.

The name comes from an electrical circuit breaker, which trips and disconnects power when electricity flow exceeds safe levels. In a similar way, the circuit breaker design pattern will stop making requests to failing services after a certain threshold is reached.

Circuit breakers prevent cascading failures and allow time for recovery. Key benefits include:

  • Preventing repeated failed requests to a failing service. This protects downstream dependencies and prevents wasting resources.
  • Providing fallback behavior instead of blocking. The application can present an error message or retrieve data from a cache rather than just waiting.
  • Giving failing services time to recover and reset. The circuit remains open until the wait duration has elapsed before trials begin again.
  • Reducing logging and monitoring overhead. Rapid failures are contained locally by the circuit breaker rather than bombarding logs.
  • Enabling graceful degradation over complete system failure. Parts of the system reliant on the failed service can degrade but other unaffected parts remain functional.

How Circuit Breakers Work

A circuit breaker acts as a proxy around calls to a downstream service. It has three distinct states that control the traffic flow:

Closed State

In the closed state, requests are allowed through to the downstream service as normal. The circuit breaker is essentially transparent, passing requests and responses unchanged. This is the default normal operation when the circuit is healthy.

Circuit Breaker: Closed State

Open State

If failures reach a configured threshold, the circuit breaker "trips" into the open state. All further requests are immediately rejected without being sent to the downstream service. The circuit breaker acts as an open circuit, no traffic flows through.

This open state allows the downstream service time to recover, preventing a buildup of congestion from repetitive failed calls. The application can redirect requests and display fallbacks, avoiding cascading failures.

Circuit Breaker: Open State

Half-Open State

After the circuit breaker remains in the open state for a set duration, it transitions into a half-open state. A limited number of requests are let through as a trial to test if the downstream service has recovered.

If trial requests succeed, the circuit breaker resets back to closed state. If requests continue to fail, the circuit immediately reverts to open state. The trial allows intermittent issues to resolve while preventing flooding an unstable service.

Circuit Breaker: Half-Open State

Configuration Settings

Two key settings control the behavior of a circuit breaker:

  • Failure Threshold - The failure percentage threshold that trips the circuit to open. For example, >50% failures in the last 10 seconds.
  • Wait Duration - The time a circuit breaker remains in open state before transitioning to half-open for trials. For example, 30 seconds.

Lower failure thresholds and longer wait durations result in a circuit breaker that opens more frequently and remains open for longer periods. The settings should be tuned based on the volatility of the downstream service and desired failure modes.

Simple Implementation in Golang

As we usually do in our blog posts, let's use the theoretical concept and see how we can implement it using Golang:

package main

import (
	"errors"
	"sync"
	"time"
)

// Constants representing the fundamental states of a CircuitBreaker.
const (
	StateClosed = iota
	StateOpen
	StateHalfOpen
)

// CircuitBreaker struct captures the state and behavior of the breaker.
type CircuitBreaker struct {
	state          int          // The current state of the circuit breaker.
	failures       int          // Count of recent failures.
	successes      int          // Count of recent successes.
	nextReset      time.Time    // Time to reset the circuit breaker state.
	mtx            sync.Mutex   // Mutex to ensure thread-safety.
	failureLimit   int          // Threshold of failures to trip the breaker.
	successLimit   int          // Threshold of successes to reset the breaker.
	halfOpenMaxReq int          // Maximum requests in half-open state.
	timeout        time.Duration // Duration before moving from open to half-open.
}

// NewCircuitBreaker initializes a new CircuitBreaker with given parameters.
func NewCircuitBreaker(failureLimit, successLimit, halfOpenMaxReq int, timeout time.Duration) *CircuitBreaker {
	return &CircuitBreaker{
		state:          StateClosed,
		failureLimit:   failureLimit,
		successLimit:   successLimit,
		halfOpenMaxReq: halfOpenMaxReq,
		timeout:        timeout,
	}
}

// Call executes a given function through the circuit breaker.
// If the circuit is in the open state and hasn't timed out, the function won't be executed.
func (cb *CircuitBreaker) Call(fn func() error) error {
	cb.mtx.Lock()          // Lock to ensure thread-safety.
	defer cb.mtx.Unlock()  // Unlock after the function call.

	// If circuit is in open state and hasn't reached the reset time.
	if cb.state == StateOpen && time.Now().Before(cb.nextReset) {
		return errors.New("circuit breaker is open")
	}
	
	// If in half-open state but maximum allowed requests have been made.
	if cb.state == StateHalfOpen && cb.successes >= cb.halfOpenMaxReq {
		return errors.New("max requests reached in half-open state")
	}

	// Execute the function.
	err := fn()
	
	// If function execution results in error.
	if err != nil {
		cb.failures++
		if cb.failures >= cb.failureLimit { // If failures exceed threshold.
			cb.state = StateOpen
			cb.nextReset = time.Now().Add(cb.timeout)  // Set the next reset time.
		}
		return err
	}

	// If in half-open state and function executes successfully.
	if cb.state == StateHalfOpen {
		cb.successes++
		if cb.successes >= cb.successLimit {  // If successes exceed threshold.
			cb.reset()
		}
	}
	return nil
}

// reset resets the CircuitBreaker's counts and sets its state to closed.
func (cb *CircuitBreaker) reset() {
	cb.failures = 0
	cb.successes = 0
	cb.state = StateClosed
}

func main() {
	cb := NewCircuitBreaker(3, 2, 1, 5*time.Second)
	// Add your function calls using cb.Call(...)
}

In the above code:

  1. CircuitBreaker struct captures the state of the breaker, failures and successes count, and the configuration parameters.
  2. The Call method executes a given function. If the circuit breaker is open, it denies the execution immediately. If the call results in an error, the failure count is incremented.
  3. The circuit breaker transitions between states based on the failure count, success count, and the pre-defined thresholds.

Now let's add an example of calling unreliable external API. We'll use a mock function like so:

func mockAPI() error {
	if rand.Float32() > 0.7 { // 30% chance of failure.
		return errors.New("API request failed")
	}
	return nil
}

func main() {
	rand.Seed(time.Now().UnixNano())
	cb := NewCircuitBreaker(3, 2, 1, 5*time.Second)

	for i := 0; i < 20; i++ {
		err := cb.Call(mockAPI)
		if err != nil {
			fmt.Printf("Request %d failed with: %s\n", i+1, err)
		} else {
			fmt.Printf("Request %d succeeded\n", i+1)
		}
		time.Sleep(500 * time.Millisecond)
	}
}

When you run this code:

  1. The mockAPI function simulates our external API. It has a 30% chance to fail.
  2. In the main function, we loop 20 times, making requests to our mock API via the circuit breaker.
  3. If the API fails consistently (with our threshold set to 3), the circuit breaker will open, and further requests will be blocked until the timeout period is reached.
  4. You'll observe the circuit breaker in action, sometimes allowing requests and sometimes blocking them based on the failure rates.

This is the result:

Request 1 failed with: API request failed
Request 2 succeeded
Request 3 succeeded
Request 4 succeeded
Request 5 succeeded
Request 6 failed with: API request failed
Request 7 succeeded
Request 8 succeeded
Request 9 succeeded
Request 10 succeeded
Request 11 failed with: API request failed
Request 12 failed with: circuit breaker is open
Request 13 failed with: circuit breaker is open
Request 14 failed with: circuit breaker is open
Request 15 failed with: circuit breaker is open
Request 16 failed with: circuit breaker is open
Request 17 failed with: circuit breaker is open
Request 18 failed with: circuit breaker is open
Request 19 failed with: circuit breaker is open
Request 20 failed with: circuit breaker is open

Available Circuit Breaker Libraries

Thankfully, there are a number of libraries available in various languages to help implement the circuit breaker pattern without having to reinvent the wheel. Below are some of the prominent circuit breaker libraries across different programming languages:

Java

  • Resilience4j - Provides circuit breakers, rate limiters, retries and more. Integrates with Spring Boot.
  • Hystrix - A Netflix library that implements circuit breaker and thread isolation patterns.

Scala:

  • akka-circuitbreaker: Akka's circuit breaker pattern implementation which is useful when combined with its actor model.

Go

  • Gobreaker - A straightforward implementation of the circuit breaker pattern.

JavaScript

  • Opossum - Circuit breaker for Node.js based on the Netflix Hystrix library.

Python

  • PyBreaker - Circuit breaker implemented as a Python context manager.
  • CircuitBreaker - Python circuit breaker package with rate limiting and fallback support.

Ruby

  • CircuitBox - Ruby gem implementing fully featured circuit breakers.

Faulty Breakers

Circuit breakers seem great in theory - they fail fast and prevent cascading failures in distributed systems. But as with everything there is a catch - they can actually make service degradation worse if not designed properly.

The key gotcha is that circuit breakers assume either a service is entirely up or entirely down. But modern distributed systems often degrade partially via mechanisms like sharding or cells.

Take a sharded NoSQL database for example. If one shard becomes overloaded, requests to that shard will start failing while other shards are still operational. An over-eager circuit breaker could trip and treat the whole service as down.

Now requests that would have succeeded are also failing unnecessarily. The circuit breaker transformed partial degradation into total failure

Check out this great blog for more details and potential solutions of this problem.

Conclusion

The circuit breaker pattern is an invaluable tool for building resilient applications that interact with unreliable dependencies. It enables graceful degradation rather than cascading failures when downstream services are unstable.

Implementations are available across many languages and frameworks to easily add circuit breakers, so make sure to put it to the good use in your systems.