3 min read

The Fail Fast Principle

Fail Fast Principle
In one of the previous posts we introduced Eight Pillars of Fault-tolerant Systems and today we will discuss "The fail fast principle".

The fail fast principle is a design pattern used in software development to immediately report any exception in an application, rather than trying to continue execution. It aims to immediately detect and propagate failures to prevent localized faults from cascading across system components.

Applying fail fast principles in distributed architectures provides several advantages:

  • Localizes failures - Failing components quickly contains issues before they cascade. Failures are isolated to specific services.
  • Reduces debugging costs - When processes terminate immediately at the source of errors, it's easier to pinpoint root causes based on crash logs and traces.
  • Allows graceful degradation - Services shutting down rapidly allows load balancers to route traffic to healthy nodes. The overall system remains operational (in a degraded mode).
  • Improves reliability - By assuming processes can crash anytime, developers build more resilient systems. Failures are handled gracefully.

Practical Examples

Let's consider 3 scenarios where fail fast pattern would be applicable

Failing Fast with Network Calls

Network communication between services is prone to timeouts and failures. Make requests fail fast by setting short timeouts and immediately returning errors:

// Timeout after 100ms
client := &http.Client{Timeout: 100 * time.Millisecond} 

resp, err := client.Get("http://remote-service")
if err != nil {
  return fmt.Errorf("Request failed: %v", err)

This prevents the system from waiting on delayed responses or retrying failed requests that are unlikely to succeed. When you don't set aggressive downstream timeouts your service will keep these connections open and it can exhaust sockets/resources and bring your service to a halt.

Validating Startup Health Checks

Services should check dependent resources like databases during initialization and terminate early if unavailable:

db, err := sql.Open("mysql", "localhost:3306")
if err != nil {
  log.Fatal("Failed to connect to database") 

err = db.Ping() 
if err != nil {
  log.Fatal("Database unavailable") 

Failing fast on startup ensures components don't stay up in degraded modes. It also reduces debugging costs and MTTR time if the proper monitoring and alerting is in place.

Securing APIs with Request Validation

APIs should validate headers, auth tokens, and payload before handling requests:

func authenticate(r *http.Request) error {
  token := r.Header.Get("Auth-Token")
  if token == "" {
    return fmt.Errorf("no auth token provided")
  // Validate token...

  return nil

func handleRequest(w http.ResponseWriter, r *http.Request) {
  if err := authenticate(r); err != nil {
    http.Error(w, "authentication failed", 401)

  // Process request

Defensive programming with proper request validation is fundamental to secure cloud-native applications. The fail fast principle says to reject bad inputs early before any damage is done.

Improve your code reviews with our new Code Review Checklist. Download it now for free to ensure your reviews cover all the key areas like readability, functionality, structure and more.

Best practices

Incorporating fail fast pattern into your software can add some overhead and even make things less stable, so you need to make sure you apply this practice carefully and utilize it for good.

Backoff Strategies

Backoff strategies are important for retry situations when a failed component or service is being restarted. This prevents a thundering herd problem where all clients retry simultaneously and overload the recovering service.

Two common backoff approaches are:

This post is for subscribers only