The Site Reliability Engineering role evolves rapidly. As we enter 2024, here are my 5 predictions for what we may see in the world of SRE this year:
With many companies looking to cut costs due to worsening economic conditions, dedicated SRE roles may be seen as expendable - so SRE headcount and budgets could be reduced. Many organizations transition to Amazon-like model, where SWEs would "do it all". Infrastructure management, operational hardening, incident tracking and being oncall are becoming a part of the job, so reliability engineers would be slowly pushed out or would have to transition into development. We can already see these trends among colleagues being laid off in 2023, including SRE-minded companies like Google.
This combination of factors means the SRE job market will likely tighten considerably in 2024. Openings will be harder to find and competition will be steeper. SREs will need to clearly demonstrate their value to stay relevant.
The economic realities of running workloads on major public clouds like AWS, GCP and Azure will lead companies to look for alternatives. The costs of using public cloud infrastructure and services have been climbing, eating into budgets. As companies look to reduce spending, running applications on public clouds may no longer make economic sense. We'll see a migration back towards private data centers, colocation facilities, and on-prem infrastructure. SREs skilled in on-prem operations, bare metal provisioning, etc. will be in higher demand.
While Kubernetes benefits and operational costs are questioned a lot recently, it has become the clear leader as the orchestration platform of choice for containerized workloads. Engineers and companies are heavily invested in Kubernetes workflows and tools, both in cloud and on-prem. As companies look to further invest in efficiency of infrastructure and application management, SREs will need strong Kubernetes expertise.
(and fewer SREs)
While the automated code generation promises improved developer productivity, it also poses new reliability challenges. As code generation by AI systems increases, companies may end up with insufficiently supervised software. With fewer SREs around to establish robust testing and deployment practices, outages caused by bugs in AI-generated code could become more frequent. Companies will be caught off guard by disruptions caused by their overreliance on AI. Quick mitigations for these outages would be problematic as well, as fundamentally it'd be harder to fix code issues in AI-written code.
In 2024, unifying infrastructure, applications, data, and services under common APIs and self-service platforms will accelerate.
These platforms will provide standardized building blocks and streamlined workflows so engineering teams can quickly build, connect and deploy applications without wasting time in infrastructure complexities. Platforms will handle provisioning, networking, monitoring, access controls, and other operational aspects behind the scenes.
With job opportunities for traditional SRE roles declining, many SREs will look to transition into platform engineering positions. The broad technical skills required by platform roles align well with strengths many SREs already have. However, to successfully land a platform engineering role, you will need to skill up on software development as well. Programming and coding will become mandatory for those looking to get into platform engineering.
There you have it - our predictions for what's in store for SREs and reliability engineering in 2024.