DevOps roles seem impossible to attain. They're everywhere and nowhere. Let's go through the steps to make yourself employable for that DevOps career!
Part One: ClickHouse Failures, by Marcel Birkner. ... Fixing The ClickHouse Node Failure On Distributed Systems - A How-To Guide.
Why the collection of practices that today we know as DevOps and SRE (Site Reliability Engineering) are becoming the norm for modern systems management.
believe this duration can be best utilised by studying for the certification which you always dreamt for. I utilised the moment to get my AWS Solutions architect Associate Certification. Here are my takeaways for the exam.
How to Operate Less and Innovate More Using Observability and AI. Take a look at a few ways DevOps pros and SRE teams can leverage observability and AI to operate less and innovate more.
Read this article to learn 3 lessons from the biggest outages of IBM Cloud, T-Mobile, and GitHub. The second quarter of 2020 was marked by several serious outages at prominent services including IBM Cloud, GitHub, Slack, Zoom and even T-Mobile (Source: StatusGator Report).
A lot of tech companies struggle with creating an effective and efficient on-call schedule internally for their product and service, which results in longer downtimes when something goes wrong. They often over-burden their team members with repeated on-call duty, resulting in team fatigue. Here’s how to create an on-call schedule that your team might just love.
Spawning an AWS EKS cluster has never been easier and options are many: CloudFormation, Terraform or CDK. For the lazy, you can even use the great CLI utility eksctl from Weavework.
Imagine that you had a good idea and decide to create a digital solution. The service is innovative and you have no competitors. After some months you see your user base grow exponentially. More and more features are added in each release.
In this article, I’ll go over some of the tricks we used to make our cloud sandbox safe, reliable, and low maintenance.
A delivery pipeline is the thing that takes freshly written software out of the hands of a developer, and turns it into running services, potentially accessible to the public.
You will find a lot of articles with good tips on how to ace the CKA exam. You should definitely read them! When preparing for the exam, it sure helped me a lot. But as I am now certified, in retrospect, I can say some of these advices may not be worth your time. In this article I will focus on tips I would call discussable: they do not give you a real competitive advantage and may even impediment your success. Let’s start!
In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.
In this blog we gone see how two containers can communicate with each other using the concept of Docker Networking. One of the most important thing that docker.
Making DevOps and SRE work for you: SRE and DevOps are trending buzzwords, especially in the startup ecosystems. What exactly are these? What is the difference?
It’s important to minimize alert or pager fatigue as much as possible, for the health and well being of your team members.Here are 5 tips to help.
Reduce Engineering Problems With a Resiliency Mindset: To reach your optimal state of resilience, there are some crucial SRE best practices you should adopt to strengthen your processes.
The primary role of the Site Reliability Engineer is to identify and manage asset risks that could adversely affect plan or business operations.
I’ll explain what SRE is and why SRE helps maintain software quality in production systems. I will also discuss how DevOps and SRE relate to each other.