AWS’ DevOps Guru Uses
Machine Learning to Detect
Erik Rush | Jun. 16, 2021
Late last year, Amazon Web Services (AWS) unveiled its new DevOps Guru managed operations service. This solution employs machine learning to identify problematic operational issues and automatically recommends specific fixes. DevOps Guru was designed to aggregate and analyze application metrics, logs, events and traces to distinguish between normal operating patterns behaviors that deviate same. This would include aspects such as database I/O over-utilization, under-provisioned compute capacity and memory leaks.
According to Amazon, “DevOps Guru automatically ingests operational data from your AWS applications and provides a single dashboard to visualize issues in your operational data.” Users can get started with DevOps Guru by selecting coverage from the CloudFormation stacks or their AWS account to improve application availability and reliability with no manual setup or machine learning expertise.
Quickly Identify Anomalies
When DevOps Guru identifies anomalous app behavior that could cause service disruptions or outages (e.g., code or configuration changes, resource limit deficiencies), it generates alerts containing details such as the resources affected, timelines of events and recommendations for the appropriate fix(es). This is accomplished through Amazon’s Simple Notification Service (SNS) and partner integrations (e.g., PagerDuty or Atlassian’s Opsgenie).
According to Swami Sivasubramanian (head of AWS’ Machine Learning group), “Customers have asked us to continue adding services around areas where we can apply our own expertise on how to improve application availability and learn from the years of operational experience that we have acquired running Amazon.com. With Amazon DevOps Guru, we have taken our experience and built specialized machine learning models that help customers detect, troubleshoot, and prevent operational issues while providing intelligent recommendations when issues do arise.” Sivasubramanian says that this allows teams to benefit immediately from operational best practices.
DevOps Guru not only painstakingly analyzes system and applications data to detect variances, but also groups data into operational insights including snapshots of applications behavior, anomalous metrics and recommendations for remediation, according to AWS. It also associates and groups related application and infrastructure metrics, like Web app latency spikes, bad code deployments, disk space limitations and memory leaks.
Less Work & More Secure
All of this leads to fewer redundant alarms and assistance for users focusing on high-severity issues. Users are able to see configuration change histories and deployment events, along with system and user activity; this allows them to generate prioritized lists of likely causes for operational issues in the Amazon DevOps Guru console.
When paired with CodeGuru (another Amazon machine developer tool that provides intelligent recommendations and identifies an application’s most expensive lines of code), DevOps Guru gives users automated benefits of machine learning for operational data, so that developers can more easily improve application availability and reliability.
AWS’ DevOps Guru also provides intelligent recommendations with remediation steps and integration with AWS Systems Manager for runbook and collaboration tooling, giving users the ability to maintain applications and manage infrastructure for their deployments more effectively.
Order of the Cipher is an Amazon Web Services (AWS) training company and a novel approach to cybersecurity training that combines theatrical presentation with proven teaching techniques. We’ve mastered Amazon Web Services, and we’ve perfected how to showcase the versatility and capability of AWS technology in a manner that provides real-world immersion experiences that prepare students to expertly navigate the AWS ecosystem.