Development is in the cloud in a big way. Modern engineering teams have built continuous integration pipelines, pulling together code repositories, continuous integration platforms, testing, orchestration, and monitoring tools within and across cloud platforms. We bolster this with mostly-automated, closed-loop DevOps workflows that emphasize speed and efficiency. Indeed, our software delivery process today looks nothing like it did even a few years ago. Code that we used to release a few times a year, we now push out many times a day.
The more we develop software in the cloud, the more we need DevOps security. We’re increasingly turning to DevSecOps tools like Prisma, ShiftLeft, and Wiz. Here’s a simplified look.
And those tools? Well, they’re delivering… but perhaps a little too well. Detections are off the charts, alerting us to flaws, exposed secrets, and infrastructure misconfigurations that put our companies, software, and customers at risk. But it’s not like we can stop the development. We can’t slow it down. We can’t go backward in any way. We can only go forward, and that means growth in every direction — more development projects, more cloud consumption, more cloud security tools, and, unfortunately, more alerts. Here’s what that looks like for one of our customers.
Beyond detecting lots of issues, we’re also detecting the same issues with duplicate alerts, as well as finding that a single root cause can underlie several different issues. Worst of all, we’re not able to see these issues aggregated into a single view, giving our swivel chair a good workout as we toggle between tools and try to prioritize and sequence the work. Check out this example of multi-layer duplicates from another customer’s recent Log4j remediation process.
But the problem is bigger than just having an out-of-control alert backlog. When we dig in and try to resolve detected issues, the process is so manual and slow that we cannot make a dent. First off, it’s not always clear who even has the action item to fix the issue. One result of the software industry’s shift to modern, microservices-based architectures is developers now work independently, with individuals or small teams on the hook to develop and release services independently.
The outcome is more distributed engineering teams, and less ability for an outsider (such as a security analyst) to track down the person responsible for responding to a security detection. And when we do find the code owner, we put a lot on their plate: Research each issue from scratch, often without context (or without enough context), and then manually remediate the problem in the code base or infrastructure, often in a bespoke, one-off way that may not take care of it for good.
Which brings us to our final problem: prevention. It’s one thing to fix a security issue in one place (say, in a code base in a repository), but it’s quite another to fix it everywhere and make sure it doesn’t come back. Even if you catch the problem in your current code, who’s to say developers won’t re-introduce it later?
Between our growing alert backlogs, a remediation process that’s slow and manual, and a lack of guardrails preventing flaws from recurring, our cloud security process isn’t just problematic; it’s broken.
What’s needed is sustainable cloud security. Just as detection tools do a great job of finding problems, fixing those problems needs to be smart, automated, and developer-friendly. It needs to fit into engineering teams’ existing workflow, massively streamline their process, and meaningfully cut their time-to-remediation.
For cloud security remediation to be sustainable, it needs to do four important things for security and development teams.
1. MAP & VISUALIZE
First, we need to visualize the code-to-production pipeline and the resources within it. We need a “heat map” of sorts to see how code moves through the pipeline and where the security issues are popping up. Seeing where the problems are helps us home in on those areas first.
Second, we need to normalize and de-duplicate the many alerts from DevSecOps pipeline tools. This means being able to compare details about code flaws and misconfigurations to collapse the queue to a fraction of “unique” alerts. Cutting the queue makes the work manageable and lets us shift focus from alerts to root causes.
3. FIND ROOT CAUSE & OWNER
Third, we should be able to correlate information from our code and cloud resources to understand — with a high degree of accuracy — each issue’s root cause, code owner, and configuration drift, plus available context such as issue severity, exploitation, or relationships. Making these details available puts a massive dent in the manual work of remediation.
4. STREAMLINE THE FIX
Fourth, we need a streamlined, ideally auto-generated, fix that works regardless of alert source, cloud provider, or language in which the code is written or configuration
by Julie O’Brien, an interview with Senior Solutions Engineer Matt Brown
Summary, remediation details, and how to reduce risk in your cloud development pipelines