Continuous security and reliability in iterative development A Buyer's Lens Report by Mike Kavis This report underwitten by: Evident.io
Continuous security and reliability in iterative development 11/10/2014 Table of Contents 1. Executive Summary 2. The rise of DevOps and continuous deployment 3. Why existing security procedures fall short 4. Balancing automation and human testing 5. Operational and logistical impacts of continuous testing 6. Security at scale 7. Key takeaways 8. About Mike Kavis 9. About Gigaom Research 10. Copyright 2
1 Executive Summary Rapid delivery poses new and more frequent security challenges, requiring an entirely different set of solutions. Chief among them is a move from waterfall-style testing methods to a more adaptive, continuous, DevOps-appropriate approach. DevOps and continuous delivery allow businesses to deploy software far more frequently than in the past, increasing consistency, predictability, and ultimately, quality. With iterative development, the deltas between builds are much smaller, reducing the likelihood of catastrophic errors. Bugs are smaller and easier to fix if caught in time. However, though rapid release cycles introduce smaller bugs, they produce them far more frequently, and bugs that evade detection can grow into serious problems. While functional problems can often be detected through regular use, security vulnerabilities are harder to spot. In companies that deploy many times per day, traditional security procedures such as static scans can often take longer than the life of the build, and excessive human interaction can rob highly automated DevOps projects of the very agility they were designed to create. To deliver on its goals, IT must create protocols that model and address security concerns as code is deployed. This report will help IT executives and development teams understand the new approaches to security required in a continuous deployment environment. Key findings include: Today s cloud architectures are much more complex and distributed than the architectures previously built on premises, so new approaches to security are required for managing the additional complexity. Since infrastructure as code allows virtual machines to be provisioned and de-provisioned within minutes, keeping track of security vulnerabilities without automation is impossible. Companies are deploying more frequently due to the adoption of continuous deployment, resulting in frequent changes to the underlying infrastructure. They must continually ensure that their environments are secure and compliant. Threats are becoming more sophisticated. The old model of performing annual assessments and security scans is no longer adequate for protecting today s environments. Monitoring for compliance and security must be a continuous effort. 3
Thumbnail image courtesy: istock/thinkstock 4
2 The rise of DevOps and continuous deployment Agility is a competitive advantage. To achieve it, companies are embracing the DevOps model and moving from monolithic deployments to a continuous-deployment model. This method allows them to create smaller change sets that simultaneously increase deployment frequency and decrease the risk of service disruption caused by faulty deployments. However, increasing the frequency of changes to production environments makes it extremely challenging to ensure that systems are secure and not vulnerable to attack. Defining some terminology helps us better understand the dilemma. The DevOps movement encourages communication and collaboration between the development team and the operations organization that supports it. DevOps was born out of the frustration IT teams felt when they were battling with fragile systems that continued to decrease in quality and reliability as changes were introduced into production. Since deployments were so painful, teams would bundle changes into very large releases in an attempt to minimize the number of times a production system would undergo change. But as a result, the business and customers had to wait a long time even for the simplest new features and bug fixes. DevOps provides speed to market with more frequent releases, while improving the overall quality and reliability of systems. The key to accomplishing this goal is creating a collaborative environment in which developers, operations, and security professionals work together with common goals, rather than working in silos with distinct handoffs among groups. With DevOps, teams strive to identify waste in the system and then remove it. Examples of waste might be inconsistent environments, manual testing, manual and non-repeatable deployment processes, or any other factor that slows down release cadence or increases the likelihood of introducing a defect. In an attempt to remove waste from the development lifecycle, many organizations first embrace continuous integration. Continuous integration (CI) is a software-development practice in which team members integrate their work frequently, usually daily, leading to multiple integrations per day. Each integration includes build-and-test automation to reduce errors that may have been inadvertently introduced into the system. With CI, a build will fail if any of the automated tests fail. Once continuous integration is mastered, many IT shops move towards continuous delivery (CD), a process that delivers a fully automated environment in which the automated build is deployed. This eliminates the issues with inconsistent environments that plague so many IT projects. 5
As companies master CD, the next step is fully automating production in a process known as continuous deployment. Some companies create one-touch continuous deployments while others, usually those that deploy multiple times a day, allow deployments to occur each time a CI and CD process successfully completes. Continuous deployment allows IT to bring business requirements to market quickly. With the use of automation, continuous deployment also greatly reduces the quality and reliability issues that fragile systems create. But everything has tradeoffs. Increasing the frequency of change to production systems increases the challenges of ensuring that new security vulnerabilities are not being introduced. Old methods of monitoring security are no longer effective in a continuous-deployment model. 6
3 Why existing security procedures fall short In the past, changing infrastructure was a tedious and labor-intensive process that organizations avoided unless absolutely necessary. Developers requested changes through a dedicated team if they needed new infrastructure or modifications to existing infrastructure. The company s systems administrators were gatekeepers and kept close tabs on physical and virtual infrastructure, its location and status, and what software was running on it. They also had control over which ports were open, how the firewall was configured, what patch levels were updated, and so on. The result was an enormous amount of control and auditability, but also a lack of flexibility that became crippling as the value of agility increased. In the era of cloud computing, developers can create infrastructure with code and companies now make frequent changes to infrastructure for any number of reasons. This allows businesses to tune and adapt their infrastructure much more nimbly. But nimbleness comes at a cost. Systems administrators lose much of the visibility and control of the infrastructure as developers start provisioning and managing the resources. In fact, in many IT organizations, systems administrators may not even be involved if developers leverage public cloud providers like AWS. Many companies are building self-sufficient teams that handle development, administration, and operations under the guidelines the security team provides. These changes require a more proactive approach to security monitoring. Running monthly or quarterly audit scans is no longer a best practice because a report is sometimes invalid within hours or even minutes of being created. Enterprises must scan continuously. Today s cloud architectures are much more complex than in years past. Prior to the cloud era, many applications were built on an n-tier architecture that included a web tier, an application tier, and a database tier. These tiers scaled vertically by adding bigger servers or increasing the memory, disk, or CPU within the servers. Modern cloud architectures are made up of many smaller servers that typically scale horizontally by adding more servers. Some systems are even built to auto-scale. With auto-scaling, virtual machines may spin up and down as traffic fluctuates. This happens without any human intervention, which means manual tracking or scheduled vulnerability scans are no longer sufficient, and may even be impossible to execute within the constraints of a release. Another challenge with today s cloud architectures is the increase in required regulatory controls. In the past, administrators would perform an annual audit of the data center and test the security controls against regulations such as HIPAA, SOC2, PCI, FERPA, etc. Now organizations are delivering their applications as software-as-a-service (SaaS) solutions and each iteration of the application must be compliant with relevant regulations. Now that 7
compliance has moved from the data center to the application, performing a single audit once a year is impractical because of the rate of change in today s applications. In scope, applications must be audited throughout the year, which means companies have less time to prepare and must complete the audits faster so that they avoid disrupting the benefits of their more rapid release schedules. Continuous security monitoring is critical for staying in compliance. 8
4 Balancing automation and human testing Traditionally, companies have relied heavily on manual inspection of hardware and software to ensure the appropriate security controls are in place, but as companies have begun consuming infrastructure-as-a-service (IaaS), manual inspection of rapidly changing virtual environments has proved to be cumbersome and error prone. The rate at which changes are occurring to both the software and the virtual infrastructure is so great that staffing for required inspection frequencies is no longer feasible. Security inspection must become an automated process in order to reduce the risks created by the frequency of changes occurring in production. Speed to market and agility are becoming competitive advantages. Companies are embracing the DevOps movement as a way to get new features and bug fixes to the market at a much faster rate than before. This desire to move faster challenges traditional change-management processes and often requires more trust in automation that is built into the software development lifecycle (SDLC). Companies are wrestling with removing manual review gates and repetitive peer reviews so that they can condense the time to market. This gives governing bodies fewer opportunities to perform in-depth reviews and requires a higher level of trust that the SDLC will adhere to security standards and best practices. Enforcement changes from being a manual review gate that stops the flow of development to a post-mortem audit check that runs automatically and discovers vulnerabilities after the fact. Change control is also moving away from being a stop-gate function and transforming into a post-mortem auditing function. While that prospect may seem alarming, manually enforcing security is a failed strategy. Consider that developers are consuming cloud services such as AWS, while Amazon and other cloud providers invest millions of dollars every year securing their infrastructure and their APIs. Developers must still build the proper security controls into their applications, but staying current with security best practices for all of the cloud services is a daunting task. Every cloud provider releases a steady stream of new APIs and adds functionality to existing APIs continually. Staffing a security team with knowledge of all the best practices that even one cloud vendor provides is nearly impossible and becomes exponentially less possible with multiple providers. A more realistic approach is to leverage a continuous security-monitoring solution that maintains the most current cloud providers business rules and scans the environment to enforce them. Another challenge to manual testing is the transient nature of infrastructure. Since infrastructure is code, companies are vulnerable to new scenarios in which someone with malicious intent could spin up a server, launch an attack, steal information or infiltrate other systems, and then destroy the server before a human can detect any of the activity. 9
With continuous monitoring, this activity can be detected immediately and the proper personnel can be alerted in time to stop the malicious activity. The odds of catching this activity by performing manual monitoring range from slim to none. 10
5 Operational and logistical impacts of continuous testing As developers have started moving to push-button deployment methodologies and striving to release software more frequently (even daily), testing has become quite a challenge. The days of developers throwing code over the wall and waiting weeks for feedback from testers are long gone. In today s world, all phases of testing must be automated. But test automation by itself is not enough. Testing must be performed continuously even after a product is deployed into production. A primary security challenge is minimizing the window a successful intruder has to compromise a system. The key to reducing that time is detecting malicious intent as early as possible, though the industry seems to be failing. In the following figure, Verizon s 2014 Data Breach Investigations Report demonstrates that the gap between the time-to-discover and time-tocompromise is widening. Clearly, the attackers are becoming more proficient with modern techniques and each year need less time to penetrate systems at a faster rate than defenders can detect breaches. 11
Speed of compromise v. discovery Source: 2014 Data Breach Investigations Report (DBIR), Verizon This issue goes far beyond detection. Mitigating these risks once they re detected must happen faster. Continuous monitoring tools not only detect issues, but also provide the remediation for resolving issues. How can a company retain the most up-to-date knowledge of security best practices and remediation techniques when the technology is changing at a pace faster than ever before? Must each company hire a world-class security team, or is it wiser to invest in securitymonitoring technologies whose core competency is security? While exceptions always exist, for most businesses, the latter seems a much wiser choice. Companies should focus on their core competencies and leverage best-in-breed solutions that world-class security experts who live and breathe security create. 12
6 Security at scale It is critical that businesses enforce good security hygiene from day one. As applications scale in the cloud, vulnerabilities pile up and expose risks exponentially. The longer issues go undetected and the longer bad practices are continuously introduced into highly scalable systems, the greater a company s risk. For example, assume a development team has built a highly scalable system on AWS and the system can detect peak loads and automatically provision additional resources on the fly in minutes. The auto-scaling process is accomplished by leveraging blueprints (infrastructure as code) and launching new instances on demand. If these blueprints contain code that creates, or allows for, vulnerabilities, they automatically increase exposure to risks because new resources are automatically provisioned in the environment. In high-scaling environments, ensuring that systems are implementing the necessary security controls becomes increasingly challenging. Continuous security monitoring is critical for high-scaling architectures. Another important use case is monitoring for human error, particularly now that companies are embracing the cloud. In many enterprises, administrators are responsible for implementing security controls for various cloud services. For example, a company using AWS may have a team that controls all access using identity access management (IAM). The team typically provides security guard rails and then administers individual AWS accounts to the various development or product teams to manage their day-to-day work. This model is put in place because it is too expensive to scale up an organization by embedding security experts in all of the development teams. Instead companies implement this sharedresponsibility model and must learn to trust that the development teams are knowledgeable enough to implement the appropriate AWS security best practices. In order to gain this trust, companies are leveraging continuous-security monitoring solutions that scan the applications continuously in real time and alert the appropriate personnel when they discover gaps in security. These tools educate the developers by providing the appropriate remediation instructions so that they know exactly what the issue is and how to fix it. Even if a company has the industry s top talent, its security is only as good as its worst vulnerability. Just one security gap can give intruders access that can lead to a catastrophic event. Even security giant RSA has been hacked. Regardless of the talent level in an organization, as complexity increases and systems continue to scale, keeping systems secure is a continuous, full-time task. 13
Staying compliant and minimizing risks can no longer be adequately accomplished using traditional methods. Companies must invest in continuous security monitoring to survive and thrive in the current dynamic and evolving era of cloud computing. 14
7 Key takeaways DevOps, CD, cloud computing, and other practices aimed at rapid deployments are beneficial and inevitable, but they increase the variety and number of potential attack vectors. Today s architectures are much more complex and distributed than ever before, so traditional security practices are ineffective and inefficient. Due to the rate of change in today s environments, periodic security audit reports are obsolete within days or hours of completion. As the severity, complexity, and frequency of external threats increases, minimizing the time between a vulnerability being introduced and mitigated is crucial. Good security hygiene that is designed in from the start enables companies to scale security effectively for the next generation of web applications. 15
8 About Mike Kavis Mike Kavis is an Analyst for Gigaom Research and a thought leader in the world of cloud computing and enterprise architecture. He is the author of Architecting the Cloud: Design Decisions for Cloud Computing Service Models (IaaS, PaaS, SaaS) and was the CTO of the winner of the 2010 AWS Global Startup Challenge. He is a principal architect at Cloud Technology Partners and an active technical advisor for several startups. 16
9 About Gigaom Research Gigaom Research gives you insider access to expert industry insights on emerging markets. Focused on delivering highly relevant and timely research to the people who need it most, our analysis, reports, and original research come from the most respected voices in the industry. Whether you re beginning to learn about a new market or are an industry insider, Gigaom Research addresses the need for relevant, illuminating insights into the industry s most dynamic markets. Visit us at: research.gigaom.com. Giga Omni Media 2014. "" is a trademark of Giga Omni Media. For permission to reproduce this report, please contact research-sales@gigaom.com. 17