Resiliency is not just about avoiding failure – it also involves accepting the failure and building and automating next-steps that allow the application to respond to the event and return to a fully-functioning or optimal state as quickly as possible.
A fully resilient application can adapt to unforeseen events that disrupt the IT environment and automatically initiate fault recovery or graceful degradation processes as defined. It continues to function normally (or as close to normally as possible) despite the failure of multiple or core components of the whole system.
The extent of application resiliency in the cloud and its relative importance to business continuity depend on various goals, requirements, and constraints that are influenced by the type of workload, the role of the users, and the scale and technical capabilities of the organization.
There are three different kinds of drivers that motivate an IT organization to build resilient apps:
Business drivers:
- Cost savings on IT infrastructure, deployment and operations
- Best user experience and minimal app downtime
- Meeting user demands at times of peak and extended usage
- Maximum QoS and availability
- Retaining user trust
- Flexibility to adapt to changing market demands
Development drivers:
- Maximizing time spent on adding new features
- Reducing time spent on troubleshooting
- Following latest industry practices and trends in development
Operations drivers:
- Optimal resource consumption
- Reduce frequency and impact of disruptions and failures
- Ability to recover quickly from failures
- Increasing automation
All said and done, resilient applications serve to improve the availability of the system, which is the primary indicator of the health of the IT deployment.
Factors that affect application resiliency
Application resiliency mandates a well-thought out hybrid cloud strategy and planning at all levels of the architecture. It influences and is influenced by how the IT infrastructure and network is laid out and how the data and storage systems are designed.
“Access to shared infrastructure, data and application resources in the cloud play a critical role in helping organizations navigate disruptions,” said Rick Villars, Group VP, Worldwide Research at IDC.
“In the coming years, enterprises’ ability to govern a growing portfolio of cloud services will be the foundation for introducing greater automation into business and IT processes while also becoming more digitally resilient.”
There are a few constraints that limit the ability of the app to scale and deliver high performance. Developers, product designers and system architects must take care to minimize and not to introduce or worsen these constraints:
- Hardware and software dependencies
- Dependencies on other apps
- Licensing restrictions
- Lack of skills in development teams
- Organizational resistance to change
Apart from these, there are challenges in planning for application resiliency that are specific to cloud environments. While the strategies used to build resilience could be similar to those used for traditional data centers, the implementations differ quite a bit.
Cloud systems favor scaling “out” to a larger number of nodes as compared to scaling “up” to a bigger, more powerful node in traditional IT architecture. This means that developers can code in a graceful degradation of the application in case of a node failure. They can avoid large service buys and provision resources by adding capacity in smaller units. In an on-premises private cloud deployment, VMs and load balancers might provide enough support.