Recovery testing is one of the most important methods that we have in the world of testing (And one of my favorites...), it belongs to the Nonfunctional side of testing and executed on different levels of the tested application.
When we perform recovery tests, we want to validate two crucial aspects of the software, the first one is the time that it’s taken to the user to return to his regular work since the failure occurrences(Hardware/Software), the second aspect is the way that the software recovered(Do we have a loose of data? Is the software functional as it was before the failure..?).
Why should we run recovery tests?
We need to make recovery tests, because both computers (Hardware side) and applications (Software side) should be capable to recover from unexpected failures in a reasonable time. Our task as a testers is to make sure that the software will be capable to return to a functional state in the expected timeframe.
Now, think about a software that failed to be recovered after unexpected failure? Will you want to use it again? Will you want to use a software that will waste your time? Will you work with software that may loss a crucial data in a case of failure?
There is only one answer…NO! You will not use it, you will not use a software that you cannot trust, and if you cannot trust it you will not buy it. And from the company perspective…? An untrusted software may cause a loss of profits, reputation Etc.
Recovery testing will help us to figure out whether the tested application can recover (or not...), from unexpected system failures. In addition, recovery tests will allow you to achieve some additional benefits that relevant specifically to this type of testing, among them we can find:
- You will make sure that the recovery plan is actually working or not during your tests executing.
- Recovery testing will allow you to understand the effectiveness of the recovery process.
- Recovery testing will allow you to understand the weak points of the system.
- You can eliminate a software risks on earlier stages of the SDLC process.
- Recovery testing will allow you to provide answers to a client that want to understand the losses in a case of failures (Time estimation, User experience, Data corruption Etc.).
- Recovery testing will demand from the tester to make a deep investigation of the software architecture and supported platforms.
- Recovery testing will allow you to understand the way that the software will behave in a case of failures.
When to Use Recovery Testing
I already explained that we want to run Recovery Tests to guarantee that the application can recover from unexpected failures. That’s great, but the main reason that will cause us to design and execute this kind of tests, is based on client requests.
Recovery testing should be executed whenever the client that buys the application, is specify that the continuity of the regular flow of the application is critical for him. In such case, the client will specify the exact requirements of the recovery process (Recovery timelines, Data integrity Etc.) that will allow him to gain confidence in the software.
- The data integrity shouldn’t be affected during the recovery process.
- The software should return to be fully functional after 7 minutes.
- If there is a loss of data, it must be minimal.
- After recovering from a failure, the user is able to continuing is work from the last working point.
The Testing challenge
The testing challenge of recovery tests is huge, as a software testers we need to master both the tested software and the architectures that support it. When combining those two, we can see that the challenge is becoming even harder.
To understand these challenges, I created a “High-Level” flow that contains six stages that demonstrate the basic steps that involve in recovery testing:
Master your software - Control the architecture and flow of the tested software (if you do it, you will become familiar with the weak points and the areas that are most likely will have more potential to cause software failures).
Master the supported architectures - Understand the supported environments will allow you to understand the integration and compatibility issues that may raise during the client use.
Know the expectations – the third stage, is dedicated to specific questions that you must know the answer before you can design your recovery testing scenarios, the questions are:
- What are the client expectations from the software, in any case of failure?
- Is there any special operations that involve in the recovering process?
- What are the expectations from the software in a case of failure?
- What’s the expected time is to recover (Partial or full)?
- What is the expected way to recover?
Master the risks - A failure can be triggered from unexpected events and at any time of the regular use, just think about the environmental parameters that continually changed during the time:
- The Operating systems that doesn’t fully adapted to the code behind the software.
- Hardware failures that may occur without a direct relevance to the software.
- Hardware issues that will affect the software (Low Disk Space, Low memory Etc.).
- Large scale environments that push the software to its limits.
- Clients that will use the software in different ways than expected.
A good testing process, will find as many risks as possible, analyze them and understand the recoverability methods that available to each one of them.
Recovery Vs Failures – based on the previous steps, we can start to create a table that will review the list of failures, and for each one of them we will add the Recovery solution. A good coverage metrics, will allow the tester to be effective and precise during the test execution.
Testing – This part is the easiest one, if the tester executes is tests when the previous steps are intelligently performed, all he need to do is to follow the matrix and execute the tests that are designed specifically to simulate the failures and validate the recovery mechanism.
What are the success factors for “Recovery” testing?
- The user can get a valid explanation about the error and the reason (Logs, Events...).
- The software will return to the last failure point and continuing the original task.
- The system has recovered quickly and efficiency based on the expected process...
- After the tests, you can approve that the recovery plan is working as expected.
- The tester can supply the exact timelines to recover the lost date (If any).
- Approving that the user can use the software without a data loss.
- Ability to eliminate unknown system behaviors.
- To Increase the confidence in the product.
- To Increase the stability of the product.