Here's another great story from the field from iTKO lead solution strategist Ken Ahrens. (You may remember the paper he wrote with me on "Virtualizing Over-Utilized Systems" a few months ago).
When all else (literally) fails... try simulating from Production
One of our banking customers recently told us a success story about how they used Virtualize to solve an issue related to a production outage. We were pretty surprised because LISA is primarily used for dev and test in the SDLC, and not typically in Operations and Production. But they said for certain that their production defect was solved because they were able to use Virtualize to recreate the problem scenario.
However, Virtual Services was not their first approach. When the Operations people identified that customers were experiencing a performance-related issue, they immediately contacted the Production Support team. After collecting information about the issue they triaged to find that customer problems were due to a performance related issue happening in the middleware.
Next, they immediately tried to recreate the problem by generating load in their test lab, but they found that they could not reproduce it there. The test lab's version of the mainframe could not support the transaction load, so the back-end capacity was never high enough to recreate the problem in the middleware. At this point, several developers and testers had spent more than a week trying to reproduce the issue, and they still didn’t know exactly what was causing the problem.
They reached out to the bank's performance team (who are avid LISA virtualization experts) to ask for assistance in running the stress tests. The load team mentioned that for very high volume load tests, they used Virtual Services to simulate the back-ends, which had significantly higher capacity than the legacy mainframe system in the lab. That way, there was no bottleneck behind the middleware under test.
The only problem was that the data was already set up in the live systems, and the Virtual Services used different data. Typically this is the kind of problem that can take days to fix with DBAs manually copying records from one system to another and using conventional TDM (test data management) methods to ready all of the data for a test. Indeed in this case, this effort had already taken place once. So instead of copying all that data to a new system, one of the Performance team engineers looked at their sample responses from Production and used LISA to automatically generate a thousand responses in a handful of Virtual Services in a couple of hours.
Immediately, they were able to generate the proper load on the middleware system, using the Virtual Service backend to eliminate the bottleneck. They found they were quickly able to recreate the problem and identify the proper fix. The typical resolution for this type of problem went from being measured in weeks to simply days (over a weekend, actually) through the use of Virtual Services.
The moral? When all else (literally) fails, support problem isolation and resolution with Virtual Services instead of trying to replicate what you don't need. It's predictable, scalable -- and just works!

Comments