Here is part three of the interview with host Mike Lippis of the Outlook Series. We have broken the 45 minute podcast interview into five parts for easier reading. This section covers performance management and testing.
Mike: “When it comes to really achieving business value from IT investments, lots of folks will really look at performance. Can you give us some background on some current performance management strategies that are out there?”
John: “I can compare that to what our traditional approach to performance management has been and again, if in the old world, entire applications were produced and then they were performance-tested in a lab and released for production. This worked okay when the thing we built was a 100% self-contained thing.
“But now back to an analogy of a service-based architecture, where I am producing applications that consume services from you, Mike. What do you do to performance test or to manage the performance of your service? You are no longer a complete application. I can’t ask you to produce your service, and send it individually into the performance lab if the performance lab hasn’t retooled? I’ve got to think about performance testing at the services layer. I’ve got to think about the performance testing even at the orchestration layer. By that I mean there may be two or three services that collaborate to make one business process or even one sub-process of the business function.
“Once we start to think about performance management, now that I’ve broken my big box application into a number of components; I’ve got to do the same thing with performance. I’ve got to break my run-time SLA into the component-level SLA (Service Level Agreement) that’s basically my performance and resource expectations that I’ve put on my applications. So, Mike, I might need to do something that we call decompose the SLA where my application must run at 2 seconds or less response time for the user.
“If 2 seconds is what I have -- I need to get your performance under 2 seconds so that I can meet my SLA. Your response time is a component of my response time, because I am invoking you -- I am dependent on your service. What I need to do is negotiate (for lack of a better phrase) your SLA with you--we call that decomposing the SLA. You need .5 seconds, then my application can consume 1.5 seconds, and therefore, I can meet my overall SLA. This is in many ways a new way of thinking about performance than previously. We used to just think about 2 seconds -- we dropped the application in a lab to see if we had 2 seconds. Now Mike’s team has to go to the lab and has to meet .5 seconds, and my team has to go to the lab to meet not 2 seconds but 1.5 seconds. I’ve got to make sure that my 1.5 seconds will combine with your .5 seconds.
Make sense?
Mike: “Once you have an application completed, a lot of times they are difficult to tune for performance. What are you specifically doing to address that?”
John: “This notion of taking our SLA apart into components actually helps us do this. It does for a couple of reasons the biggest is: if we can performance test within the life cycle, if we can do our testing as a part of our own development than we can find and resolve our performance issues much more quickly. In my experience, and I bet many people in the performance field will nod in agreement, so many times if we build an app that does not scale and then try to tune it into scalability, the best we’ll do is 10 or 20% improvement. You can’t add hardware and make a product that is un-scalable all of a sudden scalable--it just doesn’t work. If you had known earlier where those scalability issues are--than you can build in the scalability along the way a whole lot easier.
This is essentially the same reason why we iterate from a requirements definition standpoint, because the more we iterate thru the scope of our work the better we are at delivering on it and the faster we are The same is true with performance, we have to iterate and we have to do performance testing at the component level in order to get that performance up and the scalability up.”
Mike: “Are you doing anything to address the challenge of discontinuous development cycles?”
John: “Yes. Because of the fact that you may be building your component at a different lifecycle pace than my application even though I am a consumer of your component, (usually we call that service) our challenge is that I got to have access to your service maybe at a time you don’t even have it ready for me. You might have down stream dependencies of your own. As soon as I think of you as “Order Management” -- you might turn around and say: Yes, I do order management but I sit on top of the customer database that I don’t own, and I sit on top of the ERP system that I don’t own either. So, you have your own set of dependencies. So if we actually fit it on a white board, most large-scale enterprises would literally fill a white board with very small little boxes all interdependent on each other and they are all iterating on their own life cycle. It drives me nuts just thinking about how anything actually works!
“So in order for us to support this kind of distributed lifecycle, we’re doing a lot more virtualizing of the components underneath each other.
“What we mean by this is: since I am invoking your service, I am dependent on your service, but what if your service isn’t available to me yet, what if I need you to change your service and your changes aren’t ready yet? What if I need to do performance testing of my application on an interim basis but I don’t have scalable production quality hardware to run your service on? Or, maybe worse, maybe your service talks to the mainframe, but I don’t have availability of mainframe resources to do this? All these are what we call constraints. These are real world constraints which we are dependent upon.
“What we do is: we virtualize the behavior of these things. What I will do is construct a model (our LISA Virtualize product is really good at doing this) -- it constructs a model of the behavior of your service, and once that model exists I can run that model in a virtual environment, it doesn’t have to have the real world constraints; it’s a virtual thing. So, if I need it early, I’ve got it. If I need it to scale to 500 transactions a second, I can do that. Even though I don’t have production type hardware for your real service, I can virtualize it and get the kind of performance that I’m looking for.”
Mike: “It sounds like you would be able to provision the right hardware and platform for a given solution then.”
John: “Right, because what we do is that we dramatically reduce what that requires. A classic example for one of our customers is: I’ve got an application that is a portal-type application that invokes a set of services. Some of those services talk to SAP. SAP is a great ERP system and many large organizations use it; but of course we don’t want a lot of performance and load testing type transactions against that system. We don’t always have availability when we need it. We don’t always have the data in the database when we want to -- SAP testing is a very complex field -- so there are all kinds of real world constraints. What we do is we virtualize the behavior of SAP and in doing so, your portal has access to what it needs 24/7 and I don’t have to provision the entire SAP environment to do it. All I have to do is provision my own application, we can’t virtualize away you --but we can virtualize away your dependency. and once we do that, we decouple you away from a lot of the challenges that those constraints have.”
Mike: “When it comes to performance labs, it seems like a lot of them really rate data management as a very big problem. What can you tell us about dealing with that challenge in leveraging LISA?”
John: “One of the great challenges with data in performance labs … is really two challenges. One is that you can never talk to just one database in a performance test. You are usually dealing with multiple data sources. For example (as I mentioned a minute ago), order management is dealing with a customer database in one system and with the orders database, let’s say, underneath SAP -- so here’s two completely independent systems that are now interdependent on each other with regard to data. Now here I am in a performance lab and I’m trying to support hundreds of scenarios of real world use of my application. In order to do that, I need the data between those two databases populated in a certain way and synchronized between each other and whatever changes are made in one database, I need those changes to be reflected appropriately in the other database. Do you know how often this happens? Much, much less frequently than Christmas!
“What we have is a dynamic that we see all the time, you’ll go into the performance lab wanting to do some 100 different scenarios of use, these are all the use cases you want to be sure are performing and scalable in your application; but by the time you figure out just how to get all the data to synchronize between these two systems, the best you’ll come out with is about 20. Now the problem is that you’ll performance test just 20 scenarios, but you’ll go to production with all 100. You’ll find out the hard way whether the other 80 are a problem or not, because your real world users will be doing those other 80 (the one’s you couldn’t performance test) and the problem is: if you have scalability or performance or resource issues in those other 80 … then you have a problem.”
Mike: “When it comes to best practices in the area of performance and load testing, what can you tell us about decomposing performance expectations?”
John: “I gave a quick example of this earlier, I’ll elaborate on this a little bit.
“If the production SLA for a particular application; let’s say mine is 2 seconds…that’s the example that I gave previously … where my response time to my user is 2 seconds … but one of the things that my application does is to consume services from your team, Mike, I need to negotiate your response time as a part of my response time. So what I do is decompose that SLA. What I’ll do is say, ‘okay, I’ve got to make sure that I can get Mike’s service calls in a short enough amount of time so that my application overall’ -- once you add up all of the time consumed – it doesn’t trip over our expected response time.
“So we help by giving the customer the ability to do two things. The first is: we’ll help them do performance testing at the component level. Mike, you can performance test your service. You can say, ‘hey John, I can hit my half-second response time metric independent of you. I owe you the right functionality at the right performance. I can functionally test this system to make sure it works and I can performance test my service (independent to the rest of the world) at .5 seconds so I have done my job.’
“Now, I come into the picture and I can do performance testing on my application and make sure I hit my SLA as well. In order for me to hit my SLA, I have to take you out of the mix. That’s why we talk a lot about virtualization in the performance lab.
“So second, I can simulate your service’s behavior and I can even simulate your behavior at .5 seconds. I can say, ‘alright Mike’s virtual service; I want you to behave like Mike’s virtual service, and I want you to take .5 seconds whenever you respond to me.’ By the way, I can even say ‘what if.’ What if your response time takes .6 seconds, what does that do to my overall response time? What if you’re actually running a little faster at .4 seconds …what does that do to my response time?
“Sometimes the fastest way for me to get better scalability is to negotiate you a little faster so that my application hits its SLA: stuff like that. So that’s why performance testing as a component and virtualizing married together really give you a solid foundation for performance management in SOA.”
Mike: “We had mentioned provisioning before, how does that come into play in terms of optimizing provisioning relative to managing performance; should one be done before the other and how is that relevant?”
John: “The relevance is that we need to be able to capture virtual services as a part of the provisioning activity. Where in the past, I needed the entire SAP system to be available for provisioning, now I just need the virtual services that encapsulate the behavior of SAP; I need that to be the provision item instead of the whole SAP system.
By the way, when you think about these large-scale applications, often there are not only many servers with a very specific and complex kind of configuration, but there are huge databases underneath these big systems. So very frequently we have customers tell us that they can’t do development work against this massive system because it’s got a terabyte-sized database underneath it. In order for me to get access to it, I have to wait till midnight to 2 a.m. because that’s the only time they will give me time on the system. When I try to do performance testing, they refuse to let me because in order to avoid impacting production, they limit the amount of non-production activity on that particular system. All of these kinds of constraints are those very same kinds of things.
“So, back to provisioning, if I need access to this particular database or this particular large application that spans many servers, I encapsulate all that behavior into one virtual service and I can provision that one unit to represent that entire set of systems.”
This concludes part three. In the next post we cover virtualization and testing.