Thursday, March 29, 2007

Service Level Automation Deconstructed: Measuring Service Levels

This is the first of three in my series analyzing the key assumptions behind Service Level Automation. Specifically, today I want to focus on the measurement of business systems, and the concepts behind translating those measurements into service level metrics.

Rather than trying to do an exhaustive coverage of this topic (and the other topics in this series) in a single post, what I am going to do is provide a "first look" post now, then use labels when followup posts have relative information. The label for this topic will be "measure".

In my next installment, I'll introduce analysis of those metrics against service level objectives (SLO) the business requires. That post, and future related posts will be labeled with "analyze".

In the final installment of the series, I'll describe the techniques and technologies available to digitally manipulate these systems so that they run within SLO parameters. Posts related to that topic will be labeled "respond".

As noted earlier, my objective is to survey the technologies, academics, etc., of each of these topics in an attempt to enlighten you about the science and technology that enables service level automation.

How do we measure quality of service?

Measuring quality of service is a complex problem, not so much because it is hard to measure information systems and business functionality. I (and I bet you) could list dozens of technical measurements that can be made on an application or service that would reflect some aspect of its current health. For example:

  • System statistics such as CPU utilization or free disk space, as reported by SNMP
  • Response to a ping or HTTP request
  • Checksum processing on network data transfers
  • Any of dozens of Web Services standards

The real problem is that human perception of quality of service isn't (typically) based on any one of these measurements, but on a combination of measurements, where the specific combination may change based on when and how a given business function is being used.

For example, how do you measure the utilization of a Citrix environment? Measuring sessions/instances is a good start, but--as noted before with WTS--what happens when all sessions consume a large amount of CPU at once? CPU utilization, in turn, could fluctuate wildly as sessions are more or less active. Then again, what about memory utilization or I/O throughput? These could become critical completely independently from the others already mentioned.

No, what is needed is more mathematical--one (or a couple of) index(es) of sorts generated from a combination of the base metrics retrieved from the managed system.

There are tools that do this. They range from the basic capabilities available in a good automation tool, to the sophisticated evaluation and computation available in a more specialized monitoring tool.

What I am still searching for are standard metrics being collected by these tools, especially industry standard metrics and/or indexes that demonstrate the health of a datacenter or its individual components. I'll talk more about what I find in the future, but welcome you to contribute here with links, comments, etc. to point me in the right direction.

Monday, March 26, 2007

Service Level Automation is green--when done right

Vinay Pai has a post today that I think spells out one of the key benefits of Service Level Automation. Remember, the key aspect of SLA is knowing what capacity is required to maintain target operational goals. A basic SLA environment will assure that only the necessary capacity is active at any time.

So, if capacity is not being used at any given point in time, why have it consume power or cooling at all? Some "automation" products require underused physical infrastructure to remain running in order to support their management layers--just in case. This is unfortunate, as an idle hypervisor is just idle capacity. Its not serving a current business need.

A truly efficient SLA platform is aware of the power controller states of each of its physical servers, and can power down unused servers. Servers are only turned on when they are needed to meet some aspect of the system's service level goals.

Note Vinay's description of the QA labs at Cassatt. As you might expect, he pushes the limits of what a SLA datacenter must endure, yet can always scale his power and cooling needs to his current workload. Can you say that about your datacenter?

Thursday, March 22, 2007

Service Level Automation Deconstructed: Introduction

Service Level Automation starts with three simple premises:

* The factors contributing to software service quality can be measured electronically.

* Runtime targets indicating high quality of service can be defined for those measurements.

* Systems involved in delivering software functionality can be manipulated to keep those measurements within the runtime targets.

I think the support for each of these premises should be explored more deeply, so I plan to begin a little survey of the technologies and academics over the next few weeks. The idea is to get a good sense of what standards/technologies/concepts/etc. can be used to meet the requirements of each premise. I also hope to discuss how a system smart enough to take advantage of them(*) can save a large datacenter both in terms of direct costs, as well as in losses due to service level failures.

Why Service Level Automation? I wrote about this earlier. However, as a quick reminder, think of service level automation as meeting this objective:

Delivering the quantity and quality of service flow required by the business using the minimum resources required to do so.

I've been quite busy both at work and at home, so I'm hoping to use this exercise as a way to increase my posting frequency. Stay tuned for more.

Wednesday, March 21, 2007

5 things...

The latest blogosphere social phenomenon has reached me doubly this week. Both Ken Oesterich and Ken Wallich have tagged me in the ongoing "5 things tag" that has been sweeping the blogging community (especially the tech bloggers).

Here are five things most people do not know about me:

  1. I was born in Reading, England.
  2. I play pretty decent guitar. I don't know that many songs by other people (a problem when whipping out the guitar at parties), but I have several original works that I think hold their own very nicely against most pop drivel. Lately, however, I have been working on "Tears in Heaven" by Eric Clapton.
  3. I played Mr Anthrobus in Thornton Wilder's "The Skin of our Teeth" in high school. I was a geeky, awkward teenager trying to play a 40 year old man, and was the only member of the primary cast not to win an award for my performance in that show. Now that I am 40, I wonder what the hell was so hard...
  4. My computing career started in fifth grade in Cedar Rapids, Iowa. I was lucky enough to get in a science focused program at a nearby elementary school, and one of the kids' moms was one of the first BASIC programmers at Rockwell Collins, the aviation electronics firm. She came to our school once a week and taught us the basics of variables, loops, conditional statements and subroutines. Very cool. I got caught a bunch of times programming on the teletype terminal in the back of the classroom while I should have been listening to the teacher. Later, my luck continued as the father of one of my close neighborhood friends bought the fifth (or something like it) Apple II computer in the state of Iowa. We would program in BASIC every day after school, and tried to get into writing games and such.
  5. Later, in college, I was determined to be a Music/Computer Science double major...for all of one semester. I didn't practice the music stuff enough, so I got a low grade there, and I hated my systems organization class, so I lost interest in computer science. (Dumb reason, now that I look back, but it worked out.) Instead, I started taking every math and physics class that I could, and finished with a Mathematics/Physics double. The day of graduation, I swore to my friends "I will NEVER be a computer programmer for a living". Two and a half years later, I was coding C for a small manufacturing company. (Do not try to predict the future, even your own. Its pointless. Setting goals is OK, but be willing to float a bit with the breeze.)

Now, let me please introduce to you five more randomly selected from my blogosphere:

  • My mom.
  • Katie Tierney, a former collegue with excellent technical intuition who is proving herself to be a hell of a "head of household" as well.
  • Rama Roberts, another former collegue whose blog never fails to entertain and enlighten.
  • Management guru, Tom Peters, who reenforces my drive to amaze both my employers and customers by being a service professional first and foremost.
  • Alessandro Perilli, author of the virtualization-focused virtualization.info blog.