Tuesday, September 04, 2007

An easy way to get started with SLAuto

It's been an interesting week, leading up to the Labor Day weekend, but as of this morning I get to talk more openly about one project that has been taking a great deal of my time. As I have blogged about Service Level Automation ("SLAuto"), it may have dawned on some of you that achieving nirvana here means changing a lot about your current architecture and practices.

For example, decoupling software from hardware is easy to say, but requires significant planning and execution to implement (though this can be simplified somewhat with the right platform). Building the correct monitors, policies and interfaces is also time intensive work that requires the correct platform for success. However, as noted before, the biggest barriers to implementing SLAuto and utility computing are cultural.

There is an opportunity out there right now to introduce SLAuto without all of the trappings of utility computing, especially the difficult decoupling of software from hardware. It is an opportunity that the Silicon Valley is going ga-ga over, and it is a real problem with real dollar costs for every data center on the planet.

The opportunity is energy consumption management, aka the "green data center".

Rather than pitch Cassatt's solution directly, I prefer to talk about the technical opportunity as a whole. So let's evaluate what is going on in the "GDC" space these days. As I see it, there are three basic technical approaches to "green" right now:
  1. More efficient equipment, e.g. more power efficient chips, server architectures, power distribution systems, etc.
  2. More efficient cooling, e.g. hot/cold aisles, liquid cooling, outside air systems, etc.
  3. Consolidation, e.g. virtualization, mainframes, etc.

Still, there is something obvious missing here: no matter which of these technologies you consider, not one of them is actually going to turn off unused capacity. In other words, while everyone is working to build a better light bulb or to design your lighting so you need fewer bulbs, no one is turning off the lights when no-one is in the room.

That's where SLAuto comes in. I contend that there are huge tracks of computing in any large enterprise where compute capacity runs idle for extended periods. Desktop systems are certainly one of the biggest offenders, as are grid computing environments that are not pushed to maximum capacity at all times. However, possibly the biggest offender in any organization that does in-house development, extensive packaged system customization or business system integration is the dev/test environment.

Imagine such a lab where capacity that will be unused each evening/weekend, or for all but two weeks of a typical development cycle, or at all times except when testing a patch to a three year old rev of product, was shut down until needed. Turned off. Non-operational. Idle, but not idling.

Of course, most lab administrators probably feel extremely uncomfortable with this proposition. How are you going to do this without affecting developer/QA productivity? How do you know its OK to turn off a system? Why would my engineers even consider allowing their systems to be managed this way?

SLAuto addresses these concerns by simply applying intelligence to power management. A policy-based approach means a server can be scheduled for shutdown each evening (say, at 7PM), but be evaluated before shutdown against a set of policies that determine whether it is actually OK to complete the shut down.

Some example policies might be:

  • Are certain processes running that indicate a development/build/test task is still underway?
  • Is a specific user account logged in to the system right now?
  • Has disk activity been extremely low for the last four hours?
  • Did the owner of the server or one of his/her designated colleagues "opt-out" of the scheduled shutdown for that evening?

Once these policies are evaluated, we can see if the server meets the criteria to be shut down as requested. If not, keep it running. Such a system needs to also provide interfaces for both the data center administrators and the individual server owners/users to control the power state of their systems at all times, set policies and monitor power activities for managed servers.

I'll talk more about this in the coming week, but I welcome your input. Would you shut down servers in your lab? Your grid environment? Your production environment? What are your concerns with this approach? What policies come to mind that would be simple and/or difficult to implement?

4 comments:

Anonymous said...

Nonsense. Switching from controlling virtualization and utility computing to becoming a giant on/off switch?

James Urquhart said...

On the contrary, SLAuto applied to the central problems of the time: virtualization sprawl, resource optimization and power consumption management. In fact, you've helped highlight the fact that Service Level Automation is a technical approach that is applicable to a variety of infrastructure management problems.

LanceW said...

Anything that makes it easier to introduce service level automation into a data centre sounds good to me.

Once it is place and found to be useful the NEXT project might use the more advanced features.

Anonymous said...

With this new blog and announcement, I bid you a fond farewell. I will contact you off-line as I don't like these anonymous postings. We are no longer actively monitoring Cassatt, and will turn to suitable alternatives.