The Wisdom of Clouds: 2006

Monday, December 18, 2006

The Organic Cluster

Several weeks have gone by since my last post (OK, several months...), so lots to talk about in the next few days. I want to start the conversation again by talking about one of the really cool advantages of Service Level Automation in enterprise distributed application architectures.

We've been talking about distributed application architectures, and how their tight coupling to underlying hardware and middleware has left most production applications running in infrastructure "silos", where spare capacity is locked up and unavailable to other systems.

Traditionally, nowhere has this been more obvious that with your database servers. Not designed to be scalable horizontally, these servers have relied on an excessively high amount of overprovisioning to be absolutely sure that performance was consistently high, regardless of actual demand. Do you need more capacity for your database than the current server provides? Then buy a bigger box and figure out how to migrate the database without crippling the business.

The exception to this story (right now) is ORACLE 10g RAC (Real Application Clustering). RAC is a grid-based database engine, which uses clustering technology to allow the RDBMS to be distributed across several servers. This increases availability greatly, and allows for an easier upgrade path when new hardware is required.

Unfortunately, as designed, the ORACLE cluster still requires the administrator to allocate excess capacity in case of high demand. Each node of the cluster must be running at all times (in the default architecture), which means each server must be dedicated to RAC whether it is needed or not.

A good Service Level Automation environment gives the administrator an interesting new capability, however. Because the ORACLE cluster can run with less than the maximum number of servers defined for the cluster (which is how the database keeps running when a server node is lost), it is possible to capture the cluster in an image library, and then allocate nodes only according to actual demand. No need for weird code or configuration changes to ORACLE, and no need to have spare capacity "dedicated" to RAC.

If more capacity is needed, the SLA environment will grab it from the pool of spare capacity available to all applications, not just RAC. When demand is detected to exceed the safety margins of the current "live" set of servers, the SLA environment boots up a new node, and RAC "rediscovers" its "lost" node. When demand falls away, the SLA environment shuts down an unneeded node, and RAC just detects that a node went down, but keeps on chugging.

Lest you think this is a pipe dream, my employer Cassatt has this running in its labs and has indeed provided proof-of-concept to prospective customers. And they like it. Which is another reason why Service Level Automation is changing the way IT runs.

Monday, October 16, 2006

Two important links...

I have two new links to trumpet:

Me doing the Cassatt schtick for the world to see. (Note the great hair!)
My new Service Level Automation del.icio.us page, with links to a variety of interesting sites related to service level automation, virtualization and (okay, I have to be loyal to my employers) Cassatt. (I have added this to the links section on the right of the http://servicelevelautomation.blogspot.com landing page)

Thursday, October 12, 2006

The Datacenter is Dead! (Or Just Mutating Badly!)

I used to be a member of an object oriented programming user group in St. Paul, MN run by a professor at St. Thomas University (if I recall correctly--I can't even remember his name). This man was a tireless organizer of what was then a critical forum for fostering MN software development expertise. He was also a frequent speaker to the group, and one speech always comes immediately to mind when I remember the "good old days".

This computer science professor stood in front of a highly attentive audience one evening and declared "data is dead!"

His point was that if we modified our models of how computers stored data persistently to use a "always executing" approach, the need for databases to manage storage and retrieval of data from block-based storage would be made obsolete. (I view "always executing" systems much like your cell phone or Palm device today; when you turn on your device, applications remain in the state they were in when you last shut it off.)

Its funny to remember how much we thought objects were going to replace everything, given the intense dependency we have on relational databases today. But his arguments forced us to really think about the relation between the RDBMS and object oriented applications. One result of years of this thinking, for example, is Hibernate.

Jonathan Schwartz, my beloved leader in a former life, recently blogged about the future of the datacenter, contending that the need for large, centralized computing facilities are numbered. In other words, "the datacenter is dead".

His contention is that the push towards edge and peer computing with "fail in place" architectures would make central facilities tended by technology priests obsolete. Ultimately, his point is that we should reexamine current enterprise architectures given the growing ubiquity of these new technologies.

I have to say, I think he makes a good argument...up to a point. My problem is that he seems to ignore two things:

Data has to live somewhere (i.e. data is certainly not dead)
People expect predictable service levels from shared services--the more critical those service levels, the more critical that those service levels can be guaranteed.

Rather, I think that the days of the company owned datacenter are beginning to wane, and that the future is in a combination of edge computing and commercial computing utilities which will offer service delivery at guaranteed service levels.

I think its good news that, in order to achieve such a vision, we must take baby steps from the static, siloed, humans-as-service-level-managers approach of today's IT shops.

As you may have guessed from my previous blogs:

I believe the first of these steps is to shed dependencies between software services and infrastructure components.
Following that we need to begin to turn monitors into meters, capturing usage data for both real time correction of service level violations, as well as analysis of usage and incident trends.
Finally, we need the automation tools that guarantee these service levels to operate across organization boundaries, allowing businesses to drive the behavior (and associated cost) of their services wherever they may run in an open computing capacity marketplace.

The cool thing to think about is how SLA applies to the edge devices, though. Can we guarantee that necessary processing will occur both in backend data and services utilities as well as our edge and interface devices? How about in a peer network environment, especially one where one organization does not own or manage all of the computing capacity running the service?

No, neither data nor the datacenter are dead, they are just evolving quickly enough that they may soon be unrecognizable...

Thursday, October 05, 2006

InfoWorld: ITs Virtual Assett Economy

Check out:

http://www.infoworld.com/article/06/10/04/41OPcurve_1.html

Hmmmm… Service Level Automation, anyone?

“When money is distributed to managers for IT-related purchases, that capital goes to IT with the investor’s minimum requirements attached. Ideally, those requirements will be expressed in terms that are accessible to the investors…”

Great concept. Almost a "commons" (in the 18th century farming sense) for computing resources. Certainly many simularities to commodity market models as well (e.g. options, trading, etc.).

Tuesday, October 03, 2006

Service Virtualization defined

The biggest issue I've had with server virtualization vendors has nothing to do with the applicability of their products. I believe firmly that hardware virtualization is key to truly optimizing capacity usage and costs in a datacenter environment. No, the problem I have is with the rather insane notion that solving your hardware utilization problems solves your IT problems. In other words, as long as you can manipulate servers, your costs will be minimized and compliance with service levels will be a no brainer.

That's crazy.

My argument starts with the observation that its not server utilization service levels that businesses care about, but the quality and availability of the services that run the business that really matter. From a business perspective--from the view of the CEO and CFO--its not how many servers you use and how you use them, its how many orders you gather and how cheaply you gather them.

So, this focus by the VM companies on hardware and manipulating servers (virtual or not) falls short of meeting the goals of the business. Look closely at what VMWare, Virtual Iron and even XenSource are offering:

Virtual Servers. This is the core of their value proposition, and its by far the most valuable tool they deliver. As we established earlier, this is needed technology.
Virtual Server Management. VirtualCenter, Virtualization Manager, etc. provide key tools for managing virtual servers.
Automation. Tools to provision, expand, move and replace servers based on current observed conditions.

What's missing from all this? I'll give you a hint: where's the word "service" above?

Virtual machine technologies have no concept of a service, or even an application. They barely have the concept of an OS. This is by design; if they focus on the hardware virtualization problem, they have a fairly simply bounded problem--just make some software behave exactly like its emulated hardware platform. That way, you can cover existing software installations with minimal effort, and don't have to worry about the vagrancies of application, network and storage configuration. All of that can happen outside of a simple virtualization wrapper targeted at making the virtual servers work in the physical reality.

The true holy grail for IT, in my opinion, is service virtualization. I will define service virtualization as technology that decouples a set of functionality (a web service, an application, etc.) from any of the computing resources required to execute that functionality, regardless of whether those resources are themselves hardware or software. What we ultimately want to do is to optimize the delivery of this functionality by whatever metric is deemed important by the business.

Thus, I applaud the validation of policy-based automation, decoupling of physical hardware from software and automated response to server load and failure that the VM companies are clearly giving. However, I caution each of you to consider closely whether automating server management is enough, or if service virtualization is the better path.

Monday, September 25, 2006

Sidenote: BEA World and SLA

I didn't get a chance to post last week, primarily because much of the week was divided between the usual POC planning scramble and attendance/exhibiting at BEAWorld2006 in San Francisco. I took notes from some of the Wed. morning keynotes, and I thought I'd share the biggest observations here.

Mark Carges, Executive Vice President, Business Interaction Division, spent an hour talking a little about JRocket and WebLogic RealTime, and a lot about Business Process Automation and Enterprise Service Buses. His primary focus seemed to be to introduce BEA into the "Process as a Service" space, as well as an interesting focus on the on-line, ad-hoc collaboration space (aka "Web 2.0"). They had some cool WIKI/forum/identity tagging tools, for instance. Check their website for BEA Enterprise 2.0 and Workspace 360 if you want to know more.

The "expert panel" discussion included Rob Levy, BEA CTO; Cliff Booth, VP of Enterprise Architecture; Paul Patrick, VP and Chief Architect of Aqualogic; Larry Cable, Chief Architect of WebLogic; and Annie Shum, VP and SOA "visioneer".

The best thing to come out of this discussion was the observation that SOA and BPA are driving organizations to involve their operations groups much earlier in the development/deployment game. The primary reason seems to be the need for innovative infrastructure planning to support dynamic needs. Sound familiar?

I even asked at a later "SOA Reality Check" session where the featured BEA customers saw operations in their development/deployment cycle, and both were very adamant that they are now being forced to consider infrastructure much earlier in the game, not because they need to acquire hardware for each new service, but precisely because it is too expensive to do so.

All in all, a good show. Still a year or two away from SOI and SLA dominating the conversation, but signs that things will get there.

Wednesday, September 13, 2006

Service Level Automation Glossary

No decent glossary for this subject exists on the web, so here is my stab at the basic terms of interest:

Service - Any function provided by the IT organization on behalf of any "customer", including internal business customers and external revenue customers. In the context of SLA, a service is typically a software application component delivering functionality to humans or other software components.

Service Level - Any measurement of how any component of an IT organization or its infrastructure is performing for its customers.

Service Level Goal - A target or limit measurement against which actual service levels are measured. Typically, service level goals are used to contstrain the acceptable value (or range of values) at which service levels are considered acceptable.

Service Level Automation (or SLA) - Digitally managing IT resources to service level goals without the interventions of humans wherever possible. Adjustments can be made to the deployment, capacity, or configuration of IT applications and infrastructure as needed to meet these goals.

Service Virtualization - Technology that decouples a set of functionality (a web service, an application, etc.) from any of the computing resources required to execute that functionality, regardless of whether those resources are themselves hardware or software.

I will update this as I go, and repost it occasionally. Please feel free to comment on this post with suggestions for terms and/or definitions that should be covered.

Tuesday, September 12, 2006

Service Level Automation Defined

What is it that businesses really want from their IT organizations? Bill Coleman tells the story of the CIO of a Fortune 500 Silicon Valley firm that put it this way:

"I measure myself on only two things, how many quality services I provide to my business, and how cheaply I do it."

The guts of what IT is all about is service levels. Each organization establishes--either explicity or implicitly--target goals for how IT resources will perform to meet business objectives. Service level goals can be made for any measurable trait of the IT infrastructure, including value, utilization, quality, performance, availability and cost (both to acquire and to operate). These goals can be set at a technical level (e.g. CPU utilization, transactions per second on a database, etc.) or business level (e.g. number of orders processed per day, percentage of orders resulting in a complaint, etc.).

Now, computers are supposed to automate the functions and processes required to meet business objectives. So, why are there so few solutions for optomizing IT processes to meet these same objectives?

A big part of the problem is the tight coupling of software to hardware that I have discussed in several recent blogs. If its expensive to realocate resources to meet new business needs, then we will tend to minimize the number of changes we allow in the datacenter. Optomizing anything manually takes weeks or months of planning, and is usually too little too late anyway.

There are a variety of point solutions to specific steps in the IT process. Provisioning is a good example, as is trouble ticketing. However, none of this automates anything based on meeting service levels, they just cheapen the human processes already in place--processes that tend to be focused more on saving an individual time than in optimizing the datacenter as a whole.

Focusing on adjusting the environment to meet business goals takes a whole new way of thinking. Decoupling software from hardware, etc., is a first step. Once we've done that, we need to leverage relatively recent technological advances that allow us to delivery the software to the hardware in an automated, optimal fashion...

Friday, September 08, 2006

Decoupling software from hardware: a story

Imagine two companies merging into one—two production datacenters, two sets of architecture standards, and two unique installed software baselines.

In any modern corporate merger, one key element of success is using economies of scale to boost the value of each company’s technology property. To achieve this, the combined organizations’ applications must be deployed into the most cost effective operations environment possible.

How would you handle this in your company today? My guess is that your applications remain tightly coupled to the hardware systems they run on, so moving applications also means moving hardware. Achieving an efficiency such as consolidating the combined company’s IT to one datacenter is insanely expensive, as you not only have to deal with the logistics of moving the hardware from location to location, but you have to:

Find real estate, power, air conditioning capacity, and expertise in the surviving datacenter to support the relocated hardware.

Configure the surviving datacenter’s networks and storage, as well as the relocated software payloads (including application, middleware platform (if any) and operating system) to allow the applications to actually work.

Begin the arduous process of consolidating the software stacks of both the indigenous and relocated applications, including porting applications to approved platforms, resolving functional overlap between applications and integrating functionality to meet business process needs.

The unfortunate truth is that that you would put together a large scale project plan involving thousands or millions of dollars for logistics, datacenter capacity and human resources. The capital and operational expenditures would be overwhelming, and this project plan would take months—possibly years—to implement. What an incredible burden on the success of the merger.

Now, consider a model where both company’s software is decoupled from the computing resources they run on. In the ideal example, the application and data images from the decommissioned datacenter would simply be moved into the surviving data center and executed on available computing capacity there. As the granularity being relocated is applications (or even individual services), these components can simply be deployed on the OS/middleware stacks already supported in the surviving datacenter. The business owners of the relocated applications don’t care what platforms the application runs on, as long as they run.

If OSes or middleware need to be migrated as well for technical or business reasons, they too can be simply delivered in portable software payload images that can be simply allocated to computing resources as needed. In fact, each layer of any software stack can be managed and delivered as a separate, portable “building block” that does not need extensive installation or configuration to operate on completely new computing resources.

If additional capacity is needed to support the relocated applications, only the number of systems required to provide that capacity would need to be moved. Only a fraction of the excess capacity from the decommissioned data center would need to be shipped. The rest could be sold or discarded. In fact, as both the surviving and decommissioned data centers would have their own excess capacity for load spikes, failover, etc., it is possible to reduce the number of computing resources in the combined datacenter just by consolidating that excess capacity. (Excess capacity includes unused CPU time, memory, storage and network bandwidth.)

Granted, this ideal is far from being realized (though all of the components are there in one form or another—more about that later), and there are additional fixed costs associated with decommissioning a datacenter. The cost of supporting the relocated software in its new home is also a constant. However, think about how much money was saved in the above example. Think about the amount of time and pain that was removed from a datacenter consolidation effort. This is just one example of the advantages of decoupling software from hardware, operating systems and middleware.

Next I'll talk about what technologies are contributing to decoupling software from hardware today. I'll also connect all of this to the subject of this blog, service level automation. Stay tuned...

Friday, September 01, 2006

Loosening the Bonds Between Software and Hardware

Why is software still so tightly coupled with hardware?

The days when the applications run on a computer provided their own system control software are long gone.

We have introduced operating systems to allow applications to be built without specific knowledge of the hardware it will run on.

We have created software layers to separate the application from the operating system in the quest for write once, run anywhere.

Our operating systems have gotten more intelligent at delaying the coupling with hardware to the last second. Case in point, most *NIX installations can be moved from compatible bare-metal server to bare-metal server and booted without modification.

Lately, we have even introduced a software layer between the hardware and the operating system to allow any operating system to be decoupled from its hardware. (Though now the OS is coupled to virtual hardware instead.) This is critical in a non-portable OS like Windows.

Yet, despite all of these advances, most of us use the following deployment model:

Take a new server (or remove an existing server from user access)
Install an OS (if its not there already)
Install any libraries and/or containers required for the application(s) (if not there already)
Install/upgrade the application(s)
Test like crazy
Put the server in the data center (or allow user access again)

Once this deployment is completed, that server is forever hosting that application or its future upgrades. Almost never do we actually repurpose a server for another application, because the cost of doing each of those steps listed above is so high.

Now, to be fair, virtual server technology is allowing us to be more flexible in how we use physical hardware, as we can move OS/container/application stacks from one machine to another. But I want you to note that this requires that each physical server involved be loaded with a hypervisor for that VM technology; in other words, the physical server remains tightly coupled to the VM environment, and can only be used for VM hosting.

(I've always wondered why we are working so hard to move our systems to commodity physical hardware, but are just fine with deploying those same systems on proprietary virtual hardware, but never mind.)

Ideally, I would like to see us have the capability to move software stacks across physical resources without making those physical resources dedicated to *any* single software technology. In the Linux (and other Unix flavor) world, this is actually somewhat possible, but we are a long way from this in the Windows world. Its going to take standardized HAL (Hardware Abstraction Layer) and power management interfaces, OSes that can adjust to varying hardware configurations, and will ultimately require operations automation tools to take full advantage.

Stay tuned for a vision of what the world could be like if we achieved this level of decoupling...

Thursday, August 31, 2006

On finding a revolution...

For over 15 years, I have been involved in the development, deployment and management of distributed application systems. Most of my career has been in the application development space, working as a systems analyst for a small manufacturer and a large systems company, a consultant/contractor for a variety of development projects, and as senior prinicipal consultant for Forte Software, Inc. before its aquisition by Sun Microsystems.

A lot of my focus was on large scale distributed systems, such a online reservation systems, automatic toll collection systems and CRM environments. In addition to the difficult problem of building applications that could handle tens of thousands or even millions of users, I was often exposed to the problem of delivering these applications to a distributed infrastructure that could, in turn, deliver the application's potential capacity.

It was with this experience in mind that I set out looking for "the next great opportunity" in distributed software development earlier this spring. My initial feelers were telling me that there was tremendous momentum in the open source space, specifically around AJAX and the new scripting languages (Perl, Python, Ruby on Rails, etc.). However, I was disappointed by the strong sense that I had in each of these opportunities that I had already "been there, done that". I wanted something truly new.

Around this time I was introduced to Cassatt Corporation, the latest venture of Bill Coleman, the legendary former CEO of BEA Systems. Bill is perhaps best known for building BEA from 0 to 1 Billion dollars in revenue in record time. A powerful accomplishment, yes, but I was also aware of his tremendous reputation as a visionary, so I was very interested in talking to his team about what he was up to. The answer was the "next great thing" I was looking for.

Cassatt is one of a few companies (if not the only one) working to deliver a true Service Oriented Infrastructure. Nice hype words there, but the basic question being addressed at Cassatt is, if we have made great advances to decouple software components from each other, why has so little been done to decouple the software from its host hardware?

In future blogs, I will address this question, and discuss in depth a vision of Service Level Automation in the datacenter. Service Level Automation focuses on allowing businesses to define the required runtime characteristics of an application, and having an intelligent infrastructure maintain the resources and capacity necessary to achieve those characteristics.

My intention is to have this blog discuss a wide variety of both technical and social implications of this vision, and not focus as much on Cassatt's role in all of this. However, I should disclose that I liked Cassatt's vision so much, that I joined the company, and am honored to be part of the birth of such a critical yet disruptive technology.