Wednesday, November 28, 2007

Run Book Automation and SLAuto

I am attending the Gartner Data Center Conference at the MGM Grand convention center in Las Vegas this week. In between repeating the Active Power Management spiel over and over again to this mostly excellent technical audience, I was able to take some time to catch David William's rundown of the Run Book Automation market. RBA seems very related to Service Level Automation (SLAuto) in my book, so I wanted to see where the overlap really is and isn't.

David's presentation was excellent, in that it provided a concise overview of what RBA is and isn't--think process automation for IT operations processes--and where both vendors and IT organizations are in the definition, planning and implementation of RBA systems. Here are my notes from the session:
  • RBA=>Really just process automation for infrastructure and application management
  • RBA systems must be integrated into existing and new management infrastructures
  • Integration issues are more cultural than technical--IT organizations must be prepared to redefine operational boundaries to answer the question "Who owns what process?". (This will be a future blog topic, as it strikes me that this is exactly the issue that SLAuto implementers are struggling with.)
  • Early users were addressing Fault Correction/Issue Resolution/High Availability/DR type processes
  • Now RBA is predominantly adopted for Change and Configuration Management, with Fault Correction a somewhat distant second. The reason is its easier to actually see the effects of Change/Config Management process automation than Fault Correction automation, especially if there are still human steps in the FC processes.
  • BPM must be considered a very separate system from RBA. RBA is a very focused task set with different reporting and human interface requirements than BPM systems, which must be much more general and open to extension.
  • Good RBA systems should have process development and monitoring as separate user interfaces. Combining the two is not scalable.
  • Monitoring should provide not only current state, but also estimates for when a process will complete
  • IT organizations are overwhelmingly looking at their current IT infrastructure partners to provide this function, not start-ups
  • RBA implementation is not an emergency yet, as the tools need time to mature and IT organizations need time to handle the cultural "homework" required for a successful implementation
  • Of the audience members with voting machines, 39% had no plans to implement RBA, while 21% had plans for 2008. The others either already had some RBA or were between evaluation and implementation now.

If you are at the conference, stop by the Cassatt booth tonight or Thursday and introduce yourself. If not, I'll try to give an update on a couple of other sessions I attended in the next day or two.

Sunday, November 25, 2007

Beating the Utility Computing Lockdown, Part 3

Sorry for the delay, folks, but the holidays called...

I promised to go over the options that one has when considering how to evolve from typical statically managed server environments to a utility computing model. I've thought a lot about this, and I see essentially two options:

  1. Deployment directly into a third party capacity utility
  2. Adoption of utility computing technologies in your own data center

As a quick refresher to the other two parts of this series, I want to note that this is not an easy decision, by any means. Each approach has advantages for some classes of applications/services, and disadvantages to others.

For those starting out from scratch, with no data center resources of their own to depreciate, option 1 probably sounds like the best option. If you can get the service levels you want without buying the servers necessary to run them--which leads to needing people to operate them, which leads to management systems to coordinate the people, and so on--and you can get those service levels at a cost that beats owning your own infrastructure, then by all means take a look at managed hosting providers, such as Amazon (yeah, I'm starting to treat them as a special case of this category), Rackspace, etc. Most of the biggies are offering some sort of "capacity on demand" model, although most (though not all) are focused on giving you access to servers which you have to provision and operate manually.

Just be aware that when you choose your vendor, you choose your poison. The lock-in issues I have described in my previous posts are very real, and can end up being very costly. Be aware that there are no standards for server payload, application or data portability between different vendors of utility computing services. Once you buy in to your capacity choice, factor in that a failure to deliver service on that vendor's part may result in a costly redeployment and testing of your entire stack at your expense!

For this reason, I think anyone with an existing IT infrastructure that is interested in gaining the benefits of capacity as a utility should start with option 2. I also think option 2 applies to "green field" build-outs with big security and privacy concerns. This approach has the following benefits for such organizations (assuming you choose the right platform):

  • Existing infrastructure can be utilized to deliver the utility. Little or no additional hardware is required.
  • Applications can be run unmodified, though you may need to address minor start up and shutdown scripting issues when you capture your software images.
  • Projects can be converted one or two at a time, allowing iterative approaches to addressing technical and cultural issues as they arise. (Don't minimize the cultural issues--utility computing touches every aspect of your IT organization.)
  • Data remains on your premises, allowing existing security and privacy policies to work with minimal changes.
  • Anyone with a reasonable background in system administration, software deployment and/or enterprise architecture can get the ball rolling.

I've been personally involved in a few of these projects in the last couple of years, and I can tell you that the work to move an application to Amazon and then build the infrastructure to monitor and automate management of those applications is at least as much as it ends up taking to convert ones own infrastructure to a platform that already provides that monitoring and automation. You may sound cool at the water cooler talking about EC2 and S3, but you've done little to actually reduce the operations costs of a complex software environment.

If you are intimidated now by the amount of work and thought that must go into addressing utility computing, I don't blame you. Its not as easy as it sounds. Don't let any vendor tell you otherwise. However, there are ways to ease into the effort.

One way is to find a problem that you must address immediately in your existing environment with a quick ROI, and address that problem with a solution that introduces some basic utility computing concepts. One of these, perhaps the most impressive financially today, is power. Others have covered the economics here in depth, but let me just note that applying automated management policies to server power is a no brainer in a cyclical usage environment. Dev/test labs, grid computing farms and large web application environments are excellent candidates for turning off unneeded capacity without killing the availability of those applications.

I realize it might sound like I'm tooting Cassatt's horn here, but I am telling you as a field technologist with real experience trying to get utility computing going in some of the most dynamic and forward thinking data centers in the country, that this approach is a win-win for the CxOs of your company as well as the grunts on the ground. If you don't like power management as a starter approach, however, there are many others: data center migration, middleware license management, hardware fail over and disaster recovery are just a few that can show real ROI in the short term, while getting your IT department on the road to capacity as a utility today. All of which can be handled by a variety of vendors, though Cassatt certainly gives you one of the best paths directly from a starting approach to a complete capacity as a utility platform.

One final note for those who may think I've ignored multiple options for third party utility computing besides "HaaS" (Hardware as a Service) vendors. I realize that moving into SaaS, FaaS, PaaS, or WaaS (Whatever as a Service) can give you many advantages over owning your own infrastructure as well, and I certainly applaud those that find ways to trim cost while increasing service through these approaches.

However, the vendor lock-in story is equally as sticky in these cases, especially when it comes to the extremely valuable data generated by SaaS applications. Just be sure to push any vendor you select to support standards for porting that data/service/application/whatever to another provider if required. They won't like it, but if enough prospective customers balk at lock-in, they'll find innovative ways to assure your continued ownership of your data, probably while still making it more expensive for you to move than stay put. Still, that's better than not having any control over your data at all...

Tuesday, November 06, 2007

Beating the Utility Computing Lockdown, Part 2

Well, not long after I posted part 1 of this series, Bert noted that he agreed with my assessment of lock-in, then preceded to note how his (competitive to my employer's) grid platform was the answer.

Now, Bert is just having fun cross promoting on a blog with ties to a competitor, but I think its only fair to note that no one has a platform that avoids vendor lock-in in utility computing today. The best that someone like 3TERA (or even Cassatt) can do is give you some leverage between the organizations that are utilizing their platform; however, to get the portability he speaks of, you have to lock your servers, (and possibly load balancers, storage, etc-etc-etc) into that platform. (Besides, as I understand it, 3TERA is really only portable at the "data center" level, not the individual server level. I suppose you could define a bunch of really small "data centers" for each application component, but in a SOA world, that just seems cumbersome to me.)

Again, what is needed is a truly open, portable, ubiquitous standard for defining virtual "components" and their operation level configurations that can be ported and run between a wide variety of virtualization, hardware and automation platforms. (Bert, I've been working on Cassatt--are you willing to push 3TERA to submit, cooperate on and/or agree to such a standard in the near future?) As I said once before, I believe the file system is the perfect place to start, as you can always PXE boot a properly defined image on any compatible physical or virtual machine, regardless of the vendor. (This is true for every platform except for Windows--c'mon Redmond, get with the program!) However, I think the community will have the final say here, and the Open Virtual Format is a hell of a start. (It still lacks any tracking of operation level configurations, such as "safe" CPU and memory utilization thresholds, SNMP traps to monitor for heartbeats, etc.)

Unfortunately, those standards aren't baked yet. So, here's what you can do today to avoid vendor lock-in with a capacity provider tomorrow. Begin with a utility computing platform that you can use in your existing environment today. Ideally, that platform:
  1. Does not require you to modify the execution stack of your application and server images (e.g.
    • no agentry of any kind that isn't already baked into the OS,
    • no requirement to run on virtualization if that isn't appropriate or cost effective,
  2. Uses a server/application/whatever imaging format that is open enough to "uncapture" or translate to a different format by hand if necessary--again, I like our approach of just capturing a sample server file system and "generalizing" it for replication as needed. It's reversible, if you know your OS well.)
  3. Is supported by a community or business that is committed to supporting open standards wherever appropriate and will provide a transition path form any proprietary approach to the open approach when it is available.

I used to be concerned that customers would ask why they should convert their own infrastructure into a utility (if it was their goal to use utility computing technology to reduce their infrastructure footprint). I now feel comfortable that the answer is simply because there is no safe alternative for large enterprises at this time. Leave alone the issue of security (e.g. can you trust your most sensitive data to S3), and the fact that there is little or no automation available to actually reduce your cost of operations in such an environment, there are many risks to consider with respect to how deeply you are willing to commit to a nascent marketplace today.

I encourage all of you to get started with the basic concepts of utility computing. I want to talk next about ways to cost justify this activity with your business, and talk little about the relationship between utility computing and data center efficiency.

Monday, November 05, 2007

Beating the Utility Computing Lockdown

If you haven't seen it yet, there is an interesting little commotion going on in the utility computing blogosphere. Robert X. Cringley and Nick Carr, with the help of Ashley Vance at The Register, are having fun picking apart the announcement that Google is contributing to the MySQL open source project. Cringley started the fun with a conspiracy theory that I think holds some weight, though--as the others point out--perhaps not a literally as he states it. In my opinion, Cringley, Carr and Vance accurately raise the question, "will you get locked into your choice of utility computing capacity vendor, whether you like it or not?"

I've discussed my concerns about vendor lock in before, but I think its becoming increasingly clear that the early capacity vendors are out to lock you in to their solution as quickly and completely as possible. And I'm not just talking about pure server capacity (aka "HaaS") vendors, such as Amazon or the bevy of managed hosting providers that have announced "utility computing" solutions lately. I'm talking about SaaS vendors, such as Salesforce.com, and PaaS vendors such as Ning.

Why is this a problem? I mean, after all, these companies are putting tremendous amounts of money into building the software and datacenter platforms necessary to deliver the utility computing vision. The problem, quite frankly, is that while lock-in can increase the profitability of the service provider, it is not always as beneficial for the customer. I'm not one to necessarily push the mantra "everything should be commodity", but I do believe strongly that no one vendor will get it entirely right, and no one customer will always choose the right vendor for them the first time out.

With regards to vendor lock-in and "openness", Ning is an interesting case in point; I noticed with interest last week Marc Andreesen's announcements regarding Ning and the Open Social API. First, let me get on the record as saying that Open Social is a very cool integration standard. A killer app is going to come out of social networking platforms, and Open Social will allow the lucky innovator to spread the cheer across all participating networks and network platforms. That being said, however, note that Marc announced nothing about sharing data across platforms. In social networking, the data is what keeps you on the platform, not the executables.

(Maybe I'm an old fogey now, but I think the reason I've never latched on to Facebook or MySpace is because I started with LinkedIn many years ago, and I though most of my contacts are professional, quite a few of my personal contacts are also captured there. Why start over somewhere else?)

In the HaaS world, software payloads (including required data) are the most valuable components to the consumer of capacity. As most HaaS vendors do little (or nothing) to ease the effort it takes to provision a server with the appropriate OS, your applications, data, any utilities or tools you want available, security software, etc. So there is little incentive for the HaaS world to ease transition between vendors until a critical mass is reached where the pressure to commoditize breaks the lock-in barrier. All of the "savings" purported by these vendors will be limited to what they can save you over hosting it yourself in your existing environment.

Saas also has data portability issues, which have been well documented elsewhere. Most companies that have purchased ERP and CRM services online have seen this eventuality, though most if not all have yet to feel that pain.

Where am I going with all this? I want to reiterate my call for both server and data level portability standards in the utility computing world, with a target of avoiding the pain to customers that lock-in can create. I want the expense of choosing a capacity or application vendor to be the time it takes to research them, compare competitors and sign up for the service. If I have to completely re-provision my IT environment to change vendors, then that becomes the overwhelming costs, and I will never be able to move.

Truth is, open standards don't guarantee that users will flee one environment for another at the drop of a hat. Look at SQL as an example. When I worked for Forte Software many years ago, we had the ability to swap back end RDBMS vendors without changing code long before JDBC or Hybernate. The funny thing is, in six years of working with that product, not one customer changed databases just because the other guy was cheaper. I grant you that there were other costs to consider, but I really believe that the best vendors with the best service at the right price for that service will keep loyal customers whether or not they implement lock-in features.

For HaaS needs, there are alternatives to going out of house for cheap capacity. Most notably, virtualization and automation with the right platforms could let you get those 10 cents/CPU-hour rates with the datacenter you already own. The secret is to use capital equipment more effectively and efficiently while reducing the operations expenses required to keep that equipment running. In other words, if you worry about how you will maintain control over your own data and applications in a HaaS/SaaS world, turn your own infrastructure into a SaaS.

That's not to say I never see a value for Amazon, Google, et al. Rather, I think the market should approach their offerings with caution, making sure that the time and expense it takes to build their business technology platforms is not repeated when their capacity partners fail to deliver. Once portability technologies are common and supported broadly, then the time will come to rapidly shut down "private" corporate datacenters and move capacity to the computing "grid". More on this process later.