Tuesday, December 18, 2007

Some more skepticism about Amazon WS as a business plan

I've been interested in Don McAskill's review of Amazon's SimpleDB in light of SmugMug's future plans. He is very positive that he can use this service for what it is intended for; infinitely scalable storage and retrieval of small, structured data sets. His post is a good one if you want to get a clearer idea of what you can and shouldn't do with SimpleDB.

However, I worry for Don. As a growing number of voices have been pointing out, committing your business growth to Amazon, especially as a startup, may not be a great thing. Kevin Burton, founder and CEO of spinn3r and Tailrank, notes that this depends on what your processing and bandwidth profiles are, but there are a large number of services that would do better to buy capacity from, say, a traditional managed hosting facility.

Burton uses the term "vendor lock-in" a few times, which certainly echos comments that Simon and I have been making recently. But Burton brings up an additional point about bandwidth costs that I think have to be carefully considered before you jump on the Amazon-as-professional-savior bandwagon. He notes that for his bandwidth intensive business, Amazon would cost 3X what it currently costs spinn3r to access the net.

Burton goes on to suggest an alternative that he would love to see happen: bare metal capacity as a service. Similar to managed hosting, the idea would be for the system vendors to lease systems for a cost somewhat above what it would take to buy the system, but broken down over 2-3 years. Since the credit worthiness of most startups is an issue, lease default concerns can be mitigated by keeping the systems on the vendor's premises. Failure to pay would result in blocked access to the systems, for both the customer and their customers.

I like this concept as a hybrid between the "cloud" concepts and traditional server ownership. Startups can get the capacity they need without committing capital that could be used to hire expertise instead. On the negative side, however, this does nothing to reduce operational costs at the server levels, other than eliminating rack/stack costs. And Burton says nothing about how such an operation would charge for bandwidth, one of his key concerns about Amazon.

There have been a few other voices that have countered Kevin, and I think they should definitely be heard as this debate grows. Jay at thecapacity points out the following:
[B]usiness necessitates an alternate reality and if expediency, simplicity and accuracy mean vendor constraint, so be it.
I agree with this, but I think that it is critical that businesses choose to be locked in with open eyes, and a "disaster recovery" plan should something go horribly wrong. Remember, it wasn't that long ago that Amazon lost a few servers accidentally.

(Jay seems to agree with this, as he ends his post with:
When companies talk about outsourcing these components, or letting a vendor’s software product dictate their business & IT processes… I always check to make sure my lightsaber is close.
This is in reference to Marc Hedlund's post, “Jedi’s build their own lightsabers”.)

Nitin Borwankar, a strong proponent of Amazon SimpleDB commented on Kevin's post that SimpleDB is a long tail play, and that the head of the data world would probably want to run on their own servers. This is an incredibly interesting statement, as it seems to suggest that even though SimpleDB scales almost infinitely from a technical perspective, it doesn't so much from a business prospective.

On a side note, its been a while since I spoke about complexity theory and computing, but let me just say that this tension between "Get'r done" and "ye kanna take our freedom!" is exactly the kind of tension what you want in a complex system. As long as utility/cloud computing stays at the phase change between these two needs, we will see fabulous innovation that allows computing technologies to remain a vibrant and ever innovating ecosphere.

I love it.

Friday, December 14, 2007

Lessons in schitzophrenia from Sun's customers

From Don McAskill (via Robert Scoble) and Jonathon Schwartz comes fascinating insight into the complete dichotomy that is the IT world today. What I find especially interesting is the range of personalities ranging from the paranoid (e.g. "anything Sun does in the Intel/Linux/Windows space is bad for SPARC/Solaris") to the zealot (e.g. "screw what got you here, open everything up and do it low/zero margin").

I'm not surprised that the CTO audience (as represented by McAskill) was more eager to push Sun into new technologies than the CIOs. First, Sun has always been a company by engineers, for engineers, to engineer. Their sales success has come from selling to a technical audience, not a business audience. (Contrast this with IBM, or even Microsoft at the department level.) Second, CIOs are always struggling to keep up with the cost of implementing new technologies, while CTOs are being pushed to discover and implement them--in part to keep their own technical staff's skills relevant in the modern marketplace.

That's not to say that CIOs aren't technology conscious or CTOs don't care about the bottom line--I grossly exaggerated to make a point, after all--but the tendencies indicated by Jonathon aren't surprising in this light.

What I find especially fascinating, however, is that even though both the business and technical cases are made for utility/cloud computing, its the grunts that are blocking implementation in even the most forward thinking data center. Again, utility computing touches everything and everyone, and that is scary as hell, even to a hard core techie.

Thursday, December 13, 2007

"The techno-utility complex" and cloud lock-in

OK, so in the process of commenting on Nick Carr's post, "The techno-utility complex", I came up with a term I like: cloud lock-in. This goes to my earlier conversation about vendor lock-in in the capacity on demand world--aka cloud computing. I like the term because it reinforces the truth: there is no single compute cloud, and the early leaders in the space don't want there to be one. Rather, they are hell bent on collecting your dimes every hour, and making it damn expensive for you to move your revenue stream elsewhere.

My advice stands: if you are greenfield, and data security and access are less of an issue for you, go for EC2, the managed hosting "cloud bank", or the coming offerings from Google or Microsoft. However, if you want to take a more conservative approach towards gaining the economic benefits of utility computing, make your own cloud first.

Wednesday, December 12, 2007

Software fluidity and system security

I came across this fascinating conversation tonight between Rich Miller (whom I've exchanged blogs with before) and Greg Ness regarding the intense relationship between network integrity, system security and VM-based software fluidity. What caught my attention about this conversation is the depth at which Rich, Greg and others have thought about this branch of system security--what Greg refers to as Virtsec. I learned a lot by reading both authors' posts.

I know nothing about network security or virtualization security, frankly, but I know a little about network virtualization and the issues that users have in replicating secure computing architectures across pooled computing resources. Rich makes a comment in the discussion that I want to comment on from a purely philosophical point of view:

Consider this: It's not only network security, but also network integrity that must be maintained when supporting the group migration of VMs. If one wants to move an N-tier application using VMware's VMotion, one wants a management framework that permits movement only when the requirements of the VM "flock" making up the application are met by the network that underpins the secondary (destination) location. By that, I mean:

  • First, the assemblage of VMs need to arrive intact.

    If, because of a change in the underpinning network, a migration "flight plan" no longer results in a successful move by all the piece parts, that's trouble. If disaster strikes, you don't want to find that out when invoking the data center's business continuity procedure. All the VMs that take off from the primary location, need to land at the secondary.

  • Second, the assemblage's internal connections as well as connections external to the "flock" must continue to be as resilient in their new location as they were in their original home.

    If the use of VMotion for an N-tier application results in the a new instance of the application that ostensibly runs as intended, but is susceptible to an undetected, single point of network failure in its new environment, someone in the IT group's network management team will be looking for a new job.
Here is exactly where I believe application architectures are suddenly critical to the problem of software fluidity. In a well contained multi-tier application (a very turn-of-the-millennium concept) it is valid to consider the migration of the "flock" as a network integrity problem. However, when it comes to the modern world of SOA, BPM and application virtualization, suddenly application integrity becomes a dynamic discovery issue which is only partly dependent on network access.

In other words, I believe most modern enterprise software systems can't rely on the "infrastructure" to keep their components running when they are moved around the cloud. Its not good enough to say "gee, if I get these running on one set of VMs, I shouldn't have to worry about what happens if those VMs get moved". Rich hints strongly at understanding this, so I don't mean to accuse him of "not getting it". However, I wonder what Replicate Technologies is prepared to tell their clients about how they need to review their application architectures to work in such a highly dynamic environment. I'd love to hear it from Rich.

Also, from Greg, I'd be interested in knowing if he's thought beyond the effects on network security of virtsec to the effects on application security. At the very least, I think an increasing dependency on dynamic discovery of required resources (e.g. services, data sources, EAI etc.) means an increased need for virtsec to be application aware as well as network aware. I apologize if I'm missing a virtsec 101 concept here, as I haven't yet read all that Greg has written about the subject, but I'm disturbed that the little I've read so far seems to assume that VMs can be managed purely as servers, a common mistake when considering end-to-end Service Level Automation (SLAuto) needs.

I intend to keep one eye on this discussion, as virtsec is clearly a key element of software fluidity in a SOA/BPM/VM world. It is even more critical in a true utility/cloud computing world, where your software would ideally move undetected between capacity offered by entirely disparate vendors. Its no good chasing the cheapest capacity in the world if it costs you your reputation as a security conscious IT organization.

By the way, Rich, I dropped the virtual football after your last post in our earlier conversation...I just found a blog entry I wrote about 6 months ago during that early discussion about VM portability standards that I never posted. Aaaargh... My apologies, because I liked what you were saying, though I was concerned about non-VM portability and application awareness at the time as well. I continue to follow your work.

Monday, December 10, 2007

User Experience and Fluidity (er, Patration...)

First, thanks to Simon in reminding me about his seminal post defining a new key industry term, patration, to define what I call software fluidity. Is he serious? Only your use of the term in every day life will tell... [Insert cheezy smiley face here.]

Second, if you haven't run across it yet, check out the debate between Robert Scoble/Nick Carr and Michael Krigsman/the Enterprise Irregulars about the need for enterprise software vendors to learn from the "drive to sexiness" of consumer software. My personal opinion? I've worked in enterprise software for years, and I still don't understand why engineers take no pride in making something amazing to install, learn and use. All the effort goes into command line tools and cool functions, little goes into human experience. How has Apple remained relevant all of these years? A focus on sexiness, without losing sight of functionality.

I agree with Nick, sexiness and functionality/stability are not mutually exclusive--except in the eyes of most enterprise software vendors...

Wednesday, December 05, 2007

Oracle makes DBs fluid(?)

Well, Oracle is making a play at making databases portable via virtualization. This was a problem in the pure VMWare world, as no one was comfortable with running their production databases in a VM. I'm not saying that is instantly solved by any means, but "certified by Oracle" is a hell of pitch...


How fluid is your software?

I come from a software development background, and I can never quite get the itch to build the perfect architecture out of my system. That's partly why it is so hard for me to blog about power, even though it is an absolutely legitimate topic, and a problem that needs to be attacked from many fronts. However, power is not a software issue, it is a hardware and facilities issue, and my heart just isn't there when it comes to pontificating.

All is not lost, however, as Cassatt still plays in the utility computing infrastructure world, and
I get plenty of exposure to dynamic provisioning, service level automation (SLAuto) and the future of capacity on demand. And to that end, I've been giving a lot of thought to the question of what, if any, software architecture decisions should be made with utility computing in mind.

While a good infrastructure platform won't require wholesale changes to your software architecture (and none at all, if you are willing to live with consequences that will become obvious later in this discussion), the very concept of making software mobile--in the changing capacity sense, not the wireless device sense--must lead all software engineers and architects to contemplate what happens when their applications are moved from one server to another, or even one capacity provider to another. There are a myriad of issues to be considered, and I aim to cover just a few of them here.

The term I want to introduce to describe the ability of application components to migrate easily is "fluidity". The definition of the term fluidity includes "the ability of a substance to flow", and I don't think its much of a stretch to apply the term to software deployments. We talk about static and dynamic deployments today, and a fluid software system is simply one that can be moved easily without breaking the functionality of the system.

An ideally fluid system, in my opinion, would be one that could be moved in its entirety or in pieces from one provider to another without interruption. As far as I know, nobody does this. (3TERA claims they can move a data center, but as I understand it you must stop the applications to execute the move.) However, for purely academic reasons, let's analyze what it would take to do this:

  1. Software must be decoupled from physical hardware. There are currently two ways to do this that I know of:
    • Run the application on a virtual server platform
    • Boot the file system dynamically (via PXE or similar) on the appropriate capacity
  2. Software must loosely coupled from "external" dependencies. This means all software must be deployable without hard coded reference to "external" systems on which it is dependent. External systems could be other software processes on the same box, but the most critical elements to manage here are software processes running on other servers, such as services, data conduits, BPMs, etc.
  3. Software must always be able to find "external" dependencies. Loose coupling, as most of you know, is sometimes easier said than done, especially in a networked environment. Critical here is that the software can locate, access and negotiate communication with the external dependencies. Service registries, DNS and CMDB systems are all tools that can be used to help systems maintain or reestablish contact with "external" dependencies.
  4. System management and monitoring must "travel" with the software. Its not appropriate for a fluid environment to become a Schrodinger's box, where the state of the system becomes unknown until you can reestablish measurement of its function. I think this may be one of the hardest requirements to meet, but at first blush I see two approaches:
    • Keep management systems aware of how to locate and monitor systems as they move from one system to another.
    • Allow monitoring systems to dynamically "rediscover" systems when they move, if necessary. (Systems maintaining the same IP address, for instance, may not need to be rediscovered.)

This is just a really rough first cut of this stuff, but I wanted to put this out there partly to keep writing, and partly to get feedback from those of you with insights (or "incites") into the concept of software fluidity.

In future posts I'll try to cover what products, services and (perhaps most importantly) standards are important to software fluidity today. I also want to explore whether "standard" SOA and BPM architectures actually allow for fluidity. I suspect they generally do, but I would not be suprised to find some interesting implications when moving from static SOA to fluid SOA, for instance.

Respond. Let me know what you think.

Off-topic: Its official, rock beats paper and scissors

Having some fun with Google Trends, I ran the following comparison:

which yielded the following result:

where blue is rock, red is paper and yellow is scissors.
As I read this, rock has consistently beat paper and scissors for four years running.
Of course, my assertion is based on Google's observation that Trends can predict the future.

Wednesday, November 28, 2007

Run Book Automation and SLAuto

I am attending the Gartner Data Center Conference at the MGM Grand convention center in Las Vegas this week. In between repeating the Active Power Management spiel over and over again to this mostly excellent technical audience, I was able to take some time to catch David William's rundown of the Run Book Automation market. RBA seems very related to Service Level Automation (SLAuto) in my book, so I wanted to see where the overlap really is and isn't.

David's presentation was excellent, in that it provided a concise overview of what RBA is and isn't--think process automation for IT operations processes--and where both vendors and IT organizations are in the definition, planning and implementation of RBA systems. Here are my notes from the session:
  • RBA=>Really just process automation for infrastructure and application management
  • RBA systems must be integrated into existing and new management infrastructures
  • Integration issues are more cultural than technical--IT organizations must be prepared to redefine operational boundaries to answer the question "Who owns what process?". (This will be a future blog topic, as it strikes me that this is exactly the issue that SLAuto implementers are struggling with.)
  • Early users were addressing Fault Correction/Issue Resolution/High Availability/DR type processes
  • Now RBA is predominantly adopted for Change and Configuration Management, with Fault Correction a somewhat distant second. The reason is its easier to actually see the effects of Change/Config Management process automation than Fault Correction automation, especially if there are still human steps in the FC processes.
  • BPM must be considered a very separate system from RBA. RBA is a very focused task set with different reporting and human interface requirements than BPM systems, which must be much more general and open to extension.
  • Good RBA systems should have process development and monitoring as separate user interfaces. Combining the two is not scalable.
  • Monitoring should provide not only current state, but also estimates for when a process will complete
  • IT organizations are overwhelmingly looking at their current IT infrastructure partners to provide this function, not start-ups
  • RBA implementation is not an emergency yet, as the tools need time to mature and IT organizations need time to handle the cultural "homework" required for a successful implementation
  • Of the audience members with voting machines, 39% had no plans to implement RBA, while 21% had plans for 2008. The others either already had some RBA or were between evaluation and implementation now.

If you are at the conference, stop by the Cassatt booth tonight or Thursday and introduce yourself. If not, I'll try to give an update on a couple of other sessions I attended in the next day or two.

Sunday, November 25, 2007

Beating the Utility Computing Lockdown, Part 3

Sorry for the delay, folks, but the holidays called...

I promised to go over the options that one has when considering how to evolve from typical statically managed server environments to a utility computing model. I've thought a lot about this, and I see essentially two options:

  1. Deployment directly into a third party capacity utility
  2. Adoption of utility computing technologies in your own data center

As a quick refresher to the other two parts of this series, I want to note that this is not an easy decision, by any means. Each approach has advantages for some classes of applications/services, and disadvantages to others.

For those starting out from scratch, with no data center resources of their own to depreciate, option 1 probably sounds like the best option. If you can get the service levels you want without buying the servers necessary to run them--which leads to needing people to operate them, which leads to management systems to coordinate the people, and so on--and you can get those service levels at a cost that beats owning your own infrastructure, then by all means take a look at managed hosting providers, such as Amazon (yeah, I'm starting to treat them as a special case of this category), Rackspace, etc. Most of the biggies are offering some sort of "capacity on demand" model, although most (though not all) are focused on giving you access to servers which you have to provision and operate manually.

Just be aware that when you choose your vendor, you choose your poison. The lock-in issues I have described in my previous posts are very real, and can end up being very costly. Be aware that there are no standards for server payload, application or data portability between different vendors of utility computing services. Once you buy in to your capacity choice, factor in that a failure to deliver service on that vendor's part may result in a costly redeployment and testing of your entire stack at your expense!

For this reason, I think anyone with an existing IT infrastructure that is interested in gaining the benefits of capacity as a utility should start with option 2. I also think option 2 applies to "green field" build-outs with big security and privacy concerns. This approach has the following benefits for such organizations (assuming you choose the right platform):

  • Existing infrastructure can be utilized to deliver the utility. Little or no additional hardware is required.
  • Applications can be run unmodified, though you may need to address minor start up and shutdown scripting issues when you capture your software images.
  • Projects can be converted one or two at a time, allowing iterative approaches to addressing technical and cultural issues as they arise. (Don't minimize the cultural issues--utility computing touches every aspect of your IT organization.)
  • Data remains on your premises, allowing existing security and privacy policies to work with minimal changes.
  • Anyone with a reasonable background in system administration, software deployment and/or enterprise architecture can get the ball rolling.

I've been personally involved in a few of these projects in the last couple of years, and I can tell you that the work to move an application to Amazon and then build the infrastructure to monitor and automate management of those applications is at least as much as it ends up taking to convert ones own infrastructure to a platform that already provides that monitoring and automation. You may sound cool at the water cooler talking about EC2 and S3, but you've done little to actually reduce the operations costs of a complex software environment.

If you are intimidated now by the amount of work and thought that must go into addressing utility computing, I don't blame you. Its not as easy as it sounds. Don't let any vendor tell you otherwise. However, there are ways to ease into the effort.

One way is to find a problem that you must address immediately in your existing environment with a quick ROI, and address that problem with a solution that introduces some basic utility computing concepts. One of these, perhaps the most impressive financially today, is power. Others have covered the economics here in depth, but let me just note that applying automated management policies to server power is a no brainer in a cyclical usage environment. Dev/test labs, grid computing farms and large web application environments are excellent candidates for turning off unneeded capacity without killing the availability of those applications.

I realize it might sound like I'm tooting Cassatt's horn here, but I am telling you as a field technologist with real experience trying to get utility computing going in some of the most dynamic and forward thinking data centers in the country, that this approach is a win-win for the CxOs of your company as well as the grunts on the ground. If you don't like power management as a starter approach, however, there are many others: data center migration, middleware license management, hardware fail over and disaster recovery are just a few that can show real ROI in the short term, while getting your IT department on the road to capacity as a utility today. All of which can be handled by a variety of vendors, though Cassatt certainly gives you one of the best paths directly from a starting approach to a complete capacity as a utility platform.

One final note for those who may think I've ignored multiple options for third party utility computing besides "HaaS" (Hardware as a Service) vendors. I realize that moving into SaaS, FaaS, PaaS, or WaaS (Whatever as a Service) can give you many advantages over owning your own infrastructure as well, and I certainly applaud those that find ways to trim cost while increasing service through these approaches.

However, the vendor lock-in story is equally as sticky in these cases, especially when it comes to the extremely valuable data generated by SaaS applications. Just be sure to push any vendor you select to support standards for porting that data/service/application/whatever to another provider if required. They won't like it, but if enough prospective customers balk at lock-in, they'll find innovative ways to assure your continued ownership of your data, probably while still making it more expensive for you to move than stay put. Still, that's better than not having any control over your data at all...

Tuesday, November 06, 2007

Beating the Utility Computing Lockdown, Part 2

Well, not long after I posted part 1 of this series, Bert noted that he agreed with my assessment of lock-in, then preceded to note how his (competitive to my employer's) grid platform was the answer.

Now, Bert is just having fun cross promoting on a blog with ties to a competitor, but I think its only fair to note that no one has a platform that avoids vendor lock-in in utility computing today. The best that someone like 3TERA (or even Cassatt) can do is give you some leverage between the organizations that are utilizing their platform; however, to get the portability he speaks of, you have to lock your servers, (and possibly load balancers, storage, etc-etc-etc) into that platform. (Besides, as I understand it, 3TERA is really only portable at the "data center" level, not the individual server level. I suppose you could define a bunch of really small "data centers" for each application component, but in a SOA world, that just seems cumbersome to me.)

Again, what is needed is a truly open, portable, ubiquitous standard for defining virtual "components" and their operation level configurations that can be ported and run between a wide variety of virtualization, hardware and automation platforms. (Bert, I've been working on Cassatt--are you willing to push 3TERA to submit, cooperate on and/or agree to such a standard in the near future?) As I said once before, I believe the file system is the perfect place to start, as you can always PXE boot a properly defined image on any compatible physical or virtual machine, regardless of the vendor. (This is true for every platform except for Windows--c'mon Redmond, get with the program!) However, I think the community will have the final say here, and the Open Virtual Format is a hell of a start. (It still lacks any tracking of operation level configurations, such as "safe" CPU and memory utilization thresholds, SNMP traps to monitor for heartbeats, etc.)

Unfortunately, those standards aren't baked yet. So, here's what you can do today to avoid vendor lock-in with a capacity provider tomorrow. Begin with a utility computing platform that you can use in your existing environment today. Ideally, that platform:
  1. Does not require you to modify the execution stack of your application and server images (e.g.
    • no agentry of any kind that isn't already baked into the OS,
    • no requirement to run on virtualization if that isn't appropriate or cost effective,
  2. Uses a server/application/whatever imaging format that is open enough to "uncapture" or translate to a different format by hand if necessary--again, I like our approach of just capturing a sample server file system and "generalizing" it for replication as needed. It's reversible, if you know your OS well.)
  3. Is supported by a community or business that is committed to supporting open standards wherever appropriate and will provide a transition path form any proprietary approach to the open approach when it is available.

I used to be concerned that customers would ask why they should convert their own infrastructure into a utility (if it was their goal to use utility computing technology to reduce their infrastructure footprint). I now feel comfortable that the answer is simply because there is no safe alternative for large enterprises at this time. Leave alone the issue of security (e.g. can you trust your most sensitive data to S3), and the fact that there is little or no automation available to actually reduce your cost of operations in such an environment, there are many risks to consider with respect to how deeply you are willing to commit to a nascent marketplace today.

I encourage all of you to get started with the basic concepts of utility computing. I want to talk next about ways to cost justify this activity with your business, and talk little about the relationship between utility computing and data center efficiency.

Monday, November 05, 2007

Beating the Utility Computing Lockdown

If you haven't seen it yet, there is an interesting little commotion going on in the utility computing blogosphere. Robert X. Cringley and Nick Carr, with the help of Ashley Vance at The Register, are having fun picking apart the announcement that Google is contributing to the MySQL open source project. Cringley started the fun with a conspiracy theory that I think holds some weight, though--as the others point out--perhaps not a literally as he states it. In my opinion, Cringley, Carr and Vance accurately raise the question, "will you get locked into your choice of utility computing capacity vendor, whether you like it or not?"

I've discussed my concerns about vendor lock in before, but I think its becoming increasingly clear that the early capacity vendors are out to lock you in to their solution as quickly and completely as possible. And I'm not just talking about pure server capacity (aka "HaaS") vendors, such as Amazon or the bevy of managed hosting providers that have announced "utility computing" solutions lately. I'm talking about SaaS vendors, such as Salesforce.com, and PaaS vendors such as Ning.

Why is this a problem? I mean, after all, these companies are putting tremendous amounts of money into building the software and datacenter platforms necessary to deliver the utility computing vision. The problem, quite frankly, is that while lock-in can increase the profitability of the service provider, it is not always as beneficial for the customer. I'm not one to necessarily push the mantra "everything should be commodity", but I do believe strongly that no one vendor will get it entirely right, and no one customer will always choose the right vendor for them the first time out.

With regards to vendor lock-in and "openness", Ning is an interesting case in point; I noticed with interest last week Marc Andreesen's announcements regarding Ning and the Open Social API. First, let me get on the record as saying that Open Social is a very cool integration standard. A killer app is going to come out of social networking platforms, and Open Social will allow the lucky innovator to spread the cheer across all participating networks and network platforms. That being said, however, note that Marc announced nothing about sharing data across platforms. In social networking, the data is what keeps you on the platform, not the executables.

(Maybe I'm an old fogey now, but I think the reason I've never latched on to Facebook or MySpace is because I started with LinkedIn many years ago, and I though most of my contacts are professional, quite a few of my personal contacts are also captured there. Why start over somewhere else?)

In the HaaS world, software payloads (including required data) are the most valuable components to the consumer of capacity. As most HaaS vendors do little (or nothing) to ease the effort it takes to provision a server with the appropriate OS, your applications, data, any utilities or tools you want available, security software, etc. So there is little incentive for the HaaS world to ease transition between vendors until a critical mass is reached where the pressure to commoditize breaks the lock-in barrier. All of the "savings" purported by these vendors will be limited to what they can save you over hosting it yourself in your existing environment.

Saas also has data portability issues, which have been well documented elsewhere. Most companies that have purchased ERP and CRM services online have seen this eventuality, though most if not all have yet to feel that pain.

Where am I going with all this? I want to reiterate my call for both server and data level portability standards in the utility computing world, with a target of avoiding the pain to customers that lock-in can create. I want the expense of choosing a capacity or application vendor to be the time it takes to research them, compare competitors and sign up for the service. If I have to completely re-provision my IT environment to change vendors, then that becomes the overwhelming costs, and I will never be able to move.

Truth is, open standards don't guarantee that users will flee one environment for another at the drop of a hat. Look at SQL as an example. When I worked for Forte Software many years ago, we had the ability to swap back end RDBMS vendors without changing code long before JDBC or Hybernate. The funny thing is, in six years of working with that product, not one customer changed databases just because the other guy was cheaper. I grant you that there were other costs to consider, but I really believe that the best vendors with the best service at the right price for that service will keep loyal customers whether or not they implement lock-in features.

For HaaS needs, there are alternatives to going out of house for cheap capacity. Most notably, virtualization and automation with the right platforms could let you get those 10 cents/CPU-hour rates with the datacenter you already own. The secret is to use capital equipment more effectively and efficiently while reducing the operations expenses required to keep that equipment running. In other words, if you worry about how you will maintain control over your own data and applications in a HaaS/SaaS world, turn your own infrastructure into a SaaS.

That's not to say I never see a value for Amazon, Google, et al. Rather, I think the market should approach their offerings with caution, making sure that the time and expense it takes to build their business technology platforms is not repeated when their capacity partners fail to deliver. Once portability technologies are common and supported broadly, then the time will come to rapidly shut down "private" corporate datacenters and move capacity to the computing "grid". More on this process later.

Monday, October 15, 2007

Is your software ready for utility computing?

I've been seeing more thoughts on the effect of utility computing on software architectures lately, and one very well stated argument comes from Alistair Croll, Vice President of Product Management and co-founder of Coradiant, a performance tool company. Though clearly self-serving, his message is simple: if you are going to pay by the cycle--or even just share cycles between applications--you'd better make sure your software takes as few cycles as possible to do its job well.

This is one of the unforeseen effects of "paying for what you use", and I have to say its an effect that should scare the heck out of most enterprise IT departments. Although I would argue part of that fear should come from the exposure of lousy coding in most custom applications, the worst part is the lack of control most organizations will have over the lousy coding in the packaged applications they purchased and installed. Suddenly, algorithms matter again in all phases of software development, not just computing intensive steps.

The worst offender here will probably the the user interface components: Java SWING, AJAX and even browser applications themselves. To the extent that these are hosted from centralized computing resources (and even most desktops fall into this category in the some visionaries' eyes), then the incredible amount of constant cycling, polling and unnecessary redrawing will be painfully obvious in the next 10 years or so.

I have always been a strong proponent for not over-engineering applications. If you can meet the business's ongoing service levels with an architecture that cost "just enough" to implement, you were golden in my book. However, utility computing changes the mathematics here significantly, and that key phrase of "meet the business's ongoing service levels" comes much more into play. Ongoing service levels now include optimizations to the cost of executing the software itself; something that could be masked in a underutilized, siloed-stack world.

The performance/optimization guys must be loving this, because they now have a product that should see immediate increase in demand. If you are building a new business application today, you had better be:
  1. Building for a service-based, highly distributed, utility infrastructure world, and
  2. Making sure your software is a cheap to run as possible.

Number 2 above itself implies a few key things. Your software had better be:

  • as standards based as possible--making it possible for any computing provider to successfully deploy, integrate and monitor your application;
  • as simple to install, migrate and upgrade remotely as possible--to allow for cheap deployment into a competitive computing market;
  • as efficient to execute as possible--each function should take as few cycles as possible to do its job

The cost dynamics will be interesting to note, especially their effects on the agile processes, SOA, and ITIL movements. I will keep a careful tab on this, and will share my ongoing thoughts in future posts.

Thursday, October 04, 2007

Links - 10/4/2007

A Classic Introduction to SOA (DanNorth.net): Thanks to Jack van Hoof, I was led to this brilliant article on modelling SOAs in business terms. (Check out the PDF, the graphics and layout make it an even more fun read.) Rather than spend a bunch of words "me too"-ing Jack and Dan, let me just say that this is exactly the technique I have always used to design service oriented architectures, ever since my days in the mid-90s designing early service oriented architectures at Forte Software.

Classic examples of where this led to better design were the frequent arguments that I would have with customers and partners about where to put the "hire" method in a distributed architecture. Most of the "object oriented architects" I worked with would immediately jump to the conclusion that the "hire" method should be on the Employee class. However, if you sat down and modelled the hiring process, the employee never hired himself or herself. What would happen is the hiring manager would send the information about the employee to the HR office, who would then receive more information, create a new employee file and declare the new employment to the tax authorities. Thus, the "hire" method needed to be on the HR service, with the call coming from the application (or service) initiating employment (i.e. the hiring manager in software form), passing the employee object (or a representation of that object) for processing.

Without exception, that approach led to better architectures than trying to map every method that had any relation to a class of objects directly on the class itself.

Twilight of the CIO (RoughType: Nicholas Carr): Man, Nick is in rare form now that his is back from his blogging hiatus. His thesis here is that, with the advent of technologies that can be more easily managed outside of IT, and with IT departments doing less R&D and more shepherding outsourced and SaaS infrastructure, the need for the CIO role is diminishing--which I react to with mixed feelings.

On the one hand, there is no doubt that small and mid-sized non-high-tech businesses are going to have less need for a voice representing technical infrastructure issues on the executive board. There will still need to be management (as the first comment to Nick's post alludes to), but they will be a lot like the facilities guy in most businesses today--simply shepherding the services hired by the business.

(Perhaps the "centralized/decentralized pendulum" is definitely shifting wildly, with decentralization this time actually resulting in business systems residing outside of IT entirely?)

On the other hand, I'm not seeing the "simplified" nature of technology happening yet in most mid- to large-sized businesses. Cassatt sells utility computing platform software--basically an operating system for your data center. Resources are pooled and distributed as needed to meet the businesses needs (as defined in SLAs assigned to software). We make it easy to cut tremendous amounts of waste, rigidity and manual labor out of the basic data centers. CIOs love this vision, and drive technical changes in the customers we work with. However, implementations still take a long time. Why? Because most existing infrastructures are about 10 years behind the desired state of the art the IT department is trying to achieve. Also because its not just a technical change, its a cultural change. (By the way, so is SaaS.)

I fear that the lack of technical leadership on the executive team will actually hinder adoption of these critical new technologies and other technologies only being thought of now, or in the future. What I think ultimately needs to happen is that high level technical critical thinking skills need to be taught to the rest of the line-of-business executives, so that interesting new technologies will drive interesting new business opportunities in the years to come.

This goes to Marc Andreesen's recent post on how to prepare for a great career. Don't rest on your technical skills, or your business skills, but work hard to develop both. (Marc is another blogger who has been on a streak lately...read his career series and learn from someone who knows a little about success.)

Friday, September 28, 2007

The IT Power Divide

The electric grid and the computing grid (RoughType: Nicholas Carr): Nicholas describes the incredible disconnect between IT's perception of power as an issue
...[O]nly 12% of respondents believe that the energy efficiency of IT equipment is a critical purchasing criterion.
and the actual scale of the issue in reality
...[A] journeyman researcher named David Sarokin has taken a crack at estimating the overall amount of energy required to power the country's computing grid...[which] amounts to about 350 billion kWh a year, representing a whopping 9.4% of total US electricity consumption.
Amen, brother. In fact, the reason you haven't heard from me as often in the last two to three weeks is that I have been steadfastly attending a variety of conferences and customer prospect meetings discussing Active Power Management and SLAuto. What I've learned is that there are deep divides between the IT and facility views of electrical efficiency:
  • IT doesn't see the electric bill, so they think power is mostly an upfront cost issue (building a data center with enough power to handle eventual needs) and an ongoing capacity issue (figuring out how to divide power capacity among competing needs). However, their bottom line remains meeting the service needs of the business.

  • Facilities doesn't see the constantly changing need for information technology of the business, and sees electricity mostly as a upfront capacity issue (determining how much power to deliver to the data center based on square footage and proposed Kw/sq ft) and an ongoing cost issue (managing the monthly electric bill). The bottom line in this case is value, not business revenue.

Thus, IT believes that once they get a 1 Mw data center, they should figure out how to efficiently use that 1 Mw--not how to squeeze efficiencies out of the equipment to run at some number measurably below 1 Mw. Meanwhile, facilities gets excited about any technology that reduces overall power consumption and maintains excess power capacity, but lacks the insight into what approaches can be taken that will not impact the business's bottom line.

With an SLAuto approach to managing power for data centers, both organizations can be satisfied--if they would only take the time to listen to each other's needs. IT can get a technical approach that minimizes (or has zero effect) on system productivity, while facilities sees a more "optimal" power bill every month. Furthermore, facilities can finally integrate IT into the demand curtailment programs offered by their local power utilities, which can generate significant additional rebates for the company.

Let me know what you think here. Am I off base? Do you speak regularly with your facilities/IT counter part, and actively search for ways to reduce the cost of electricity while meeting service demand?

Monday, September 24, 2007

Service-Oriented Everything...

Agility Principle: Service-Oriented Network Architecture (eBiz: Mark Milinkovich, Director, Service-Oriented Network Architecture, Cisco Systems): Cisco is touting the network as the center of the universe again, but this article is pretty close to the truth about software and infrastructure architectures we are moving to. Most importantly, Mark points out that there is a three layer stack that actually binds applications to infrastructure:
  • Applications layer - includes all software used for business purposes (e.g., enterprise resource planning) or collaboration (e.g., conferencing). As Web-based applications rely on the Extensible Markup Language (XML) schema and become tightly interwoven with routed messages, they become capable of supporting greater collaboration and more effective communications across an integrated networked environment.

  • Integrated network services layer - optimizes communications between applications and services by taking advantage of distributed network functions such as continuous data protection, multiprotocol message routing, embedded QoS, I/O virtualization, server load balancing, SSL VPN, identity, location and IPv6-based services. Consider how security can be enhanced with the interactive services layer. These intelligence network-centric services can be used by the application layer through either transparent or exposed interfaces presented by the network.

  • Network systems layer - supports a wide range of places in the network such as branch, campus and data center with a broad suite of collaborative connectivity functions, including peer-to-peer, client-to-server and storage-to-storage connectivity. Building on this resilient and secure platform provides an enterprise with the infrastructure on which services and applications can reliably and predictably ride.
Of course, he's missing a key layer:
Physical infrastructure layer - represents the body of physical(and possibly virtual) infrastructure components that support the applications,network services and network systems, not to mention the storage environment,management environment and, yes, Service Level Automation (SLAuto) environment.
It is important to note that, while the network may becoming a computer in its own right, it still requires physical infrastructure to run. And all of these various application, integrated network, and network systems services that Mark mentions not only depend on this infrastructure, but can actually be loosely coupled to the physical layer in a way that augments the agility of all four layers.

For example, imagine a world where your software provisioning is completely decoupled from your hardware provisioning. In other words, adding an application to your production data center doesn't require you to predict exactly what load the application is going to add to the network, server or storage capacity. Rather, you simply load the application into the SLAuto engine, let traffic start to arrive, measure the stress on existing capacity, and order additional hardware as required. Or, better yet, order hardware at the end of a quarter based on trend analysis from the previous quarter. No need for the software teams and the hardware teams to even talk to each other.

I will admit that it is unlikely that many IT departments will ever get to that "pie-in-the-sky" scenario--for some the risk of not guessing high enough on capacity overwhelms the cost of predicting short to medium term load. However, SLAuto allows you to get past the problems of siloed systems, such as "hitting the ceiling" in allocated capacity. Even if the SLAuto environment runs out of excess physical capacity, it can borrow the capacity it needs for high priority systems from lower priority applications.

The best part is that, since the SLAuto environment tracks every action it takes, there are easy ways to get reports showing everything from capacity utilization trend analysis to cost of infrastructure for a given application.

Back to Mark's article, though. It is good to see some consensus in the industry on where we are moving, even if each vendor is trying to spin it as if they are the heart of the new platform. In the end though, if the network is indeed the computer, the network and the data center will need operating systems. Mark has entire sections dedicated to designing for application awareness (this is where most data center automation technologies fall woefully short), and designing for virtualization (including all aspects of infrastructure virtualization). He is right on the money here, but there needs to be something that coordinates the utilization of all of these virtualized resources. This is where SLAuto comes in.

Most importantly, don't forget to integrate SLAuto into all four layers. Make sure that each "high" layer talks to the layers below it in a way that decouples the higher layer from the lower layer. Make sure that each lower layer uses that information to determine what adjustments it needs to make (including, possibly, to send the information to an even lower layer). And make sure your physical infrastructure layer is supported by an automation environment that can adjust capacity usage quickly and painlessly as applications, services and networks demand.

As you prepare your service oriented architecture of the future, don't forget the operations aspects. We are on the brink of an automated computing world that will change the cost of IT forever. However, it will only work for you if you take all of the components involved in meeting service levels/operation levels into account.

Monday, September 10, 2007

Links - 09/10/2007

Brave New World (Isabel Wang): I can't begin to express how sorry I am to see Isabel Wang leave the discussion, as her voice has been one of the clearest expressions of the challenges before the MSP community. However, I understand her need to go where her heart takes her, and I wish her the best of luck in all of her endeavors.

(Let me also offer my condolences to Isabel and the entire 3TERA community for the loss of their leader and visionary, Vlad Miloushev. His understanding of the utility computing opportunity for MSPs will also be missed.)

MTBF: Fear and Loathing in the Datacenter (Aloof Architecture: Aloof Schipperke): Aloof discusses his mixed feelings about my earlier post on changing the mindset around power cycling servers. I understand his fears, and hear his concerns; MTBF (or more to the point, MTTF) isn't a great indicator of actual service experience. However, even by conservative standards, the quality and reliability of server components has improved vastly in the last decade. Does that mean perfection? Nope. But as Aloof notes, our bad experiences get ingrained in the culture, so we overcompensate.

CIOs Uncensored: Whither The Role Of The CIO? (InformationWeek: John Sloat): Nice generality, Bob! Seriously, does he really expect that *every* IT organization will shed its data centers for service providers? What about defense? Banking? Financial markets? While I believe that most IT shops are going to go to a general contractor/architect role, I think there is still a big enough market for enterprise data centers that markets to support them will go on for years to come.

That being said, most of you out there should look at your own future with a service-oriented computing (SOC?) world in mind.

Thursday, September 06, 2007

Fear and the Right Thing

An interesting thing about diving into the Active Power Management game is the incredible amount of FUD surrounding the simple act of turning a computer off. Considering the fact that:

  • server components manufactured in the last several years have intense MTBF values (measured in hundreds of thousands or even millions of hours),
  • the servers you will buy starting soon will all turn off pieces of themselves automagically, and
  • no one thinks twice about turning off laptop or desktop computers or their components,

the myth lives on that turning servers off and on is bad.

I understand where this concern comes from. Older disk drives were notoriously susceptible to problems with spin-up and spin-down. Don't get me started on power supplies in the late eighties and early nineties. My first job was as a sys admin for a small college where I regularly bought commodity 386 chassis power supplies and Seagate ST-220 disk drives. Even in the mid-nineties, older servers would all too frequently mysteriously die while we were restarting the system for an OS upgrade or after a system move.

Add to this the fact that enterprise computing went through its (first?) mainframe stage, where powering things off was contrary to the goal of using it as much as possible, you get a cultural mentality in IT that up time is king, even if system resources will be idle for great periods of time.

These days, though, the story has greatly changed. As Vinay documented, Cassatt starts and stops all of its QA and build servers every day. In over 18,826 power cycles, not a single system failed. In my interactions at customer sites, there have been zero failures in thousands of power cycles. Granted, that's not a scientific study, but it goes to the point that unexpected component failure is not a common occurrence during power cycles any more.

Of course, I'm looking for hard data about the effect of power cycling nodes to supplement Vinay's data and support my own anecdotal experience. If you have hard data about the effect of power cycling on system reliability, I would love to hear from you.

For the rest of us, let's pay attention to the realities of our equipment capabilities, and look at the real possibility that powering off a server is often the right thing to do.

Tuesday, September 04, 2007

An easy way to get started with SLAuto

It's been an interesting week, leading up to the Labor Day weekend, but as of this morning I get to talk more openly about one project that has been taking a great deal of my time. As I have blogged about Service Level Automation ("SLAuto"), it may have dawned on some of you that achieving nirvana here means changing a lot about your current architecture and practices.

For example, decoupling software from hardware is easy to say, but requires significant planning and execution to implement (though this can be simplified somewhat with the right platform). Building the correct monitors, policies and interfaces is also time intensive work that requires the correct platform for success. However, as noted before, the biggest barriers to implementing SLAuto and utility computing are cultural.

There is an opportunity out there right now to introduce SLAuto without all of the trappings of utility computing, especially the difficult decoupling of software from hardware. It is an opportunity that the Silicon Valley is going ga-ga over, and it is a real problem with real dollar costs for every data center on the planet.

The opportunity is energy consumption management, aka the "green data center".

Rather than pitch Cassatt's solution directly, I prefer to talk about the technical opportunity as a whole. So let's evaluate what is going on in the "GDC" space these days. As I see it, there are three basic technical approaches to "green" right now:
  1. More efficient equipment, e.g. more power efficient chips, server architectures, power distribution systems, etc.
  2. More efficient cooling, e.g. hot/cold aisles, liquid cooling, outside air systems, etc.
  3. Consolidation, e.g. virtualization, mainframes, etc.

Still, there is something obvious missing here: no matter which of these technologies you consider, not one of them is actually going to turn off unused capacity. In other words, while everyone is working to build a better light bulb or to design your lighting so you need fewer bulbs, no one is turning off the lights when no-one is in the room.

That's where SLAuto comes in. I contend that there are huge tracks of computing in any large enterprise where compute capacity runs idle for extended periods. Desktop systems are certainly one of the biggest offenders, as are grid computing environments that are not pushed to maximum capacity at all times. However, possibly the biggest offender in any organization that does in-house development, extensive packaged system customization or business system integration is the dev/test environment.

Imagine such a lab where capacity that will be unused each evening/weekend, or for all but two weeks of a typical development cycle, or at all times except when testing a patch to a three year old rev of product, was shut down until needed. Turned off. Non-operational. Idle, but not idling.

Of course, most lab administrators probably feel extremely uncomfortable with this proposition. How are you going to do this without affecting developer/QA productivity? How do you know its OK to turn off a system? Why would my engineers even consider allowing their systems to be managed this way?

SLAuto addresses these concerns by simply applying intelligence to power management. A policy-based approach means a server can be scheduled for shutdown each evening (say, at 7PM), but be evaluated before shutdown against a set of policies that determine whether it is actually OK to complete the shut down.

Some example policies might be:

  • Are certain processes running that indicate a development/build/test task is still underway?
  • Is a specific user account logged in to the system right now?
  • Has disk activity been extremely low for the last four hours?
  • Did the owner of the server or one of his/her designated colleagues "opt-out" of the scheduled shutdown for that evening?

Once these policies are evaluated, we can see if the server meets the criteria to be shut down as requested. If not, keep it running. Such a system needs to also provide interfaces for both the data center administrators and the individual server owners/users to control the power state of their systems at all times, set policies and monitor power activities for managed servers.

I'll talk more about this in the coming week, but I welcome your input. Would you shut down servers in your lab? Your grid environment? Your production environment? What are your concerns with this approach? What policies come to mind that would be simple and/or difficult to implement?

Tuesday, August 28, 2007

Links - 08/28/2007

Don't Worry, It's Safe to Power off that Server and Power It on Again (Vinay Pai): Vinay posts on one of the biggest myths in data center operations: the "Mean Time Between Failure" myth. In short, if you ran 1000 servers for 3 years, there is a .06% chance that any power supply would fail. From this, he notes that dual power supplies are an inefficient solution (from a green standpoint) to a decidedly minor problem. Remember this point for some of my future posts--this myth is busted, and knowing this opens you to some very quick and simple power efficiency practices.

Lowering Barriers to Entry: Open Source and the Enterprise (The Future of Software: Stephen O’Grady): Stephen, of Red Monk fame, argues that the real value of open source software is not its price or code quality, but the ease in which it can be introduced into an enterprise. According to Stephen, open source puts the power of software acquisition into the hands of developers and architects. Would you agree? Is there an equivalent possibility for open source hardware? SaaS? Utility computing? Or will those drive the pendulum back to central management control?

Scalability != concurrency (Ted Leung on the Air): Given my past as a enterprise software development junkie, this article is particularly interesting to me. A little debate is breaking out about the shortcomings of Java in a highly concurrent hardware model, and there seem to be a few upstart languages making a name for themselves. I'm was not aware of Erlang, but you can bet I will spend some time reading about it now (despite its apparent shortcomings--see below). For those interested in utility computing and SLAuto, remember the goal is to deliver the functionality required by the business using the most cost effective resource set necessary to do so. Software is a big part of the problem here, as inefficient programs (or even VMs, it seems) can minimize the impact of better hardware technologies. Keep an eye on this, and if you are writing software to run in a SaaS / utility computing world, consider the importance of concurrency to gaining cost effective scale.

http://www.russellbeattie.com/blog/java-needs-an-overhaul (Russell Beattie): This is the article that triggered Ted's comments above. An interesting breakdown from one developer's point of view, with an brief overview of Erlang's shortcomings as well.

Thursday, August 23, 2007

Links - 08/23/2007

AWS and Web 2.0 Mapping (WeoGeo: Paul Bissett): Paul, founder of WeoGeo, a company focused on geospacial solutions, made a comment in this post that I thought needed sharing:

Mapping, particularly quantitative mapping like GIS, and AWS go together like peanut butter and jelly (I have 3 small kids who have been out of school all summer, so this was the first analogy that came to mind). The utility computing of EC2 and the large web-addressable disk storage of S3 provide opportunities for developing and sharing of mapping products that previously were cost prohibitive.

A solar-powered data center saves energy its own way (SearchDataCenter.com: Mark Fontecchio): Frankly, I think this may actually be a wave of the future. Time to buy cheap real estate in the California desert, folks. (No access to the grid required! Drill a well for water, dig a septic tank, and Internet access is your only utility need.) Given the success AISO.net is having here, I would be surprised if more small-medium sized data centers don't pop up where the sun shines almost every day.

Hell, with the spaceports going up in New Mexico, that state's deserts might not be a bad place to place your bet either.

Wednesday, August 22, 2007

Business doesn't ask for utility computing, part 2

Bob Warfield (of the stealth mode company, SmoothScan) called me out on the admittedly flippant argument I put out for IT ownership of infrastructure strategy and architecture. I made this argument specifically in the face of business units (and even hosting clients) who are extremely resistant to sharing "their" servers with anyone. Bob's response is insightful, probably exactly how BUs will respond, and deserves a careful response.

IT is a service organization. (Translation: You work for us, and we're more mission critical than you are. You are replaceable by VAR/SIs and by SaaS. Be careful when getting uppity with us.)

Damn straight. We are a service organization, and as such our sole purpose of costing our enterprise money is to meet your functional and service level requirements in the most cost effective way possible.

However, your statement does not explain why you shouldn't share servers. If we demonstrate conclusively that we can better meet your service levels at a lower cost with virtualization and/or utility computing, clearly it is in the financial interest of the company for you to pursue the concept further. In the same vein, if you can prove that you can get the business functionality you need for cheaper through a SaaS solution, we should help you make that happen.

You have not always met your SLA's and delivered your projects on time and on budget. In fact, there is at least one major nightmare project on everyone's mind at any time. (Hey, it's software, what else is new, it wasn't our fault, part of it is business' fault beacuse of how they spec'd their requirements and then failed to deliver, yada, yada. But, fair or not, IT gets the blame. IT has more glass on its house than anyone.)

We completely agree--IT has often failed to deliver (or been party to delivery failures). However, because we are focusing on infrastructure issues, let's let the SOA guys describe how they will mitigate software delivery failures.

There are two key forms of project failures for IT infrastructure:

  1. Failing to acquire, install, configure and provision hardware in a timely fashion
  2. Failing to meet agreed upon SLAs when operating that hardware and their software payloads.

Assuming we physically receive the hardware in a timely fashion, we then must use automation to greatly reduce the cost of getting new systems up and running quickly. Whether or not systems are shared by business units, this need is there.

In fact, because we are utilizing resource pooling in a utility computing model, it will often be possible to provision your software without requiring you to wait for any associated hardware. Want to get a quick beta marketing program up and running in a matter of hours? Give us the binaries and we will find capacity for it today. We'll determine the need to add additional capacity to the system later, once we see trends in overall infrastructure demand.

As far as service levels go, response to violations have to be automated. No more waiting for someone to respond to a pager call--if a server dies in the middle of the night, SLAuto systems must quickly restart or replace the failed system. With automation involved in meeting your SLA needs on shared systems, we aim to remove the dependency on "human time" and the limits of siloed resources, which are what was killing us before.

BTW, our new friend (insert name of Enterprise Software Sales Guy) has told us all about these topics, so we're knowledgeable too, and we think you ought to listen to us. (Very dangerous game for the Enterprise Sales guy, but if IT already shut him down, this is exactly how they'll play it in many cases because they have nothing to lose.)

Again, unless Mr. Enterprise Software Sales Guy was selling you something that manages infrastructure (in which case, why do you care?), what he is selling doesn't impact the decision of why or why not to share servers with other business units in an IT utility. If he is telling you it does matter, he'd better be able to demonstrate how his product will beat the ROI we are projecting from utility computing. Oh, and that is a BIG number.

We still remember those times when you put the needs and requirements of your own organization ahead of our business needs. You wouldn't choose the app we wanted because it didn't fit your "standards". The app you did choose stinks and our competitors, who use the app we wanted, are now running rings around us. (Yep, it happens. I've seen IT frequently choose an inferior or even unacceptable app because they didn't care and had the power to ram it down the business' throats. When it blew up, IT either lost credibility or the business suffered depending on how the culture worked. This happens at all kinds of companies, large and small, successful or not.)

In my earlier example, I made it clear that you have the right to push as much as you want for functionality and aesthetics. Applications are the point where both originate, and we fully support your demands that we not make your application decisions for you (but hope we can make them with you). However, architecture is a different story, and infrastructure architecture is about as far removed from functionality and aesthetics as you can get (except in relation to service levels, but we already covered that). Again, if we deliver the functionality you want at the service levels you require in the most cost efficient way reasonably possible, then you shouldn't care.

Oh, and by the way, we take full responsibility of reporting to you the SLA compliance levels and associated cost of infrastructure as we move forward, so you can determine on your own if we are really achieving our goal. Of course, that might come in the form of a bill...

The core of Phil's comment boils down to the following:

You won't wean the business from sticking their nose into IT's business so long as these cultural factors persist. Earning the write to be a trusted deliverer carries with it the responsibility to be trustworthy.

Have you been trustworthy?

If not, even if it wasn't your fault, consider a more consensus oriented approach. After all, the speeches described above boil down to "do it because I say so". I try to avoid that with my kids where possible, and it is possible almost 100% of the time.

To that I reply "mea culpa". I was stating my case in a much less friendly tone than I would in real life to make my point. You are right that all business relationships must (ideally) be consensus driven. However, in the end the cost savings driven by a multi-tenant approach (be it utility computing, virtualization or SaaS) can't be achieved if each business unit demands owning its own server.

One last thing: it has been my experience in the last year or two that business units are much more open to sharing infrastructure within a company. As long as data security is maintained, business application owners are most concerned about the same things IT is--delivering required functionality at required service levels for as little cost as possible.

Sharing infrastructure with another company, however, is an entirely different story.

Tuesday, August 21, 2007

Links - 08/21/2007

MaaS - Money as a Service (Roman Stanek's Push-Button Thinking): The analogy of banking to software usage is a good one. As Roman says:
"[A]s nobody would keep their money at home stuffed in a mattress anymore, I don't expect users to go through the pains of installs, upgrades, re-installs and maintenance of complex software products. "
Sure enough. However, note that there are always some entities that keep their own cash handy: banks, for one, not to mention government treasuries. In the same vein, I think there will always be certain non-IT organizations that will maintain their own data centers, such as financial firms with proprietary IT that enable competitive advantage, as well as law enforcement and national defense.

Millions of Square Feet (RackLabs: Lew Moorman): This is an interesting example of how computing needs translates directly into physical space. I'm interested in knowing what RackSpace / RackLabs view of the business model for utility computing is, but at the very least we can see that giant compute farms are most definitely in our future.

Tech's own data centers are their green showrooms (InfoWorld: Robert Mullins, IDG News Service): This article covers the "eat your own dog food" approach that both Sun and Fujitsu are taking in terms of energy efficient computing. It is interesting to me, however, that none of the solutions described simply turn unused equipment off...

Monday, August 20, 2007

Business doesn't ask for utility computing,either...

Call for more EA collaboration (Enterprise Architecture: From Incite comes Insight...: James McGovern) and
SOA and EDA: SOA-selling battle goes on in blogosphere (SOA and EDA: Jack van Hoof): Interesting discussion regarding Jack's post, "SOA and EDA: Business doesn't ask for SOA". There seems to be a little bit of backlash to the argument that no one should have to sell SOA to the business. However, James puts it wonderfully when he presents the following observation:
Imagine finding a carpenter with thirty years of experience and having him ask you whether it is OK if he uses a nailer instead of the trusty hammer and nail. Wouldn't this feel absurd?
Absolutely. IT architecture is actually very rarely a business issue. This is as true in infrastructure as it is in software. Which is why arguments from the business that "I don't want to share my server with anyone" shouldn't hold a lick of weight with IT. If you encounter that kind of resistance in your world, just fire back the following:

"As long as I am meeting your service levels, how I deliver them is not your concern. Like the relationship between home builder and client, we are responsible for delivering the product you pay for to required building codes (meaning IT technology governance, not business "want to haves") and contractual quality specifications (SLAs).

Feel free to "drive by the property" occasionally to see our progress (and comment on aesthetic and feature completeness concerns), but trust our professional experience to design and build the required infrastructure. As a cost center, believe that it is in our interest to drive down costs, passing the savings on to you."

This argument would probably hold true for the hosting-client relationship as well...

Friday, August 17, 2007

Plumbers are plumbers, dude...

Allan Leinwand, a venture partner with Panorama Capital, founder of Vyatta, and the former CTO of Digital Island posted an interesting article about what it will take for today's telecom service providers to become major players in the Internet of the future. As Allan puts it:
If there’s one thing that service providers denounce, it’s being classified as the plumbers and pipe fitters of the Internet, destined to move bits between co-location facilities. With the software-as-a-service (SaaS) and Web 2.0 revolutions in full swing, service providers are pounding the table, insisting that they have evolved beyond the mundane task of moving bits to become “service provider 2.0” companies.
Allan goes on to demonstrate that the true advantage that these SPs have over startups is their understanding of scale, though he is less than certain that they will be able to take advantage of the opportunity.

I believe the telecom providers have never moved beyond being the plumbers, though innovative plumbers that have figured out all kinds of ways to charge you for every turn of a faucet. Doubt me? Just look at the Web 1.0 world. Every single Internet access provider I have used has offered me a "home page" of their making, with supposedly advanced services for accessing mail, news, search and other key features of the early Internet. And in every case, I quickly replaced their tired page with either my My Yahoo page or Google. Not a single one was able to offer me anything innovative enough to see them as leading edge technology in the Web content space.

The same will be true for SaaS (Software as a Service), FaaS (Frameworks as a Service) and PaaS (Platform as a Service). They may be great at scaling network architectures, pretty damn good at scaling computing infrastructures (making one or more Bells a player in the compute capacity space), but they haven't got a clue how to provide the art that makes Internet content compelling. I've worked with telecoms and Internet access providers in the past, and I wouldn't trust them to create an ERP package, social networking site or even an online photo album that would hold a candle to Salesforce.com, Facebook or Flickr respectively.

It all comes down to the layering that Isabel Wang points out some major players are evangelizing these days. To quote Isabel:

Amazon and Microsoft made me realize that Internet infrastructure solutions should be - will be - delivered in 4 layers:

(a) Data centers/physical servers/virtualizataion software

(b) Utility computing fabric comprised of large pools of servers across multiple facilities

(c) Application frameworks, such as Amazon's web services APIs

(d) Shared services, such as identity management and social networking

Damn straight. Think about the implications of the above. To expand on those definitions a little bit, if you want to cover all of the bases in the Web 3.0 world, you have to deliver:
  • servers (physical and virtual) with supporting network, storage, power, cooling, etc. systems
  • automation and management intelligence to deliver service levels in an optimal fashion (insert SLAuto here) on that infrastructure
  • some killer APIs/frameworks/GUIs to allow customers to apply the infrastructure to their needs
  • all of those core capabilities that customers will require but will not want to build/support themselves (such as the things that Isabel notes, but also there is some SLAuto here as well)

The SPs that Alan references are great at Isabel's layer (a), and have a head start on delivering (b). However, when you move to (c), all of a sudden most service providers fall down. Even the wireless guys rely on Java / Microsoft / Nokia / etc. to provide this interface on their networks. Today, there are no telecoms, hosting providers or other Internet service provider that comes even close to handling (d).

Is anyone handing all four layers? Sure, the software companies that know how to scale: Google, Amazon, Microsoft, Salesforce.com, Ebay, etc. These guys worked from the top down to build their businesses: they wanted to provide applications to the masses, so they had to build (d), (c) and (b) in order to keep the required (a) manageable. Some (soon most?) of these guys are working to make a buck off of the rest of us with their technology.

It took startups--quickly growing startups, mind you--to work through the pain of being dominant Web 3.0 pioneers. However, even they don't own the entire infrastructure stack needed to do truly dynamic web computing, and they are really still pretty damn primitive. (For example, while many of these vendors have internal automation to make their own lives easier, they offer their customers little or no automation for their own use.)

Telecoms will always own the networks that in turn make the rest of our online lives possible. They may also acquire companies that know how to do the software infrastructure side a bit better--identity infrastructure especially seems like a good telecom business; after all, what is a phone number other than a numeric user ID? But they will not likely be the owners of the social networks of the future. They probably will never be the dominant capacity providers in the utility computing world. However, owning the network is a powerful position to have.

Network neutrality, anyone?

Update: You should read the comments to Alan's article, as well. Lot's of very smart people saying much the same as my long post, but in far fewer words. :)