Showing posts with label utility computing. Show all posts
Showing posts with label utility computing. Show all posts

Tuesday, June 24, 2008

"Follow the Law" Meme Hits the Big Time

A few days ago, I checked in to my w3counter dashboard to see who was linking to my blog, and I discovered an very intelligent continuation of the "Follow the Law Computing" meme written by Greg Ness (also found on his blog). Greg's addition of the "spice trails" analogy was something new to me, and raised some interesting thoughts about what the historical significance of the cloud will be to world wide wealth distribution. There certainly has been a limited but significant wealth effect created by the Internet itself, but will the ability to physically move data and/or compute loads accelerate these trends?

Noting that I should blog about this on the plane at some point during my trip to Austin this week, I dutifully bookmarked the article for later. I had no chance to look at traffic on Monday, so it was with great shock that when I got on line this morning I saw a hockey stick graph. I investigated, and then my heart skipped a beat.

As of now, today, quotes from my "Follow the Law" post make up Nick Carr's latest post. Nick weaves together the work of Bill Thompson (which I also reference), myself and Greg to provide a clear, concise discussion of the concept of what he calls "itinerant computing". (Damn, he's good at coining these terms, isn't he?)

Ever since I discovered Nick's blog early in my career at Cassatt, I've wanted to get his attention. The Big Switch was an eye opening read--if only it served as a good counterpoint to Bill Coleman's optimistic vision. He made me look at utility computing and cloud computing with a more critical eye, and I wanted to add to his body of knowledge. I am honored to have done so in a small way.

Surprisingly, though, that wasn't whole the hockey stick trigger. Greg's post was picked up by a site called Seeking Alpha, a site I must admit I had never heard of before. Apparently a high traffic investment site (connected to Jim Cramer?), Seeking Alpha drove a record traffic load to my humble blog through a rebroadcast of Greg's post. Rereading that, I noticed that there is a very strong business message there that may in fact be actual historical significance of "itinerant computing": the flow of data and computing is simply an enabler of new business models and competitive advantages that change the face of global wealth. Being a resident of what is essentially a suburb of the Silicon Valley, I can't help but think there is more downside than upside to that story.

Finally, as I looked at the other referrers to this blog, I found an excellent summary of all of the "Follow" computing options: Follow the Sun, Follow the Moon and Follow the Law. Kevin Kelly gives very good basic definitions of each concept, and then makes the following observation:

"Most likely different industries adopt a different scenario. Maybe financial follows the moon, while commerce follows the sun, and entertainment follows the law. A single computing environment (One Machine) should not suggest homogeneity. A meadow is not homogeneous, but its does act as a coherent ecological system.

Another way to dissect the daily rhythm of the One Machine is to trace the three distinct waves of energy, data, and computation as they flow through the planetary "cloud." Each probably has its own pathways."

Amen, brother. I'll go even further. Maybe the customer server systems of a financial company follows the sun, the analytics systems follow the moon, and the trading systems follow the law. I do not mean to suggest at all that every distributed compute task will benefit from follow the law concepts. In fact, I would suggest that there are other "Follow" options that will be created over the coming decades.

All of this leads to the question of software fluidity...

Tuesday, June 10, 2008

Eucalyptus and You

Last Friday night I came across a post by Sam Dean of OStatic, titled "Eucalyptus: Unsung Open Source Infrastructure for Cloud Computing", and my jaw fell to the floor. Here it was, the project I wondered why no one was building; a project focused on replicating Amazon APIs in an open source cluster environment. The more I read Sam's post, the more I thought "Man, is this project in the right place at the right time."

I immediately Twittered the link, and was retweeted by no less than Don MacAskill and Dion Hinchcliffe in a matter of minutes. A few hours later, Simon posted his excitement, and then this morning I came across an analysis by Todd Hoff of highscalability.com that I think sums up what we know today quite nicely. Todd heard about this through the Cloud Computing group on Google Groups, and that thread was kicked off by Khazret Sapenov, himself a very prolific cloud thinker.

This is big stuff, despite the skepticism of some cloud fanatics who can't grep why "private clouds" (I am beginning to like that term) are legitimate. I most certainly don't fall into that particular camp, having real experience working with customers who realize that they have to start with an in-house cloud to satisfy corporate and legal mandates. Ideally, though, this infrastructure would allow them to migrate all or portions of their applications out of house when the time and technology are right. If Eucalyptus can pull this off and really provide a killer Amazon clone for private deployments, they may become the core technology for an awful lot of enterprise SLAuto platforms in years to come.

Of course, they are a hell of a long way from achieving that. Todd's post gives a fairly good overview of what Eucalyptus is, but there is still much to do from the technical, functional and marketing standpoints. For example:

  • As the Eucalyptus team notes themselves, its still missing key command line tools.
  • It doesn't appear to be an infrastructure optimization approach, but rather a straight forward clustering approach. Thus, all of your capacity likely must remain running continuously when using the out-of-the-box functionality. I'd like to see them tackle SLAuto when they have the Amazon tools completed.
  • It is thoroughly dependent on the Rock cluster project. Knowing my enterprise IT friends, this won't "go down easy" for any of them.
Interestingly enough, while I was writing this, the Eucalyptus home page was temporarily unavailable. I hope this means that it is overwhelmed with interest. I'd really like to see this community grow substantially, and for the project to evolve very quickly from where it is now.

Simon's observations about portability are really at the heart of my excitement. Realistically, the Eucalyptus team has simply started a journey of 1000 miles with this single step. Congratulations, guys, on setting the pace.

Friday, May 30, 2008

It just keeps getting cloudier and cloudier

Looking for inspiration, I checked out my latest Google Alerts for "cloud computing" and found an interesting--perhaps even disturbing--trend: people are locking in their definitions of cloud computing. The problem is these definitions are largely inconsistent.

First, allow me to make a confession. In my own storied attempt to define cloud computing, I certainly sounded definitive in my definition. For example, I stated:

Cloud computing describes a systems architecture. Period. This particular architecture assumes nothing about the physical location, internal composition or ownership of its component parts. It represents the entire computing stack from software to hardware, though system boundaries (e.g. where does one system stop and another begin) may be difficult to define. Components are simply integrated or consumed as need requires and economics allow.
For what its worth, I have found myself shifting a little; not so much on the definition, but on what exactly it defines. Given the largely consensus opinion that Cloud Computing refers to a service model, I am willing to concede that the description above really describes a "Cloud Oriented Architecture" for a complex integrated environment. The true definition of cloud computing is still evolving in my mind.

Now, back to the posts at hand. What I believe I am seeing these days is a split between two camps; the "cloud computing is only about services" camp, and the "cloud computing is getting what ever you need from the Internet" camp.

An example of the former comes from Randy Bias at NeoTactics:
"There seems to be a group myopia around so-called ‘cloud computing’ and it’s definitions. What we’re really talking about are ‘cloud services’ of which, ‘computing’, is only a subset. It gets worse when you have people talking about Software as a Service (SaaS) as a ‘cloud’ service. Things continue to become murkier when the SaaS crowd, bloggers, and reporters start making up new definitions for cloud services using SaaS-like terms such as Platform as a Service (PaaS) and Infrastructure as a Service (IaaS)."
Scott Wilson of The CIO Weblog adds the following:
"When I think of a service as cloud computing, it is characterized by being an offering of nearly unlimited capacity (although it may be billed differently at different utilizations) which has some sort of generic utility but beyond certain minimal architectural requirements there should be no inherent specificity in what it may or should do. It may be a service of a certain type of utility, perhaps storage, raw processing capability, or data storage, but in the same way that a datacenter does not restrict what servers you may host with them, it should not restrict what sort of data you store, process, or serve."

[Some definition links removed]
Sort of a "cloud services have a cloudy definition" kind of definition.

One of the best examples of the latter comes from ProductionScale's Joseph Kent Langley:
"Cloud Computing (Figure 1.0) is a commercial extension of computing resources like computation cycles and storage offered as a metered service similar to a physical public utility like electricity, water, natural gas, or telephone network. It enables a computing system to acquire or release computing resources on demand in a manner such that the loss of any one component of the system will not cause total system failure. Cloud computing also allows the deployment of software applications into an environment running the necessary technology stack for the purposes of development, staging, or production of a software application. It does all this in a way that minimizes the necessary interaction with the underlying layers of the technology stack. In this way cloud computing obfuscates much of the complexity that underlies Software as a Service (SaaS) or batch computing software applications. To explain better though, let's simplify that and break it down this definition to it's constituent parts."


Langley's definition is more closely aligned with utility computing, but may be best summarized as a "if you can run it on the Internet, its a cloud".

Of course, there is also James Governor's famous list of requirements.

All of which leads to a gap in terminology that gets filled by whatever reaches the vacuum at the moment: what do you call a "cloud-like" infrastructure in a private data center? As I noted to the Google Groups Cloud Computing alias:
"[H]ere (is) how I arrived at that conclusion:

  • If "grid computing" is about running job-based tasks in a MPP model (e.g. HPC) (as it seems to be defined for many), and
  • If "utility computing" is a business model for providing computing on an as-needed, bill-for-what-you-use basis, and
  • If "cloud computing" is a market model describing services provided over the Internet (which it is for most of the Web 2.0 world), and
  • If "virtualization" describes providing software layers in the execution stack to decouple software from the hard resources it depends on (and it is important to note for the purposes of this argument that "resource-pooled" does NOT require virtualization in this sense; it is quite possible to run your software on bare metal server pools, as we did at Cassatt)
  • Then, what do we call the systems/infrastructure model where resources are pooled together, and used for a variety of workloads, including both job-based and "always running" tasks (such as web applications, management and monitoring applications, security applications, etc.)?
Do we redefine "grid" to cover the expanded role of resource-pooled computing (as 3TERA seems wont to do)? Do we leverage "utility computing" as an adjective for platforms that can deliver that business model for those that own infrastructure (as Cassatt and IBM tend to do)? Does the term "virtualization" represent a broader view than how VMWare, Microsoft and Citrix are defining it? Is there another term (such as "resource-pooled computing"--ugh) that would better serve the discussion?"
I'm still hunting for the answer to that one.

However, in terms of my definition of cloud computing, I have to say I lean towards the "anything you can run on the Internet" camp, as it--to me--best represents what an actual drawing of a cloud means in a system diagram. Just "go to the cloud" and get what you need, whether its a complete CRM system or a simple purchasing service. This eliminates a million potential grey areas at the boundaries of the "only about services" definition. Is PayPal a cloud service? Why or why not?

I'd love to hear from those of you that are beginning to see some consensus in online communities about what a constitutes a cloud or cloud service and what doesn't. In the meantime, I am settling down for another long summer of fog (this is the Bay Area, after all), though I'll have plenty of company, I'm sure.

Thursday, May 22, 2008

Cassatt Announces Active Response 5.1 with Demand Base Policies

Ken Oestreich blogged recently about the very cool, probably landmark release of Cassatt that just became available, Cassatt Active Response 5.1. He very eloquently runs down the biggest feature--demand based policies--so I won't repeat all of that here. What I thought I would do instead is relate my personal thoughts on monitoring based policies and how they are the key disruptive technology for data centers today.

To be sure, everyone is talking about server virtualization in the data center market today, and that's fine. It's core short-term benefit, physical system consolidation and increased utilization is key for cost-constrained IT departments, and features such as live motion and automatic backup are creating new opportunities that should be carefully considered. However, virtualization alone is limited in its applications, and does little to actually optimize a data center over time. (This is why VMWare is emphasizing management over just virtualizing servers these days.)

The technology that will make the long term difference is resource optimization: applying automation technologies to tuning how and when physical and virtual infrastructure is used to solve specific business needs. It is the automation software that will really change the "deploy and babysit" culture of most data centers and labs today. The new description will be more like "deploy and ignore".

To really optimize resource usage in real time, the automation software must use a combination of monitoring (aka "measure"), a policy engine or other logic system (aka "analyze") and interfaces to the control systems of the equipment and software it is managing (aka "respond"). It turns out that the "respond" part of the equation is actually pretty straight forward--lots of work, but straight forward. Just write "driver" like components that know how to talk to various data center equipment (e.g. Windows, DRAC, Cisco NX-OS, NetApp Data ONTAP, etc.), as well as handle error conditions by directly responding or forwarding the information to the policy engine.

The other two, however, require more immediate configuration by the end user. Measure and analyze, in fact, are where the entire set of Service Level Automation (SLAuto) parameters are defined and executed on. So, this is where the key user interface between the SLAuto system and end user has to happen.

What Cassatt has announced is a new user interface to define demand based policies as the end user sees fit. For example, what defines an idle server? Some systems use very little CPU while they wait for something to happen (at which point they get much busier), so simply measuring CPU isn't good enough in those cases. Ditto for memory in systems that are compute intensive but handle very little state.

What Cassatt did that is so brilliant (and so unique) is to allow the end user to leverage the full range of SNMP attributes for their OS, as well as JMX and even scripts running on the monitored system to create expressions that define an idle metric that is right for that system. For example, on a test system you may in fact say that a system is idle when the master test controller software indicates that no test is being run on that box. On another system, you may say its idle when no user accounts are currently active. Its up to you to define when to attempt to shut down a box, or reduce capacity for a scale-out application.

Even when such an "idle" system is identified, Cassatt gives you the ability to go further and write some "spot checks" to make sure they system is actually OK to shut down. For example, in the aforementioned test system, Cassatt may determine that its worth trying to power down a system, but a spot check could be run to determine if a given process is still running, or an administrator account is currently actively logged in to the box that would indicate to Cassatt that it should ignore that system for now.

I know of no one else that has this level of GUI configurable monitor/analyze/respond sophistication today. If anyone wants to challenge that, feel free. Now that I no longer work at Cassatt, I'd be happy to learn about (and write about) alternatives in the marketplace. Just remember that it has to be easy to configure and execute these policies, and scripting the policies themselves is not good enough.

It is clear from the rush to release resource optimization products for the cloud, such as RightScale, Scalr, and others, that this will be a key feature for distributed systems moving forward. In my opinion, Cassatt has launched itself into the lead spot for on premises enterprise utility computing. I can't wait to see who responds with the next great advancement.

Disclaimer: I am a Cassatt shareholder (or soon will be).

Tuesday, May 20, 2008

A Funny Thing Happened On The Way to the Apple Store...

Part of the fun of joining my new employer is their open policy for selecting the laptop of your choice. Of course, being a lover of technologies that enable one to be technically lazy, I chose a MacBook Pro. It should arrive in a few days.

However, I was beginning to feel like I needed another beefed up system of my own at home to act as a multi-guest virtual "server farm" for various experiments, etc., that may include scale-out benchmarking, interesting integration issues, etc. My initial thought was a 8-core Mac Pro loaded with memory and disk, which would have set me back about $6500. So I asked Luis what he thought, and he said, "Don't Bother. Whenever I need a bunch of servers to test with, I generally find [Amazon] EC2 works perfectly fine."

You could have heard the head slap a mile away.

With all of my focus being on enterprise computing the last two years, I had totally lost sight of the "individual" applications of a cloud like EC2. I no longer have to think about building up a server farm of my own, or purchase a big honkin' dual Quad-core tower, or even reserve space on the corporate "cluster library". I just need my credit card, my Amazon account, and a little time with the "Getting Started" tutorial, and I have all the server resources I need at a price that is a fraction of buying the big box, with billing that allows me to easily expense work-related computing. Damn, I love the modern world!

Now, all of this probably seems so obvious to all of you out there, and it probably cracks you up to see a cloud computing blogger miss this opportunity to "reach for the clouds", so to speak. However, I think this is indicative of the change that both individuals and enterprises must go through to take advantage of these new breed of technologies.

I, like may Fortune 500 IT departments, am an old school client-server/SOA guy. I have a "use the right tool for the job" mentality, driven by years of pain trying to force procedural pegs into SOA holes. This mentality leads to a "best of breed" bias that leads one to worry about the ground up implementation of any software solution. If a tool was found that reliably hid some of that implementation, that was awesome and incredibly helpful to productivity. However, one needed to still understand how the server worked with the OS worked with the middleware worked with the application implementation to be comfortable to go to production.

To me, Amazon, Mosso, Cassatt and others are indicative of a major change in this mentality. With reliable shared configurations of systems (or a reliable systematic infrastructure for matching compute tasks to disparate resources that can handle those tasks), application developers now need to know less and less about the server, networking and storage part of the equation. Now, with the focus from the OS on up the stack, developers can start shopping for the infrastructure that makes economic sense for the problem they are trying to solve. The trick, of course, is to remember there are alternatives to buying your own servers.

So, this week I started to play with Amazon EC2, S3 and Cloud Services' new instance management tool, Cloud Studio. Let me just say, I am incredibly impressed with what I've done so far, which is little more than creating, starting and terminating instances (with a little between machine networking thrown in for fun). Even using Amazon's command line tools, it is a pretty straight forward process to get either a 32-bit or 64-bit server, but when you add the visual cues of Cloud Studio, it just becomes so simple it boggles the mind.

Now, there are definitely disadvantages to using Amazon for some problems. Windows support is out, for instance. (Anyone have a good suggestion for a true on-demand pricing option for Windows? Mosso would work, I hear, but they have a fixed upfront price that is a little steep for my general needs.) Also, any work that involves large amounts of data transfer ups the ante greatly. (Kevin Burton talked about this some time ago--see his note about bandwidth pricing just below the last quote, about half way down.) However, I will never again forget to consider the cloud before "own your own" for any computing task I have in my personal world.

Hmmm. I wonder if I can get my wife to use Zoho now...

Saturday, May 17, 2008

Blog Title Change: Leveraging the Wisdom of Clouds

As I discussed in my last post, the change of jobs gives me the opportunity to broaden the coverage of this blog somewhat beyond the basic topic of delivering SLAuto to enterprise data centers. To more completely reflect this, and (quite frankly) to increase visibility to those searching for information about cloud computing and utility computing, I have changed the title and description of this blog.

Now titled "The Wisdom of Clouds" (with absolute apologies to James Surowiecki and his great book, The Wisdom of Crowds) this blog will discuss cloud computing, utility computing, SaaS, PaaS and Haas as they relate to both the enterprise and individual users. This really isn't much of a departure from the topics covered in the last year or so--in fact, I considered sub-titling the blog "Covering your *aaSes since 2006"--but the explicit description allows more people to more readily discover my ramblings.

For those who have been following this blog for some time, as well as those who have just discovered it, I thank you. I hope you will join me in creating and shaping "the wisdom of clouds".

Monday, May 12, 2008

Project Caroline: "Sweet" project, or Sun's savior?

A few days ago there was significant coverage of Project Caroline, Sun's new open source cloud computing platform and service offering. While seemingly taking a page directly out of Google's play book, Caroline is actually interesting for a few key differences (adapted from Rich Zippel's blog):

  • It is an open source research project, not an actual product offering at this time. This means Sun's services are offered for free. Of course, there is one catch with regards to the Sun offering: you must have a Grid account, and you will be charged for resources used on that grid.

  • The source code for the entire stack is freely available today. Not just the programming APIs, as in Google's case, but the entire stack. If you are comfortable using Glassfish, Postgres, and "limiting" languages to Java, Perl, Python, Ruby, and PHP, you can start your own Caroline-compatible cloud computing company today. Just remember, its a research project, so all of this is subject to change.

  • In some ways, this is what you would expect from Sun: an engineering research project touted as the future of computing. No charge for the software, etc, but note that Sun can actually monitize this through the Grid-hosted offering.

I still hold some Sun stock, so I'm actually a little excited about the possibility that there may be an actual new revenue stream here. Could you imagine, Sun actually branching out from pure hardware? The timing is good too, as they may have a better prescription than their more successful competitors, at a time when sales to corporate data centers may be hitting their peak. If handled well (which is a big "if" with Sun), this could guarantee a growing revenue stream for decades to come, even if corporate IT nearly stops buying servers.

I haven't played with Caroline yet, but I think Sun is at least marketing the platform I hoped that Google, or Microsoft, or Adobe, or someone out there would have built. Yeah, its Sun, so its probably a computer science dissertation project to configure and manage the thing, but who else is doing five languages on industry standard infrastructure with RDBMS support?

I'm hoping to get around to evaluating this in some detail in the next couple of weeks, so stay tuned.

Tuesday, May 06, 2008

One advantage of utility computing infrastructure: Heisenberg's Uncertainty Principle applied to computing

I was casually browsing my Google Reader pages (which can also be followed on FriendFeed) when I came across a gem from Data Center Knowledge: apparently Peter Gabriel's web site servers were stolen from their hosting provider. All content and hardware gone, and fans left with nothing but an apology page.

Now, I'm a huge Gabriel fan, so this was interesting in part because I feel for the guy and hope nothing of great value was stored on those servers. However, my interest was peaked by the realization that this highlights one of the key values of decoupling software from hardware. To illustrate this advantage, I'd like to paraphrase Heisenberg's famous Uncertainty Principle:

In shared resource computing, you can locate the server, but you cannot firmly define what is running on the server (over time); conversely, you can define the software image, but it is difficult to firmly locate which server it is running on (over time).
Thus, if someone comes into a data center that is sharing server resources in a utility computing like model and steals a server, they will very likely get no data whatsoever. Conversely, if they want the data, they have to steal all of the storage associated with the server image, which in many environments is spread amongst several physical drives; is dependent on the network infrastructure in which it is running; and is useless without both a compatible server to execute it, and a compatible management system to deliver it to that server.

To me, this greatly enhances system security over dedicated server models. If Gabriel's stuff had been PXE booted on random servers around the hosting center, from distributed storage systems, he may have foiled his thief's plans. He certainly would have made it much more technically difficult for them.

The more I learn about decoupling software from hardware, whether through server virtualization or policy-based dynamic deployment, the more I think its a no-brainer for most computing applications. Plus, it makes SLAuto possible--which has its own benefits, of course.

Saturday, May 03, 2008

Thinking about SLAuto in a frenzied cloud

I've been quite silent for a week or two, mostly because of my responsibilities as a sales engineer; doing my part in closing key deals for my employer. I've spent this time sitting in meetings, installing and configuring software, and measuring power savings in large dev/test lab installations. (By large I mean hundreds approaching thousands of servers.) All in all, its been a successful couple of weeks, but its kept me from keeping too close an eye on the big news coming out of the cloud and utility computing markets.

However, as I thought about this more, I realized that I have drifted significantly from my core subject, Service Level Automation (or SLAuto), in the last six months or so--mostly due to the incredible burst of cloud computing innovation to be announced and/or delivered in that time frame. I still believe that there are two key components to an open cloud market that scales:

  • Portable platforms that allow customers to change vendors on a whim
  • Automation that takes action to acquire, release or replace services based on pre-determined service targets

The latter, simply said, is SLAuto.

Of course, what is happening is sort of the nascent birth of cloud computing technologies, where the DNA hasn't had a chance to recombine to build long term survivability into any given "species" yet. We all knew that AWS was doing cool things, but who knew that they would cross the chasm in terms of customer demand as completely as they did? Yet, there is no portability story for Amazon (at least not off of Amazon); and the market forming for SLAuto (see RightScale and others) is tightly tied to the Amazon platform.

The rest of the "big" announcements are worse: Microsoft has no concept of management in Live Mesh (other than synchronization) that I can see, and Google and Yahoo are both building platforms with developers in mind, where service levels are a business agreement, not a platform differentiator. I understand we are taking baby steps here, but I wonder how long it is before corporate IT realizes that they are both a) locked in (at least in an economic sense), and b) paying too much to operate software that doesn't even run in their data center.

Now, I say all of this, but truth be told, most corporate IT shops don't do SLAuto today. So, why should this change in the cloud? I hinted at it earlier: scale. Not scale of functional execution or data access, as we usually think of the term, but scale of market--the speed at which companies will need to respond to the ever evolving marketplace for cloud services and platforms. As self-professed "open" nature of Google and Yahoo's platforms become more of a reality, combined with true innovation in "industry" standard APIs (for capacity management, code platforms and feature integration), there is little doubt that pressure will be on the IT shop to optimize the cost of delivering business services to the rest of the company. Again, I argue that this cannot be done without SLAuto. Prove me wrong.

I am really concerned that SLAuto is still considered "bleeding edge" in most IT shops. Its not rocket science, and the future of IT cost management almost certainly has to be built around it. On the other hand, perhaps as some of these customers I worked with the last couple of weeks serve as references to the value of SLAuto--at least in terms of energy costs--more of them will understand its urgency.

Wednesday, April 23, 2008

Moshing on the Mesh

Ray Ozzie is a rock star, but his band's latest album probably seems a little inaccessible at first. At least, that's the way I read the initial response to Microsoft's announcements at Web2.0 this week. Ray and the Mister Softy Band have released to "airplay"--at least a little--the mysterious cloud strategy that many of us have been anxiously awaiting for some time now. While arriving at the show fashionably late, the Mister Softy Band is laying a groove that will address the consumer market in ways that strongly challenge Google, be interesting to business, and demonstrate even more clearly how AWS is more of a hosting platform than anything.

Details of the announcement are everywhere, but here are the highlights for me:

  • The core of the concept is a virtual desktop hosted in Microsoft's data centers, to which you connect any compatible device (PC, mobile, etc., but Windows only for now).
  • Within that desktop, folders can be created which allow you to store whatever you want to share (documents, photos, videos, music, etc.) among your devices.
  • Folders can even be shared with other friends or family members using a social network built into the mesh.
  • The mesh uses a two way RSS/ATOM mechanism (FeedSync) to sync not only files, but also applications between devices
That last item is key, because while this may look at the start as nothing more than a grandiose social network with storage, its actually much more than that. Ray's vision is to provide a platform for developers that can leverage the syncing capability, along with some other framework components, to build applications that truly live within and through the mesh.

This is ambitious as hell, and I have to give "the band" credit for their vision. While tried and true MS "lock-'em-in...lock-'em-all-in" hardcore, it is a completely different sound than what Google, Amazon and even Intuit have released. Its a place to live in the cloud, rather than simply a stopping point. And, while the open source community is rightfully skeptical, there are hundreds of thousands of Microsoft loyal developers out there who will make this thing work for them. That, in turn, creates a market that the rest of the cloud would do well to keep an eye on.

So, now I see the following experiments in the nascent cloud market:
  • Amazon: Pure Capacity-On-Demand with scalable components available ala carte
  • Mosso: Pure Capacity-On-Demand in a hosted model with flat rate for normal usage
  • Google: Platform-as-a-Service targeted at Internet facing web applications and optimizing developer experience for highly scalable web application development and deployment
  • Intuit: Platform-as-a-Service targeted at Internet facing financial applications using their QuickBooks platform
  • Microsoft: Virtual Desktop and Platform-as-a-Service targeted at providing a complete online compute environment from a end user point of view
Update: Bob Warfield at SmoothSpan has a post that is making me rethink some of my enthusiasm for the mesh.

Wednesday, April 09, 2008

What Google App Engine is NOT

Simon Wardley wrote a post discussing the Google App Engine announcement as a "first step" for them in the "the web as an operating system space". Simon is right, but as I commented on the post:

As I just noted on my blog, perhaps it is critical to look at this from the perspective of web businesses, rather than from enterprise IT's perspective. From the former angle, this is disruptive and revolutionary; from the latter, its a no-op at this point, except perhaps for externally facing web apps.
Simon then wrote an interesting post in response, describing the opportunity that Google has created by open sourcing the App Engine SDK. His core premises can be summed up in the following quote:
Now, whilst Google hasn't provided their environment as open sourced, it has provided an open sourced SDK that "emulates all of the App Engine services on your local computer". This appears, though I'm not a python expert, to contain all the primitives and information needed to build a compatible environment to GoogleAppEngine. This allows for companies, vendors and ISPs to create competing but compatible systems. It's almost as if Google has offered a blueprint for a web operating environment and asked the rest of the community to come compete with them.
And here I have to say, "Well, true, as far as web application hosting goes. But we all know the enterprise is WAY more than that." I think if a commercial product came out that allowed anyone to build a high-scale web environment, with data storage, development tools and operations interfaces within their own infrastructure, that would be very cool. But, as someone who really understands the utility computing space, I want everyone to be clear that this wouldn't help scalability or optimizing resource usage in the following key IT areas:
  1. Portal Services - Yes, an archaic concept to some, but still a critical strategy for delivering work functionality and key information to most knowledge workers. Note that Google does not provide portal support, nor support ANY standard portal interfaces, though you may be able to hack that in Python.

  2. SOA architectures - While it is theoretically possible to build a REST service in App Engine, there is no mechanism to host any other form of services. Yes, you could theoretically leverage services external to the Python app, but this would probably require services and GUI to be located in the same network, to avoid latency issues. Not to mention the fact that there is nothing resembling a messaging infrastructure, or Enterprise Service Bus.

  3. Business Process Automation - This is one of key tactics for gaining business agility, in my opinion, and while I wouldn't doubt someone will write an app to do BPA/I in App Engine, it will be expensive from a resource usage perspective (lots of in/out traffic, storage for quiesced processes and so on).

  4. EAI - Enterprise integration is still the most customized element of IT today, and, as noted in the last two points, there is nothing provided by Google at this point to help with data or application level integration; no data transformation (ala Informatica), no messaging engine, no business process automation, etc., etc., etc.

  5. HPC - Yes, Google is amazingly scalable, but they went out of their way to insist that App Engine is not a grid. It is not designed to--nor do you have the quota to allow you to--send arbitrary compute intensive jobs to the engine for processing.

  6. Server and desktop virtualization - No one does desktop in the cloud today, as far as I know, but Google doesn't even provide virtual servers--useful for hosting and maintenance of legacy applications, if nothing else. I suppose you could run out and convert your productivity apps to Google Apps, your email to GMail, etc., but what about print services?
Not to mention the fact that Google provides no service level guarantees (though I think they will probably do something here when they go GA), no premium support, no integration services, no live customer support (that I know of); in other words, there is a distinct lack of a "throat to choke" here.

Thus, I think most enterprises need to look at Amazon and Google services as just that--services that can be leveraged within their own architectures when it makes sense, rather than wonder-tools that can replace their entire IT infrastructure expenditure. Again, there is probably more bang for the buck today in converting that existing infrastructure into a utility, unless your data center hosts only web-facing applications...but then there is the expense of rewriting them entirely in Python, which may cancel out a tremendous amount of the cost benefits of using App Engine.

So, Simon, I share your excitement about the future of scalable web applications, but my point remains--this is largely a no-op for most enterprise IT organizations.

Friday, April 04, 2008

John Willis Honors Me with Inaugural Cloud Cafe Podcast

I am the inaugural guest in John Willis's Cloud Cafe podcast series. I couldn't be more honored.

Those of you who have been following this whole "what is Cloud Computing" debate may have had the opportunity to see the conversations between several bloggers regarding how to define cloud computing and related technologies. John Willis, of the John Willis ESM Blog, is making a key contribution by taking on the challenge of classifying vendors in this space. As I had some issues with his classification of Cassatt, he thought the best way to resolve that was to invite me to launch his new series.

Two things were resolved in this podcast.

First, I learned first hand what a classy guy John is. He handled the interview very well, let me talk my butt off (a talent I got from my minister mother, I think) and had several observations over the course of the conversation that showed his tremendous experience in the enterprise systems management space. I feel quite sheepish that I ever hinted that he wasn't being forthright with his audience. Lesson gratefully learned; apology gladly offered.

Second, John and I were always much closer in our visions of cloud computing, utility computing and enterprise systems than it might have appeared at first. Our conversation raged from the aforementioned "what is cloud computing" question, to topics such as:

  • the relationship between cloud and utility computing,
  • the cultural challenge facing enterprises seeking the economic returns of these technologies,
  • how cloud and utility computing revolutionize performance and capacity planning, and
  • where Hadoop and CloudDB fit into all of this.
In the end, I think John and I agreed that cloud computing is more than just virtualization on the Internet. I very much enjoyed the conversation, and I hope you will take the time to listen to this podcast.

Got questions or comments? Post them here or on John's blog; I will check both.

Finally, I will be working to get Cassatt's entry in John's classifications updated as a result of the discussion.

Friday, March 28, 2008

MapReduce reaches adolescence

I have to admit I find myself growing more impressed with the MapReduce (and related algorithm) community every day. I spend the better part of an hour watching Stu Hood of Rackspace/Mailtrust discussing MapReduce, Mailtrust's use of it for daily log processing, and comparing it to SQL. I'm a MapReduce newbie, so I was happy to find Stu's overview clear, careful and at a level I could grasp.

His overview of Hadoop (an open source implementation of a MapReduce framework) was equally enlightening, and I learned that Hadoop is more than the framework, but it includes a distributed file system as well. This is where I think SLAuto starts to become important, as it will be critical not only to monitor which systems in a Hadoop cluster are alive at any time (thus providing access to their storage), but also to correct failures by remounting disks on additional nodes, provisioning new nodes to meet increased data loads, etc. Granted, I know just enough to be dangerous here, but I would bet that I could sell the value of SLAuto in a MapReduce environment.

Another interesting overview of the MapReduce space comes from Greg Linden. (Damn, now I've mentioned Greg twice in a row...my groupie tendencies are really showing these days! -) Greg points us to notes taken at the Hadoop Summit by James Hamilton, an architect on the Windows Live Platform Services team. I haven't read through them all yet, but I like the breakdown of many of the big projects getting a lot of coverage among techies these days: Yahoo's PIG and HBase, as well as Microsoft's DRYAD. Missing is CouchDB, but I plan to watch Jan Lehnardt's talks [1][2] on that as soon as I get a moment.

Again, the reason MapReduce is being covered in a blog about Service Level Automation and utility computing is that as soon as I see "tens of thousands of nodes", I also see "no way human beings can meet the SLAs without automation". At least not without significant costs compared to automating. System provisioning, monitoring, autonomic scaling and fail-resistance are not built in to Hadoop, they are simply easy to support. Something else is needed to provide SLAuto support at the infrastructure layers.

Tuesday, March 25, 2008

Greg Linden on the Cloud

Greg Linden, of Geeking with Greg fame, was interviewed on Mix about his work in search personalization, recommendation engines and cloud computing. Most of the interview is only sort of interesting, but what really perked my ears up was Greg's observation that anyone scaling a software environment to thousands or tens of thousands of servers will likely continue to run their own data centers, if only because they will want to tweak the hardware to meet their specific needs.

Initially, I thought of this as just another example of a class of data center that will not be quickly (if ever) moved to a third party capacity vendor. Based on examples like Kevin Burton's fine tuning of Spinn3r's infrastructure using Solid State Drives (SSD) instead of RAID and traditional disks, it even seems like there would be many such applications. Ta da! It is proven that there will always be private data centers!

Yet, the more I think about it, I wonder if I wouldn't pay Google's staff to run my Map/Reduce infrastructure, even if it used tens of thousands of servers. I mean, where is the economic boundary between when it is cheaper to purchase your computing from clouds that already have your needed expertise versus hiring staff with specialized skills to meet those same needs?

Alternatively, is this kind of thing a business opportunity for a "boutique" cloud vendor? "Come to Bob's MapReduce Heaven. We'll keep your Hadoop systems running for $99.95, or my name isn't Bob Smith!"

I'll just leave it at that. I'm tired tonight, and coherence has left the building.

Sunday, March 23, 2008

An amazing resource for scalable systems architectures

I don't know why I hadn't heard of these guys before, but I'm in love with the content at highscalability.com. In post after post, feature after feature, there is more to learn here about everything from architecting software to optimize Amazon Web Services costs, to possibly the greatest collection of articles on real-life scalable architectures ever assembled. I have a feeling I will lose a few hours of sleep in the next few nights trying to read everything I can here.

I noted the inevitability of architecting specifically for utility (or cloud) computing some months ago.

Saturday, March 22, 2008

Eric Schmidt: Please believe me...

ZDNet Asia covered comments from Eric Schmidt of Google regarding the trust issues that enterprises must address before adopting cloud computing. He made these comments during a recent visit to Sydney, Australia. I find the comments interesting, because it signals for me the first public acknowledgment of the challenges that Google faces in selling the enterprise on the cloud vs. in-house applications.

Of course, he couches it in terms of how to choose Google Apps over Microsoft Office, but heart of the issue--trust--applies to just about any choice between traditional "I own it all" IT, and "renting" from the cloud--including compute capacity. (By the way, is anyone still claiming that Google Apps does not compete with Microsoft Office?)

As Eric notes for the Apps/Office debate:

"At some point in your firm, someone is going to say: 'Well maybe there is an alternative in the enterprise', and they're going to do an evaluation. And they're going to say the cloud computing model has its strengths and weaknesses."
This seems consistent for all cloud computing choices: in each case, the IT organization (or even the business) will need to evaluate the costs/benefits of moving data and functionality to the cloud versus maintaining traditional desktop/server systems. Up to now, I agree with Eric, but then he goes on to say:
"What assurances [do you have] that the information you have in your computer is safe--that it is properly stored and so forth? So it's important to understand that you really are making trade offs of one versus the other."
Assuming I am understanding this right, Eric seems to be saying, "Hey, your data isn't really all that secure on your PC, so why don't you just trust us that we will do better?" Ah, there is the rub.

I believe most enterprises would answer,
"Well, if data is misappropriated on my in-house systems, I can hunt down and fire those responsible, and the original copy of the data is still in my control. If Google (or someone who compromises Google) misappropriates my data in the cloud, I can go after the guilty parties, but if I no longer trust Google, I now have a legal battle on my hands to get my data back and get Google to completely delete it from their systems."
This partially gets to data portability, which some are trying to address, but it is not a solved problem yet. However, even with portability, its the "completely delete it from their systems" part that I may never trust without clear and explicit legal consequences and vendor auditing. Until I have full control over where my data resides (at least in terms of vendors) and when and where I can move it and how it gets removed from storage that I no longer wish to utilize, I am putting a lot at risk by moving data outside of my firewalls.

At its heart, I think Eric's statement gets at the core of what Google has ahead of them in terms of delivering Apps to large, established enterprises. I don't doubt that Google will both develop and acquire technology that overcomes many of the security concerns that large enterprises have, but I continue to believe that we will see a major legal case in the next 5 years where a large corporation has to fight in court to get their data from a SaaS/cloud computing provider.

If it were me, I'd look to get cloud-like economics from my existing infrastructure. This is done by utilizing software architectures that are multitennant capable (SOA is a good place to start), and by implementing utility computing type infrastructure in your own data center. No matter how nicely Eric asks, be careful of what you are getting into if you put your sensitive data in the cloud.

Wednesday, February 27, 2008

Enterprise Architecture, Business Continuity and Integrating the Cloud

(Update: Throughout the original version of this post, I had misspelled Mr. Vambenepe's name. This is now corrected.)

William Vambenepe, a product architect at Oracle focusing on enterprise management of applications and middleware, pointed me to a blog by David Linthicum on IntelligentEnterprise that makes the case for why enterprise architects must plan for SaaS. In a very high level, but well reasoned post, Linthicum highlights why SaaS systems should be considered a part of enterprise architectures, not tangential to them.

As Vambenepe points out, perhaps the most interesting observation from Linthicum is the following:

Third, get in the mindset of SaaS-delivered systems being enterprise applications, knowing they have to be managed as such. In many instances, enterprise architects are in a state of denial when it comes to SaaS, despite the fact that these SaaS-delivered systems are becoming mission-critical. If you don't believe that, just see what happens if Salesforce.com has an outage.

I don't want to simply repeat Vambenepe's excellent analysis, and I absolutely agree with him. So let me just add something about SLAuto.

Take a look at Vambenepe's immediate response:
I very much agree with this view and the resulting requirements for us vendors of IT management tools.
Now add the comments from Microsoft's Gabriel Morgan that I discussed a couple of weeks ago.
Take for example Microsoft Word. Product Features such as Import/Export, Mail Merge, Rich Editing, HTML support, Charts and Graphs and Templates are the types of features that Customer 1.0 values most in a product. SaaS Products are much different because Customer 2.0 demands it. Not only must a product include traditional product features, it must also include operational features such as Configure Service, Manage Service SLA, Manage Add-On Features, Monitor Service Usage Statistics, Self-Service Incident Resolution as well.
Gabriel's point boiled down to the following equation:
Service Offering = (Product Features) + (Operational Features)
which I find to be entirely in agreement with Linthicum and Vambenepe.

As I am wont to do, let me push "Operational Features" as far as I think they can go.

In the end, what customers want from any service--software, infrastructure or otherwise--is control over the balance of quality, cost and time-to-market. Quality is measured through specific metrics, typically called service level metrics. Service level agreements (SLAs) are commitments to maintain service level metrics within commonly agreed boundaries and rules. In the end, all of these "operational features" are about allowing the end user to either

  1. define the service level metrics and/or their boundaries (e.g. define the SLA), or
  2. define how the system should respond if a metric fails to meet the SLA.

Item "2" is SLAuto.

I would argue that what you don't want is a closed loop SLAuto offering from any of your vendors. In fact, I propose right here, right now, that a standard (and, I am sure Simon Wardley would argue, open source) protocol or set of protocols for the following:
  1. Defining service level metrics (probably already exists?)
  2. Defining SLA bounds and rules (may also exist?)
  3. Defining alerts or complex events that indicate that an SLA was violated

Vendors could then use these protocols to build Operational Features that support a distributed SLAuto fabric, where the ultimate control over what to do in severe SLA violations can be controlled and managed outside of any individual provider's infrastructure, preferably at a site of the customer's choosing. This "customer advocate" SLAuto system would then coordinate with all of the customer's other business systems' individual SLAuto to become the automated enforcer of business continuity. In the end, that is the most fundamental role of IT, whether it is distributed or centralized, in any modern, information driven business.

"Nice, James," you say. "Very pretty 'pie-in-the-sky' stuff, but none of it exists today. So what are we supposed to do now?"

Implement SLAuto internally in your own data centers with your existing systems, that's what. Integrate SLAuto for SaaS as you understand the Operational Feature APIs from your vendors, and those vendors, your SLAuto vendor and/or your systems talent can develop interfaces into your own SLAuto infrastructure.

Evolve towards nirvana, don't try to reach it by taking tabs of vendor acid.

If you want more advice on how to do all of this, drop me a line (james dot urquhart at cassatt dot com) or comment below.

Monday, February 25, 2008

HPC in the Cloud

Check out Blue Collar Computing. High Performance Computing is one area that should really benefit from utility computing models. Imagine gaining access to the worlds most powerful computers (with reasonable assistance from experts on programming and deploying on those systems) at a price made reasonable by paying only your "share" of resource usage costs.

Cool to see someone try this business model out for real.

Wednesday, February 20, 2008

Data Goes SLAuto at Oracle

Thanks to Steve Jones, check out this presentation from David Chappell, Oracle VP and CTO of SOA, titled "Next-Generation Grid Enabled SOA". (A shorter written article can be found in at SOA Magazine's site.) Chappell outlines the work that Oracle is doing at turning the traditional model of application scalability on its head; instead of a fixed amount of database resources and scaling the applications/services horizontally, scale the database (using a cool complex adaptive systems approach) and alleviate much of the need to scale apps and services (except for CPU bound services). For someone like me, that's mind blowing.

Add to that the fact that the data management functions are relatively homogenous (though the infrastructure may not be), and aware of its resource utilization, and you can see why they are claiming a certain amount of hardware-metric based SLAuto.

(Hardware metric based SLAuto is based in measurements of hardware components, such as CPU utilization, memory utilization and so on. Software-based SLAuto usually uses business metrics such as transaction rates, active accounts, etc. to make scaling decisions.)

The catch? Well, everything must be written to use the "Data Grid" if its to take advantage of these capabilities. Legacy applications need not apply. (Could be the deal killer for David's "Not your MOM's Bus" concept.)

It seems to me that if Oracle wants this approach to catch on, it should open source a reference implementation as soon as possible. I'm not an expert at the most recent data processing approaches, but it would seem to me that Map-Reduce approaches would be complimentary to the Data Grid. However, Hadoop implementations would generally only be integrated with a data grid if there was an open source alternative. Otherwise, MySQL will continue to be the first choice. Open Source would also speed up integration between the data grid and infrastructure automation such as Cassatt and its competitors.

Dave hints at a URL for more info on the Oracle site, but I can't find it. If anyone tracks it down, I would appreciate any help I can get.

Thursday, February 14, 2008

Latency: Obstacle to cloud computing, or opportunity?

I was challenged in the comments to my Cloud Computing Heats Up post regarding my criticism of pupwhines' post that in turn criticized cloud computing. The anonymous author of the comment thought I was too hard on pupwhines, and wanted to know what my response specifically was to the challenge latency presents to distributed computing. I responded there, but I want to expand a little bit on the topic, as it is indeed important to understand, and backs my contention that there will need to be some software architectural changes made to leverage the cloud system.

(Quick note: I've alluded to this before, but I strongly believe there is no one cloud, but a bunch of siloed clouds today with *some* limited integration between them. More of a frontal system, really.)

Latency is an issue in most IT application environments today. There is no question that "traditional" tiered application design scales well at the processing layer, but has real issues at the data layer. There is simply no easy way to manage a traditional relational database architecture over a widely distributed environment. Pupwhines' contention that joining a table between two SaaS vendor implementations would be disaster is right on. In modern technology terms, it would be insane.

However, this is the disruptive aspect of cloud computing: the architectures you know and love are no longer necessarily best practices in a world where your functionality and capacity is:

  1. not necessarily your own,
  2. not necessarily integrated, and
  3. splayed out across this 5.1×108 km2 rock we live on
There are new technical advances being made today in the companies that already rely on cloud principles (think Google, Amazon, Microsoft, etc.). These advances will change the way you design and deploy software, but they will enable a world where proximity of data means less and less.

In fact, you probably already leverage one of these technical changes: increased bandwidth. Indulge me in some autobiographical narrative to illustrate.

Back in the late '90s, while I was a Senior Principal Consultant with Forte Software, Inc, the legendary(?) distributed application development platform vendor, one of my key roles was advising clients on how to best architect for high performance, high scalability and high availability. Forte was an early service oriented architecture, but it ran on the 10Mb/100Mb networks of the time. Thus, the rule for message passing between components (UI<->service or service<->service) was (in order of priority):
  1. Send as few messages over the network as possible
  2. Send the smallest messages possible

Thus, it was better to send large messages once than many small messages, but you wanted to optimize each message as much as possible.

To this end, best practices was to create data services and to actually deploy these services directly on the database server hardware. It was more important to process the relational mapping of data into the object mapping according to need in a timely fashion--thus avoiding unecessary network traffic--than to divide processing responsibility so that there was no custom application components running on the RDBMS hardware.

Fast forward to the 1G/10G networks of today. From what I am seeing, it is actually considered bad practice to do what I described above. While at Sun, I actually got admonished by a (very competent) manager for suggesting the way around Sun Access Managers horrible performance was to deploy the identity server and database on the same box (with our custom login and registration UIs deployed on separate, horizontally scalable servers). Pure architectural heresy. He was right in many ways: doing so would have put the business logic tier into horizontally locked architecture, but that wasn't his point. "We don't deploy our software on our database servers" was the gist of his argument.

So, faster networks have already changed the so-called "laws of physics" that software architects must design around. Given this, it seems easy to postulate that additional advances in network bandwidth will open additional opportunities for architectural change. In fact, it already has; check out Gigaspaces for a cool (though controversial) alternative to horizontally replicated service architectures.

Will bandwidth really grow at a rate that will make a difference to the current IT generation(s)? Many postulate it has to, even if the core network operators resist. As I noted in my response, Cisco's new Nexus 7000 series is a sign of times to come. Does anyone deny that 40G and 100G networks have the opportunity to change the laws of physics? (Disclaimer: I know just enough about