Monday, June 30, 2008

Why cloud computing doesn't get us out of the woods yet...

Jesse Robbins (a modern Renaissance man if ever there was one) quoted from a post by Theo Schlossnagle, author of Scalable Internet Architectures and President and CEO of OmniTI, in which Schlossnagle notes the challenge brought by highly popular sites linking to average traffic sites, and its implications for scalable Internet architectures, including cloud computing.

As he carefully documents (using his blog site and two events triggered by Digg and the New York Times respectfully), the nature of "spikes" in the Internet has changed dramatically. First, he shows a graph of traffic to his blog over a two day period in March of 2008:Then he goes on to point out:
"What isn't entirely obvious in the above graphs? These spikes happen inside 60 seconds. The idea of provisioning more servers (virtual or not) is unrealistic. Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time. This means it is about time to adjust what our systems architecture should support. The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling. At least eight times in the past month, we've experienced from 100% to 1000% sudden increases in traffic across many of our clients."
Stop and pay attention to that. The onset of traffic to near peak levels can take place in less than 60 seconds!

Sure, you can get "unlimited" capacity from an Amazon/Mosso/GoGrid/whatever on demand, but can you provision that extra capacity fast enough to meet demand? Clearly automation is not enough to guarantee that you will never lose a user. If this is critical to you, then running at a reduced utilization is probably the only really good answer. (Another possibility is implementing "warm" systems that primarily do another task, but that can be enslaved into a high traffic situation with little or no manual intervention--and don't require a reboot.)

I'm not sure I have a great answer for what to do here, but I think anyone buys into "capacity on demand" should know that this capacity takes time to allocate, and that demand may outstrip supply for seconds or minutes. Nothing about the cloud can avoid the trade off between utilization and "reactability".

Update: The folks at Project Caroline ran some simple tests, and feel they would be ready for this scenario. Ron Mann spells it all out for you, and it is an interesting read.

Tuesday, June 24, 2008

"Follow the Law" Meme Hits the Big Time

A few days ago, I checked in to my w3counter dashboard to see who was linking to my blog, and I discovered an very intelligent continuation of the "Follow the Law Computing" meme written by Greg Ness (also found on his blog). Greg's addition of the "spice trails" analogy was something new to me, and raised some interesting thoughts about what the historical significance of the cloud will be to world wide wealth distribution. There certainly has been a limited but significant wealth effect created by the Internet itself, but will the ability to physically move data and/or compute loads accelerate these trends?

Noting that I should blog about this on the plane at some point during my trip to Austin this week, I dutifully bookmarked the article for later. I had no chance to look at traffic on Monday, so it was with great shock that when I got on line this morning I saw a hockey stick graph. I investigated, and then my heart skipped a beat.

As of now, today, quotes from my "Follow the Law" post make up Nick Carr's latest post. Nick weaves together the work of Bill Thompson (which I also reference), myself and Greg to provide a clear, concise discussion of the concept of what he calls "itinerant computing". (Damn, he's good at coining these terms, isn't he?)

Ever since I discovered Nick's blog early in my career at Cassatt, I've wanted to get his attention. The Big Switch was an eye opening read--if only it served as a good counterpoint to Bill Coleman's optimistic vision. He made me look at utility computing and cloud computing with a more critical eye, and I wanted to add to his body of knowledge. I am honored to have done so in a small way.

Surprisingly, though, that wasn't whole the hockey stick trigger. Greg's post was picked up by a site called Seeking Alpha, a site I must admit I had never heard of before. Apparently a high traffic investment site (connected to Jim Cramer?), Seeking Alpha drove a record traffic load to my humble blog through a rebroadcast of Greg's post. Rereading that, I noticed that there is a very strong business message there that may in fact be actual historical significance of "itinerant computing": the flow of data and computing is simply an enabler of new business models and competitive advantages that change the face of global wealth. Being a resident of what is essentially a suburb of the Silicon Valley, I can't help but think there is more downside than upside to that story.

Finally, as I looked at the other referrers to this blog, I found an excellent summary of all of the "Follow" computing options: Follow the Sun, Follow the Moon and Follow the Law. Kevin Kelly gives very good basic definitions of each concept, and then makes the following observation:
"Most likely different industries adopt a different scenario. Maybe financial follows the moon, while commerce follows the sun, and entertainment follows the law. A single computing environment (One Machine) should not suggest homogeneity. A meadow is not homogeneous, but its does act as a coherent ecological system.

Another way to dissect the daily rhythm of the One Machine is to trace the three distinct waves of energy, data, and computation as they flow through the planetary "cloud." Each probably has its own pathways."

Amen, brother. I'll go even further. Maybe the customer server systems of a financial company follows the sun, the analytics systems follow the moon, and the trading systems follow the law. I do not mean to suggest at all that every distributed compute task will benefit from follow the law concepts. In fact, I would suggest that there are other "Follow" options that will be created over the coming decades.

All of this leads to the question of software fluidity...

Sunday, June 22, 2008

"Follow the Law Computing" on Google Groups: Cloud Computing

Not long after my post outlining my theory of an unexplored economic concern for moving compute loads in a cloud computing environment, a discussion popped up on the Google Groups Cloud Computing group. In this thread, which started out covering BI issues in the cloud, the question of moving data to computing versus moving computing to the data came up. It is a priceless thread, and one that showed me that I have not been the only one thinking about the technology of migrating workloads in the cloud.

The first message that popped out at me was one by Chuck Wegrzyn, apparently of Twisted Storage:
"How does the "cloud" protect data going from the owner to the computing service without being compromised (read that as sniffed)? Will a computing service in country A have the right to impose restrictions on data from another country (even if the results of the computing don't affect the citizens of country A)? An so on. "
He goes on to say, in a separate message:
"While I think trans-national data movement will be an area that requires governance of some kind I think that companies can get around the problem in other ways. I think it just requires looking at the problem in a different way.

I'd think the approach is to keep the data still and move the computing to it. The idea is to see the thousands of machines it takes to hold the petabytes worth of data as the compute cloud. What needs to move to it is the programs that can process the data. I've been working on this approach for the last 3 years (Twisted Storage). "
Bingo! This is what I think is going to start happening as well. Move compute loads to where the legal and regulatory environment is most favorable, and leave the (highly contentious) data where it is.

Khaz Sapenov even has a name for this pattern:
"This is valid approach, that I personally called "Plumber Pattern", when application, encapsulated in some kind of container (e.g. virtual machine image) is marshalled to secure data islands to iteratively do its unique work (say, do a matches on some criterium in Interpol, FBI, CIA, MI5 and other databases, all distributed across continents). Due to utterly confidential nature of these types of data, it is impossible to move them to public storage (at least this time). Above-mentioned case might be
extrapolated to some lines of business as well with reduced privacy/security requirements. "
I have no idea where the term "plumber" comes into this, but it somehow seems to work. More importantly, Khaz gives an excellent use case for a compute problem where the data cannot move for legal and national security reasons, but an authorized (or unauthorized--gulp) software stack could move from data center to data center to compute an aggregate report.

Marc Evans even points out that we already have some open source compute algorithms that can serve as a starting point to address these problems:
"In my experiences(sic), there are cases where having the data / computation as close to the customer edge as possible is what is required for an acceptable user experience. In other cases, the relationship of the user / data / computation is not important. Most often, there is a mix of both. One of the ideas behind Hadoop as I understand it is to bring the computation to the data location, while also providing for the data to be in several locations. The scheduler is critical to making good use of
data locality. So yes, I believe that what you are looking for does exist within Hadoop at a minimum, though I also believe that there is alot of room to evolve the techniques that it uses. "
Jim Peters then asks a simple, but loaded question:
"Even if the cloud providers come up with excellent answers to the security and reliability questions, who's going to trust them? Credit card numbers are one thing, but cloud data is something else entirely. "
At this point, Ray Nugent adds what I think is the quintessential economic consideration:
"Security is really a business issue. Each layer of security should cost no more than the data is worth. So the concept of "secure enough" becomes important. What security is appropriate for a given type of data and is it more or less secure in the cloud than in the corp DC? Is data inherently "less secure" by virtue of being in the cloud than, say, an employees laptop or flash dongle or "on the wire"? I don't think corporate data centers are a secure as you're suggesting they are..."
"Secure enough" is, I think, where its at. Perhaps a new term is needed: "Avoid the Risk Computing"?

Anyway, the discussion goes on from there, and I suggest you read the thread yourself. This is a key topic for cloud computing, and I think there is a good chance that one or more of the biggest technology companies of the early to mid 21st century will hatched from discussions like these.

(This group, by the way, is absolutely awesome, and each thread is packed with intelligent and insightful messages. If you care about cloud computing, you need to join.)

Tuesday, June 17, 2008

Say what, Mashable?!? Why Microsoft won't be acquiring Amazon

Adam Ostrow, a Mashable blogger, decided to close his eyes and guess what the big acquisitions of the future might be now that Yahoo-Microsoft is dead. He offered two guesses: Google acquiring CBS; and Microsoft acquiring (ahem) Amazon.

His argument for Google/CBS is quite sound. I happen to agree that CBS has built up a hell of an offline advertising delivery business (with solid television, print and outdoor advertising businesses). Google could drive advertising automation in each of these businesses and really solidify itself as a one stop shop for brand development. Oh, and they have pretty neat software too.

However, the Microsoft/Amazon thing is so ill conceived that I had to address it here. Here is the core of Adam's argument:
"If Microsoft were to buy Amazon, they would be in an excellent position to push the cloud computing concept even further ahead. In remaining the dominant desktop OS, Microsoft still in reality has the largest developer community – developers that are also thinking long and hard about how to move their applications off the desktop and into the cloud. There are also the tens of millions of small and medium sized businesses that are currently Microsoft customers that are without a web strategy - Amazon Web Services fits perfectly into serving this segment too.

Finally, let us not forget Google is already moving in this direction too with App Engine. Microsoft needs to make a play in this space – and Amazon is the quickest way to do so (in addition to adding a company that grew revenue from $10.7 to $14.8 billion last year)"

First, let's be clear that the retail business would have to be spun off quickly. Not only (as Adam points out) does Microsoft not need to enter retail (e.g. their failed attempt at a showcase store in San Francisco's Metreon), but Amazon's retail side requires a skill set that really wouldn't interest Balmer in the least. Let's not even get into conflicts of interest in customer support, etc.

I think the bigger issue, though, is that the Amazon team responsible for making the web services business work do not seem at all interested in ditching open source for a single vendor OS solution. If Microsoft were to acquire Amazon for the web services business (especially EC2/S3), I believe they would lose the team that knows how to make that business happen. Frankly, I don't see that Mister Softy has the talent to further that model themselves.

(If they do, they why the hell don't they counter EC2 with a Windows friendly service, instead of targeting the consumer-happy Mesh concept?)

And, finally, how many times must it be said that Google App Engine is not a direct competitor to Amazon EC2/S3? The very fact that you can get a GAE clone to run on Amazon, but not the other way around, should tell you something about the two markets. (Not to mention that observation that MySQL will probably NEVER run on GAE.) GAE is a PaaS play, Amazon EC2/S3 is an IaaS play. Period. I believe that Google and Amazon still have the capacity to be very close partners in the cloud computing space over the coming years. It probably won't happen, but the capacity is there.

If Microsoft wants to capture compute capacity, I'd say Rackspace is a better target. Huge player, with both Linux and Windows friendly services, Mosso for a cloud, and no core business functions that are outside of what Microsoft would specifically be looking to acquire. An outstanding customer base that actually wants to directly interact with its vendor (providing MUCH better upsell opportunities to MSFT), and some serious hosting expertise.

I know it was fun to think about two big consumer names perhaps marrying in the deal of the decade, but c'mon Adam. If Microsoft backed away from the cultural and business incongruities of the Yahoo deal, why would they willingly seek out a larger problem in Amazon?

Update: It does occur to me, in Adam's defense, that if Amazon were to voluntarily spin off the AWS business into a separate company, that the economics would change for Microsoft. How interested Jeff Barr is in doing this remains to be seen, though.

Thursday, June 12, 2008

"Follow the law" computing

A few days ago, Nick Carr worked his usual magic in analyzing Bill Thompson's keen observation that every element of "the cloud" eventually boils down to a physical element in a physical location with real geopolitical and legal influences. This problem was first brought to my attention in a blog post by Leslie Poston noting that the Canadian government has refused to allow public IT projects to use US-based hosting environments for fear of security breaches authorized via the Patriot Act. Nick added another example with the following:
Right before the manuscript of The Big Switch was shipped off to the printer ("manuscript" and "shipped off" are being used metaphorically here), I made one last edit, adding a paragraph about France's decision to ban government ministers from using Blackberrys since the messages sent by the popular devices are routinely stored on servers sitting in data centers in the US and the UK. "The risks of interception are real," a French intelligence official explained at the time.
I hadn't thought too much about the political consequences of the cloud since first reading Nick's book, but these stories triggered a vision that I just can't shake.

Let me explain. First, some setup...

One of the really cool visions that Bill Coleman used to talk about with respect to cloud computing was the concept of "follow the moon"; in other words, moving running applications globally over the course of an earth day to where processing power is cheapest--on the dark side of the planet. The idea was originally about operational costs in general, but these days Cassatt and others focus this vision around electricity costs.

The concept of "moving" servers around the world was greatly enhanced by the live motion technologies offered by all of the major virtualization infrastructure players (e.g. VMotion). With these technologies (as you all probably know by now), moving a server from one piece of hardware to another is as simple as clicking a button. Today, most of that convenience is limited to within a single network, but with upcoming SLAuto federation architectures and standards that inter-LAN motion will be greatly simplified over the coming years.

(It should be noted that "moving" software running on bare metal is possible, but it requires "rebooting" the server image on another physical box.)

The key piece of the puzzle is automation. Whether simple runbook-style automation (automating human-centric processes) or all-out SLAuto, automation allows for optimized decision making across hundreds, thousands or even tens of thousands of virtual machines. Today, most SLAuto is blissfully unaware of runtime cost factors, such as cost of electricity or cost of network bandwidth, but once the elementary SLAuto solutions are firmly established, this is naturally the next frontier to address.

But hold on...

As the articles I noted earlier suggest, early cloud computing users have discovered a hitch in the giddy-up: the borders and politics of the world DO matter when it comes to IT legislation.

If law will in fact have such an influence on cloud computing dynamics, it occurs to me that a new cost factor might outshine simple operations when it comes to choosing where to run systems; namely, legality itself. As businesses seek to optimize business processes to deliver the most competitive advantage at the lowest costs, it is quite likely that they will seek out ways to leverage legal loopholes around the world to get around barriers in any one country.

Now, this is just pie-in-the-sky thinking on my part, and there are 1000 holes here, but I think its worth going through the exercise of thinking this out. The problem is complicated, as there are different laws that apply to data and the processing being one on that data (as well as, in some jurisdictions, the record keeping about both the data and the processing). However, there are technical solutions available today for both data and processing that could allow a company to mix and match the geographies that give them the best legal leverage for the services they wish to offer:
  • Database Sharding/Replication

    Conceptually, the simplest way to keep from violating any one jurisdiction's data storage or privacy laws is to not put the data in the jurisdiction. This would be hard to do, if not for some really cool data base sharding frameworks being released to the community these days.

    Furthermore, replicate the data in multiple jurisdictions, but use the best-case instance of that data for processing happening in a given jurisdiction. In fact, by replicating a single data exchange into multiple jurisdictions at once, it becomes possible to move VMs from place to place without losing (read-only, at least) access to that data.

  • VMotion/LiveMotion

    From a processing perspective, once you solve legally accessing the data from each jurisdiction, you can now move your complete processing state from place to place as processing requires, without losing a beat. In fact, with networks getting as fast as they are, transfer times at the heart of the Internet may be almost as fast as on a LAN, and those times are usually measured in the low hundreds of milliseconds.

    So, run your registration process in the USA, your banking steps in Switzerland, and your gambling algorithms in the Bahamas. Or, market your child-focused alternative reality game in the US, but collect personal information exclusively on servers in Madagascar. It may still be technically illegal from a US perspective, but who do they prosecute?

Again, I know there are a million roadblocks here, but I also know both the corporate world and underworld have proven themselves determined and ingenious technologists when it comes to these kinds of problems.

As Leslie noted, our legislators must understand the economic impact of a law meant for a physical world on an online reality. As Nick noted, we seem to be treading into that mythical territory marked on maps with the words "Here Be Dragons", and the dragons are stirring.

Tuesday, June 10, 2008

Eucalyptus and You

Last Friday night I came across a post by Sam Dean of OStatic, titled "Eucalyptus: Unsung Open Source Infrastructure for Cloud Computing", and my jaw fell to the floor. Here it was, the project I wondered why no one was building; a project focused on replicating Amazon APIs in an open source cluster environment. The more I read Sam's post, the more I thought "Man, is this project in the right place at the right time."

I immediately Twittered the link, and was retweeted by no less than Don MacAskill and Dion Hinchcliffe in a matter of minutes. A few hours later, Simon posted his excitement, and then this morning I came across an analysis by Todd Hoff of highscalability.com that I think sums up what we know today quite nicely. Todd heard about this through the Cloud Computing group on Google Groups, and that thread was kicked off by Khazret Sapenov, himself a very prolific cloud thinker.

This is big stuff, despite the skepticism of some cloud fanatics who can't grep why "private clouds" (I am beginning to like that term) are legitimate. I most certainly don't fall into that particular camp, having real experience working with customers who realize that they have to start with an in-house cloud to satisfy corporate and legal mandates. Ideally, though, this infrastructure would allow them to migrate all or portions of their applications out of house when the time and technology are right. If Eucalyptus can pull this off and really provide a killer Amazon clone for private deployments, they may become the core technology for an awful lot of enterprise SLAuto platforms in years to come.

Of course, they are a hell of a long way from achieving that. Todd's post gives a fairly good overview of what Eucalyptus is, but there is still much to do from the technical, functional and marketing standpoints. For example:
  • As the Eucalyptus team notes themselves, its still missing key command line tools.
  • It doesn't appear to be an infrastructure optimization approach, but rather a straight forward clustering approach. Thus, all of your capacity likely must remain running continuously when using the out-of-the-box functionality. I'd like to see them tackle SLAuto when they have the Amazon tools completed.
  • It is thoroughly dependent on the Rock cluster project. Knowing my enterprise IT friends, this won't "go down easy" for any of them.
Interestingly enough, while I was writing this, the Eucalyptus home page was temporarily unavailable. I hope this means that it is overwhelmed with interest. I'd really like to see this community grow substantially, and for the project to evolve very quickly from where it is now.

Simon's observations about portability are really at the heart of my excitement. Realistically, the Eucalyptus team has simply started a journey of 1000 miles with this single step. Congratulations, guys, on setting the pace.

Tuesday, June 03, 2008

VMWare (Finally) Joins the Cloud Computing Race

Update: VMWare has now launched a Virtual Data Center Operating System (VDC-OS), which I believe would rely heavily on the technologies discussed here.

I missed it when it was actually announced, but VMWare's acquisition of B-Hive is very big news, IMHO, for the enterprise cloud/utility computing market. (Thanks to GridToday for the heads up.) I'm not a huge fan of VMWare's automation technology to date (DRS/HA scalability is a joke ), but this acquisition may change that perception over time. If nothing else, it signals VMWare's entrance into the "serious" cloud/utility computing infrastructure market. (Still looking for that term...)

Taking a very Cassatt-like approach to policy-based automation (even using the term Service Level Automation throughout its marketing literature), B-Hive Conductor claims to be an all-virtualization approach to the problem. Agent-less from a software standpoint, the B-Hive "controller" is a virtual appliance that runs in your VMWare infrastructure and manages the creation, destruction and replacement of VMs on a policy-based basis. (Disclaimer: I am a Cassatt alumni.)

The result is that VMWare now has the pieces/parts for a serious "fog computing" offering (the term being a tongue-firmly-in-cheek reference to cloud computing behind the firewall). With features like dependency mapping between servers; monitoring and reporting; and SLAuto, B-Hive is for real and should push VMWare into the production data center in a big way. Granted, they will still take the Microsoft modeled approach of an all proprietary suite, but--hey--people still buy Microsoft for production data centers as well, so there is no accounting for taste.

What is VMWare's key advantage? VirtualCenter. Take the following quotes from Forrester's James Staten in GridToday's feature story:
"Forrester’s Staten agrees that the ability of B-hive's tool to tell the load balancer what to do is “very valuable,” but notes that automation on this level is far beyond what most virtualization users would be able to digest at this point. “What we see in virtual environments is they're going up the maturity curve, where we have a large number of customers who are starting to make the move from tactically trying it out to strategically implementing it,” he explained. “And they’re not yet, even in the strategically implementing it phase, ready to start automating it.”

The key, Staten believes, is for VMware to make its automation tools ready before users get to that point so they can start learning about and trusting the tools. B-hive was experiencing some early success, he said, but “putting this into [VMware] VirtualCenter will make it possible for a lot more customers to start learning about and getting familiar with automation and actually using it.” In addition, he says, the promise of added support and certification, as well as the presence of VMware's large R&D team, will add a level of comfort for prospective users."<
This is an excellent strategy for introducing SLAuto into data centers everywhere. The big problem that Cassatt and others have had in speeding up the sales cycle is that SLAuto touches everything, and as such needs to be sold to damn near everyone in IT before a project can be successful. By allowing the end contributors to dabble with the automation on their own, before committing to a production project, there is a much higher likelihood that (someday) automation will be adopted. It doesn't bring the revenue any earlier, but I would think it would significantly bring down the cost of sales.

So, is VMWare going to pull away from the cloud/utility infrastructure pack? Perhaps, but not today, or even tomorrow. Give it months, in which time some of its biggest rivals in this space will advance their own technologies. If nothing else, I hope this acquisition will light a fire under the butts of those who figured VMWare would be relegated to "just a feature". Or do the same for those enterprises that fear letting go of minute by minute control to an automation platform.