Wednesday, July 16, 2008

Watch out for Cisco, kids!

What is the most important enabler of distributed computing architectures, such as cloud oriented architectures? What is the one thing that has to be in ample supply before the other elements of the data center come into play? Is it the number of servers or CPU power available for computing? Is it the size and speed of the disks and network storage devices? Is it the distributed software architectures themselves?

My answer? None of the above. It's network bandwidth, baby, all the way.

Why? Well, let's break down where the costs of distributed systems lay. We all know that CPU capabilities double roughly every couple of years, and we also know that disk I/O slows those CPUs down, but not at the rate that network I/O typically does. When designing distributed systems, you must first be aware of network latency and control traffic between components to have any chance in heck of meeting rigorous transaction rate demands. The old rule at Forte Software, for what it's worth, was:

  • First reduce the number of messages as much as possible
  • Then reduce the size of those messages as much as possible
Increase adherence to those rules, and your software would outperform less optimized applications every time. It was easy to look like a performance tuning genius in those days.

What is exciting about today's environment, however, is that network technology is changing rapidly. Bandwidth speeds are increasing quickly (though not as fast as CPU speeds), and this high speed bandwidth is becoming more ubiquitous world wide. Inter-data-center speeds are increasingly mind boggling, and WAN optimization apparently has removed much of the fear of moving real-time traffic between geographically disparate environments.

All of this is a huge positive to cloud oriented architectures. When you design for the cloud, you want to focus on a few key things:
  • Software fluidity - The ability of the software to run cleanly in a dynamic infrastructure, where the server, switch port, storage and possibly even the IP address changes day by day or minute by minute.

  • Software optimization - Because using a cloud service costs money, whether billed by the CPU hour, the transaction or the number of servers used, you want to be sure you are getting your money's worth when leveraging the cloud. That means both optimizing the execution profile of your software, and the use of external cloud services by the same software.

  • Scalability - This is well established, but clearly your software must be able to scale to your needs. Ideally, it should scale infinitely, especially in environments with highly unpredictable usage volume (such as the Internet).

Achieving any of these in an environment where your network bandwidth is constricting your options is nearly impossible.

Oh, and one more thing. The network is the first element of your data center that sees load, failure and service level compliance. Think about it--without the eyes of the network, all of your other data center elements become black boxes (though often physically with those annoying beeps and little blinking orange lights). What are the nerves in the data center nervous system? Network cables, I would say.

Today I saw two really good posts about possible network trends driven by the cloud, and how Cisco's new workhorse leverages "virtualized" bandwidth and opens the door to commodity cloud capacity. The first is a post by Douglas Gourlay of Cisco, which simply looks at the trends that got us to where we are today, and further trends that will grease the skids for commodity clouds. I am especially interested in the following observations:
"8) IP Addressing will move to IPv6 or have IPv4 RFCs standardized that allow for a global address device/VM ID within the addressing space and a location/provider sensitive ID that will allow for workload to be moved from one provider to another without changing the client’s host stack or known IP address ‘in flight’. Here’s an example from my friend Dino.

9) This will allow workload portability between Enterprise Clouds and Service Provider Clouds.

10) The SP community will embrace this and start aggressively trying to capture as much footprint as possible so they can fill their data centers to near capacity allowing for them to have the maximum efficiency within their operation. This holds to my rule that ‘The Value of Virtualization is compounded by the number of devices virtualized’.

11) Someone will write a DNS or a DNS Coupled Workload exchange. This will allow the enterprise to effectively automate the bidding of workload allocation against some number or pool of Service Providers who are offering them the compute, storage, and network capacity at a given price. The faster and more seamless the above technologies make the shift of workload from one provider to another the simpler it is in the end for an exchange or market-based system to be the controlling authority for the distribution of workload and thus $$$’s to the provider who is most capable of processing the workload."

The possibility that IP addresses could successfully travel with their software payloads is incredibly powerful to me, and I think would change everything for both "traditional" VM users, as well as the virtual appliance world. The possibility that my host name could travel with my workload, even as it is moved in real time from one vendor to another is, of course, cloud computing nirvana. To see someone who obviously knows something about networking and networking trends spell out this possibility got my attention.

(Those who see a fatal flaw in Doug's vision are welcome to point it out in the comments section below, or on Doug's blog.)

The second post is from Hurwitz analyst, Robin Bloor, who describes in brilliant detail why Cisco's Nexus 7000 series is different, and why it could very well take over the private cloud game. As an architecture, it essentially makes the network OS the policy engine for controlling provisioning and load balancing, though with bandwidth speeds that blow away today's standards (10G today, but room for 40G and 100G standards in the future). Get to those speeds, and all of a sudden something other than network bandwidth is your restricting function in scaling a distributed application.

I have been cautiously excited about the Nexus announcement from the start. Excited because the vision of what Nexus will be is so compelling to me, for all of the reasons I describe above. (John Chambers, CEO of Cisco, communicates that vision in a video that accompanied the Nexus 5000 series launch.) Cautious, because it reeks of old-school enterprise sales mentality, with Cisco hoping to "own" whole corporate IT departments by controlling both how software runs, and what hardware and virtualization can be bought to run it on. Lock-in galore, and something the modern, open source aware corporate world may be a little uneasy about.

That being said, as Robin put it, "In summary: The network is a computer. And if you think that’s just a smart-ass bit of word play: it’s not."

Robin further explains Cisco's vision as follows:

"Cisco’s vision, which can become reality with the Nexus, is of a data center that is no longer defined by computer architecture, but by network architecture. This makes sense on many levels. Let’s list them in the hope of making it easier to understand.

  1. Networks have become so fast that in many instances it is practical to send the the data to the program, or to send the program to the data, or to send both the program and the data somewhere else to execute. Software architecture has been about keeping data and process together to satisfy performance constraints. Well Moore’s Law reduced the performance issue and Metcalfe’s Law opened up the network. All the constraints of software architecture reduced and they continue to reduce. Distributing both software and data becomes easier by the year.
  2. Software is increasingly being delivered as a service that you connect to. And if it cannot deliver the right performance characteristics in the place where it lives, you move it to a place where it can.
  3. Increasingly there is more and more intelligence being placed on the switch or on the wire. Of course Cisco has been adding intelligence to the switch for years. Those Cisco firewalls and VPNs were exactly that. But also, in the last 5 years, agentless sotware (for example some Intrusion Detection products) has become prominent. Such applications simply listen to the network and initiate action if they “don’t like what they hear”. The point is that applications don’t have to live in server blade cabinets. You can put them on switches or you could put them onto server boards that sit in a big switch cabinet. They’re very portable.
  4. The network needs an OS (or NOS). Whether Cisco has the right OS is a point for debate, but the network definitely needs an OS and the OS needs to perform the functions that Cisco’s NX-OS carries out. It also needs to do other things to like optimize and load balance all the resources in a way that corresponds to the service level needs of the important business transactions and processes it supports. Personally, I do not see how that OS can do anything but span the whole network - including the switches."
Would all applications run this way? Probably not. But those mission critical, highly distributed, performance-is-everything apps you provide for your customers, or partners, or employees, or even large data sets, are extremely good candidates for this way of thinking.

Oh, and I wouldn't be surprised if Google, Microsoft, et al. agreed (though not necessarily as Cisco customers).

Does Nexus work? I have no idea. But I am betting that, as private clouds are built, the idea that servers are the center of the universe will be tested greatly, and the incredibly important role of the network will become more and more apparent. And when it does, Cisco may have positioned themselves to take advantage of the fun that follows.

Its just too bad that it is another single-vendor, closed source vendor offering that will take probably 5-7 years (minimum) to replicate in the open source world. At the very least, I hope Cisco is paying attention to Doug's observation that:
"[T]here will be a standardization of the hypervisor ‘interface’ between the VM and the hypervisor. This will allow a VM created on Xen to move to VMWare or Hyper-V and so on."
I hope they are openly seeking to partner with OVF or another virtualization/cloud standard to ensure portability to and from Nexus.

However, I would rather have this technology in a proprietary form than not at all, so way to go Cisco, and I will be watching you closely--via the network, of course.

Monday, July 07, 2008

Cloudware: Standard to Watch, or Another Self-Interested Enterprise Play

Rich Miller of Replicate Technologies, Inc. and Telematique fame wrote a post the other day that explored 3TERA's Cloudware vision with a highly critical eye:

"In September of last year, as I was preparing (mentally and emotionally) to get Replicate started on its current path, I considered issues of portability and interoperability in the virtualized datacenter. I posted a few comments about OVF but one in particular drew the attention of Bert Armijo of 3tera.

At that time, Bert indicated that he thought it "... too early for a standard,...", with a (perfectly arguable) claim that standards are often "... a trade-off to gain interoperability in exchange for stifling innovation." He went on to say that "(w)e haven't adequately explored the possibilities in utility computing." He then provided a critique of OVF. (Whether I agree with that critique or not is immaterial to this post, and the subject for another time.)

At the end of June, 3tera announced their Cloudware vision for a standards-based interoperable utility infrastructure. Since the arrival of Cloudware, there have been a number of venues at which "cloud computing" and interoperability has been on the minds of the cognoscenti... Structure08 and Velocity being the most heavily covered. In the past few weeks, there have also been claims, and counter-claims of support... and to be fair, the disputed claims of support were made by others, not by 3tera.

So... what's changed, Bert? Why is "now the time" to create the standard for interoperable cloud computing? What's happened in 9 - 10 months that has so changed the field, that these efforts don't also stifle innovation?"
Bert responded in the comments:
"Last year what most people meant when they talked about a standard for "cloud computing" was a portable virtual machine format. While that's important, it's not cloud computing. What's changed in the past 10 months is that there are now a number of companies offering workable services that have a vision beyond merely hosting virtual machines."
That would be a wonderful explanation, if it wasn't for the fact that Bert is blatantly using Cloudware to promote 3TERA's AppLogic as the core architecture of the "standard". Here is what Larry Dignan of ZDNet's Between the Lines reported when Bert first hinted about Cloudware:
"Initially, 3Tera’s AppLogic software will play a prominent role in the Cloudware Architecture, but that’s because these efforts initially need at least one vendor championing the effort."
In other words, AppLogic gets a huge head start, defines what the platform should look like, do and not do, etc., and uses Cloudware as a vehicle to thrust itself into the "de facto standard" spot for (at the very least) infrastructure clouds (aka HaaS).

Right there is the crux of the argument for open source standards versus simply open standards. I briefly interviewed for a position with a giant software company to be a representative on various SOA standards bodies. The focus of that team was to promote their engineer's solutions to the rest of the body, and to master the art of negotiating the best position possible for that technology. In other words, if the company invented it, it was this team's job to turn it into a standard, or at least make sure the adopted standard would support their technology or protocols. The traditional standards game is one of diplomacy, negotiation and gamesmanship largely because it is an environment where vendors are pitting their self-interests against each other.

For 3TERA to base Cloudware on AppLogic's existing architecture and functionality is purely self-interest on 3TERA's part. If they had wanted to promote openness equally among potential vendors, they would open source AppLogic outright, and switch to a solid open source business model. Alternatively, they would throw significant resources and IP into an existing open source project. To "open" their own architecture (and therefore forcing others to conform to it), but not sharing the implementation, is simply driving competitive advantage for themselves.

This is why I think you saw such quick refutement of supposed support for Cloudware when the erroneous Forbes article was published about the effort. (Again, this was not 3TERA's fault, and Bert should not be blamed for this error.) The other cloud management and provider platform companies are rightfully eyeing this with some skepticism, many saying outright that they have faith that the standard will appear through market forces.

I almost hate to write this post, because it inevitably reflects badly on 3TERA, and I am actually a huge admirer of their product marketing. Bert set the stage for "private clouds"--though he didn't use that term--even when my employer at the time had a perfectly viable solution, but was struggling to find the right message for the right audience. Their demonstration of moving an entire virtual data center with a single command can't be beat. As far as I know, they have had great success (relative to others) in the hosting space, but have not yet penetrated the larger enterprises (though they are trying). In truth, the hosting story alone is why I think they are the only ones that can claim some portability for end customers.

(There are reportedly issues with the scalability of the platform, but I have no proof of that other than 2nd hand information from former Cassatt customers that tried and rejected 3TERA. Besides, scalability issues can always be fixed in future releases.)

Cloudware, however, bugs me to no end, and I hope 3TERA will either turn it into a legitimate open source project (based on the AppLogic code) or spare us the pain of vendor brinkmanship and offer Cloudware as an AppLogic specific framework, but not an open standard.

By the way, I would have had that standards body diplomat role if I was willing to move north...

Update 7/8/2008: William Vanbenepe points out in the comments below that there is an existing set of threads about Cloudware on his blog and John William's blog. The comments to these blogs are worth a read, as they lay out the debate from all sides.

Sunday, July 06, 2008

Which Sun Do You Orbit?

I love cloud computing. I love the concept, I love many of the implementations, and I love the opportunity that such a major disruption creates for entrepreneurs and tech giants alike. There is much to be excited about, though the market is in its infancy.

Or markets, if you look closely. Simon noted that at his Opscon presentation this year he ended up on the receiving end of an extended diatribe from a gentleman who was arguing determinately that software would never be portable between Amazon EC2 and Google AppEngine (which is probably very true). Simon's response was right on the money:

"I must admit I was somewhat perplexed at why this person ever thought they would and why they were talking to me about it. I explained my view but I also thought that I'd reiterate the same points here.

From the ideas of componentisation, the software stack contains three main stable layers of subsystems from the application to the framework to hardware. This entire software stack is shifting from a product to a service based economy (due to commoditisation of IT) and this will eventually lead to numerous competitive utility computing markets based upon open sourced standards at the various layers of this stack.

These markets will depend upon substitutability (which includes portability and interoperability) between providers. For example you might have multiple providers offering services which match the open SDK of Google App Engine or another market with providers matching Eucalyptus. What you won't get is substitutability from one layer of the stack (e.g. the hardware level where EC2 resides) to another (e.g. the framework level where GAE resides). They are totally different things: apples and pears."

I want to take Simon's "stack" theory and refine it further. Look at the layers of the stack, and note that there appears to be a relatively small number of companies in each that can actually drive a large following to their particular set of "standards". In the platform space, of course Google's python-focused (for now) restricted library set is where much of the focus is, but no one has counted out Bungie Labs yet, nor is anyone ignoring what Yahoo might do in this space. Each vendor has their framework (as Simon rightly calls the platform itself), but each has a few followers building tools, extensions, replications and other projects aimed at both benefiting from and extending the benefits of the platform. The diagram below identifies many of the current central players, or "suns", that exist in each technology stack today:

Credit: Kent Langley, ProductionScale

I call these communities of central players and satellites "solar systems" (though perhaps it would be more accurate to call them "nodes and edges", as we will see later).

In each solar system--say the Google AppEngine solar system--you will find an enthusiastic community of followers who thoroughly learn the platform, push its limits, and frequently (though not in every case) find economic and productivity benefits that keep them coming back. Furthermore, the most successful satellite projects will attract their own satellites, and an ever changing environment will form, though the original central players will likely maintain their role for decades (basically until the market is disrupted by an even better technical paradigm).

You already see a very strong Amazon system forming. RightScale, Enomalism and ELASTRA, are all key satellites to AWS's sun. Now you are even starting to hear about satellites of satellites in that space, such as GigaSpace's partnership with RightScale. However, if you look closely at this system, you begin to see the breakdown in the strict interpretation of this analogy, as several of the players (CohesiveFT's ElasticServer On-Demand, for example) starting to address multiple suns in a particular "stack". Thus my earlier comment that perhaps a nodal analogy is somewhat better.

The key here is that for some time from now, technologies created for the cloud will be attached to one or two so-called solar systems in the stack the technology addresses. Slowly standards will start to appear (as one solar system begins to dominate or subsume the others), and eventually the stack will play as a commodity market, though (I would argue) still centered around one key player. By the time this happens, some cross pollination of the stacks themselves will start happen (as has already happened with the prototype of GAE running in EC2), at which point new gaps in standards will be identified. This is going to take probably two decades to play out entirely, at which point the cloud market will probably already face a major disruptive alternative (or "reinvention").

I say this not to be cynical nor to pontificate for pontification's sake. I say this because I believe developers are already starting to choose their "solar system", and thus their technological options are already being dictated by which satellite technologies apply to their chosen sun. Recognizing this as OK, in fact natural to the process, and acknowledging that religious wars between platforms--or at least stacks--is kind of pointless, will make for a better climate to accelerate the consolidation of technical platforms into a small set of commodity markets. Then the real fun begins.

Of course, I'm a big fan of religious wars myself...

Thursday, July 03, 2008

Is Amazon Google's biggest threat?

This is a bit of a stretch, but a Greg Linden post, "Amazon page recommendations", suggests that Amazon may be offering a new service soon, one that could turn Amazon into one of the core information sources on the Internet. Greg points to a post by Brady Forrest of O'Reilly Radar that outlines the service in more detail.

From Brady's post:

"Amazon is turning its personalization engine towards webpages. You can test it on your site via the new Page Recommender Widget (sorry if the link doesn't work you, it's only open to affiliates). The widget only considers pages on your website. As you can see from the screenshot above, it shows a combination of products and webpages.

Amazon provides the following info:

In order to generate page recommendations, the Page Recommender Widget must be placed on every page of your site that you'd like to be recommended. Page recommendations will appear in the widget over time, as Amazon analyzes traffic patterns on your site. You'll typically see recommendations for your most popular pages first, with the remainder of your site filling in over time. The length of this time depends on the characteristics of your web site. During this period, we'll still display individually targeted Amazon products in the widget.

The widget learns from your visitors and how they move through your site. If you only have a couple of pages the widget won't do much for you. I do not know if the widget restricts recommended pages to the same domain or if all of an affiliate ID's sites will be included. I wonder if a visitor's Amazon history will be used by the Recommendation Engine."

Brady goes on to theorize that this may be the beginnings of a new recommendation web service from Amazon, and I think he may be on to something. Amazon has perhaps the most sophisticated usage tracking software out there on its retail sites, and no one really cares because the data is used to enhance the shopping experience so much. I can imagine that a service which allows any site to determine context and preferences for any given user (or at least the users with Amazon IDs) would be highly profitable.

Now the stretch. Is this building up an extension of human knowledge that not only tracks what information a user seeks, but what they actually use? Given that, is there the long term potential to beat Sergey and Larry to a specialized brain extension, as described in Nick Carr's The Atlantic article, "Is Google Making Us Stupid?":

"Where does it end? Sergey Brin and Larry Page, the gifted young men who founded Google while pursuing doctoral degrees in computer science at Stanford, speak frequently of their desire to turn their search engine into an artificial intelligence, a HAL-like machine that might be connected directly to our brains. “The ultimate search engine is something as smart as people—or smarter,” Page said in a speech a few years back. “For us, working on search is a way to work on artificial intelligence.” In a 2004 interview with Newsweek, Brin said, “Certainly if you had all the world’s information directly attached to your brain, or an artificial brain that was smarter than your brain, you’d be better off.” Last year, Page told a convention of scientists that Google is “really trying to build artificial intelligence and to do it on a large scale.”"
Now imagine Amazon actually anticipating your interests before you even realize them consciously simply by tracking the context in which you "live" online. Is that AI enough for you?

Now, Google does some similar tracking with its Web History service, so I'm probably way off here. However, I get the sense that Amazon Web Services is pushing Amazon to think in terms of a larger vision, one in which it plays a central part in any and all commercial activities on the web, making it the smartest marketing machine on the planet--smarter in that sense that even the mighty Google itself.

Monday, June 30, 2008

Why cloud computing doesn't get us out of the woods yet...

Jesse Robbins (a modern Renaissance man if ever there was one) quoted from a post by Theo Schlossnagle, author of Scalable Internet Architectures and President and CEO of OmniTI, in which Schlossnagle notes the challenge brought by highly popular sites linking to average traffic sites, and its implications for scalable Internet architectures, including cloud computing.

As he carefully documents (using his blog site and two events triggered by Digg and the New York Times respectfully), the nature of "spikes" in the Internet has changed dramatically. First, he shows a graph of traffic to his blog over a two day period in March of 2008:Then he goes on to point out:

"What isn't entirely obvious in the above graphs? These spikes happen inside 60 seconds. The idea of provisioning more servers (virtual or not) is unrealistic. Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time. This means it is about time to adjust what our systems architecture should support. The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling. At least eight times in the past month, we've experienced from 100% to 1000% sudden increases in traffic across many of our clients."
Stop and pay attention to that. The onset of traffic to near peak levels can take place in less than 60 seconds!

Sure, you can get "unlimited" capacity from an Amazon/Mosso/GoGrid/whatever on demand, but can you provision that extra capacity fast enough to meet demand? Clearly automation is not enough to guarantee that you will never lose a user. If this is critical to you, then running at a reduced utilization is probably the only really good answer. (Another possibility is implementing "warm" systems that primarily do another task, but that can be enslaved into a high traffic situation with little or no manual intervention--and don't require a reboot.)

I'm not sure I have a great answer for what to do here, but I think anyone buys into "capacity on demand" should know that this capacity takes time to allocate, and that demand may outstrip supply for seconds or minutes. Nothing about the cloud can avoid the trade off between utilization and "reactability".

Update: The folks at Project Caroline ran some simple tests, and feel they would be ready for this scenario. Ron Mann spells it all out for you, and it is an interesting read.

Tuesday, June 24, 2008

"Follow the Law" Meme Hits the Big Time

A few days ago, I checked in to my w3counter dashboard to see who was linking to my blog, and I discovered an very intelligent continuation of the "Follow the Law Computing" meme written by Greg Ness (also found on his blog). Greg's addition of the "spice trails" analogy was something new to me, and raised some interesting thoughts about what the historical significance of the cloud will be to world wide wealth distribution. There certainly has been a limited but significant wealth effect created by the Internet itself, but will the ability to physically move data and/or compute loads accelerate these trends?

Noting that I should blog about this on the plane at some point during my trip to Austin this week, I dutifully bookmarked the article for later. I had no chance to look at traffic on Monday, so it was with great shock that when I got on line this morning I saw a hockey stick graph. I investigated, and then my heart skipped a beat.

As of now, today, quotes from my "Follow the Law" post make up Nick Carr's latest post. Nick weaves together the work of Bill Thompson (which I also reference), myself and Greg to provide a clear, concise discussion of the concept of what he calls "itinerant computing". (Damn, he's good at coining these terms, isn't he?)

Ever since I discovered Nick's blog early in my career at Cassatt, I've wanted to get his attention. The Big Switch was an eye opening read--if only it served as a good counterpoint to Bill Coleman's optimistic vision. He made me look at utility computing and cloud computing with a more critical eye, and I wanted to add to his body of knowledge. I am honored to have done so in a small way.

Surprisingly, though, that wasn't whole the hockey stick trigger. Greg's post was picked up by a site called Seeking Alpha, a site I must admit I had never heard of before. Apparently a high traffic investment site (connected to Jim Cramer?), Seeking Alpha drove a record traffic load to my humble blog through a rebroadcast of Greg's post. Rereading that, I noticed that there is a very strong business message there that may in fact be actual historical significance of "itinerant computing": the flow of data and computing is simply an enabler of new business models and competitive advantages that change the face of global wealth. Being a resident of what is essentially a suburb of the Silicon Valley, I can't help but think there is more downside than upside to that story.

Finally, as I looked at the other referrers to this blog, I found an excellent summary of all of the "Follow" computing options: Follow the Sun, Follow the Moon and Follow the Law. Kevin Kelly gives very good basic definitions of each concept, and then makes the following observation:

"Most likely different industries adopt a different scenario. Maybe financial follows the moon, while commerce follows the sun, and entertainment follows the law. A single computing environment (One Machine) should not suggest homogeneity. A meadow is not homogeneous, but its does act as a coherent ecological system.

Another way to dissect the daily rhythm of the One Machine is to trace the three distinct waves of energy, data, and computation as they flow through the planetary "cloud." Each probably has its own pathways."

Amen, brother. I'll go even further. Maybe the customer server systems of a financial company follows the sun, the analytics systems follow the moon, and the trading systems follow the law. I do not mean to suggest at all that every distributed compute task will benefit from follow the law concepts. In fact, I would suggest that there are other "Follow" options that will be created over the coming decades.

All of this leads to the question of software fluidity...

Sunday, June 22, 2008

"Follow the Law Computing" on Google Groups: Cloud Computing

Not long after my post outlining my theory of an unexplored economic concern for moving compute loads in a cloud computing environment, a discussion popped up on the Google Groups Cloud Computing group. In this thread, which started out covering BI issues in the cloud, the question of moving data to computing versus moving computing to the data came up. It is a priceless thread, and one that showed me that I have not been the only one thinking about the technology of migrating workloads in the cloud.

The first message that popped out at me was one by Chuck Wegrzyn, apparently of Twisted Storage:

"How does the "cloud" protect data going from the owner to the computing service without being compromised (read that as sniffed)? Will a computing service in country A have the right to impose restrictions on data from another country (even if the results of the computing don't affect the citizens of country A)? An so on. "
He goes on to say, in a separate message:
"While I think trans-national data movement will be an area that requires governance of some kind I think that companies can get around the problem in other ways. I think it just requires looking at the problem in a different way.

I'd think the approach is to keep the data still and move the computing to it. The idea is to see the thousands of machines it takes to hold the petabytes worth of data as the compute cloud. What needs to move to it is the programs that can process the data. I've been working on this approach for the last 3 years (Twisted Storage). "
Bingo! This is what I think is going to start happening as well. Move compute loads to where the legal and regulatory environment is most favorable, and leave the (highly contentious) data where it is.

Khaz Sapenov even has a name for this pattern:
"This is valid approach, that I personally called "Plumber Pattern", when application, encapsulated in some kind of container (e.g. virtual machine image) is marshalled to secure data islands to iteratively do its unique work (say, do a matches on some criterium in Interpol, FBI, CIA, MI5 and other databases, all distributed across continents). Due to utterly confidential nature of these types of data, it is impossible to move them to public storage (at least this time). Above-mentioned case might be
extrapolated to some lines of business as well with reduced privacy/security requirements. "
I have no idea where the term "plumber" comes into this, but it somehow seems to work. More importantly, Khaz gives an excellent use case for a compute problem where the data cannot move for legal and national security reasons, but an authorized (or unauthorized--gulp) software stack could move from data center to data center to compute an aggregate report.

Marc Evans even points out that we already have some open source compute algorithms that can serve as a starting point to address these problems:
"In my experiences(sic), there are cases where having the data / computation as close to the customer edge as possible is what is required for an acceptable user experience. In other cases, the relationship of the user / data / computation is not important. Most often, there is a mix of both. One of the ideas behind Hadoop as I understand it is to bring the computation to the data location, while also providing for the data to be in several locations. The scheduler is critical to making good use of
data locality. So yes, I believe that what you are looking for does exist within Hadoop at a minimum, though I also believe that there is alot of room to evolve the techniques that it uses. "
Jim Peters then asks a simple, but loaded question:
"Even if the cloud providers come up with excellent answers to the security and reliability questions, who's going to trust them? Credit card numbers are one thing, but cloud data is something else entirely. "
At this point, Ray Nugent adds what I think is the quintessential economic consideration:
"Security is really a business issue. Each layer of security should cost no more than the data is worth. So the concept of "secure enough" becomes important. What security is appropriate for a given type of data and is it more or less secure in the cloud than in the corp DC? Is data inherently "less secure" by virtue of being in the cloud than, say, an employees laptop or flash dongle or "on the wire"? I don't think corporate data centers are a secure as you're suggesting they are..."
"Secure enough" is, I think, where its at. Perhaps a new term is needed: "Avoid the Risk Computing"?

Anyway, the discussion goes on from there, and I suggest you read the thread yourself. This is a key topic for cloud computing, and I think there is a good chance that one or more of the biggest technology companies of the early to mid 21st century will hatched from discussions like these.

(This group, by the way, is absolutely awesome, and each thread is packed with intelligent and insightful messages. If you care about cloud computing, you need to join.)