Showing posts with label software fluidity. Show all posts
Showing posts with label software fluidity. Show all posts

Wednesday, July 16, 2008

Watch out for Cisco, kids!

What is the most important enabler of distributed computing architectures, such as cloud oriented architectures? What is the one thing that has to be in ample supply before the other elements of the data center come into play? Is it the number of servers or CPU power available for computing? Is it the size and speed of the disks and network storage devices? Is it the distributed software architectures themselves?

My answer? None of the above. It's network bandwidth, baby, all the way.

Why? Well, let's break down where the costs of distributed systems lay. We all know that CPU capabilities double roughly every couple of years, and we also know that disk I/O slows those CPUs down, but not at the rate that network I/O typically does. When designing distributed systems, you must first be aware of network latency and control traffic between components to have any chance in heck of meeting rigorous transaction rate demands. The old rule at Forte Software, for what it's worth, was:

  • First reduce the number of messages as much as possible
  • Then reduce the size of those messages as much as possible
Increase adherence to those rules, and your software would outperform less optimized applications every time. It was easy to look like a performance tuning genius in those days.

What is exciting about today's environment, however, is that network technology is changing rapidly. Bandwidth speeds are increasing quickly (though not as fast as CPU speeds), and this high speed bandwidth is becoming more ubiquitous world wide. Inter-data-center speeds are increasingly mind boggling, and WAN optimization apparently has removed much of the fear of moving real-time traffic between geographically disparate environments.

All of this is a huge positive to cloud oriented architectures. When you design for the cloud, you want to focus on a few key things:
  • Software fluidity - The ability of the software to run cleanly in a dynamic infrastructure, where the server, switch port, storage and possibly even the IP address changes day by day or minute by minute.

  • Software optimization - Because using a cloud service costs money, whether billed by the CPU hour, the transaction or the number of servers used, you want to be sure you are getting your money's worth when leveraging the cloud. That means both optimizing the execution profile of your software, and the use of external cloud services by the same software.

  • Scalability - This is well established, but clearly your software must be able to scale to your needs. Ideally, it should scale infinitely, especially in environments with highly unpredictable usage volume (such as the Internet).

Achieving any of these in an environment where your network bandwidth is constricting your options is nearly impossible.

Oh, and one more thing. The network is the first element of your data center that sees load, failure and service level compliance. Think about it--without the eyes of the network, all of your other data center elements become black boxes (though often physically with those annoying beeps and little blinking orange lights). What are the nerves in the data center nervous system? Network cables, I would say.

Today I saw two really good posts about possible network trends driven by the cloud, and how Cisco's new workhorse leverages "virtualized" bandwidth and opens the door to commodity cloud capacity. The first is a post by Douglas Gourlay of Cisco, which simply looks at the trends that got us to where we are today, and further trends that will grease the skids for commodity clouds. I am especially interested in the following observations:
"8) IP Addressing will move to IPv6 or have IPv4 RFCs standardized that allow for a global address device/VM ID within the addressing space and a location/provider sensitive ID that will allow for workload to be moved from one provider to another without changing the client’s host stack or known IP address ‘in flight’. Here’s an example from my friend Dino.

9) This will allow workload portability between Enterprise Clouds and Service Provider Clouds.

10) The SP community will embrace this and start aggressively trying to capture as much footprint as possible so they can fill their data centers to near capacity allowing for them to have the maximum efficiency within their operation. This holds to my rule that ‘The Value of Virtualization is compounded by the number of devices virtualized’.

11) Someone will write a DNS or a DNS Coupled Workload exchange. This will allow the enterprise to effectively automate the bidding of workload allocation against some number or pool of Service Providers who are offering them the compute, storage, and network capacity at a given price. The faster and more seamless the above technologies make the shift of workload from one provider to another the simpler it is in the end for an exchange or market-based system to be the controlling authority for the distribution of workload and thus $$$’s to the provider who is most capable of processing the workload."

The possibility that IP addresses could successfully travel with their software payloads is incredibly powerful to me, and I think would change everything for both "traditional" VM users, as well as the virtual appliance world. The possibility that my host name could travel with my workload, even as it is moved in real time from one vendor to another is, of course, cloud computing nirvana. To see someone who obviously knows something about networking and networking trends spell out this possibility got my attention.

(Those who see a fatal flaw in Doug's vision are welcome to point it out in the comments section below, or on Doug's blog.)

The second post is from Hurwitz analyst, Robin Bloor, who describes in brilliant detail why Cisco's Nexus 7000 series is different, and why it could very well take over the private cloud game. As an architecture, it essentially makes the network OS the policy engine for controlling provisioning and load balancing, though with bandwidth speeds that blow away today's standards (10G today, but room for 40G and 100G standards in the future). Get to those speeds, and all of a sudden something other than network bandwidth is your restricting function in scaling a distributed application.

I have been cautiously excited about the Nexus announcement from the start. Excited because the vision of what Nexus will be is so compelling to me, for all of the reasons I describe above. (John Chambers, CEO of Cisco, communicates that vision in a video that accompanied the Nexus 5000 series launch.) Cautious, because it reeks of old-school enterprise sales mentality, with Cisco hoping to "own" whole corporate IT departments by controlling both how software runs, and what hardware and virtualization can be bought to run it on. Lock-in galore, and something the modern, open source aware corporate world may be a little uneasy about.

That being said, as Robin put it, "In summary: The network is a computer. And if you think that’s just a smart-ass bit of word play: it’s not."

Robin further explains Cisco's vision as follows:

"Cisco’s vision, which can become reality with the Nexus, is of a data center that is no longer defined by computer architecture, but by network architecture. This makes sense on many levels. Let’s list them in the hope of making it easier to understand.

  1. Networks have become so fast that in many instances it is practical to send the the data to the program, or to send the program to the data, or to send both the program and the data somewhere else to execute. Software architecture has been about keeping data and process together to satisfy performance constraints. Well Moore’s Law reduced the performance issue and Metcalfe’s Law opened up the network. All the constraints of software architecture reduced and they continue to reduce. Distributing both software and data becomes easier by the year.
  2. Software is increasingly being delivered as a service that you connect to. And if it cannot deliver the right performance characteristics in the place where it lives, you move it to a place where it can.
  3. Increasingly there is more and more intelligence being placed on the switch or on the wire. Of course Cisco has been adding intelligence to the switch for years. Those Cisco firewalls and VPNs were exactly that. But also, in the last 5 years, agentless sotware (for example some Intrusion Detection products) has become prominent. Such applications simply listen to the network and initiate action if they “don’t like what they hear”. The point is that applications don’t have to live in server blade cabinets. You can put them on switches or you could put them onto server boards that sit in a big switch cabinet. They’re very portable.
  4. The network needs an OS (or NOS). Whether Cisco has the right OS is a point for debate, but the network definitely needs an OS and the OS needs to perform the functions that Cisco’s NX-OS carries out. It also needs to do other things to like optimize and load balance all the resources in a way that corresponds to the service level needs of the important business transactions and processes it supports. Personally, I do not see how that OS can do anything but span the whole network - including the switches."
Would all applications run this way? Probably not. But those mission critical, highly distributed, performance-is-everything apps you provide for your customers, or partners, or employees, or even large data sets, are extremely good candidates for this way of thinking.

Oh, and I wouldn't be surprised if Google, Microsoft, et al. agreed (though not necessarily as Cisco customers).

Does Nexus work? I have no idea. But I am betting that, as private clouds are built, the idea that servers are the center of the universe will be tested greatly, and the incredibly important role of the network will become more and more apparent. And when it does, Cisco may have positioned themselves to take advantage of the fun that follows.

Its just too bad that it is another single-vendor, closed source vendor offering that will take probably 5-7 years (minimum) to replicate in the open source world. At the very least, I hope Cisco is paying attention to Doug's observation that:
"[T]here will be a standardization of the hypervisor ‘interface’ between the VM and the hypervisor. This will allow a VM created on Xen to move to VMWare or Hyper-V and so on."
I hope they are openly seeking to partner with OVF or another virtualization/cloud standard to ensure portability to and from Nexus.

However, I would rather have this technology in a proprietary form than not at all, so way to go Cisco, and I will be watching you closely--via the network, of course.

Tuesday, June 24, 2008

"Follow the Law" Meme Hits the Big Time

A few days ago, I checked in to my w3counter dashboard to see who was linking to my blog, and I discovered an very intelligent continuation of the "Follow the Law Computing" meme written by Greg Ness (also found on his blog). Greg's addition of the "spice trails" analogy was something new to me, and raised some interesting thoughts about what the historical significance of the cloud will be to world wide wealth distribution. There certainly has been a limited but significant wealth effect created by the Internet itself, but will the ability to physically move data and/or compute loads accelerate these trends?

Noting that I should blog about this on the plane at some point during my trip to Austin this week, I dutifully bookmarked the article for later. I had no chance to look at traffic on Monday, so it was with great shock that when I got on line this morning I saw a hockey stick graph. I investigated, and then my heart skipped a beat.

As of now, today, quotes from my "Follow the Law" post make up Nick Carr's latest post. Nick weaves together the work of Bill Thompson (which I also reference), myself and Greg to provide a clear, concise discussion of the concept of what he calls "itinerant computing". (Damn, he's good at coining these terms, isn't he?)

Ever since I discovered Nick's blog early in my career at Cassatt, I've wanted to get his attention. The Big Switch was an eye opening read--if only it served as a good counterpoint to Bill Coleman's optimistic vision. He made me look at utility computing and cloud computing with a more critical eye, and I wanted to add to his body of knowledge. I am honored to have done so in a small way.

Surprisingly, though, that wasn't whole the hockey stick trigger. Greg's post was picked up by a site called Seeking Alpha, a site I must admit I had never heard of before. Apparently a high traffic investment site (connected to Jim Cramer?), Seeking Alpha drove a record traffic load to my humble blog through a rebroadcast of Greg's post. Rereading that, I noticed that there is a very strong business message there that may in fact be actual historical significance of "itinerant computing": the flow of data and computing is simply an enabler of new business models and competitive advantages that change the face of global wealth. Being a resident of what is essentially a suburb of the Silicon Valley, I can't help but think there is more downside than upside to that story.

Finally, as I looked at the other referrers to this blog, I found an excellent summary of all of the "Follow" computing options: Follow the Sun, Follow the Moon and Follow the Law. Kevin Kelly gives very good basic definitions of each concept, and then makes the following observation:

"Most likely different industries adopt a different scenario. Maybe financial follows the moon, while commerce follows the sun, and entertainment follows the law. A single computing environment (One Machine) should not suggest homogeneity. A meadow is not homogeneous, but its does act as a coherent ecological system.

Another way to dissect the daily rhythm of the One Machine is to trace the three distinct waves of energy, data, and computation as they flow through the planetary "cloud." Each probably has its own pathways."

Amen, brother. I'll go even further. Maybe the customer server systems of a financial company follows the sun, the analytics systems follow the moon, and the trading systems follow the law. I do not mean to suggest at all that every distributed compute task will benefit from follow the law concepts. In fact, I would suggest that there are other "Follow" options that will be created over the coming decades.

All of this leads to the question of software fluidity...

Sunday, June 22, 2008

"Follow the Law Computing" on Google Groups: Cloud Computing

Not long after my post outlining my theory of an unexplored economic concern for moving compute loads in a cloud computing environment, a discussion popped up on the Google Groups Cloud Computing group. In this thread, which started out covering BI issues in the cloud, the question of moving data to computing versus moving computing to the data came up. It is a priceless thread, and one that showed me that I have not been the only one thinking about the technology of migrating workloads in the cloud.

The first message that popped out at me was one by Chuck Wegrzyn, apparently of Twisted Storage:

"How does the "cloud" protect data going from the owner to the computing service without being compromised (read that as sniffed)? Will a computing service in country A have the right to impose restrictions on data from another country (even if the results of the computing don't affect the citizens of country A)? An so on. "
He goes on to say, in a separate message:
"While I think trans-national data movement will be an area that requires governance of some kind I think that companies can get around the problem in other ways. I think it just requires looking at the problem in a different way.

I'd think the approach is to keep the data still and move the computing to it. The idea is to see the thousands of machines it takes to hold the petabytes worth of data as the compute cloud. What needs to move to it is the programs that can process the data. I've been working on this approach for the last 3 years (Twisted Storage). "
Bingo! This is what I think is going to start happening as well. Move compute loads to where the legal and regulatory environment is most favorable, and leave the (highly contentious) data where it is.

Khaz Sapenov even has a name for this pattern:
"This is valid approach, that I personally called "Plumber Pattern", when application, encapsulated in some kind of container (e.g. virtual machine image) is marshalled to secure data islands to iteratively do its unique work (say, do a matches on some criterium in Interpol, FBI, CIA, MI5 and other databases, all distributed across continents). Due to utterly confidential nature of these types of data, it is impossible to move them to public storage (at least this time). Above-mentioned case might be
extrapolated to some lines of business as well with reduced privacy/security requirements. "
I have no idea where the term "plumber" comes into this, but it somehow seems to work. More importantly, Khaz gives an excellent use case for a compute problem where the data cannot move for legal and national security reasons, but an authorized (or unauthorized--gulp) software stack could move from data center to data center to compute an aggregate report.

Marc Evans even points out that we already have some open source compute algorithms that can serve as a starting point to address these problems:
"In my experiences(sic), there are cases where having the data / computation as close to the customer edge as possible is what is required for an acceptable user experience. In other cases, the relationship of the user / data / computation is not important. Most often, there is a mix of both. One of the ideas behind Hadoop as I understand it is to bring the computation to the data location, while also providing for the data to be in several locations. The scheduler is critical to making good use of
data locality. So yes, I believe that what you are looking for does exist within Hadoop at a minimum, though I also believe that there is alot of room to evolve the techniques that it uses. "
Jim Peters then asks a simple, but loaded question:
"Even if the cloud providers come up with excellent answers to the security and reliability questions, who's going to trust them? Credit card numbers are one thing, but cloud data is something else entirely. "
At this point, Ray Nugent adds what I think is the quintessential economic consideration:
"Security is really a business issue. Each layer of security should cost no more than the data is worth. So the concept of "secure enough" becomes important. What security is appropriate for a given type of data and is it more or less secure in the cloud than in the corp DC? Is data inherently "less secure" by virtue of being in the cloud than, say, an employees laptop or flash dongle or "on the wire"? I don't think corporate data centers are a secure as you're suggesting they are..."
"Secure enough" is, I think, where its at. Perhaps a new term is needed: "Avoid the Risk Computing"?

Anyway, the discussion goes on from there, and I suggest you read the thread yourself. This is a key topic for cloud computing, and I think there is a good chance that one or more of the biggest technology companies of the early to mid 21st century will hatched from discussions like these.

(This group, by the way, is absolutely awesome, and each thread is packed with intelligent and insightful messages. If you care about cloud computing, you need to join.)

Thursday, June 12, 2008

"Follow the law" computing

A few days ago, Nick Carr worked his usual magic in analyzing Bill Thompson's keen observation that every element of "the cloud" eventually boils down to a physical element in a physical location with real geopolitical and legal influences. This problem was first brought to my attention in a blog post by Leslie Poston noting that the Canadian government has refused to allow public IT projects to use US-based hosting environments for fear of security breaches authorized via the Patriot Act. Nick added another example with the following:

Right before the manuscript of The Big Switch was shipped off to the printer ("manuscript" and "shipped off" are being used metaphorically here), I made one last edit, adding a paragraph about France's decision to ban government ministers from using Blackberrys since the messages sent by the popular devices are routinely stored on servers sitting in data centers in the US and the UK. "The risks of interception are real," a French intelligence official explained at the time.
I hadn't thought too much about the political consequences of the cloud since first reading Nick's book, but these stories triggered a vision that I just can't shake.

Let me explain. First, some setup...

One of the really cool visions that Bill Coleman used to talk about with respect to cloud computing was the concept of "follow the moon"; in other words, moving running applications globally over the course of an earth day to where processing power is cheapest--on the dark side of the planet. The idea was originally about operational costs in general, but these days Cassatt and others focus this vision around electricity costs.

The concept of "moving" servers around the world was greatly enhanced by the live motion technologies offered by all of the major virtualization infrastructure players (e.g. VMotion). With these technologies (as you all probably know by now), moving a server from one piece of hardware to another is as simple as clicking a button. Today, most of that convenience is limited to within a single network, but with upcoming SLAuto federation architectures and standards that inter-LAN motion will be greatly simplified over the coming years.

(It should be noted that "moving" software running on bare metal is possible, but it requires "rebooting" the server image on another physical box.)

The key piece of the puzzle is automation. Whether simple runbook-style automation (automating human-centric processes) or all-out SLAuto, automation allows for optimized decision making across hundreds, thousands or even tens of thousands of virtual machines. Today, most SLAuto is blissfully unaware of runtime cost factors, such as cost of electricity or cost of network bandwidth, but once the elementary SLAuto solutions are firmly established, this is naturally the next frontier to address.

But hold on...

As the articles I noted earlier suggest, early cloud computing users have discovered a hitch in the giddy-up: the borders and politics of the world DO matter when it comes to IT legislation.

If law will in fact have such an influence on cloud computing dynamics, it occurs to me that a new cost factor might outshine simple operations when it comes to choosing where to run systems; namely, legality itself. As businesses seek to optimize business processes to deliver the most competitive advantage at the lowest costs, it is quite likely that they will seek out ways to leverage legal loopholes around the world to get around barriers in any one country.

Now, this is just pie-in-the-sky thinking on my part, and there are 1000 holes here, but I think its worth going through the exercise of thinking this out. The problem is complicated, as there are different laws that apply to data and the processing being one on that data (as well as, in some jurisdictions, the record keeping about both the data and the processing). However, there are technical solutions available today for both data and processing that could allow a company to mix and match the geographies that give them the best legal leverage for the services they wish to offer:
  • Database Sharding/Replication

    Conceptually, the simplest way to keep from violating any one jurisdiction's data storage or privacy laws is to not put the data in the jurisdiction. This would be hard to do, if not for some really cool data base sharding frameworks being released to the community these days.

    Furthermore, replicate the data in multiple jurisdictions, but use the best-case instance of that data for processing happening in a given jurisdiction. In fact, by replicating a single data exchange into multiple jurisdictions at once, it becomes possible to move VMs from place to place without losing (read-only, at least) access to that data.

  • VMotion/LiveMotion

    From a processing perspective, once you solve legally accessing the data from each jurisdiction, you can now move your complete processing state from place to place as processing requires, without losing a beat. In fact, with networks getting as fast as they are, transfer times at the heart of the Internet may be almost as fast as on a LAN, and those times are usually measured in the low hundreds of milliseconds.

    So, run your registration process in the USA, your banking steps in Switzerland, and your gambling algorithms in the Bahamas. Or, market your child-focused alternative reality game in the US, but collect personal information exclusively on servers in Madagascar. It may still be technically illegal from a US perspective, but who do they prosecute?

Again, I know there are a million roadblocks here, but I also know both the corporate world and underworld have proven themselves determined and ingenious technologists when it comes to these kinds of problems.

As Leslie noted, our legislators must understand the economic impact of a law meant for a physical world on an online reality. As Nick noted, we seem to be treading into that mythical territory marked on maps with the words "Here Be Dragons", and the dragons are stirring.

Friday, May 09, 2008

What Will It Take to Form a Cloud Computing Market?

A few weeks ago I joined the Google Groups Cloud Computing group as a charter member of sorts. Today there is an excellent thread going on regarding the various elements needed to make a cloud market work. It started with Reuven Cohen of Enomaly proposing the need for a new marketing term...er, technology concept...he calls a Virtual Private Cloud. The idea here is to create a logical container for a variety of resources located across multiple data centers and technology, making it all appear as a single homogeneous computing environment with security, management (SLAuto?), etc.

However, what has spawned from that original proposal is a wide ranging, but thoughtful discussion of what is needed to allow for an open market for compute resources in the cloud. Participating are several of the vendors supplying various services for Amazon EC2 today (and, I'm sure, a wide range of others in the future), and several end users of primarily "Computational Grid" technologies.

I sent several responses. Here are some highlights:

  • Mike Culver pointed out that
    "Condor (http://www.cs.wisc.edu/condor/) is a project that enables this sort of scenario. But in so doing, things go full sircle [sic], and suddenly the paradigm is the old days of mainframe computing where the notion of "job" is separate from the underlying computing resource."
    I replied
    "I hate the notion of every software executable being a "job". [With most] high availability applications running additional instances [of "permanent" processes] in excess capacity at another site is a distinctly possible scenario.

    I prefer the term "software payload" to describe what gets moved from cloud provider to cloud provider, at least at the HaaS level."
    The idea of a "job" is just not applicable to many user-facing applications.

  • Larry Ludwig of hosting provider Empowering Media notes
    "Application scaling IMHO will always
    involve a mixture of automated systems and programing changes to the
    application. I don't think this aspect of cloud computing can ever be
    completely automated.

    The typical "throwing hardware at it" works up to a point and in cloud
    computing
    will be no different since the a cloud system is still based upon
    the Von Neumann architecture. There is a point where it becomes more than a
    sysadmin/scaling challenge. Programming changes will need to made to the
    specific application. What scales to X, won't scale to Y because of a
    different bottleneck."
    And I agree, adding
    "I see two architectural problems to be managed when building an application for the cloud:

    - Scalability - which you cover below

    - Fluidity - which is the ability to move an application, *application
    tier* or service between cloud infrastructures without rewriting or
    reconfiguring the software payload"
    I blog extensively about software fluidity in these posts.

  • Geoffrey Fox notes that much work has been done to analyze and design Computational Grid economies. I assumed that Geoffrey was taking about grid computing models based on splitting up "jobs", so I noted:
    "The problem is partially that Computational Grid computing is a subclass of the Cloud Computing metrics and standard problem. See earlier notes about long running processes versus "jobs"."
    In other words, I'm not sure how far people have gone to analyze enterprise computing on a grid versus just HPC, etc.
There is so much more to read on this thread and others in this rapidly growing group; not the least of which will be the responses to my thoughts here. If you haven't joined yet, and you are interested in cloud computing strategy and tactics, I recommend that you get involved.

Tuesday, May 06, 2008

One advantage of utility computing infrastructure: Heisenberg's Uncertainty Principle applied to computing

I was casually browsing my Google Reader pages (which can also be followed on FriendFeed) when I came across a gem from Data Center Knowledge: apparently Peter Gabriel's web site servers were stolen from their hosting provider. All content and hardware gone, and fans left with nothing but an apology page.

Now, I'm a huge Gabriel fan, so this was interesting in part because I feel for the guy and hope nothing of great value was stored on those servers. However, my interest was peaked by the realization that this highlights one of the key values of decoupling software from hardware. To illustrate this advantage, I'd like to paraphrase Heisenberg's famous Uncertainty Principle:

In shared resource computing, you can locate the server, but you cannot firmly define what is running on the server (over time); conversely, you can define the software image, but it is difficult to firmly locate which server it is running on (over time).
Thus, if someone comes into a data center that is sharing server resources in a utility computing like model and steals a server, they will very likely get no data whatsoever. Conversely, if they want the data, they have to steal all of the storage associated with the server image, which in many environments is spread amongst several physical drives; is dependent on the network infrastructure in which it is running; and is useless without both a compatible server to execute it, and a compatible management system to deliver it to that server.

To me, this greatly enhances system security over dedicated server models. If Gabriel's stuff had been PXE booted on random servers around the hosting center, from distributed storage systems, he may have foiled his thief's plans. He certainly would have made it much more technically difficult for them.

The more I learn about decoupling software from hardware, whether through server virtualization or policy-based dynamic deployment, the more I think its a no-brainer for most computing applications. Plus, it makes SLAuto possible--which has its own benefits, of course.

Saturday, May 03, 2008

Thinking about SLAuto in a frenzied cloud

I've been quite silent for a week or two, mostly because of my responsibilities as a sales engineer; doing my part in closing key deals for my employer. I've spent this time sitting in meetings, installing and configuring software, and measuring power savings in large dev/test lab installations. (By large I mean hundreds approaching thousands of servers.) All in all, its been a successful couple of weeks, but its kept me from keeping too close an eye on the big news coming out of the cloud and utility computing markets.

However, as I thought about this more, I realized that I have drifted significantly from my core subject, Service Level Automation (or SLAuto), in the last six months or so--mostly due to the incredible burst of cloud computing innovation to be announced and/or delivered in that time frame. I still believe that there are two key components to an open cloud market that scales:

  • Portable platforms that allow customers to change vendors on a whim
  • Automation that takes action to acquire, release or replace services based on pre-determined service targets

The latter, simply said, is SLAuto.

Of course, what is happening is sort of the nascent birth of cloud computing technologies, where the DNA hasn't had a chance to recombine to build long term survivability into any given "species" yet. We all knew that AWS was doing cool things, but who knew that they would cross the chasm in terms of customer demand as completely as they did? Yet, there is no portability story for Amazon (at least not off of Amazon); and the market forming for SLAuto (see RightScale and others) is tightly tied to the Amazon platform.

The rest of the "big" announcements are worse: Microsoft has no concept of management in Live Mesh (other than synchronization) that I can see, and Google and Yahoo are both building platforms with developers in mind, where service levels are a business agreement, not a platform differentiator. I understand we are taking baby steps here, but I wonder how long it is before corporate IT realizes that they are both a) locked in (at least in an economic sense), and b) paying too much to operate software that doesn't even run in their data center.

Now, I say all of this, but truth be told, most corporate IT shops don't do SLAuto today. So, why should this change in the cloud? I hinted at it earlier: scale. Not scale of functional execution or data access, as we usually think of the term, but scale of market--the speed at which companies will need to respond to the ever evolving marketplace for cloud services and platforms. As self-professed "open" nature of Google and Yahoo's platforms become more of a reality, combined with true innovation in "industry" standard APIs (for capacity management, code platforms and feature integration), there is little doubt that pressure will be on the IT shop to optimize the cost of delivering business services to the rest of the company. Again, I argue that this cannot be done without SLAuto. Prove me wrong.

I am really concerned that SLAuto is still considered "bleeding edge" in most IT shops. Its not rocket science, and the future of IT cost management almost certainly has to be built around it. On the other hand, perhaps as some of these customers I worked with the last couple of weeks serve as references to the value of SLAuto--at least in terms of energy costs--more of them will understand its urgency.

Wednesday, April 09, 2008

What Google App Engine is NOT

Simon Wardley wrote a post discussing the Google App Engine announcement as a "first step" for them in the "the web as an operating system space". Simon is right, but as I commented on the post:

As I just noted on my blog, perhaps it is critical to look at this from the perspective of web businesses, rather than from enterprise IT's perspective. From the former angle, this is disruptive and revolutionary; from the latter, its a no-op at this point, except perhaps for externally facing web apps.
Simon then wrote an interesting post in response, describing the opportunity that Google has created by open sourcing the App Engine SDK. His core premises can be summed up in the following quote:
Now, whilst Google hasn't provided their environment as open sourced, it has provided an open sourced SDK that "emulates all of the App Engine services on your local computer". This appears, though I'm not a python expert, to contain all the primitives and information needed to build a compatible environment to GoogleAppEngine. This allows for companies, vendors and ISPs to create competing but compatible systems. It's almost as if Google has offered a blueprint for a web operating environment and asked the rest of the community to come compete with them.
And here I have to say, "Well, true, as far as web application hosting goes. But we all know the enterprise is WAY more than that." I think if a commercial product came out that allowed anyone to build a high-scale web environment, with data storage, development tools and operations interfaces within their own infrastructure, that would be very cool. But, as someone who really understands the utility computing space, I want everyone to be clear that this wouldn't help scalability or optimizing resource usage in the following key IT areas:
  1. Portal Services - Yes, an archaic concept to some, but still a critical strategy for delivering work functionality and key information to most knowledge workers. Note that Google does not provide portal support, nor support ANY standard portal interfaces, though you may be able to hack that in Python.

  2. SOA architectures - While it is theoretically possible to build a REST service in App Engine, there is no mechanism to host any other form of services. Yes, you could theoretically leverage services external to the Python app, but this would probably require services and GUI to be located in the same network, to avoid latency issues. Not to mention the fact that there is nothing resembling a messaging infrastructure, or Enterprise Service Bus.

  3. Business Process Automation - This is one of key tactics for gaining business agility, in my opinion, and while I wouldn't doubt someone will write an app to do BPA/I in App Engine, it will be expensive from a resource usage perspective (lots of in/out traffic, storage for quiesced processes and so on).

  4. EAI - Enterprise integration is still the most customized element of IT today, and, as noted in the last two points, there is nothing provided by Google at this point to help with data or application level integration; no data transformation (ala Informatica), no messaging engine, no business process automation, etc., etc., etc.

  5. HPC - Yes, Google is amazingly scalable, but they went out of their way to insist that App Engine is not a grid. It is not designed to--nor do you have the quota to allow you to--send arbitrary compute intensive jobs to the engine for processing.

  6. Server and desktop virtualization - No one does desktop in the cloud today, as far as I know, but Google doesn't even provide virtual servers--useful for hosting and maintenance of legacy applications, if nothing else. I suppose you could run out and convert your productivity apps to Google Apps, your email to GMail, etc., but what about print services?
Not to mention the fact that Google provides no service level guarantees (though I think they will probably do something here when they go GA), no premium support, no integration services, no live customer support (that I know of); in other words, there is a distinct lack of a "throat to choke" here.

Thus, I think most enterprises need to look at Amazon and Google services as just that--services that can be leveraged within their own architectures when it makes sense, rather than wonder-tools that can replace their entire IT infrastructure expenditure. Again, there is probably more bang for the buck today in converting that existing infrastructure into a utility, unless your data center hosts only web-facing applications...but then there is the expense of rewriting them entirely in Python, which may cancel out a tremendous amount of the cost benefits of using App Engine.

So, Simon, I share your excitement about the future of scalable web applications, but my point remains--this is largely a no-op for most enterprise IT organizations.

Tuesday, April 08, 2008

Google App Engine: Forte Software for the Cloud?

I was rather harsh on Google App Engine last night, and I think with good reason. However, as I read more about it today, I am realizing that there is more to this product for web businesses than there is for your typical enterprise. Looking at it from that angle, let me talk about the compelling aspects of Apps Engine for those developing the types of applications that environment is intended to support.

Let me start with some history. In the mid to late nineties, I was a consultant for Forte Software, the Paul Butterworth led distributed application development and deployment tools company. Forte was an amazing company to work for, but it had an even more compelling product to work with.

The basic concept was derived from a simple development scenario. Paul invisioned allowing a developer to:

  1. Write an applications as if they were monolithic, locally executable applications

  2. Name specific objects in the application as "service objects" to act as key interface points (important later)

  3. Test those applications in a local-only configuration

  4. Use a GUI tool to partition the application by dragging and dropping the service objects around the environment as necessary. Developers could also configure service objects to be replicated for load balancing, failover or both.

  5. Test execute the application in its distributed configuration

  6. Deploy and operate the finished application in its final partitioned configuration

  7. Monitor the distributed application and its components for both availability and performance characteristics
Though based on a 4GL at a time that Java was pushing for "open languages", Forte proved to be a very popular tool in a variety of extremely high scalability settings: OnStar, EZPass, Marriott online reservations, the New York state sex offender web site to name but a few.

It wasn't the 4GL that made the product compelling (though it was very good), and certainly not the developer GUI (that was well below average), but this end-to-end developer experience that made the product a winner.

Now flash forward to today, and the TechCrunch article covering their developer's experience in developing and deploying a decent little app in about 4 hours, including deciding on requirements, writing code, debugging, deploying and "launching" on the crunchbase.com domain. In reading through their step by step activities, I was struck hard by the similarities with the Forte experience, with a few positive differences:
  • The tools are now open source themselves, and based on an open source language

  • The need for application partitioning is largely eliminated. Note I said largely, as if you are using a service-based architecture, you will have to hand-code the outbound calls to any services via Google's URL API.

  • Deployment and monitoring is automatic. You never have to worry about what was deployed where when. The capacity is just there (up to your quota).
Now, all of this comes with a cost (which was true of Forte as well): you must agree to living in a proprietary world. In a later post, I am going to talk about another cost (which is common with other platforms): start-up lock-in; suffice to say, your lock-in isn't just the available languages or the libraries you *must* use, but its also the dependency on all of that infrastructure automation that is Google's and Google's alone.

There are also many key application components which seem logically locked into Google: identity, domain management, monitoring and data storage/retrieval. Not necessarily a bad thing, but developers should go in with their eyes wide open.

However, if time to market is your biggest concern, and all you care about is cool web application capabilities, then you now have two choices: Amazon (via Heroku and Zend, for instance) and Google (via App Engine). Each has its language and its limitations, but the experience is largely the same. (I haven't checked to see if the "launch"--e.g. domain assignment--capabilities of Heroku or Zend, match Google's, though, and it doesn't appear that identity services are covered at all.)

None of these really give you service level guarantees, so SLAuto doesn't really apply. However, service levels will be assumed, so if you care, start looking at SLAuto tools that may help in the future.

Again, all of this probably does not apply to enterprise IT, but its a hell of a compelling story for web developers.

Monday, April 07, 2008

Google announces ultimate cloud lock-in platform

I was about to write a long post about how all the big guys are starting with storage as a cloud service (based on the rumor that Google was going to announce BigTable as their first cloud service, and HP's new offering), when I took the time to watch Scoble's (unintentially) multi-part coverage [1] [2] [3] of the mysterious Google announcement (on Qik). And--just to screw with me--do they announce a data-only offering? Of course not, they announce Google App Engine.

Update: Here is a link to the official Google coverage of the announcement on YouTube.

What is Google App Engine? Well, detailed coverage is all over the web; see:

Mike Arrington (TechCrunch)
What this all means: Google App Engine is designed for developers who want to run their entire application stack, soup to nuts, on Google resources. Amazon, by contrast, offers more of an a la carte offering with which developers can pick and choose what resources they want to use.
Bob Warfield (SmoothSpan) [1] [2] [3]
However, the short-short version is it is a complete scalable and manageable runtime environment to build, test and run scalable web applications. (I don't say "highly scalable" for reasons that will be clear later.) This environment is made up of the following five core components (today):
  1. Scalable Serving Infrastructure - Basically the Google infrastructure, including everything but the Python code and web templates themselves

  2. Python Runtime - All of the infrastructure to deliver and execute your application in a distributed environment

  3. Software Development Kit - Allows you to code your application on your local system before deploying to Google.

  4. Web-based Admin Console - A web application including at least simplistic version management (including rollback), running system statistics and errors, access to the datastore (see below) and access to log files

  5. Datastore - BigTable storage (I don't know enough about BigTable yet to say more)
All of this delivered in a free (as of the beta) limited-scale package:

500MB storage
200 Megacycles CPU
10GB Bandwidth In/Out

Should be around 5 million page views a month for the average web application. This is a reasonable scale, but would not qualify as "highly scalable" in most large web properties' books.

What does this add up to, in my opinion? The ultimate cloud lock-in story. (As background, watch Scoble's first video from about 3:17-5:25.) Not a single thing in your web application will not be dependent on Google if you use this technology--not even your Python code. (For proof, check out the "includes" in the coding demo--at around 8:44 of the first video.) Everything you do will depend on a piece of Google intellectual property. You datastore is BigTable, your operations environment is Web Operations Center, etc., etc., etc.

This isn't cloud computing, its just a cool web app hosting tool. OK, I exaggerate. It is cloud, but its exactly the kind of cloud most enterprises should avoid. If you are building a web business, and this tickles your fancy, go for it. You can't beat the price, and you've got to love the feature set. If you are a Fortune 500 looking for where to launch your next CRM interface, forget it. There are safer ships to sail than this--e.g. Amazon EC2 (et. al.), Mosso, etc.; better yet, convert what you have.

If it sounds like I am being reactionary to this announcement, I suppose I am in a way. Unfortunately, I have spent a lot of time thinking about how today's high-scale business systems will move to the cloud, and I think the market needs more maturity before this can be done safely. You need flexibility of the type and architecture of your application, and which components you choose to leverage. There is no such choice with Google.

The best part of Scoble's coverage was when he talked to two developers at the end (~18:15). One (Michael Malone) notes the biggest problem is "lock-in". The woman standing next to him (Mia Culver) calls it a "proprietary platform".

I love it. There is no fooling this savvy, open source focused market. If you want to win hearts and minds, be open. When the hell are we going to get that application portability standard we've been demanding, eh?

(On a side note, the required demo for cloud application development is now to build a web app from scratch and deploy it so the audience can access it from their laptops in 5-8 minutes. Google did it tonight, and Heroku did it at the Cloud Demo Night earlier this month.)

Some more of my notes from the announcement:
Can't do:
  1. No write to file system. (Reads OK, so you can use props files, etc.)

  2. No direct web calls (instead utilizes "URL fetch" API)

  3. No threads (single thread only, but distributed across multiple systems)

  4. Python only first language, looking for input on next language to attack (must have runtime that can be "hardened")
Administration Console gives the ability to see and manipulate running app code (by version) and data

Is the identity environment for all hosted apps Google login? Is everyone comfortable with this?

The initial 10,000 beta accounts may already be gone.

Quota based, no ability to grow past above for now.

Also, no "offline processing" today, but looking into it for future. (Sounds like batch stuff, etc.)
I have an interesting experiment I wish I could get to. I want to marry Scalr, the open source Amazon EC2 automation environment with a policy-based SLAuto environment to get the ultimate in flexible, open and coding agnostic autonomic operations, both in the cloud and "at home". Anyone want to beat me to it? (Come to think of it, why is Google still hosting Scalr now that App Engine is live? Hmmmm....)

Wednesday, March 19, 2008

The Social Enterprise Opportunity

I want to begin today with a quick shout-out to my fellow bloggers at Data Center Knowledge. In a recent post, they identified me as one of the bloggers they follow for cloud and utility computing, and I'm honored to me included among such a strong list of bloggers. (Rich Miller, who posted the list, is no slouch himself.) Update: I violated the cardinal rule of Internet social networking: assuming a given name applies to one person. Rich Miller from Data Center Knowledge is not the same Rich Miller that writes Telematique. My apologies to both.

One of those bloggers is Phil Wainwright, whose Software as Services blog is one of my regular reads. He is the most aggressive, forward thinker in the SaaS space, and he is very often sees opportunity that most of us miss. (Phil's blog is also a great way to stay on top of the companies and technologies that specifically support the SaaS market.)

Phil recently wrote an interesting post about SaaS and Web 2.0 concepts, titled "Enter the socialprise", in which he points out that the very nature of an "enterprise" is changing thanks to the Internet and cloud computing concepts. He notes that loyalty between individuals is replacing corporate loyalty, and that social networking on the Internet is creating a new work economy for individual knowledge workers.

He then goes on to challenge enterprise computing models:

But enterprise computing is still designed for the old, stovepipe model in which every transaction took place within the same firm. There’s no connection with the social automation that’s happening between individuals. Many enterprises even resist talking about social networking. And even when an application vendor adds some kind of social networking features, there’s always the suspicion that they’re just painting social lipstick on a stovepipe pig.

This yawning chasm is an opportunity for a new class of applications to emerge that can harness the social networks between individuals and make them relevant to the enterprise. Or perhaps reinvent a new kind of enterprise, better suited to the low-friction reality of the connected Web. Enter the socialprise.

The example he gives of a company leveraging this is InsideView, which is creating a very cool sales intelligence application that integrates with major SaaS CRM vendor products to aggregate information from a variety of online sources into a single prospect activity dashboard. This is an incredibly cool example of how rich data about individuals within and across firms can be used at an enterprise level.

Another product that is similar that struck me was JobScience, which is one of the companies whose blog is in the Data Center Knowledge list referenced above. JobScience is using force.com to create a rich social intelligence engine for Salesforce.com customers. Their product, aptly called Genius, is an excellent example of what they are able to do. Read the post for all the features, but my favorite is:
The Genius Tracker. Not only does the tracker pop up to tell me an email recipient has just opened my email, or is visiting my web site, but the more important intelligence this gives me is that this prospect is is online and engaged with our solution. If a sales rep can call 40 people in a day, and a blast to 5000 prospects shows me that 40 of those prospects are online and engaged, it doesn’t take a genius to figure out who to call. That rep’s going to have a much more productive day calling people who they know are in the office. Less voicemails, less brushoffs, less calls to people who don’t work there anymore.
Bordering on privacy issues, I know, but an amazing level of detail, and invaluable if used wisely. More importantly, it goes to show what is possible in a stable, shared application environment.

By the way, this direct integration with a given CRM platform by a "value added extender" is an interesting twist to the dependency issues that Bob Warfield writes about on the SmoothSpan blog. JobScience's products are services that become a feature of the destination both visually as well as functionally. Bob's point about being a component provider to the actual product is well taken, and I wonder if the only exit strategy for these guys is acquisition by Salesforce. What else can they hope for as a company dependent on force.com? Talk about cloud lock-in.

Wednesday, February 20, 2008

Data Goes SLAuto at Oracle

Thanks to Steve Jones, check out this presentation from David Chappell, Oracle VP and CTO of SOA, titled "Next-Generation Grid Enabled SOA". (A shorter written article can be found in at SOA Magazine's site.) Chappell outlines the work that Oracle is doing at turning the traditional model of application scalability on its head; instead of a fixed amount of database resources and scaling the applications/services horizontally, scale the database (using a cool complex adaptive systems approach) and alleviate much of the need to scale apps and services (except for CPU bound services). For someone like me, that's mind blowing.

Add to that the fact that the data management functions are relatively homogenous (though the infrastructure may not be), and aware of its resource utilization, and you can see why they are claiming a certain amount of hardware-metric based SLAuto.

(Hardware metric based SLAuto is based in measurements of hardware components, such as CPU utilization, memory utilization and so on. Software-based SLAuto usually uses business metrics such as transaction rates, active accounts, etc. to make scaling decisions.)

The catch? Well, everything must be written to use the "Data Grid" if its to take advantage of these capabilities. Legacy applications need not apply. (Could be the deal killer for David's "Not your MOM's Bus" concept.)

It seems to me that if Oracle wants this approach to catch on, it should open source a reference implementation as soon as possible. I'm not an expert at the most recent data processing approaches, but it would seem to me that Map-Reduce approaches would be complimentary to the Data Grid. However, Hadoop implementations would generally only be integrated with a data grid if there was an open source alternative. Otherwise, MySQL will continue to be the first choice. Open Source would also speed up integration between the data grid and infrastructure automation such as Cassatt and its competitors.

Dave hints at a URL for more info on the Oracle site, but I can't find it. If anyone tracks it down, I would appreciate any help I can get.

Thursday, February 14, 2008

Latency: Obstacle to cloud computing, or opportunity?

I was challenged in the comments to my Cloud Computing Heats Up post regarding my criticism of pupwhines' post that in turn criticized cloud computing. The anonymous author of the comment thought I was too hard on pupwhines, and wanted to know what my response specifically was to the challenge latency presents to distributed computing. I responded there, but I want to expand a little bit on the topic, as it is indeed important to understand, and backs my contention that there will need to be some software architectural changes made to leverage the cloud system.

(Quick note: I've alluded to this before, but I strongly believe there is no one cloud, but a bunch of siloed clouds today with *some* limited integration between them. More of a frontal system, really.)

Latency is an issue in most IT application environments today. There is no question that "traditional" tiered application design scales well at the processing layer, but has real issues at the data layer. There is simply no easy way to manage a traditional relational database architecture over a widely distributed environment. Pupwhines' contention that joining a table between two SaaS vendor implementations would be disaster is right on. In modern technology terms, it would be insane.

However, this is the disruptive aspect of cloud computing: the architectures you know and love are no longer necessarily best practices in a world where your functionality and capacity is:

  1. not necessarily your own,
  2. not necessarily integrated, and
  3. splayed out across this 5.1×108 km2 rock we live on
There are new technical advances being made today in the companies that already rely on cloud principles (think Google, Amazon, Microsoft, etc.). These advances will change the way you design and deploy software, but they will enable a world where proximity of data means less and less.

In fact, you probably already leverage one of these technical changes: increased bandwidth. Indulge me in some autobiographical narrative to illustrate.

Back in the late '90s, while I was a Senior Principal Consultant with Forte Software, Inc, the legendary(?) distributed application development platform vendor, one of my key roles was advising clients on how to best architect for high performance, high scalability and high availability. Forte was an early service oriented architecture, but it ran on the 10Mb/100Mb networks of the time. Thus, the rule for message passing between components (UI<->service or service<->service) was (in order of priority):
  1. Send as few messages over the network as possible
  2. Send the smallest messages possible

Thus, it was better to send large messages once than many small messages, but you wanted to optimize each message as much as possible.

To this end, best practices was to create data services and to actually deploy these services directly on the database server hardware. It was more important to process the relational mapping of data into the object mapping according to need in a timely fashion--thus avoiding unecessary network traffic--than to divide processing responsibility so that there was no custom application components running on the RDBMS hardware.

Fast forward to the 1G/10G networks of today. From what I am seeing, it is actually considered bad practice to do what I described above. While at Sun, I actually got admonished by a (very competent) manager for suggesting the way around Sun Access Managers horrible performance was to deploy the identity server and database on the same box (with our custom login and registration UIs deployed on separate, horizontally scalable servers). Pure architectural heresy. He was right in many ways: doing so would have put the business logic tier into horizontally locked architecture, but that wasn't his point. "We don't deploy our software on our database servers" was the gist of his argument.

So, faster networks have already changed the so-called "laws of physics" that software architects must design around. Given this, it seems easy to postulate that additional advances in network bandwidth will open additional opportunities for architectural change. In fact, it already has; check out Gigaspaces for a cool (though controversial) alternative to horizontally replicated service architectures.

Will bandwidth really grow at a rate that will make a difference to the current IT generation(s)? Many postulate it has to, even if the core network operators resist. As I noted in my response, Cisco's new Nexus 7000 series is a sign of times to come. Does anyone deny that 40G and 100G networks have the opportunity to change the laws of physics? (Disclaimer: I know just enough about networking to be dangerous, so I may be overstating the case...but change is still clearly on the horizon.)

Even if network bandwidth doesn't change at all, or any additional bandwidth is chewed up by demand at existing rates, there are other software architectural advances that will revolutionize certain kinds of computing. I spoke in my response about MapReduce and its open source implementation, Hadoop. For processing large, distributed data loads, this architecture is eliminating boundaries created by traditional scaleout RDBMS-based approaches. Google has used this approach to tie data from every one of its properties (including acquired properties, such as Blogger) into a single user identity and profile. Talk about a distributed join problem...

(Another quick note: hats off to Google and Yahoo respectively for their work in this space. I know from my past life at Sun what a pain in the whatever this is, and I love the seamlessness I experience on these sites.)

One other major advancement is the increasing sophistication of business integration technologies, from traditional application integration (force.com, boomi.com, BizTalk, Lombardi, etc.) to data integration options (Informatica, Business Objects, etc.) to subscription data propogation techniques. These integration options can allow one to go back to some of the basics I spoke of before: do as much processing as possible on Saas vendor A's infrastructure before sharing the relevant data with vendor B. Not as perfect as a join in many cases, but in a service oriented world, a common, required approach for most.

Perhaps the most important point I want to make today, however, is that today--in the modern IT era--many of these technologies are either future tech or not what was used to build existing applications. Given that, what does an existing datacenter do? Stick to my recommendation; convert your own datacenter into a utility/cloud today, and begin to leverage the maturing compute grid/cloud computing ecosystem as it and your applications mature.

Wednesday, February 06, 2008

Cloud computing heats up

Today's reading has been especially interesting, as it has become clear that a) "cloud computing" is a concept that more and more IT people are beginning to understand and dissect, and b) there is the corresponding denial that comes with any disruptive change. Let me walk you through my reading to demonstrate.

I always start with Nick Carr, and today he did not disappoint. It seems that IBM has posited that a single (distributed) computer could be built that could run the entire Internet, and expand as needed to meet demand. Of course, this would require the use of Blue Gene, an IBM technology, but man does it feed right into Nick's vision of the World Wide Computer. To Nick's credit, he seems skeptical--I know I am. However, it is a worthy thought experiment to think how one would design distributed computing to be more efficient if one had control over the entire architecture from chip to system software. (Er, come to think of it, I could imagine Apple designing a compute cloud...)

I then came across an interesting breakdown of cloud computing by John M Willis, who appears to contribute to redmonk. He breaks down the cloud according to "capacity-on-demand" options, and is one of the few to include a "turn your own capacity into a utility" component. Unfortunately, he needs a little education of these particular options, but I did my best to set him straight. (I appreciate his kind response to my comment.) If you are trying to understand how to break down the "capacity-on-demand" market, this post (along with the comments) is an excellent starting place.

Next on the list was a GigaOm post by Nitin Borwankar stating his concept of "Data Property Rights" and expressing some skepticism about the "data portability" movement. At first I was concerned that he was going to make an argument reinforced certain cloud lock-in principles, but he actually makes a lot of sense. I still want to see Data Portability as an element of his basic rights list, but he is correct when he says if the other elements are handled correctly, data portability will be a largely moot issue (though I would argue it remains a "last resort" property right).

Dana Blankenhorn at ZDNet/open-source covers a concept being put forth by Etelos, a company I find difficult to describe, but that seems to be an "application-on-demand" company (interesting concept). "Opportunity computing", as described by Etelos CEO Danny Kolke describes the complete set of software and infrastructure required to meet a market opportunity on a moments notice. “Opportunity computing is really a superset of utility computing,” Kolke notes. Blankenhorn adds,


"It’s when you look at the tools Kolke is talking about that you begin to get the picture. He’s combining advertising, applications, the cash register, and all the relationships which go into those elements in his model. "

In other words, it seems like prebuilt ecommerce, CRM and other applications that can quickly be customized and deployed as needed, to the hosting solution of your choice. My experience with this kind of thing is that it is impossible to satisfy all of the people, all of the time, but I'm fascinated by the concept. Sort of Platform as a Service with a twist.

Finally, the denial. The blog "pupwhines" remains true to its name as its author whimpers about how Nick "has figured out that companies can write their own code and then run it in an outsourced data center." Those of you that have been following utility/cloud computing know that this misses the point entirely. Its not outsourcing capacity that is new, but its the way it is outsourced--no contracts for labor, no work-order charges for capacity changes, etc. In other words, just pay for the compute time.

With SLAuto, it gets even more interesting as you would just tell the cloud "run this software at these service levels", and the who, what, where and how would be completely hidden from you. To equate that with the old IBM/Accenture/{Insert Indian company here} mode of outsourcing is like comparing your electric utility to renting generators from your neighbors. (OK, not a great analogy, but you get the picture.)

Another interesting data point for measuring the booming interest in utility and cloud computing is the fact that my Google Alerts emails for both terms have grown from one or two links a day, to five or more links each and every day. People are talking about this stuff because the economics are so compelling its impossible not to. Just remember to think before you jump on in.

Tuesday, January 29, 2008

One Step To Prepare For Cloud Computing

Some of you may be wondering why I am making such a big stink about software architecture on a blog about service level automation (SLAuto). Well, as Todd Biske points out, "the relationships (and potentially collisions) between the worlds of enterprise system management, business process management, web service management, business activity monitoring, and business intelligence" are easier to resolve if the appropriate access to metrics is provided for a software service. For SLAuto, this means the more feedback you can provide from the service, process, data and infrastructure levels of your software architecture, the easier it is to automate service level compliance.

Let's look at a few examples for each level:

  • Service/Application: From the end user's perspective, this is what service levels are all about. Key metrics such as transaction rates (how many orders/hour, etc.), response times, error rates, and availability are what the end users of a service (e.g. consumers, business stakeholders, etc.) really care about.
  • Business Process: Business process metrics can warn the SLAuto environment about cross-service issues, business rule violations or other extraordinary conditions in the process cycle that would warrant capacity changes at the BPM or service levels.
  • Data Storage/Management: Primarily, this layer can inform the SLAuto system about storage needs and storage provisioning, which in turn is critical to automated deployment of applications into a dynamic environment.
  • Infrastructure: This is the most common form of metric used to make SLAuto decisions today. Such metrics as CPU utilization, memory utilization and I/O rates are commonly used in both virtualized and non-virtualized automated environments.

As noted, digital measurement of these data points can feed an SLAuto policy engine to trigger capacity adjustment, failure recover or other applicable actions as necessary to remain within defined service thresholds. While most of the technology required to support SLAuto is available, the truth is that the monitoring/metrics side of things is the most uncharted territory. As an action item, I ask all of you to take Todd's words of wisdom into account, and design not only for functionality, but also manageability. This will aid you greatly in the quest to build fluid systems that can best take ad