The Wisdom of Clouds: April 2008

Tuesday, April 29, 2008

Yahoo goes Social with Paas Offering

Well, no time to really expound on this, but I thought it was important to highlight: Yahoo! announced a PaaS offering at Web 2.0, and it is yet another interesting twist on a theme. The best overview I found is a video of Yahoo! CTO Ari Balogh's keynote at Web2.0.

What sets Yahoo!'s offering apart (at least in theory--it isn't all delivered yet) is the focus on turning all of Yahoo's properties, services and content into:

An open API based mash-up ready smorgasbord of development opportunity, complete with development environment and optional hosting in their infrastructure.
A completely interconnected social network that differentiates itself by being a feature, not a destination. This, for me, is a wise move on Yahoo's part, as no one else is willing to say their network is simply a part of the overall user experience of a destination, rather than being a destination that users must conciously choose to navigate to to use its advantages.

I think Yahoo! is looking at an interesting play, though you have to wonder how they will steal developer mind-share from Microsoft and Google--that is, unless they become either Microsoft or Google...

More in the next few days when I have time. Till then, stop by my main page to check out what I am reading on a day-to-day basis, with some commentary. You can comment yourself on my page at FriendFeed.

Wednesday, April 23, 2008

Moshing on the Mesh

Ray Ozzie is a rock star, but his band's latest album probably seems a little inaccessible at first. At least, that's the way I read the initial response to Microsoft's announcements at Web2.0 this week. Ray and the Mister Softy Band have released to "airplay"--at least a little--the mysterious cloud strategy that many of us have been anxiously awaiting for some time now. While arriving at the show fashionably late, the Mister Softy Band is laying a groove that will address the consumer market in ways that strongly challenge Google, be interesting to business, and demonstrate even more clearly how AWS is more of a hosting platform than anything.

Details of the announcement are everywhere, but here are the highlights for me:

The core of the concept is a virtual desktop hosted in Microsoft's data centers, to which you connect any compatible device (PC, mobile, etc., but Windows only for now).
Within that desktop, folders can be created which allow you to store whatever you want to share (documents, photos, videos, music, etc.) among your devices.
Folders can even be shared with other friends or family members using a social network built into the mesh.
The mesh uses a two way RSS/ATOM mechanism (FeedSync) to sync not only files, but also applications between devices

That last item is key, because while this may look at the start as nothing more than a grandiose social network with storage, its actually much more than that. Ray's vision is to provide a platform for developers that can leverage the syncing capability, along with some other framework components, to build applications that truly live within and through the mesh.

This is ambitious as hell, and I have to give "the band" credit for their vision. While tried and true MS "lock-'em-in...lock-'em-all-in" hardcore, it is a completely different sound than what Google, Amazon and even Intuit have released. Its a place to live in the cloud, rather than simply a stopping point. And, while the open source community is rightfully skeptical, there are hundreds of thousands of Microsoft loyal developers out there who will make this thing work for them. That, in turn, creates a market that the rest of the cloud would do well to keep an eye on.

So, now I see the following experiments in the nascent cloud market:

Amazon: Pure Capacity-On-Demand with scalable components available ala carte
Mosso: Pure Capacity-On-Demand in a hosted model with flat rate for normal usage
Google: Platform-as-a-Service targeted at Internet facing web applications and optimizing developer experience for highly scalable web application development and deployment
Intuit: Platform-as-a-Service targeted at Internet facing financial applications using their QuickBooks platform
Microsoft: Virtual Desktop and Platform-as-a-Service targeted at providing a complete online compute environment from a end user point of view

Update: Bob Warfield at SmoothSpan has a post that is making me rethink some of my enthusiasm for the mesh.

Friday, April 18, 2008

Intuit Makes Play for Finances In The Cloud

It's end of quarter time here at Camp Cassatt, and my Sales Engineering role will probably overwhelm my Technology Evangelist/Field Technologist/Trendy-Title-of-the-Month role. This, unfortunately, means fewer blogs at an incredibly fast moving, "heady" (as they say) time in autonomic/utility/cloud computing.

The biggest news of a very news filled several days is that Intuit will announce today that they will launch a PaaS (Platform as a Service) offering for developers wishing to target the 3 million strong QuickBase market. I came across this news via Bob Warfield at SmoothSpan, and his analysis of the coming announcement is thorough and very intriguing. You may or may not know this, but Intuit is completely changing their business model, moving away from shrink-wrap and going completely towards a SaaS/PaaS model.

The PaaS offering (officially the "QuickBase Developer Program") is in private beta right now (request an invite), and perhaps most closely aligned with Salesforce.com's force.com offering; namely it provides a platform for developers to add value to Intuit's existing and coming online products. The platform uses Adobe's Flex for client development and QuickBase for the server, a decision that clearly meets with Bob's approval. (Hmmm. Which to learn first, Python or Flex?) See Bob's post for the nitty gritty details, however.

I am most excited about the philosophical difference between Intuit's approach and Salesforce.com's approach. First and foremost, Intuit seems to be stating up front that they are committed to supporting data portability, and keeping it relatively simple for developers to move data off of their platform. Bill Lucchini, VP and GM of Inutit QuickBase was directly quoted as saying:

We never want to lock anyone in. But we want the customer to choose us because we offer more value. That’s why we didn’t create our own language like Apex, we chose Flex. We won’t stop anybody from leaving. Vendors have to double down and work harder to keep customers loyal.

Now, truth is that if no one else uses Flex (except maybe Adobe) and you are using Intuit's libraries, the code is locked in...at least until Intuit either open sources or licenses its platform components to other capacity providers. However, data is another issue, and having a PaaS vendor commit to data portability is incredibly refreshing.

Update: Bob and I have had a little comment exchange [1] [2] [3] on whether or not there really is lock-in here. Bob makes a good point about the portability of a Flex client, but I counter it is the end-to-end functionality that is locked in to Intuit, not just the client, and that this is probably OK with a large class of potential users. Just not large enterprises.

Tuesday, April 15, 2008

Google App Engine: How AppDrop Does and Does Not Affect Lock-in

The cloud computing blogging world is abuzz with the news that Chris Anderson has created an interesting experiment in which he has created GAE-compliant hosting on Amazon EC2. This is an experiment that I've actually been looking at myself, but having a day job that is intensely busy right now, haven't had a chance to get to.

What did Chris do? In short, he got a working copy of the GAE SDK working on Amazon virtual machines with some modification. There are limitations:

It will not scale (none of the Google "secret sauce").
It does not support email at this time (though the source code is available for anyone who wants to add it).
It could go down at any time, either due to an EC2 outage (not very likely), or because whoever is paying for this doesn't want to foot the bill anymore (much more likely).

The implications of this are big, however. A couple of days ago, in an email exchange with Simon Wardley, I did a little analysis of what it would take to do a scalable version of this in open source. I don't have the depth of knowledge necessary to identify all of the relevant projects, but here's what I came up with:

I think you are indeed correct, though I think a lot of the attention has been on what we discussed earlier: what is GAE *not* capable of doing (right now), and with regards to what it *is* capable of doing, how do you take advantage of Google's amazing infrastructure.
That being said, I think you are on to something potentially big: through the open source SDK, there is an opening for any other hosting company or "OS for the data center" company to provide a "GAE-to-go" solution. All of the SDK would have to be supported--and there is a lot that is specific to Google right now--and the solution would need to be at least comparably scalable, etc.
The good news is that, as your run down the SDK, there are many simple solutions possible, or even existing open source projects available:
The Python Runtime - This is just a python interpreter with a series of rules imposed, running on a autonomic scalable infrastructure. The interpreter can be harvested from the SDK (with some modification, I am sure), and the scalable infrastructure can be damn near anything with a policy-based management component.
The Datastore API - Google's BigTable data store is based on Map/Reduce. I am not exactly sure how well this maps to Hadoop, but that is where I would start. Regardless, I would expect wrapping and/or extending Hadoop to support GQL is the minimum involved.
The Users API - This may indeed be the biggest problem area, but as I read the API, they have hidden complexities such as generating login/logout URLs and accessing account data. Assuming I read the docs correctly, other than "nickname" and "email", Google assumes nothing about what defines a user. Thus, Google's accounts do not have to be used for the APIs to be functional. However, will the community expect a shared identity store, or at least the ability to choose to use Google's accounts?
The URL Fetch API - This was implemented for both security and scalability reasons. Not sure what exists to map it to, but I think it is a function of the identity infrastructure, and how you scale the Python Runtime. In other words, you'd need to map these functions to the appropriate mechanisms in each of the other infrastructure elements.
The Mail API - I would assume there is something close out there, but if not you would need to wrapper a scalable email system with the Python classes defined in the API. Doesn't seem overly hard.
Finally, given the fact that the source code for the "faker" dev environment is open source, there is a lot of basic sample code for many of these "wrapper" APIs. The trick is to find developers that know how to do this at high scale--perhaps request participation at highscalability.org?

Now, the potential "gotchas" here are that you are working on the same limitations that Google has set for itself, can only "officially" extend the API when Google adds something or agrees to implement your requirements (they own the "open source community"), and you would need to test in a real-world high scale environment, which could be expensive (though perhaps, ironically, Amazon could be of some use here).
By the way, none of this solves VMWare's portability issues, Amazon EC2's portability issues or even Cassatt and/or our competitors portability issues. It simply provides a portable web application environment that uses a "sandbox" approach for application execution. Again, a start (and an exciting one), but only a piece of the overall puzzle. Frankly, I think an Amazon portability story would be much more generally interesting to enterprise IT. But, that's just me.

I would think that Chris or others that see the possibilities will get on this. Hell, they are already probably "on this".

By the way, as part of a post questioning Google's lock-in issues, Tim O'Reilly at Radar O'Reilly made an interesting observation about why even if the code is portable, an AppEngine application today is still "locked-in" to Google's site. To set up his argument, he quotes venture capitalist Brad Feld:

At *this* moment in time, it would be difficult to move apps off of AppEngine. Doing that in EC2 is trivial. This, to me, is the biggest issue, as I believe it could make startups less-interesting from an acquisition perspective by anyone other than Google. This will most likely change as people develop compatibility layers. However, Google has yet to provide any information about how to migrate data from their datastore the best I can tell. If you have a substantial amount of data, you can't just write code to dump it because they will only let any request run for a short period before they terminate it.

Tim then goes on to say:

This last point is really very serious. I've been warning for some time that the first phase of Web 2.0 is the acquisition of critical mass via network effects, but that once companies achieve that critical mass, they will be tempted to consolidate their position, leading ultimately to a replay of the personal computer industry's sad decline from an open, energetic marketplace to a controlled economy.

What remains to be seen is how Google's plans to allow for larger data transfers will affect this (see Phil Wainright's post covering the business-readiness of GAE). If they allow unlimited fast transfer--not necessarily for free, but at a reasonable price--they will establish themselves as a truly open platform, competing on their amazing infrastructure and technology innovation. Now, combined with a compatible open source platform, that would be game changing.

Wednesday, April 09, 2008

What Google App Engine is NOT

Simon Wardley wrote a post discussing the Google App Engine announcement as a "first step" for them in the "the web as an operating system space". Simon is right, but as I commented on the post:

As I just noted on my blog, perhaps it is critical to look at this from the perspective of web businesses, rather than from enterprise IT's perspective. From the former angle, this is disruptive and revolutionary; from the latter, its a no-op at this point, except perhaps for externally facing web apps.

Simon then wrote an interesting post in response, describing the opportunity that Google has created by open sourcing the App Engine SDK. His core premises can be summed up in the following quote:

Now, whilst Google hasn't provided their environment as open sourced, it has provided an open sourced SDK that "emulates all of the App Engine services on your local computer". This appears, though I'm not a python expert, to contain all the primitives and information needed to build a compatible environment to GoogleAppEngine. This allows for companies, vendors and ISPs to create competing but compatible systems. It's almost as if Google has offered a blueprint for a web operating environment and asked the rest of the community to come compete with them.

And here I have to say, "Well, true, as far as web application hosting goes. But we all know the enterprise is WAY more than that." I think if a commercial product came out that allowed anyone to build a high-scale web environment, with data storage, development tools and operations interfaces within their own infrastructure, that would be very cool. But, as someone who really understands the utility computing space, I want everyone to be clear that this wouldn't help scalability or optimizing resource usage in the following key IT areas:

Portal Services - Yes, an archaic concept to some, but still a critical strategy for delivering work functionality and key information to most knowledge workers. Note that Google does not provide portal support, nor support ANY standard portal interfaces, though you may be able to hack that in Python.
SOA architectures - While it is theoretically possible to build a REST service in App Engine, there is no mechanism to host any other form of services. Yes, you could theoretically leverage services external to the Python app, but this would probably require services and GUI to be located in the same network, to avoid latency issues. Not to mention the fact that there is nothing resembling a messaging infrastructure, or Enterprise Service Bus.
Business Process Automation - This is one of key tactics for gaining business agility, in my opinion, and while I wouldn't doubt someone will write an app to do BPA/I in App Engine, it will be expensive from a resource usage perspective (lots of in/out traffic, storage for quiesced processes and so on).
EAI - Enterprise integration is still the most customized element of IT today, and, as noted in the last two points, there is nothing provided by Google at this point to help with data or application level integration; no data transformation (ala Informatica), no messaging engine, no business process automation, etc., etc., etc.
HPC - Yes, Google is amazingly scalable, but they went out of their way to insist that App Engine is not a grid. It is not designed to--nor do you have the quota to allow you to--send arbitrary compute intensive jobs to the engine for processing.
Server and desktop virtualization - No one does desktop in the cloud today, as far as I know, but Google doesn't even provide virtual servers--useful for hosting and maintenance of legacy applications, if nothing else. I suppose you could run out and convert your productivity apps to Google Apps, your email to GMail, etc., but what about print services?

Not to mention the fact that Google provides no service level guarantees (though I think they will probably do something here when they go GA), no premium support, no integration services, no live customer support (that I know of); in other words, there is a distinct lack of a "throat to choke" here.

Thus, I think most enterprises need to look at Amazon and Google services as just that--services that can be leveraged within their own architectures when it makes sense, rather than wonder-tools that can replace their entire IT infrastructure expenditure. Again, there is probably more bang for the buck today in converting that existing infrastructure into a utility, unless your data center hosts only web-facing applications...but then there is the expense of rewriting them entirely in Python, which may cancel out a tremendous amount of the cost benefits of using App Engine.

So, Simon, I share your excitement about the future of scalable web applications, but my point remains--this is largely a no-op for most enterprise IT organizations.

Tuesday, April 08, 2008

Google App Engine: Forte Software for the Cloud?

I was rather harsh on Google App Engine last night, and I think with good reason. However, as I read more about it today, I am realizing that there is more to this product for web businesses than there is for your typical enterprise. Looking at it from that angle, let me talk about the compelling aspects of Apps Engine for those developing the types of applications that environment is intended to support.

Let me start with some history. In the mid to late nineties, I was a consultant for Forte Software, the Paul Butterworth led distributed application development and deployment tools company. Forte was an amazing company to work for, but it had an even more compelling product to work with.

The basic concept was derived from a simple development scenario. Paul invisioned allowing a developer to:

Write an applications as if they were monolithic, locally executable applications
Name specific objects in the application as "service objects" to act as key interface points (important later)
Test those applications in a local-only configuration
Use a GUI tool to partition the application by dragging and dropping the service objects around the environment as necessary. Developers could also configure service objects to be replicated for load balancing, failover or both.
Test execute the application in its distributed configuration
Deploy and operate the finished application in its final partitioned configuration
Monitor the distributed application and its components for both availability and performance characteristics

Though based on a 4GL at a time that Java was pushing for "open languages", Forte proved to be a very popular tool in a variety of extremely high scalability settings: OnStar, EZPass, Marriott online reservations, the New York state sex offender web site to name but a few.

It wasn't the 4GL that made the product compelling (though it was very good), and certainly not the developer GUI (that was well below average), but this end-to-end developer experience that made the product a winner.

Now flash forward to today, and the TechCrunch article covering their developer's experience in developing and deploying a decent little app in about 4 hours, including deciding on requirements, writing code, debugging, deploying and "launching" on the crunchbase.com domain. In reading through their step by step activities, I was struck hard by the similarities with the Forte experience, with a few positive differences:

The tools are now open source themselves, and based on an open source language
The need for application partitioning is largely eliminated. Note I said largely, as if you are using a service-based architecture, you will have to hand-code the outbound calls to any services via Google's URL API.
Deployment and monitoring is automatic. You never have to worry about what was deployed where when. The capacity is just there (up to your quota).

Now, all of this comes with a cost (which was true of Forte as well): you must agree to living in a proprietary world. In a later post, I am going to talk about another cost (which is common with other platforms): start-up lock-in; suffice to say, your lock-in isn't just the available languages or the libraries you *must* use, but its also the dependency on all of that infrastructure automation that is Google's and Google's alone.

There are also many key application components which seem logically locked into Google: identity, domain management, monitoring and data storage/retrieval. Not necessarily a bad thing, but developers should go in with their eyes wide open.

However, if time to market is your biggest concern, and all you care about is cool web application capabilities, then you now have two choices: Amazon (via Heroku and Zend, for instance) and Google (via App Engine). Each has its language and its limitations, but the experience is largely the same. (I haven't checked to see if the "launch"--e.g. domain assignment--capabilities of Heroku or Zend, match Google's, though, and it doesn't appear that identity services are covered at all.)

None of these really give you service level guarantees, so SLAuto doesn't really apply. However, service levels will be assumed, so if you care, start looking at SLAuto tools that may help in the future.

Again, all of this probably does not apply to enterprise IT, but its a hell of a compelling story for web developers.

Monday, April 07, 2008

Google announces ultimate cloud lock-in platform

I was about to write a long post about how all the big guys are starting with storage as a cloud service (based on the rumor that Google was going to announce BigTable as their first cloud service, and HP's new offering), when I took the time to watch Scoble's (unintentially) multi-part coverage [1] [2] [3] of the mysterious Google announcement (on Qik). And--just to screw with me--do they announce a data-only offering? Of course not, they announce Google App Engine.

Update: Here is a link to the official Google coverage of the announcement on YouTube.

What is Google App Engine? Well, detailed coverage is all over the web; see:

Mike Arrington (TechCrunch)
What this all means: Google App Engine is designed for developers who want to run their entire application stack, soup to nuts, on Google resources. Amazon, by contrast, offers more of an a la carte offering with which developers can pick and choose what resources they want to use.
Bob Warfield (SmoothSpan) [1] [2] [3]

However, the short-short version is it is a complete scalable and manageable runtime environment to build, test and run scalable web applications. (I don't say "highly scalable" for reasons that will be clear later.) This environment is made up of the following five core components (today):

Scalable Serving Infrastructure - Basically the Google infrastructure, including everything but the Python code and web templates themselves
Python Runtime - All of the infrastructure to deliver and execute your application in a distributed environment
Software Development Kit - Allows you to code your application on your local system before deploying to Google.
Web-based Admin Console - A web application including at least simplistic version management (including rollback), running system statistics and errors, access to the datastore (see below) and access to log files
Datastore - BigTable storage (I don't know enough about BigTable yet to say more)

All of this delivered in a free (as of the beta) limited-scale package:

500MB storage
200 Megacycles CPU
10GB Bandwidth In/Out

Should be around 5 million page views a month for the average web application. This is a reasonable scale, but would not qualify as "highly scalable" in most large web properties' books.

What does this add up to, in my opinion? The ultimate cloud lock-in story. (As background, watch Scoble's first video from about 3:17-5:25.) Not a single thing in your web application will not be dependent on Google if you use this technology--not even your Python code. (For proof, check out the "includes" in the coding demo--at around 8:44 of the first video.) Everything you do will depend on a piece of Google intellectual property. You datastore is BigTable, your operations environment is Web Operations Center, etc., etc., etc.

This isn't cloud computing, its just a cool web app hosting tool. OK, I exaggerate. It is cloud, but its exactly the kind of cloud most enterprises should avoid. If you are building a web business, and this tickles your fancy, go for it. You can't beat the price, and you've got to love the feature set. If you are a Fortune 500 looking for where to launch your next CRM interface, forget it. There are safer ships to sail than this--e.g. Amazon EC2 (et. al.), Mosso, etc.; better yet, convert what you have.

If it sounds like I am being reactionary to this announcement, I suppose I am in a way. Unfortunately, I have spent a lot of time thinking about how today's high-scale business systems will move to the cloud, and I think the market needs more maturity before this can be done safely. You need flexibility of the type and architecture of your application, and which components you choose to leverage. There is no such choice with Google.

The best part of Scoble's coverage was when he talked to two developers at the end (~18:15). One (Michael Malone) notes the biggest problem is "lock-in". The woman standing next to him (Mia Culver) calls it a "proprietary platform".

I love it. There is no fooling this savvy, open source focused market. If you want to win hearts and minds, be open. When the hell are we going to get that application portability standard we've been demanding, eh?

(On a side note, the required demo for cloud application development is now to build a web app from scratch and deploy it so the audience can access it from their laptops in 5-8 minutes. Google did it tonight, and Heroku did it at the Cloud Demo Night earlier this month.)

Some more of my notes from the announcement:

Can't do:
No write to file system. (Reads OK, so you can use props files, etc.)

No direct web calls (instead utilizes "URL fetch" API)

No threads (single thread only, but distributed across multiple systems)

Python only first language, looking for input on next language to attack (must have runtime that can be "hardened")
Administration Console gives the ability to see and manipulate running app code (by version) and data

Is the identity environment for all hosted apps Google login? Is everyone comfortable with this?

The initial 10,000 beta accounts may already be gone.

Quota based, no ability to grow past above for now.

Also, no "offline processing" today, but looking into it for future. (Sounds like batch stuff, etc.)

I have an interesting experiment I wish I could get to. I want to marry Scalr, the open source Amazon EC2 automation environment with a policy-based SLAuto environment to get the ultimate in flexible, open and coding agnostic autonomic operations, both in the cloud and "at home". Anyone want to beat me to it? (Come to think of it, why is Google still hosting Scalr now that App Engine is live? Hmmmm....)

Friday, April 04, 2008

John Willis Honors Me with Inaugural Cloud Cafe Podcast

I am the inaugural guest in John Willis's Cloud Cafe podcast series. I couldn't be more honored.

Those of you who have been following this whole "what is Cloud Computing" debate may have had the opportunity to see the conversations between several bloggers regarding how to define cloud computing and related technologies. John Willis, of the John Willis ESM Blog, is making a key contribution by taking on the challenge of classifying vendors in this space. As I had some issues with his classification of Cassatt, he thought the best way to resolve that was to invite me to launch his new series.

Two things were resolved in this podcast.

First, I learned first hand what a classy guy John is. He handled the interview very well, let me talk my butt off (a talent I got from my minister mother, I think) and had several observations over the course of the conversation that showed his tremendous experience in the enterprise systems management space. I feel quite sheepish that I ever hinted that he wasn't being forthright with his audience. Lesson gratefully learned; apology gladly offered.

Second, John and I were always much closer in our visions of cloud computing, utility computing and enterprise systems than it might have appeared at first. Our conversation raged from the aforementioned "what is cloud computing" question, to topics such as:

the relationship between cloud and utility computing,
the cultural challenge facing enterprises seeking the economic returns of these technologies,
how cloud and utility computing revolutionize performance and capacity planning, and
where Hadoop and CloudDB fit into all of this.

In the end, I think John and I agreed that cloud computing is more than just virtualization on the Internet. I very much enjoyed the conversation, and I hope you will take the time to listen to this podcast.

Got questions or comments? Post them here or on John's blog; I will check both.

Finally, I will be working to get Cassatt's entry in John's classifications updated as a result of the discussion.