Saturday, October 11, 2008

The PaaS Spectrum: Choosing Your Coding Cloud

Platform as a Service is a fascinating space to me. As I noted in one of my reviews of Google AppEngine when it was released, there is a certain development experience that comes with a good distributed platform that understands both simple development-test cycles, yet also reduces the complexity of delivering highly scalable and reliable applications to a complex data center. With the right platform, and there are many, a development team can leapfrog the hours of pain and effort required to stitch together hardware, software, networking and storage to create a bulletproof web application.

At Cloud Camp Silicon Valley earlier this month, a group of us discussed this in some depth. A crowd of about thirty assorted representatives of cloud vendors and customers alike engaged in a lively discussion of what the elements of cloud oriented architectures are, and how one chooses the right architecture.

I spoke (perhaps too much) about software fluidity, and it was noted that many PaaS platforms limit that fluidity, rather than enable it. Think Google AppEngine or force.com or Bungee Connect. Great products, but not exactly built to make your application portable and dynamic. (Google and others are open sourcing all or part of their platforms, but the ecosystem to port applications isn't there yet. See below.) So, the conclusion went, perhaps you choose some PaaS offerings when time-to-market was the central tenet of your project, not portability. Others (possibly including IaaS, if you want to be technical) make sense when portability is your primary concern, but you'll have to do more work to get your application out the door.

This creates a spectrum on which PaaS offerings would fall:

This makes perfect sense to me. Choose to use an "all-in-one" platform from a single software+service+infrastructure vendor, and they can hide much of the complexity of coding, deploying and operating your application from you. On the other hand, select an infrastructure-only vendor using "standard" OS images (typically based on one of the major linux distros) but little else, and you can port your applications to your hearts content, but you'll have to do all of the configuration of database connection, middleware memory parameters, etc. yourself. Many platforms will lie somewhere in the middle of this spectrum, but the differences between the edges is striking, and most platforms will fall towards one end or the other.

For an example of a relatively "extreme right" platform, take a look at Force.com, the application platform provided by, and tied closely to, Salesforce.com, and its APEX language. How much do they provide in the way of productivity? Well, Rich Unger, an engineer at Salesforce.com (and one of the participants in the CloudCamp SV discussion), has an excellent blog that covers APEX. Here's one example that he gives:
"Database operations don't have to establish a connection (much less manage a connection pool) [in APEX]. Also, the object-relational mapping is built into the language. It's even statically typed. Let's say you want the first and last names of all your contacts. In Java, there are many ways to set this up, depending on whether you're using JPA, straight JDBC, entity beans, etc. In general, you must to at least these four things:

  1. Write an entity class, and annotate it or map it to a DB table in an xml file
  2. Configure the DB and connection pool
  3. acquire a connection in the client code
  4. Perform the query

In Apex, you'd do just do #4:

Contact[] mycontacts = [select firstname, lastname from Contact];
for (Contact c : mycontacts) {
System.debug(c.firstname + ' ' + c.lastname);
}

That's it. You could even shorten this by putting the query right into the for loop. The language knows how to connect to the DB. There's no configuration involved. I'm not hiding any XML files. Contact is a standard data type. If you wanted a custom data type, you'd configure that through the Salesforce.com UI (no coding). Adding the new type to the DB automatically configures the O-R mapping. Furthermore, if you tried:

Account[] myaccounts = [select firstname, lastname from Contact];

...it wouldn't even compile. Static typing right on down to the query. Try that by passing strings into Jdbc!"

Freakin' brilliant! That is, as long as you wanted to write an application that used the Salesforce.com databases and ran on the Force.com infrastructure. Not code that you can run on AppEngine or EC2.

On the other hand, I've been working with GoGrid for a little while getting Alfresco to run in a clustered configuration on their standard images. It has been amazing, and helped along both by the fact that GoGrid gives you root access to the virtual server (very cool!), and that the standard Alfresco Enterprise download (trial version available for free) contains a Tomcat instance, and installs with a tar command, a single properties file change and a database script. So, combine a CentOS 64-bit image with Alfresco 2.2 Enterprise, make sure iptables has port 8080 open, and away you go. The best thing is that--in theory--I should be able to grab the relevant files from that CentOS image, copy them to a similar image on, say, Flexiscale, and be up and running in minutes. However, I did have to manage some very techie things; I had to edit iptables, for instance, and know how to confirm that I had the right Java version for Tomcat.

By the way, long term operational issues are similarly affected by your choice of PaaS provider. If you have root access to the server, you must handle a measurable percentage of issues that are driven by configuration changes over time in your system. On the other hand, if your code is running on a complete stack that the vendor maintains for backward compatibility, and that hides configuration issues from you at the get-go, you may not have to do much of anything to keep your system running at reasonable service levels.

Today, the choice is up to you.

I wonder, though, if this spectrum has to be so spread out. For example, as I wrote recently, I see a huge opportunity for application middleware vendors, such as GigaSpaces, BEA and JBOSS, to provide a "portability layer" that would allow both reduced configuration on prebuilt app server/OS images, but at the same time allow the application on top of the app server to be portable to just about any instance of that server in the cloud. (There would likely be more configuration required on the middleware option than the APEX example earlier. For instance, the application server and/or application itself would have to be "pointed" to the database server.)

Google AppEngine should, in theory, be on this list. However, while they open sourced the API and development "simulator", they have not provided source to the true middleware itself--the so-called Google "magic dust". Implementing a truly scalable alternative AppEngine platform is an exercise left up to the reader. Has anyone built a true alternative AppEngine-compatible infrastructure yet? I hear rumors of what's to come, but as far as I know, nothing exists today. So, AppEngine is not yet portable. To be fair, there is no JBOSS "cloud edition" yet, either. GigaSpaces is the only vendor I've seen actively pursue this route.

While we are waiting for more flexible options, you are left with a choice to make. Do you need it now at all costs? Do you need it portable at all costs? Or do you need something in between? Where you fall in the PaaS spectrum is entirely up to you.