What did Chris do? In short, he got a working copy of the GAE SDK working on Amazon virtual machines with some modification. There are limitations:
- It will not scale (none of the Google "secret sauce").
- It does not support email at this time (though the source code is available for anyone who wants to add it).
- It could go down at any time, either due to an EC2 outage (not very likely), or because whoever is paying for this doesn't want to foot the bill anymore (much more likely).
I think you are indeed correct, though I think a lot of the attention has been on what we discussed earlier: what is GAE *not* capable of doing (right now), and with regards to what it *is* capable of doing, how do you take advantage ofI would think that Chris or others that see the possibilities will get on this. Hell, they are already probably "on this". 's amazing infrastructure.
That being said, I think you are on to something potentially big: through the open source SDK, there is an opening for any other hosting company or "OS for the data center" company to provide a "GAE-to-go" solution. All of the SDK would have to be supported--and there is a lot that is specific toright now--and the solution would need to be at least comparably scalable, etc.
The good news is that, as your run down the SDK, there are many simple solutions possible, or even existing open source projects available:
The Python Runtime - This is just a python interpreter with a series of rules imposed, running on a autonomic scalable infrastructure. The interpreter can be harvested from the SDK (with some modification, I am sure), and the scalable infrastructure can be damn near anything with a policy-based management component.
The Datastore API - 's BigTable data store is based on Map/Reduce. I am not exactly sure how well this maps to Hadoop, but that is where I would start. Regardless, I would expect wrapping and/or extending Hadoop to support GQL is the minimum involved.
The Users API - indeed be the biggest problem area, but as I read the API, they have hidden complexities such as generating login/logout URLs and accessing account data. Assuming I read the docs correctly, other than "nickname" and "email", assumes nothing about what defines a user. Thus, 's accounts do not have to be used for the APIs to be functional. However, will the community expect a shared identity store, or at least the ability to choose to use 's accounts?
The URL Fetch API - This was implemented for both security and scalability reasons. Not sure what exists to map it to, but I think it is a function of the identity infrastructure, and how you scale the Python Runtime. In other words, you'd need to map these functions to the appropriate mechanisms in each of the other infrastructure elements.
The Mail API - I would assume there is something close out there, but if not you would need to wrapper a scalable email system with the Python classes defined in the API. Doesn't seem overly hard.
Finally, given the fact that the source code for the "faker" dev environment is open source, there is a lot of basic sample code for many of these "wrapper" APIs. The trick is to find developers that know how to do this at high scale--perhaps request participation at highscalability.org?
Now, the potential "gotchas" here are that you are working on the same limitations thathas set for itself, can only "officially" extend the API when adds something or agrees to implement your requirements (they own the "open source community"), and you would need to test in a real-world high scale environment, which could be expensive (though perhaps, ironically, could be of some use here).
By the way, none of this solves VMWare's portability issues,EC2's portability issues or even Cassatt and/or our competitors portability issues. It simply provides a portable web application environment that uses a "sandbox" approach for application execution. Again, a start (and an exciting one), but only a piece of the overall puzzle. Frankly, I think an portability story would be much more generally interesting to enterprise IT. But, that's just me.
By the way, as part of a post questioning Google's lock-in issues, Tim O'Reilly at Radar O'Reilly made an interesting observation about why even if the code is portable, an AppEngine application today is still "locked-in" to Google's site. To set up his argument, he quotes venture capitalist Brad Feld:
At *this* moment in time, it would be difficult to move apps off of AppEngine. Doing that in EC2 is trivial. This, to me, is the biggest issue, as I believe it could make startups less-interesting from an acquisition perspective by anyone other than Google. This will most likely change as people develop compatibility layers. However, Google has yet to provide any information about how to migrate data from their datastore the best I can tell. If you have a substantial amount of data, you can't just write code to dump it because they will only let any request run for a short period before they terminate it.Tim then goes on to say:
This last point is really very serious. I've been warning for some time that the first phase of Web 2.0 is the acquisition of critical mass via network effects, but that once companies achieve that critical mass, they will be tempted to consolidate their position, leading ultimately to a replay of the personal computer industry's sad decline from an open, energetic marketplace to a controlled economy.What remains to be seen is how Google's plans to allow for larger data transfers will affect this (see Phil Wainright's post covering the business-readiness of GAE). If they allow unlimited fast transfer--not necessarily for free, but at a reasonable price--they will establish themselves as a truly open platform, competing on their amazing infrastructure and technology innovation. Now, combined with a compatible open source platform, that would be game changing.