Showing posts with label architecture. Show all posts
Showing posts with label architecture. Show all posts

Tuesday, May 20, 2008

A Funny Thing Happened On The Way to the Apple Store...

Part of the fun of joining my new employer is their open policy for selecting the laptop of your choice. Of course, being a lover of technologies that enable one to be technically lazy, I chose a MacBook Pro. It should arrive in a few days.

However, I was beginning to feel like I needed another beefed up system of my own at home to act as a multi-guest virtual "server farm" for various experiments, etc., that may include scale-out benchmarking, interesting integration issues, etc. My initial thought was a 8-core Mac Pro loaded with memory and disk, which would have set me back about $6500. So I asked Luis what he thought, and he said, "Don't Bother. Whenever I need a bunch of servers to test with, I generally find [Amazon] EC2 works perfectly fine."

You could have heard the head slap a mile away.

With all of my focus being on enterprise computing the last two years, I had totally lost sight of the "individual" applications of a cloud like EC2. I no longer have to think about building up a server farm of my own, or purchase a big honkin' dual Quad-core tower, or even reserve space on the corporate "cluster library". I just need my credit card, my Amazon account, and a little time with the "Getting Started" tutorial, and I have all the server resources I need at a price that is a fraction of buying the big box, with billing that allows me to easily expense work-related computing. Damn, I love the modern world!

Now, all of this probably seems so obvious to all of you out there, and it probably cracks you up to see a cloud computing blogger miss this opportunity to "reach for the clouds", so to speak. However, I think this is indicative of the change that both individuals and enterprises must go through to take advantage of these new breed of technologies.

I, like may Fortune 500 IT departments, am an old school client-server/SOA guy. I have a "use the right tool for the job" mentality, driven by years of pain trying to force procedural pegs into SOA holes. This mentality leads to a "best of breed" bias that leads one to worry about the ground up implementation of any software solution. If a tool was found that reliably hid some of that implementation, that was awesome and incredibly helpful to productivity. However, one needed to still understand how the server worked with the OS worked with the middleware worked with the application implementation to be comfortable to go to production.

To me, Amazon, Mosso, Cassatt and others are indicative of a major change in this mentality. With reliable shared configurations of systems (or a reliable systematic infrastructure for matching compute tasks to disparate resources that can handle those tasks), application developers now need to know less and less about the server, networking and storage part of the equation. Now, with the focus from the OS on up the stack, developers can start shopping for the infrastructure that makes economic sense for the problem they are trying to solve. The trick, of course, is to remember there are alternatives to buying your own servers.

So, this week I started to play with Amazon EC2, S3 and Cloud Services' new instance management tool, Cloud Studio. Let me just say, I am incredibly impressed with what I've done so far, which is little more than creating, starting and terminating instances (with a little between machine networking thrown in for fun). Even using Amazon's command line tools, it is a pretty straight forward process to get either a 32-bit or 64-bit server, but when you add the visual cues of Cloud Studio, it just becomes so simple it boggles the mind.

Now, there are definitely disadvantages to using Amazon for some problems. Windows support is out, for instance. (Anyone have a good suggestion for a true on-demand pricing option for Windows? Mosso would work, I hear, but they have a fixed upfront price that is a little steep for my general needs.) Also, any work that involves large amounts of data transfer ups the ante greatly. (Kevin Burton talked about this some time ago--see his note about bandwidth pricing just below the last quote, about half way down.) However, I will never again forget to consider the cloud before "own your own" for any computing task I have in my personal world.

Hmmm. I wonder if I can get my wife to use Zoho now...

Tuesday, April 08, 2008

Google App Engine: Forte Software for the Cloud?

I was rather harsh on Google App Engine last night, and I think with good reason. However, as I read more about it today, I am realizing that there is more to this product for web businesses than there is for your typical enterprise. Looking at it from that angle, let me talk about the compelling aspects of Apps Engine for those developing the types of applications that environment is intended to support.

Let me start with some history. In the mid to late nineties, I was a consultant for Forte Software, the Paul Butterworth led distributed application development and deployment tools company. Forte was an amazing company to work for, but it had an even more compelling product to work with.

The basic concept was derived from a simple development scenario. Paul invisioned allowing a developer to:

  1. Write an applications as if they were monolithic, locally executable applications

  2. Name specific objects in the application as "service objects" to act as key interface points (important later)

  3. Test those applications in a local-only configuration

  4. Use a GUI tool to partition the application by dragging and dropping the service objects around the environment as necessary. Developers could also configure service objects to be replicated for load balancing, failover or both.

  5. Test execute the application in its distributed configuration

  6. Deploy and operate the finished application in its final partitioned configuration

  7. Monitor the distributed application and its components for both availability and performance characteristics
Though based on a 4GL at a time that Java was pushing for "open languages", Forte proved to be a very popular tool in a variety of extremely high scalability settings: OnStar, EZPass, Marriott online reservations, the New York state sex offender web site to name but a few.

It wasn't the 4GL that made the product compelling (though it was very good), and certainly not the developer GUI (that was well below average), but this end-to-end developer experience that made the product a winner.

Now flash forward to today, and the TechCrunch article covering their developer's experience in developing and deploying a decent little app in about 4 hours, including deciding on requirements, writing code, debugging, deploying and "launching" on the crunchbase.com domain. In reading through their step by step activities, I was struck hard by the similarities with the Forte experience, with a few positive differences:
  • The tools are now open source themselves, and based on an open source language

  • The need for application partitioning is largely eliminated. Note I said largely, as if you are using a service-based architecture, you will have to hand-code the outbound calls to any services via Google's URL API.

  • Deployment and monitoring is automatic. You never have to worry about what was deployed where when. The capacity is just there (up to your quota).
Now, all of this comes with a cost (which was true of Forte as well): you must agree to living in a proprietary world. In a later post, I am going to talk about another cost (which is common with other platforms): start-up lock-in; suffice to say, your lock-in isn't just the available languages or the libraries you *must* use, but its also the dependency on all of that infrastructure automation that is Google's and Google's alone.

There are also many key application components which seem logically locked into Google: identity, domain management, monitoring and data storage/retrieval. Not necessarily a bad thing, but developers should go in with their eyes wide open.

However, if time to market is your biggest concern, and all you care about is cool web application capabilities, then you now have two choices: Amazon (via Heroku and Zend, for instance) and Google (via App Engine). Each has its language and its limitations, but the experience is largely the same. (I haven't checked to see if the "launch"--e.g. domain assignment--capabilities of Heroku or Zend, match Google's, though, and it doesn't appear that identity services are covered at all.)

None of these really give you service level guarantees, so SLAuto doesn't really apply. However, service levels will be assumed, so if you care, start looking at SLAuto tools that may help in the future.

Again, all of this probably does not apply to enterprise IT, but its a hell of a compelling story for web developers.

Friday, April 04, 2008

John Willis Honors Me with Inaugural Cloud Cafe Podcast

I am the inaugural guest in John Willis's Cloud Cafe podcast series. I couldn't be more honored.

Those of you who have been following this whole "what is Cloud Computing" debate may have had the opportunity to see the conversations between several bloggers regarding how to define cloud computing and related technologies. John Willis, of the John Willis ESM Blog, is making a key contribution by taking on the challenge of classifying vendors in this space. As I had some issues with his classification of Cassatt, he thought the best way to resolve that was to invite me to launch his new series.

Two things were resolved in this podcast.

First, I learned first hand what a classy guy John is. He handled the interview very well, let me talk my butt off (a talent I got from my minister mother, I think) and had several observations over the course of the conversation that showed his tremendous experience in the enterprise systems management space. I feel quite sheepish that I ever hinted that he wasn't being forthright with his audience. Lesson gratefully learned; apology gladly offered.

Second, John and I were always much closer in our visions of cloud computing, utility computing and enterprise systems than it might have appeared at first. Our conversation raged from the aforementioned "what is cloud computing" question, to topics such as:

  • the relationship between cloud and utility computing,
  • the cultural challenge facing enterprises seeking the economic returns of these technologies,
  • how cloud and utility computing revolutionize performance and capacity planning, and
  • where Hadoop and CloudDB fit into all of this.
In the end, I think John and I agreed that cloud computing is more than just virtualization on the Internet. I very much enjoyed the conversation, and I hope you will take the time to listen to this podcast.

Got questions or comments? Post them here or on John's blog; I will check both.

Finally, I will be working to get Cassatt's entry in John's classifications updated as a result of the discussion.

Friday, March 28, 2008

MapReduce reaches adolescence

I have to admit I find myself growing more impressed with the MapReduce (and related algorithm) community every day. I spend the better part of an hour watching Stu Hood of Rackspace/Mailtrust discussing MapReduce, Mailtrust's use of it for daily log processing, and comparing it to SQL. I'm a MapReduce newbie, so I was happy to find Stu's overview clear, careful and at a level I could grasp.

His overview of Hadoop (an open source implementation of a MapReduce framework) was equally enlightening, and I learned that Hadoop is more than the framework, but it includes a distributed file system as well. This is where I think SLAuto starts to become important, as it will be critical not only to monitor which systems in a Hadoop cluster are alive at any time (thus providing access to their storage), but also to correct failures by remounting disks on additional nodes, provisioning new nodes to meet increased data loads, etc. Granted, I know just enough to be dangerous here, but I would bet that I could sell the value of SLAuto in a MapReduce environment.

Another interesting overview of the MapReduce space comes from Greg Linden. (Damn, now I've mentioned Greg twice in a row...my groupie tendencies are really showing these days! -) Greg points us to notes taken at the Hadoop Summit by James Hamilton, an architect on the Windows Live Platform Services team. I haven't read through them all yet, but I like the breakdown of many of the big projects getting a lot of coverage among techies these days: Yahoo's PIG and HBase, as well as Microsoft's DRYAD. Missing is CouchDB, but I plan to watch Jan Lehnardt's talks [1][2] on that as soon as I get a moment.

Again, the reason MapReduce is being covered in a blog about Service Level Automation and utility computing is that as soon as I see "tens of thousands of nodes", I also see "no way human beings can meet the SLAs without automation". At least not without significant costs compared to automating. System provisioning, monitoring, autonomic scaling and fail-resistance are not built in to Hadoop, they are simply easy to support. Something else is needed to provide SLAuto support at the infrastructure layers.

Tuesday, March 25, 2008

Greg Linden on the Cloud

Greg Linden, of Geeking with Greg fame, was interviewed on Mix about his work in search personalization, recommendation engines and cloud computing. Most of the interview is only sort of interesting, but what really perked my ears up was Greg's observation that anyone scaling a software environment to thousands or tens of thousands of servers will likely continue to run their own data centers, if only because they will want to tweak the hardware to meet their specific needs.

Initially, I thought of this as just another example of a class of data center that will not be quickly (if ever) moved to a third party capacity vendor. Based on examples like Kevin Burton's fine tuning of Spinn3r's infrastructure using Solid State Drives (SSD) instead of RAID and traditional disks, it even seems like there would be many such applications. Ta da! It is proven that there will always be private data centers!

Yet, the more I think about it, I wonder if I wouldn't pay Google's staff to run my Map/Reduce infrastructure, even if it used tens of thousands of servers. I mean, where is the economic boundary between when it is cheaper to purchase your computing from clouds that already have your needed expertise versus hiring staff with specialized skills to meet those same needs?

Alternatively, is this kind of thing a business opportunity for a "boutique" cloud vendor? "Come to Bob's MapReduce Heaven. We'll keep your Hadoop systems running for $99.95, or my name isn't Bob Smith!"

I'll just leave it at that. I'm tired tonight, and coherence has left the building.

Sunday, March 23, 2008

An amazing resource for scalable systems architectures

I don't know why I hadn't heard of these guys before, but I'm in love with the content at highscalability.com. In post after post, feature after feature, there is more to learn here about everything from architecting software to optimize Amazon Web Services costs, to possibly the greatest collection of articles on real-life scalable architectures ever assembled. I have a feeling I will lose a few hours of sleep in the next few nights trying to read everything I can here.

I noted the inevitability of architecting specifically for utility (or cloud) computing some months ago.

Wednesday, February 27, 2008

Enterprise Architecture, Business Continuity and Integrating the Cloud

(Update: Throughout the original version of this post, I had misspelled Mr. Vambenepe's name. This is now corrected.)

William Vambenepe, a product architect at Oracle focusing on enterprise management of applications and middleware, pointed me to a blog by David Linthicum on IntelligentEnterprise that makes the case for why enterprise architects must plan for SaaS. In a very high level, but well reasoned post, Linthicum highlights why SaaS systems should be considered a part of enterprise architectures, not tangential to them.

As Vambenepe points out, perhaps the most interesting observation from Linthicum is the following:

Third, get in the mindset of SaaS-delivered systems being enterprise applications, knowing they have to be managed as such. In many instances, enterprise architects are in a state of denial when it comes to SaaS, despite the fact that these SaaS-delivered systems are becoming mission-critical. If you don't believe that, just see what happens if Salesforce.com has an outage.

I don't want to simply repeat Vambenepe's excellent analysis, and I absolutely agree with him. So let me just add something about SLAuto.

Take a look at Vambenepe's immediate response:
I very much agree with this view and the resulting requirements for us vendors of IT management tools.
Now add the comments from Microsoft's Gabriel Morgan that I discussed a couple of weeks ago.
Take for example Microsoft Word. Product Features such as Import/Export, Mail Merge, Rich Editing, HTML support, Charts and Graphs and Templates are the types of features that Customer 1.0 values most in a product. SaaS Products are much different because Customer 2.0 demands it. Not only must a product include traditional product features, it must also include operational features such as Configure Service, Manage Service SLA, Manage Add-On Features, Monitor Service Usage Statistics, Self-Service Incident Resolution as well.
Gabriel's point boiled down to the following equation:
Service Offering = (Product Features) + (Operational Features)
which I find to be entirely in agreement with Linthicum and Vambenepe.

As I am wont to do, let me push "Operational Features" as far as I think they can go.

In the end, what customers want from any service--software, infrastructure or otherwise--is control over the balance of quality, cost and time-to-market. Quality is measured through specific metrics, typically called service level metrics. Service level agreements (SLAs) are commitments to maintain service level metrics within commonly agreed boundaries and rules. In the end, all of these "operational features" are about allowing the end user to either

  1. define the service level metrics and/or their boundaries (e.g. define the SLA), or
  2. define how the system should respond if a metric fails to meet the SLA.

Item "2" is SLAuto.

I would argue that what you don't want is a closed loop SLAuto offering from any of your vendors. In fact, I propose right here, right now, that a standard (and, I am sure Simon Wardley would argue, open source) protocol or set of protocols for the following:
  1. Defining service level metrics (probably already exists?)
  2. Defining SLA bounds and rules (may also exist?)
  3. Defining alerts or complex events that indicate that an SLA was violated

Vendors could then use these protocols to build Operational Features that support a distributed SLAuto fabric, where the ultimate control over what to do in severe SLA violations can be controlled and managed outside of any individual provider's infrastructure, preferably at a site of the customer's choosing. This "customer advocate" SLAuto system would then coordinate with all of the customer's other business systems' individual SLAuto to become the automated enforcer of business continuity. In the end, that is the most fundamental role of IT, whether it is distributed or centralized, in any modern, information driven business.

"Nice, James," you say. "Very pretty 'pie-in-the-sky' stuff, but none of it exists today. So what are we supposed to do now?"

Implement SLAuto internally in your own data centers with your existing systems, that's what. Integrate SLAuto for SaaS as you understand the Operational Feature APIs from your vendors, and those vendors, your SLAuto vendor and/or your systems talent can develop interfaces into your own SLAuto infrastructure.

Evolve towards nirvana, don't try to reach it by taking tabs of vendor acid.

If you want more advice on how to do all of this, drop me a line (james dot urquhart at cassatt dot com) or comment below.

Wednesday, February 20, 2008

Data Goes SLAuto at Oracle

Thanks to Steve Jones, check out this presentation from David Chappell, Oracle VP and CTO of SOA, titled "Next-Generation Grid Enabled SOA". (A shorter written article can be found in at SOA Magazine's site.) Chappell outlines the work that Oracle is doing at turning the traditional model of application scalability on its head; instead of a fixed amount of database resources and scaling the applications/services horizontally, scale the database (using a cool complex adaptive systems approach) and alleviate much of the need to scale apps and services (except for CPU bound services). For someone like me, that's mind blowing.

Add to that the fact that the data management functions are relatively homogenous (though the infrastructure may not be), and aware of its resource utilization, and you can see why they are claiming a certain amount of hardware-metric based SLAuto.

(Hardware metric based SLAuto is based in measurements of hardware components, such as CPU utilization, memory utilization and so on. Software-based SLAuto usually uses business metrics such as transaction rates, active accounts, etc. to make scaling decisions.)

The catch? Well, everything must be written to use the "Data Grid" if its to take advantage of these capabilities. Legacy applications need not apply. (Could be the deal killer for David's "Not your MOM's Bus" concept.)

It seems to me that if Oracle wants this approach to catch on, it should open source a reference implementation as soon as possible. I'm not an expert at the most recent data processing approaches, but it would seem to me that Map-Reduce approaches would be complimentary to the Data Grid. However, Hadoop implementations would generally only be integrated with a data grid if there was an open source alternative. Otherwise, MySQL will continue to be the first choice. Open Source would also speed up integration between the data grid and infrastructure automation such as Cassatt and its competitors.

Dave hints at a URL for more info on the Oracle site, but I can't find it. If anyone tracks it down, I would appreciate any help I can get.

Thursday, February 14, 2008

Latency: Obstacle to cloud computing, or opportunity?

I was challenged in the comments to my Cloud Computing Heats Up post regarding my criticism of pupwhines' post that in turn criticized cloud computing. The anonymous author of the comment thought I was too hard on pupwhines, and wanted to know what my response specifically was to the challenge latency presents to distributed computing. I responded there, but I want to expand a little bit on the topic, as it is indeed important to understand, and backs my contention that there will need to be some software architectural changes made to leverage the cloud system.

(Quick note: I've alluded to this before, but I strongly believe there is no one cloud, but a bunch of siloed clouds today with *some* limited integration between them. More of a frontal system, really.)

Latency is an issue in most IT application environments today. There is no question that "traditional" tiered application design scales well at the processing layer, but has real issues at the data layer. There is simply no easy way to manage a traditional relational database architecture over a widely distributed environment. Pupwhines' contention that joining a table between two SaaS vendor implementations would be disaster is right on. In modern technology terms, it would be insane.

However, this is the disruptive aspect of cloud computing: the architectures you know and love are no longer necessarily best practices in a world where your functionality and capacity is:

  1. not necessarily your own,
  2. not necessarily integrated, and
  3. splayed out across this 5.1×108 km2 rock we live on
There are new technical advances being made today in the companies that already rely on cloud principles (think Google, Amazon, Microsoft, etc.). These advances will change the way you design and deploy software, but they will enable a world where proximity of data means less and less.

In fact, you probably already leverage one of these technical changes: increased bandwidth. Indulge me in some autobiographical narrative to illustrate.

Back in the late '90s, while I was a Senior Principal Consultant with Forte Software, Inc, the legendary(?) distributed application development platform vendor, one of my key roles was advising clients on how to best architect for high performance, high scalability and high availability. Forte was an early service oriented architecture, but it ran on the 10Mb/100Mb networks of the time. Thus, the rule for message passing between components (UI<->service or service<->service) was (in order of priority):
  1. Send as few messages over the network as possible
  2. Send the smallest messages possible

Thus, it was better to send large messages once than many small messages, but you wanted to optimize each message as much as possible.

To this end, best practices was to create data services and to actually deploy these services directly on the database server hardware. It was more important to process the relational mapping of data into the object mapping according to need in a timely fashion--thus avoiding unecessary network traffic--than to divide processing responsibility so that there was no custom application components running on the RDBMS hardware.

Fast forward to the 1G/10G networks of today. From what I am seeing, it is actually considered bad practice to do what I described above. While at Sun, I actually got admonished by a (very competent) manager for suggesting the way around Sun Access Managers horrible performance was to deploy the identity server and database on the same box (with our custom login and registration UIs deployed on separate, horizontally scalable servers). Pure architectural heresy. He was right in many ways: doing so would have put the business logic tier into horizontally locked architecture, but that wasn't his point. "We don't deploy our software on our database servers" was the gist of his argument.

So, faster networks have already changed the so-called "laws of physics" that software architects must design around. Given this, it seems easy to postulate that additional advances in network bandwidth will open additional opportunities for architectural change. In fact, it already has; check out Gigaspaces for a cool (though controversial) alternative to horizontally replicated service architectures.

Will bandwidth really grow at a rate that will make a difference to the current IT generation(s)? Many postulate it has to, even if the core network operators resist. As I noted in my response, Cisco's new Nexus 7000 series is a sign of times to come. Does anyone deny that 40G and 100G networks have the opportunity to change the laws of physics? (Disclaimer: I know just enough about networking to be dangerous, so I may be overstating the case...but change is still clearly on the horizon.)

Even if network bandwidth doesn't change at all, or any additional bandwidth is chewed up by demand at existing rates, there are other software architectural advances that will revolutionize certain kinds of computing. I spoke in my response about MapReduce and its open source implementation, Hadoop. For processing large, distributed data loads, this architecture is eliminating boundaries created by traditional scaleout RDBMS-based approaches. Google has used this approach to tie data from every one of its properties (including acquired properties, such as Blogger) into a single user identity and profile. Talk about a distributed join problem...

(Another quick note: hats off to Google and Yahoo respectively for their work in this space. I know from my past life at Sun what a pain in the whatever this is, and I love the seamlessness I experience on these sites.)

One other major advancement is the increasing sophistication of business integration technologies, from traditional application integration (force.com, boomi.com, BizTalk, Lombardi, etc.) to data integration options (Informatica, Business Objects, etc.) to subscription data propogation techniques. These integration options can allow one to go back to some of the basics I spoke of before: do as much processing as possible on Saas vendor A's infrastructure before sharing the relevant data with vendor B. Not as perfect as a join in many cases, but in a service oriented world, a common, required approach for most.

Perhaps the most important point I want to make today, however, is that today--in the modern IT era--many of these technologies are either future tech or not what was used to build existing applications. Given that, what does an existing datacenter do? Stick to my recommendation; convert your own datacenter into a utility/cloud today, and begin to leverage the maturing compute grid/cloud computing ecosystem as it and your applications mature.

Tuesday, February 12, 2008

Green Aware Programming

monkchips writes about "green aware programming", as coined by Christopher O’Connor, vice-president strategy and market management, Tivoli. I responded and pointed out that "green = cheap" in the utility computing world.

Tuesday, January 29, 2008

One Step To Prepare For Cloud Computing

Some of you may be wondering why I am making such a big stink about software architecture on a blog about service level automation (SLAuto). Well, as Todd Biske points out, "the relationships (and potentially collisions) between the worlds of enterprise system management, business process management, web service management, business activity monitoring, and business intelligence" are easier to resolve if the appropriate access to metrics is provided for a software service. For SLAuto, this means the more feedback you can provide from the service, process, data and infrastructure levels of your software architecture, the easier it is to automate service level compliance.

Let's look at a few examples for each level:

  • Service/Application: From the end user's perspective, this is what service levels are all about. Key metrics such as transaction rates (how many orders/hour, etc.), response times, error rates, and availability are what the end users of a service (e.g. consumers, business stakeholders, etc.) really care about.
  • Business Process: Business process metrics can warn the SLAuto environment about cross-service issues, business rule violations or other extraordinary conditions in the process cycle that would warrant capacity changes at the BPM or service levels.
  • Data Storage/Management: Primarily, this layer can inform the SLAuto system about storage needs and storage provisioning, which in turn is critical to automated deployment of applications into a dynamic environment.
  • Infrastructure: This is the most common form of metric used to make SLAuto decisions today. Such metrics as CPU utilization, memory utilization and I/O rates are commonly used in both virtualized and non-virtualized automated environments.

As noted, digital measurement of these data points can feed an SLAuto policy engine to trigger capacity adjustment, failure recover or other applicable actions as necessary to remain within defined service thresholds. While most of the technology required to support SLAuto is available, the truth is that the monitoring/metrics side of things is the most uncharted territory. As an action item, I ask all of you to take Todd's words of wisdom into account, and design not only for functionality, but also manageability. This will aid you greatly in the quest to build fluid systems that can best take advantage of utility infrastructure today.

Thursday, January 24, 2008

Data propagation and software fluidity

Jon Udell has an interesting post commenting on Jeff Jonas' explaination of Out-bound Record-level Accountability in Information Sharing Systems. The central thesis of Jeff's post is that tracking who specifically received a given datum is very expensive yet highly necessary in many applications. The example given is that of a user who wishes to no longer receive email from a site they have an account with, or any of the other sites that the original site shared that preference with. How does the original site know who to contact? The high cost is a result of the difficulty in tracking who data has been forwarded to.

John replies very simply that a "publish" model, much like blogging, might be the answer. "Data blogging", coined by fellow blogger Galvin Carr, refers essentially to the problem of syndication, but Udell projects that to a much wider arena of data types. As he notes, there is much evidence out there that "push" models are generally only applicable to edge systems calling "inward". "Publish and Subscribe"-type pull models are far easier to implement when running "outward" from the cloud to edge systems (as well as, generally, within the cloud--aka event-driven architectures).

There are two valuable results of this approach:

  1. The originating system can require users of data to subscribe with a unique identity, and each "pull" of published data could be tracked (if necessary) to identify who is up to date and who isn't.
  2. For software fluidity purposes, it further decouples the originating system from its subscribers, meaning both the subscribers and the originating system can be "moved" from physical environment to physical environment with no loss of communication. The most negative action that could take place here is if the originating publisher's DNS name changed in the course of the move, but redirects and other techniques could even mitigate that issue.

I am commenting on this, of course, largely for the second item. Access to data, services and even edge devices must be very loosely coupled to work in a cloud computing world. This is one great example of how you could architect for that eventuality, in my opinion.

Tuesday, December 18, 2007

Some more skepticism about Amazon WS as a business plan

I've been interested in Don McAskill's review of Amazon's SimpleDB in light of SmugMug's future plans. He is very positive that he can use this service for what it is intended for; infinitely scalable storage and retrieval of small, structured data sets. His post is a good one if you want to get a clearer idea of what you can and shouldn't do with SimpleDB.

However, I worry for Don. As a growing number of voices have been pointing out, committing your business growth to Amazon, especially as a startup, may not be a great thing. Kevin Burton, founder and CEO of spinn3r and Tailrank, notes that this depends on what your processing and bandwidth profiles are, but there are a large number of services that would do better to buy capacity from, say, a traditional managed hosting facility.

Burton uses the term "vendor lock-in" a few times, which certainly echos comments that Simon and I have been making recently. But Burton brings up an additional point about bandwidth costs that I think have to be carefully considered before you jump on the Amazon-as-professional-savior bandwagon. He notes that for his bandwidth intensive business, Amazon would cost 3X what it currently costs spinn3r to access the net.

Burton goes on to suggest an alternative that he would love to see happen: bare metal capacity as a service. Similar to managed hosting, the idea would be for the system vendors to lease systems for a cost somewhat above what it would take to buy the system, but broken down over 2-3 years. Since the credit worthiness of most startups is an issue, lease default concerns can be mitigated by keeping the systems on the vendor's premises. Failure to pay would result in blocked access to the systems, for both the customer and their customers.

I like this concept as a hybrid between the "cloud" concepts and traditional server ownership. Startups can get the capacity they need without committing capital that could be used to hire expertise instead. On the negative side, however, this does nothing to reduce operational costs at the server levels, other than eliminating rack/stack costs. And Burton says nothing about how such an operation would charge for bandwidth, one of his key concerns about Amazon.

There have been a few other voices that have countered Kevin, and I think they should definitely be heard as this debate grows. Jay at thecapacity points out the following:

[B]usiness necessitates an alternate reality and if expediency, simplicity and accuracy mean vendor constraint, so be it.
I agree with this, but I think that it is critical that businesses choose to be locked in with open eyes, and a "disaster recovery" plan should something go horribly wrong. Remember, it wasn't that long ago that Amazon lost a few servers accidentally.

(Jay seems to agree with this, as he ends his post with:
When companies talk about outsourcing these components, or letting a vendor’s software product dictate their business & IT processes… I always check to make sure my lightsaber is close.
This is in reference to Marc Hedlund's post, “Jedi’s build their own lightsabers”.)

Nitin Borwankar, a strong proponent of Amazon SimpleDB commented on Kevin's post that SimpleDB is a long tail play, and that the head of the data world would probably want to run on their own servers. This is an incredibly interesting statement, as it seems to suggest that even though SimpleDB scales almost infinitely from a technical perspective, it doesn't so much from a business prospective.

On a side note, its been a while since I spoke about complexity theory and computing, but let me just say that this tension between "Get'r done" and "ye kanna take our freedom!" is exactly the kind of tension what you want in a complex system. As long as utility/cloud computing stays at the phase change between these two needs, we will see fabulous innovation that allows computing technologies to remain a vibrant and ever innovating ecosphere.

I love it.

Thursday, December 13, 2007

"The techno-utility complex" and cloud lock-in

OK, so in the process of commenting on Nick Carr's post, "The techno-utility complex", I came up with a term I like: cloud lock-in. This goes to my earlier conversation about vendor lock-in in the capacity on demand world--aka cloud computing. I like the term because it reinforces the truth: there is no single compute cloud, and the early leaders in the space don't want there to be one. Rather, they are hell bent on collecting your dimes every hour, and making it damn expensive for you to move your revenue stream elsewhere.

My advice stands: if you are greenfield, and data security and access are less of an issue for you, go for EC2, the managed hosting "cloud bank", or the coming offerings from Google or Microsoft. However, if you want to take a more conservative approach towards gaining the economic benefits of utility computing, make your own cloud first.

Wednesday, December 12, 2007

Software fluidity and system security

I came across this fascinating conversation tonight between Rich Miller (whom I've exchanged blogs with before) and Greg Ness regarding the intense relationship between network integrity, system security and VM-based software fluidity. What caught my attention about this conversation is the depth at which Rich, Greg and others have thought about this branch of system security--what Greg refers to as Virtsec. I learned a lot by reading both authors' posts.

I know nothing about network security or virtualization security, frankly, but I know a little about network virtualization and the issues that users have in replicating secure computing architectures across pooled computing resources. Rich makes a comment in the discussion that I want to comment on from a purely philosophical point of view:

Consider this: It's not only network security, but also network integrity that must be maintained when supporting the group migration of VMs. If one wants to move an N-tier application using VMware's VMotion, one wants a management framework that permits movement only when the requirements of the VM "flock" making up the application are met by the network that underpins the secondary (destination) location. By that, I mean:

  • First, the assemblage of VMs need to arrive intact.

    If, because of a change in the underpinning network, a migration "flight plan" no longer results in a successful move by all the piece parts, that's trouble. If disaster strikes, you don't want to find that out when invoking the data center's business continuity procedure. All the VMs that take off from the primary location, need to land at the secondary.


  • Second, the assemblage's internal connections as well as connections external to the "flock" must continue to be as resilient in their new location as they were in their original home.

    If the use of VMotion for an N-tier application results in the a new instance of the application that ostensibly runs as intended, but is susceptible to an undetected, single point of network failure in its new environment, someone in the IT group's network management team will be looking for a new job.
Here is exactly where I believe application architectures are suddenly critical to the problem of software fluidity. In a well contained multi-tier application (a very turn-of-the-millennium concept) it is valid to consider the migration of the "flock" as a network integrity problem. However, when it comes to the modern world of SOA, BPM and application virtualization, suddenly application integrity becomes a dynamic discovery issue which is only partly dependent on network access.

In other words, I believe most modern enterprise software systems can't rely on the "infrastructure" to keep their components running when they are moved around the cloud. Its not good enough to say "gee, if I get these running on one set of VMs, I shouldn't have to worry about what happens if those VMs get moved". Rich hints strongly at understanding this, so I don't mean to accuse him of "not getting it". However, I wonder what Replicate Technologies is prepared to tell their clients about how they need to review their application architectures to work in such a highly dynamic environment. I'd love to hear it from Rich.

Also, from Greg, I'd be interested in knowing if he's thought beyond the effects on network security of virtsec to the effects on application security. At the very least, I think an increasing dependency on dynamic discovery of required resources (e.g. services, data sources, EAI etc.) means an increased need for virtsec to be application aware as well as network aware. I apologize if I'm missing a virtsec 101 concept here, as I haven't yet read all that Greg has written about the subject, but I'm disturbed that the little I've read so far seems to assume that VMs can be managed purely as servers, a common mistake when considering end-to-end Service Level Automation (SLAuto) needs.

I intend to keep one eye on this discussion, as virtsec is clearly a key element of software fluidity in a SOA/BPM/VM world. It is even more critical in a true utility/cloud computing world, where your software would ideally move undetected between capacity offered by entirely disparate vendors. Its no good chasing the cheapest capacity in the world if it costs you your reputation as a security conscious IT organization.

By the way, Rich, I dropped the virtual football after your last post in our earlier conversation...I just found a blog entry I wrote about 6 months ago during that early discussion about VM portability standards that I never posted. Aaaargh... My apologies, because I liked what you were saying, though I was concerned about non-VM portability and application awareness at the time as well. I continue to follow your work.

Monday, December 10, 2007

User Experience and Fluidity (er, Patration...)

First, thanks to Simon in reminding me about his seminal post defining a new key industry term, patration, to define what I call software fluidity. Is he serious? Only your use of the term in every day life will tell... [Insert cheezy smiley face here.]

Second, if you haven't run across it yet, check out the debate between Robert Scoble/Nick Carr and Michael Krigsman/the Enterprise Irregulars about the need for enterprise software vendors to learn from the "drive to sexiness" of consumer software. My personal opinion? I've worked in enterprise software for years, and I still don't understand why engineers take no pride in making something amazing to install, learn and use. All the effort goes into command line tools and cool functions, little goes into human experience. How has Apple remained relevant all of these years? A focus on sexiness, without losing sight of functionality.

I agree with Nick, sexiness and functionality/stability are not mutually exclusive--except in the eyes of most enterprise software vendors...

Wednesday, December 05, 2007

How fluid is your software?

I come from a software development background, and I can never quite get the itch to build the perfect architecture out of my system. That's partly why it is so hard for me to blog about power, even though it is an absolutely legitimate topic, and a problem that needs to be attacked from many fronts. However, power is not a software issue, it is a hardware and facilities issue, and my heart just isn't there when it comes to pontificating.

All is not lost, however, as Cassatt still plays in the utility computing infrastructure world, and
I get plenty of exposure to dynamic provisioning, service level automation (SLAuto) and the future of capacity on demand. And to that end, I've been giving a lot of thought to the question of what, if any, software architecture decisions should be made with utility computing in mind.

While a good infrastructure platform won't require wholesale changes to your software architecture (and none at all, if you are willing to live with consequences that will become obvious later in this discussion), the very concept of making software mobile--in the changing capacity sense, not the wireless device sense--must lead all software engineers and architects to contemplate what happens when their applications are moved from one server to another, or even one capacity provider to another. There are a myriad of issues to be considered, and I aim to cover just a few of them here.

The term I want to introduce to describe the ability of application components to migrate easily is "fluidity". The definition of the term fluidity includes "the ability of a substance to flow", and I don't think its much of a stretch to apply the term to software deployments. We talk about static and dynamic deployments today, and a fluid software system is simply one that can be moved easily without breaking the functionality of the system.

An ideally fluid system, in my opinion, would be one that could be moved in its entirety or in pieces from one provider to another without interruption. As far as I know, nobody does this. (3TERA claims they can move a data center, but as I understand it you must stop the applications to execute the move.) However, for purely academic reasons, let's analyze what it would take to do this:


  1. Software must be decoupled from physical hardware. There are currently two ways to do this that I know of:
    • Run the application on a virtual server platform
    • Boot the file system dynamically (via PXE or similar) on the appropriate capacity
  2. Software must loosely coupled from "external" dependencies. This means all software must be deployable without hard coded reference to "external" systems on which it is dependent. External systems could be other software processes on the same box, but the most critical elements to manage here are software processes running on other servers, such as services, data conduits, BPMs, etc.
  3. Software must always be able to find "external" dependencies. Loose coupling, as most of you know, is sometimes easier said than done, especially in a networked environment. Critical here is that the software can locate, access and negotiate communication with the external dependencies. Service registries, DNS and CMDB systems are all tools that can be used to help systems maintain or reestablish contact with "external" dependencies.
  4. System management and monitoring must "travel" with the software. Its not appropriate for a fluid environment to become a Schrodinger's box, where the state of the system becomes unknown until you can reestablish measurement of its function. I think this may be one of the hardest requirements to meet, but at first blush I see two approaches:
    • Keep management systems aware of how to locate and monitor systems as they move from one system to another.
    • Allow monitoring systems to dynamically "rediscover" systems when they move, if necessary. (Systems maintaining the same IP address, for instance, may not need to be rediscovered.)

This is just a really rough first cut of this stuff, but I wanted to put this out there partly to keep writing, and partly to get feedback from those of you with insights (or "incites") into the concept of software fluidity.

In future posts I'll try to cover what products, services and (perhaps most importantly) standards are important to software fluidity today. I also want to explore whether "standard" SOA and BPM architectures actually allow for fluidity. I suspect they generally do, but I would not be suprised to find some interesting implications when moving from static SOA to fluid SOA, for instance.

Respond. Let me know what you think.

Monday, October 15, 2007

Is your software ready for utility computing?

I've been seeing more thoughts on the effect of utility computing on software architectures lately, and one very well stated argument comes from Alistair Croll, Vice President of Product Management and co-founder of Coradiant, a performance tool company. Though clearly self-serving, his message is simple: if you are going to pay by the cycle--or even just share cycles between applications--you'd better make sure your software takes as few cycles as possible to do its job well.

This is one of the unforeseen effects of "paying for what you use", and I have to say its an effect that should scare the heck out of most enterprise IT departments. Although I would argue part of that fear should come from the exposure of lousy coding in most custom applications, the worst part is the lack of control most organizations will have over the lousy coding in the packaged applications they purchased and installed. Suddenly, algorithms matter again in all phases of software development, not just computing intensive steps.

The worst offender here will probably the the user interface components: Java SWING, AJAX and even browser applications themselves. To the extent that these are hosted from centralized computing resources (and even most desktops fall into this category in the some visionaries' eyes), then the incredible amount of constant cycling, polling and unnecessary redrawing will be painfully obvious in the next 10 years or so.

I have always been a strong proponent for not over-engineering applications. If you can meet the business's ongoing service levels with an architecture that cost "just enough" to implement, you were golden in my book. However, utility computing changes the mathematics here significantly, and that key phrase of "meet the business's ongoing service levels" comes much more into play. Ongoing service levels now include optimizations to the cost of executing the software itself; something that could be masked in a underutilized, siloed-stack world.

The performance/optimization guys must be loving this, because they now have a product that should see immediate increase in demand. If you are building a new business application today, you had better be:

  1. Building for a service-based, highly distributed, utility infrastructure world, and
  2. Making sure your software is a cheap to run as possible.

Number 2 above itself implies a few key things. Your software had better be:

  • as standards based as possible--making it possible for any computing provider to successfully deploy, integrate and monitor your application;
  • as simple to install, migrate and upgrade remotely as possible--to allow for cheap deployment into a competitive computing market;
  • as efficient to execute as possible--each function should take as few cycles as possible to do its job

The cost dynamics will be interesting to note, especially their effects on the agile processes, SOA, and ITIL movements. I will keep a careful tab on this, and will share my ongoing thoughts in future posts.

Thursday, October 04, 2007

Links - 10/4/2007

A Classic Introduction to SOA (DanNorth.net): Thanks to Jack van Hoof, I was led to this brilliant article on modelling SOAs in business terms. (Check out the PDF, the graphics and layout make it an even more fun read.) Rather than spend a bunch of words "me too"-ing Jack and Dan, let me just say that this is exactly the technique I have always used to design service oriented architectures, ever since my days in the mid-90s designing early service oriented architectures at Forte Software.

Classic examples of where this led to better design were the frequent arguments that I would have with customers and partners about where to put the "hire" method in a distributed architecture. Most of the "object oriented architects" I worked with would immediately jump to the conclusion that the "hire" method should be on the Employee class. However, if you sat down and modelled the hiring process, the employee never hired himself or herself. What would happen is the hiring manager would send the information about the employee to the HR office, who would then receive more information, create a new employee file and declare the new employment to the tax authorities. Thus, the "hire" method needed to be on the HR service, with the call coming from the application (or service) initiating employment (i.e. the hiring manager in software form), passing the employee object (or a representation of that object) for processing.

Without exception, that approach led to better architectures than trying to map every method that had any relation to a class of objects directly on the class itself.


Twilight of the CIO (RoughType: Nicholas Carr): Man, Nick is in rare form now that his is back from his blogging hiatus. His thesis here is that, with the advent of technologies that can be more easily managed outside of IT, and with IT departments doing less R&D and more shepherding outsourced and SaaS infrastructure, the need for the CIO role is diminishing--which I react to with mixed feelings.

On the one hand, there is no doubt that small and mid-sized non-high-tech businesses are going to have less need for a voice representing technical infrastructure issues on the executive board. There will still need to be management (as the first comment to Nick's post alludes to), but they will be a lot like the facilities guy in most businesses today--simply shepherding the services hired by the business.

(Perhaps the "centralized/decentralized pendulum" is definitely shifting wildly, with decentralization this time actually resulting in business systems residing outside of IT entirely?)

On the other hand, I'm not seeing the "simplified" nature of technology happening yet in most mid- to large-sized businesses. Cassatt sells utility computing platform software--basically an operating system for your data center. Resources are pooled and distributed as needed to meet the businesses needs (as defined in SLAs assigned to software). We make it easy to cut tremendous amounts of waste, rigidity and manual labor out of the basic data centers. CIOs love this vision, and drive technical changes in the customers we work with. However, implementations still take a long time. Why? Because most existing infrastructures are about 10 years behind the desired state of the art the IT department is trying to achieve. Also because its not just a technical change, its a cultural change. (By the way, so is SaaS.)

I fear that the lack of technical leadership on the executive team will actually hinder adoption of these critical new technologies and other technologies only being thought of now, or in the future. What I think ultimately needs to happen is that high level technical critical thinking skills need to be taught to the rest of the line-of-business executives, so that interesting new technologies will drive interesting new business opportunities in the years to come.

This goes to Marc Andreesen's recent post on how to prepare for a great career. Don't rest on your technical skills, or your business skills, but work hard to develop both. (Marc is another blogger who has been on a streak lately...read his career series and learn from someone who knows a little about success.)

Monday, September 10, 2007

Links - 09/10/2007

Brave New World (Isabel Wang): I can't begin to express how sorry I am to see Isabel Wang leave the discussion, as her voice has been one of the clearest expressions of the challenges before the MSP community. However, I understand her need to go where her heart takes her, and I wish her the best of luck in all of her endeavors.

(Let me also offer my condolences to Isabel and the entire 3TERA community for the loss of their leader and visionary, Vlad Miloushev. His understanding of the utility computing opportunity for MSPs will also be missed.)

MTBF: Fear and Loathing in the Datacenter (Aloof Architecture: Aloof Schipperke): Aloof discusses his mixed feelings about my earlier post on changing the mindset around power cycling servers. I understand his fears, and hear his concerns; MTBF (or more to the point, MTTF) isn't a great indicator of actual service experience. However, even by conservative standards, the quality and reliability of server components has improved vastly in the last decade. Does that mean perfection? Nope. But as Aloof notes, our bad experiences get ingrained in the culture, so we overcompensate.

CIOs Uncensored: Whither The Role Of The CIO? (InformationWeek: John Sloat): Nice generality, Bob! Seriously, does he really expect that *every* IT organization will shed its data centers for service providers? What about defense? Banking? Financial markets? While I believe that most IT shops are going to go to a general contractor/architect role, I think there is still a big enough market for enterprise data centers that markets to support them will go on for years to come.

That being said, most of you out there should look at your own future with a service-oriented computing (SOC?) world in mind.