Monday, August 13, 2007

Data X.0, and why you need SLAuto now

Data 2.0: How the Web disrupts our relational database world (GigaOM: The Future of Software): I'm a little annoyed that there is a claim here that distributed data is only rev 2 of database technology, but otherwise this is an important trend to keep tabs on if you are interested in enterprise software architectures. The following statement from the article says it best:
Relational databases are to software what mainframes are to networked hardware:the monolithic beast at the core that needs magic incantations from high priests to run, and consumes unsuspecting junior engineers for breakfast.
The truth is we have depended on giant centralized relational stores for a very long time now, but we've had the luxury of building databases with a basically "one-owner" model for the data they contain. However, as the web (and, I guess, social networking software) blows away that concept, and the data that drives our applications becomes necessarily distributed both intra- and inter-organization, we are being forced to expand the technology around data management.

What does this have to do with SLAuto? Well, how are you going to handle the increased management complexity that all of this brings? How are you going to monitor the health and performance of dozens or hundreds (or thousands!) of distinct data sources that make up your key applications and services? Worse yet, how are you going to scale the infrastructure to handle varied--and likely unpredictable--demand?

I would argue that you need some level of SLAuto in place before creating or depending on a distributed data infrastructure. I admit this is something I haven't put a lot of thought into, but its sheer complexity calls out for intelligence managing scale and failure recovery. If you start exploding your data infrastructure without some level of automation around the infrastructure, you are doomed to doing the same tasks manually. With that many more "atoms" to manage, the effect on productivity may well be overwhelming.

I guarantee that Yahoo, Google and Amazon have the management and monitoring in place to keep these distributed data "ecosystems" running smoothly. In fact, the article even notes that Yahoo is using "Hadoop" for "the massive data mining of Webserver logs." I'm not sure, but I bet there are some management tasks that depend on this data mining.

Don't wait for the complexity to overwhelm you--implement Service Level Automation.

No comments: