To Rule the Clouds Takes Software: Why Amazon SimpleDB is a Huge Next Step
Posted by Bob Warfield on December 15, 2007
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them…
There is much interesting cloud-related news in the blogosphere. Various pundits are sharing a back and forth on the potential for cloud centralization to result in just a very few datacenters and what that might mean. The really big news is Amazon’s fascinating new addition to their cloud platform of SimpleDB. Let’s talk about what it all means.
Sun’s CTO, Greg Papadopoulos, has been predicting that the earth’s compute resources will resolve into about “five hyperscale, pan-global broadband computing services giants” — with Sun, in its version of this future scenario, the primary supplier of hardware and operations software to those giants. The last was channeled via Phil Wainewright, who goes on to ask, “What is it about a computing grid that’s inherently “more centralized” in nature?” He feels that Nick Carr has missed the mark and swallowed Sun’s line hook, line, and sinker. For his part, Carr’s only crime was to seize on a good story, because at the same dinner, another Sun executive, Subodh Bapat, was telling Carr that sometime soon a major datacenter failure would have “major national effects.” The irony is positively juicy with Sun talking out both sides of their proverbial mouths.
The tradeoff that Carr and Wainewright are worried about is one of economies of scale that favor centralization versus flexibility and resiliency that favors decentralization. Where they differ is that Carr sees economies of scale winning in a world where IT matters less and less and Wainewright favors the superior architectural possibilities of decentralization. Is datacenter centralization inexorable? In a word, yes, but it may not boil down to just 5 data center owners, and it may take quite a while for the forces at work to finish this evolution. The factors that determine who the eventual winners will be are also quite interesting, and have the potential to change a lot of landscapes that today are relatively isolated. Let’s consider what the forces of centralization are.
First, there is a huge migration of software underway to the cloud. In other words, software that is never installed on your machine or in your company’s datacenter. It resides in the cloud and comes to you via the browser. Examples include SaaS on the business side and the vast armada of consumer Web 2.0 products such as Facebook. No category is safe from this trend, not even traditional bastions as should be clear from the growing crop of Microsoft Office competitors that reside in the cloud.
Second, this migration leads to centralization. The mere act of building around a cloud architecture, even if it is a private cloud in your own company’s datacenter, leads to centralization. After all, software is moving off your desktop and into that datacenter. When many companies are aggregated into a single datacenter, into a SaaS multi-tenant architecture, for example, further centralization occurs. When you offer a ubiquitous service to the masses, as is the case with something like Google, the requirements to deliver that can lead to some of the largest datacenter operations in the land.
Third, there are the afore-mentioned economies of scale. Google has grown so large that it now builds its own special-purpose switches and servers to enable it to grow more cheaply. The big web empires are all built on the notion of scaling out rather than scaling up, and they run on commodity hardware. Because they have so many servers, automating their care and feeding has been baked into their DNA. Not so with most corporate datacenters that are just beginning to see the fruits of crude generic technologies like virtualization that seek to be all things to all people. Virtualization is a great next step for them, but there are bigger steps ahead yet that will further reduce costs.
Fourth, the ultimate irony is that centralization begats centralization through network effects. This is the story of the big consumer web properties. Every person that joins a social network adds more value to the network than the prior person did. The value of the network grows exponentially. This connectedness is facilitated most easily in today’s world by centralization. Vendors that start to get traction increase their network effects in various ways: Amazon charges to bring data in and out of their cloud, but not to transfer between services within the cloud.
Lastly, there are green considerations at work. The biggest costs associated with datacenters these days are around electricity and cooling. Microsoft is building a data center in Siberia, which is both cold and pretty central to Asia. Consider this: given the speed of light over a fiber connection, what is the cost of latency in having a data center somewhere far north (and cold) in Canada like Winnipeg versus far south (and hot) like Austin, Texas? It’s 1349 miles, which, as the photon travels (186,000 miles per second) is about 7.2 milliseconds. The world’s fastest hard drive, the nifty Mtron solid state disks I’m now coveting thanks to Engadget and Kevin Burton, can only write a paltry 80K or so bytes in that time: not even enough for one photo at decent resolution. So consider a ring of datacenter clusters built in colder regions. Centralized computing is up north where the cold that computers like is nearly free for the asking: just open a window many days. Or come closer. Put it up on a mountain peak. Immerse it near a hydro dam and get the juice cheaper too. It doesn’t matter. Laying fiber is pretty cheap compared to paying the energy bills.
The next question is trickier: how do these clouds compete? Eventually, they will become commoditized, and they will compete on price, but we are a long ways from that point. At least 10 years or more. Before that can happen, customers have to agree on what the essential feature sets are for this “product”. I believe this is where software comes into play, and that should be a matter of great concern for the hosting providers of today whose expertise largely does not revolve around software as a way to add value. As Eric Schmidt said (via Nick Carr) when he started saying Google would enter this market:
For clouds to reach their potential, they should be nearly as easy to program and navigate as the Web. This, say analysts, should open up growing markets for cloud search and software tools—a natural business for Google and its competitors.
Some will immediately react with, “Hold it a minute, what about the hardware? What about the network?” The best of the cloud architectures will commoditize those considerations away. In fact, commoditization will start down at the bottom of the technology stack and work its way up. The first stage of that, BTW, is already almost over. That was the choice of CPU. MIPS? PowerPC? SPARC? No, Intel/AMD are the winners. The others still exist (not all of them!), but they’ve peaked and are on their way down at various terminal velocities. Their owners need to milk them for profit, but it would be a losing battle to invest there. Even Macs now carry Intel inside, and Sun now carries the ticker symbol “JAVA”, a not-so-subtle hat tip to the importance of software.
Hardware boxes are largely a dead issue too. There is too little opportunity to differentiate for very long and the cpu’s dictate an awful lot of what must be done. Dell is an assembler and marketer of the lowest cost components delivered just in time lest they devalue in inventory. Sun still pushes package design, and it may have some relevance to centralization, but this will be commoditized because of centralization.
Next up will be the operating system. Again, we’re pretty far down the path of Linux. Corporations still carry a lot of other things inside their firewalls, but the clouds will be populated almost exclusively with Linux, and we could already see that has happened if we could get reliable statistics on it. Linux defines the base minimum of what a cloud offering has to provide: utility computing instances running Linux. This is exactly what Amazon’s EC2 offers.
What else does the cloud need? Reliable archival storage. Again, Amazon offers this with S3. Cloud consumers are adopting it in droves because it makes sense. It’s a better deal than a raw disk array because it adds value versus that disk array for archival storage. The value is in the form of resiliency and backup. Put the data on S3 and forget about those problems. This begins the commoditization of storage. Is it any wonder that EMC bought VMWare and that a software offering is now most of their market cap? Hardware guys, put on your thinking caps, this will get much worse. What software assets do you bring to the table.
3Tera is a service I’ve talked about before that has a very similar offering available from multiple hosting partners of theirs. They create a virtual SAN that you can backup and mirror at the click of a mouse. They let you configure Linux instances to your heart’s content. Others will follow. IBM’s Blue Cloud offers much the same. This collection is today’s blueprint for what the Cloud offers in terms of a platform.
But, this platform is a moving target, and it will keep moving up the stack. Amazon just announced another rung up with SimpleDB. For most software that goes into the Cloud, once you have an OS and a file system, the next thing you want to see is a database. Certainly when I attended Amazon Startup Project, the availability of a robust database solution was the number one thing folks wanted to see Amazon bring out. The GM of EC2 promised me that this was on the way and that there would be several announcements before the end of the year. First we saw the availability of EC2 instances that had more memory, disk, and cpu, so that they’d make better database hosts. SimpleDB is much more ambitious. It’s a replacement for the conventional database as embodied in products like mySQL and Oracle that was designed from the ground up to live in a cloud computing world. At one stroke it solves a lot of very interesting problems that used to challenge would-be EC2 users around the database.
Since Amazon brought out its S3 storage service, I’ve seen many many startups give up data centers altogether.
Tell me why the same thing won’t happen here.
There is no doubt in my mind that all startups will give up having datacenters altogether before this ends. However, before we get too head up in assuming that SimpleDB gives us that opportunity, let’s drop back and consider what it’s limitations are:
– It is similar to a relational database, but there are significant differences. Code will have to be reworked to run there, even if it doesn’t run afoul of the other issues.
– Latency is a problem when your database is in another datacenter from the rest of your code. Don MacAskill brings this one up, and all I can say is that this is another network effect that leads to more centralization. If you like Simple DB, it’s another reason to bring all of your code inside Amazon’s cloud.
– All fields are strings, and they are limited to 1024 characters. Savvy developers can use the 1024 characters to find unlimited size files on S3, as well as other methods like combining fields to get around this limit. Mind you, a lot can be done with that, but it is again a difference from traditional RDMS systems and it means more work for developers that must overcome the limitation.
– There are no joins, if you want them (and many proponents of hugely scalable sites view joins as evil), you have to roll your own.
– Transactions and consistency are also absent. Reads are not guaranteed to be fully up to date with writes.
– There is no indexing and a whole host of other trappings that database afficionados have gotten comfortable with.
Mind you, serious web software is created within these limitations including some at Amazon itself. In exchange for living with them, you get massively scalable database access at good performance and very cheaply. And, as Techcrunch says, you may be able to get rid of one of the highest cost IT operations jobs around, database administration, and your costs are even lower. Remember my analysis that shows SaaS vendors need to achieve 16:1 operations cost advantages over conventional software and you can see this is a big step in that direction already.
There is no doubt that cloud computing will be massively disruptive, and that Amazon are well on their way in the race to plant their flag at the top of the mountain. The pace of progress for Amazon Web Services has been blistering this year, and much more hype free than what we’ve gotten from the likes of Google and Facebook when it comes to platform speak. It’s almost odd that we haven’t heard more from these other players, and especially from the likes of Google. GigaOm says that Simple DB completes the Amazon Web Services Trifecta. They go on to say that Amazon’s announcements have the feel of a well thought out long term strategy, while Google’s make it sound like the ad hoc grab bag of tools. I think that’s true, and perhaps reflective of Google’s culture, which is hugely decentralized to the point of giving developers 20% free time to work on projects of their choosing. The problem is that such a culture can more easily give us a grab bag of applications, as Google has, than it can provide a well-designed platform, as Amazon has. Or, as Mathew Ingram puts it, while everyone else was talking about it, Amazon went ahead and did it.
I’ve talked to a dozen or so startups that are eagerly working with the Amazon Web Services and having great success, as well as some frustrations. They require rethinking the old ways. Integrity issues are particularly different in this brave new world, as are issues of latency. That matters to how a lot of folks think about their applications. Because of the learning curve, I don’t plan to go out and short Oracle immediately, but the sand has started running in the hourglass. There will be more layers added to the cloud, and over time it will become harder and harder to ignore. There will be economic advantage to those who embrace the new ways, and penalties for those who don’t. This is a bet-your-business drama that’s unfolding, make no mistake. At the very least, you need to get yourself educated about what these kinds of services offer and what they mean for application architecture.
Business located low in the stack I’ve mentioned will be hit hard if they don’t have a strategy to embrace and win a piece of the cloud computing New Deal. We’re talking hardware manufacturers like Sun, Dell, IBM, and HP. Software infrastructure comes next. Applications that depend on low cost delivery, aka SaaS, are also very much in the crosshairs, although probably at a slightly later date.
Welcome to the brave new world of utility cloud computing. Long live the server, the server is dead!
Coté’s Excellent Description of the Microsoft Web Rift : Nice post on cloud computing at Microsoft