SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the 'platforms' Category


Amazon Ran Out of Capacity

Posted by smoothspan on February 18, 2008

As I suggested in my original post on the topic, Amazon’s recent S3 outage was due to running out of capacity.  Specifically, they ran out of authentication capacity.  In part, this problem was due to the fact that Amazon wasn’t monitoring exactly this part of their capacity envelope very well.  High Contrast has the Amazon quote telling us that it was also due to just a few customers radically increasing their load on the system in an unpredictable way:

the surge was caused by at least one very large customer plus several other customers suddenly and unexpectedly increasing their usage. 

So far, most of the pundits are in something of a denial mode.  They argue that nothing really new and interesting is happening here.  All services go down, including the electric companyVinnie Merchandani says corporate data centers have been going down a lot more often than 99.999% uptime allows for since forever.  Folks like Nick Carr seem to feel the biggest issue in this outage was that users didn’t have timely information and Amazon is fixing that.

This all misses a bigger point.  What these writers are doing is attempting to apply the old standards and methods against the new world of Cloud Computing.  The trouble is, there is something genuinely new at work here that goes beyond the inevitability of some outages and the need to be more transparent with customers about what is going on.  The problem Amazon and other would-be cloud platform purveyors face is predictability.  The world they deal in is radically less predictable than corporate data centers of old because the Internet today has much lower friction and higher connectivity between different web sites that make load spikes increasingly sudden and intense.  There is a cascade of dominoes effect that is enabled by the low friction web that wasn’t nearly so twitchy in the past. 

The premise of any large computing infrastructure is that by sharing the load across many customers (and in Amazon’s case, sharing excess capacity from their core retail business), we enable headroom for such load spikes.  But how realistic is that concept?

Consider this Alexa plot of CNN and Flickr traffic over time:

 Flickr Traffic

Do these two curves look predictable to you?  Take CNN, for example.  To handle the big spikes requires 2-3x overload capacity.  Flickr is a little less crazy except for one massive event that involved a doubling in a very short time.  This latter even was permanent in its effect, so if you were counting on temporarily borrowing some headroom, you would have had to keep it in place indefinitely and grow from there.  Ironically, that chart was brought to my attention at Amazon Startup Project where they used it to sell the idea of unlimited headroom a startup can’t afford to purchase by using Amazon Web Services.

These charts are displaying non-linear behaviour, the hardest of all phenomena to predict.  This non-linearity is becoming more and more common because the Internet has become extremely viral.  It is crosslinked, the very meaning of the word “web”, and messages travel along the links with almost no friction.  Viral has become a virtue, and much of the current innovation is focused around how to make the viral spread of information more likely.  Social Networks are all about such behaviour.  Take a look again at those CNN spikes.  Now let’s imagine your cloud computing infrastructure is hosting a bunch of different blogging, micro-blogging, video, photo sharing, and other social sites.  The CNN spikes no doubt represent something newsworthy happening.  The greatest likelihood is that each spike will be echoed at some level across all of these sites that are in the business of spreading information.  Friction has been lowered to the point it is almost non-existent when it comes to the spread of memes on the Internet.  We have major spikes from world events, such as the assassination of a world leader.  In the Internet, we can have major spikes from such inane moments as Scoble shedding tears of delight over new Microsoft secret software.  And the whole thing is wired together.  That one tear on Scoble’s cheek breeds a thousand or more accounts ranging from poking fun to trying to guess what this secret software is.  There is a ravenous beast poised over the keyboard waiting for something interesting to pass onto its network of other ravenous beasts.

This is decidedly non-linear behaviour and impossible to predict.  The answer is major cloud computing infrastructure providers will need to have considerable excess capacity available on tap at all times to avoid outages.  Take Amazon.  Web bandwidth to their web services now exceeds to total traffic to all of their other properties.  What might have once been a nice remaindering business allowing them to resell their excess capacity is now driving the need for more capacity.  They have just a few choices.  They can invest in a lot more hardware and lower the margins on their business, or they can implement some strategies to limit the availability of the service to some customers.  It strains credulity to think they’ll limit capacity to their retail business.  How will they decide?  Tiered pricing of some kind? 

Think in terms of other unexpected networked events.  I’m reminded of financial markets and the law of unintended consequences.  Look at today’s housing market.  Remember Long Term Capital, a hedge fund with Nobel Laureates who had mathematical proofs they would continue making money.  Right up until they unpredictably went bankrupt.  BTW, this sort of thing used to happen with the electrical grid too.  In both cases, the financial markets and the electrical grid, elaborate means were put into place to artificially inject friction to damp the machine’s oscillations before it could destroy itself.  There are elaborate rules in the stock exchanges about shorting stocks that are falling.  They inject a form of friction back into those markets to prevent total free fall. 

Perhaps this points the way to new technology for Cloud Computing infrastructure.  A gentle injection of the right kind of friction at the right point for a limited time might prevent suddenly massive spikes and outages.  It’s an area ripe for innovation.  Meanwhile, Amazon could sorely use some competition.  If a customer could contract for emergency capacity from elsewhere, or even better, if the Cloud Computing Providers could share slack capacity as the electrical companies do, it would be tremendously helpful when the inevitable load spikes arrive.

Posted in Web 2.0, amazon, data center, platforms, saas | 3 Comments »

MySQL and BEA: Oracle and Sun Will Be At Each Other’s Throats!

Posted by smoothspan on January 16, 2008

Big news today is that Sun is buying MySQL and Oracle is buying BEA. This creates a couple of strange bedfellows to say the least. BEA is inextricably wrapped up in Sun’s Java business (is it really a business or just a hobby given the revenues it doesn’t produce?) which gives a reason for the two to get closer together. On the other hand, there is hardly a bigger threat to Oracles core database server business imaginable than MySQL, which has got to push the two companies further apart. What a tangled web!  Is Sun leaving Oracle to its own devices in order to pursue cloud computing?  Sure looks like it!

Let’s analyze these moves a bit. I want to start with BEA and Oracle.

As we all know, Oracle started that courtship dance not long ago and was rebuffed for not offering enough.  Amusingly, they closed almost exactly at the midpoint of the prices the two argued were “fair” at the outset.  Meanwhile, the recession is really setting in, stock prices are falling, and Oracle’s offer went up.  Since Cisco’s John Chambers mused about IT spending will slowing, it has become a widely accepted article that this will happen. So shall it be said, so shall it be written, Mr. Chambers. That’s a very bad thing for BEA, which is primarily selling to that market. The corporate IT market is their bread and butter for a number of reasons. Many ISV’s and web companies will look to Open Source solutions like Tomcat or JBoss with which to reduce costs. Corporate IT wants to superior support of a big player like BEA. The darker truth is that big Java seems to be falling out of favor among the bleeding edge crowd. Java itself gets a lot of criticism, but is strong enough to take it. J2EE is another matter, though there is still a huge amount of it going on. There is also the matter of the steady ascendency of RESTful acrchitecture while BEA is one of the lynchpins of Big SOA.  There is already posturing about the importance of BEA to Oracle Fusion.  If it is so important, Fusion may be born with an obsolete architecture from day one. 

The long and the short is that any competent tea leaf reader (is there any such thing?) would conclude that this was a good move for BEA to let themselves be bought before their curve has crested too much more. For Oracle’s part, its a further opportunity to consolidate their Big Corporate IT Hedgemony and to feed their acquisition-based growth machine. I am not qualified to say whether they paid too much or not, but if I do think the value curve for BEA is falling and will continue to fall post-acquisition. They are way late on the innovation curve, which looks to me like it has already fallen.  In short, BEA is a pure bean counting exercise: milk the revenue tail as efficiently as possible and then move on.  For this Oracle paid $8.5B.  Not surprisingly, even though it is a much bigger transaction, there is much less about it on the blogosphere as I write this than about the other transaction.

Speaking of which, let’s turn to the Sun+MySQL combination.  Jonathan Schwartz gets a bit artsy with his blog post introducing the introduction, which he calls “Teach dolphins to fly.”  The metaphor is apropos.  Schwartz says that MySQL is the biggest database up and comer news in the world of network computing (that’s how we say cloud computing without offending the dolphins that haven’t figured out how to fly yet).  What Sun will bring to the table is credibility, solidity, and support.  He talks about Fortune 500 needing all that in the guise of:

Global Enterprise Support for MySQL - so that traditional enterprises looking for the same mission critical support they’ve come to expect with proprietary databases can have that peace of mind with MySQL, as well.

That business of “proprietary databases” means Oracle.  Jonathan just fired a good sized projectile across your bow Mr. Ellison.  What do you think of that? 

I know what I think.  Getting my tea leaf reading union card back out, I compare these two big acquisitions and walk away with a view that Oracle paid $8.5B to carve up an older steer and have a BBQ while Sun paid $1B to buy the most promising race horse to win the Kentucky Derby.  What a brilliant move for Sun!  Now they’ve united a couple of the big elements out there, Java being one and MySQL the other.  They could stand to add a decent scripting language, but unlike Microsoft’s typical tactics, they’ve learned not to ply a scorched earth policy towards other platforms, so they are peacefully coexisting until a better cohabitation arrangement comes along. 

We talked a little about the Oracle transaction being a good deal for BEA:  it’s a lucrative exit from declining fortunes.  What about mySQL?  Zack Urlocker comments about the rumor everyone knew, that MySQL had been poised to go public.  Let me tell you: this is a far better move.  Savvy private companies get right to the IPO alter, and then they find someone to buy them for a premium over what they would go out at.  What they gain in return is potentially huge.  The best possible example of this was VMWare.  Now look where they are.  I will argue that would not have been possible without the springboard of EMC.  At least not this quickly.   Sun offers the same potential for MySQL.  It is truly the biggest open source deal in history.  It’s also a watershed liquidity event for a highly technical platform based offering from a sea of consumer web offerings.  The VC’s have been pretty tepid about new deals like MySQL.  Perhaps this will help more innovations to get funded.

What do others have to say about the deal?

 - Tim O’Reilly echoes the big open source and importance of database to platform themes.

 - Larry Dignan picks up on my rather combative title theme by pointing out that it puts Sun at war with the major DB vendors:  Microsoft, IBM and Oracle.  Personally, I think any overt combat will hurt those three.  The Open Source movement holds the higher moral ground and it just won’t be good PR to buck that too publicly.  Dignan sounds like he is making a little light of Schwartz’s conference call remark that it is the most important acquisition in Sun’s history, but I think that is no exaggeration on Jonathan’s part.  This is a hugely strategic move that affects every aspect of how Sun interfaces with the world computing ecosystem including its customers, many partners, and its future.  When Dignan asks what else Sun needs, I would argue a decent scripting language.  Since Google already has Python in hand, what about buying a company like Zend to get a leg up on PHP?  Last point from Larry is he asks, “If Sun makes MySQL more enterprise acceptable does that diminish its mojo with startups? Does it matter?”  Bottom line: improvements for the Enterprise in no way diminish what makes MySQL attractive to startups, providing Sun minds its manners.  So far it has been a good citizen.  With regards to, “Does it matter?”  Yes, it matters hugely.  MySQL is tapped into all the megatrends that lead to the future.  Startups are a part of that.  Of course that matters.

One other thought I’ve had:  what if Sun decides to build the ultimate database appliance?  I’m talking about order it, plug your CAT5 cable in, and forget about it.  Do for dabases what disk arrays did for storage.  That seems to me a powerful combination.  Database servers require a painful amount of care and feeding to install and administer properly.  If Sun can convert them to appliances, it kills two birds with one stone.  First, it becomes a powerful incentive to buy more Sun hardware.  This will even help more fully monetize MySQL, which apparently only gets revenue from 1 in 10,000 users.  Second, it could radically simplify and commoditze a piece of the software and cloud computing fabric that is currently expensive and painful.  Such a move would be a radical revolution that would perforce drive a huge revenue opportunity for Sun.  They have enough smart people between Sun and MySQL to pull it off if they have the will. 

Conclusion

Sun has made an uncannily good move in acquiring MySQL.  As Wired points out:

One company that won’t be thrilled by the news is Oracle, makers of the Oracle database which has managed to seduce a large segment of the enterprise market into the proprietary Oracle on the basis that the open source options lacked support.

With Sun backing the free MySQL option (and offering paid support) Oracle suddenly looks a bit expensive.

How else can you simultaneously lay a bet on owning a substantial piece of the computing fabric that all future roads are pointing to and send a big chill down Larry Ellison’s spine for the low low price of just $1B?  Awesome move, Jonathan!

Related Articles

VARGuy says the acquisition means Sun finally matters again.  $1B is cheap to “finally matter again!”

Posted in Open Source, Partnering, Web 2.0, business, enterprise software, platforms, saas, soa, strategy | 8 Comments »

Is Programming Like Music or Engineering, and Must it Be Unintuitive?

Posted by smoothspan on January 8, 2008

Two different blog posts hit me today with one of those unconscious knee jerk desires to disagree.  First was Joel Spolsky wishing that undergraduate computer science programs would quit spending so much time on formalisms and become more like a Julliard music program.  Second was Raganwald contending that if a language is really going to push the envelope of “better”, it must by necessity be even less intuitive.  Taken individually, I probably could have swallowed these pills without comment, but wouldn’t you know it: both wound up back to back in my feed reader.  Somehow they tickled the same mysterious place in my subsconscious, but I had to actually write this diatribe to understand how.

Let’s start with Raganwald.  On the face of it, the logic is pretty sound.  It’s not unlike one of my favorite quotations, which goes:

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself.  Therefore all progress depends on the unreasonable man. 
George Bernard Shaw

The essential thesis that Reg works from is that all programming languages are equally powerful, because they’re Turing complete.  As such, he wants to talk about what makes one language better than another.  He concludes at one point that since all languages are equally as powerful, and since better is subjective, that better languages make you a better programmer, which in and of itself is subjective.  Also, making you a better programmer by definition means moving you out of your comfort zone.  Because of all that, “better” languages have to be unintuitive.  “QED”, as we used to say.

Right at this point I’ve had too many almost-rights go by to be able to sit still.  First, “better” does not have to be subjective.  We could choose to quantify it in some way.  I know that as a profession, programmers have a tremendous love hate relationship with various metrics to the point where we largely throw up our hands and dismiss them all as useless.  Sorry, but I am not in that camp.  I think there are many useless metrics, but not all are useless any more than all evaluations of every programming language are hopelessly subjective personal opinions.  I am even happy to accept fairly simple-minded quantifications such as being able to write similar things in fewer lines of code.  And please, don’t waste our time with silly contrived examples where fewer lines of code result in backwards progress.  In general, and with appropriate subjective oversight to weed out the silly off-point examples, I am convinced that fewer lines of code is a good thing.  Here is a harder one:  how about fewer clearer lines of code?  That’s going to be harder to quantify as a metric, and I can accept that. 

Now here is where I guess Reg and I part company:  is a language that needs fewer clearer lines of code necessarily less intuitive?  Reg seems to say “yes”, that language will be less intuitive.  I say that by the very construction, i.e. the lines are clearer, the answer must be “no.”  Is the only resolution that it is impossible to create a language using fewer clearer lines of code?  Gosh, I hope not, because that seems to say we can’t build better languages unless you want to quarrel with “fewer” or “clearer” not being better.  That’s my “QED”, but let’s tease things apart further, because it gets interesting.

If nothing else, this quantification business reflects a lack of good formalisms being available for a lot of squishy things like clarity of code.  There is another phenomenon at work too with this word formalism.  BTW, you can substitute “model” there if it helps make what I’m saying clearer.  There may be no single formalism or even a small set that is best for all problems solved by all programming languages.  Rather, we may have different fairly independent domains with happy little language ecosystems that work best in one or a few, but that may be terrible in others.  I am surprised and amused at the apparently fractal nature of this formalism by domain concept.  Whenever we think we’re close to finding the universal formalism with which to create the best possible language, we build something like Lisp.  After it’s built, and we use it for a while, what we discover is that such universal formalisms are really good at is creating domain-specific languages, but that they are not so good at many other things.  So in a sense, they’re lousy tools for any particular problem,  but they make it easy to create a special tool that gives the best possible solution to the sub-domain.  See what I mean by fractal?  There’s something very cool about that fractal idea that just feels right.  If you agree, it probably means we’re communicating.

Is it any surprise that it takes many formalisms to make the disparate problems we face soluble with fewer clearer lines of code?  I’m not surprised.  I’ve fiddled with just a few domains and languages in my career, and there are many differences.  Here are some I’ve played with: 

-  Database problems like the inherently parallel and set theoretic languages including the rather homely but oh-so-practical SQL. 

-  Bootstrapping everything from nothing can be done very easily with Forth.

-  Certain kinds of graphics cry out for the purity of mathematical languages with an ability to manipulate matrices and deal with collections of graphical entities.  Some like Fortran for math, but I’ve always like APL and spreadsheets.  I never played with Mathematica enough, but it is likely another alternative.  Tying something like APL and spreadsheets together with graphics in a suitably expressive way would probably make for a wonderful graphics language.  I’ve certainly seen interesting graphics come out of Mathematica, so maybe it’s been done.

-  User interface is a surprisingly parallel problem space dealing with events.  Most frameworks make this painful.  I’ll bet there is an opportunity for a great language in this niche.  I like Adobe’s Flex a lot, but it is far from perfect.  Constraint oriented programming (Thinglab!) is closer, so I am very hopeful about Adobe Thermo raising the bar further.

-  Low-level systems programming of memory allocators, process schedulers, and the like loves languages from the C family.   The less successful branch that included things like Modula-2 and Eifel is not bad either.  There are likely more recent developments I’m not familiar with.

-  Application domain specific programing likes Ruby. 

-  String processing is another world of fascinating special-purpose formalisms such as regular expressions and weird parser generator languages.

-  Creating new languages and pure computer science concepts seems to benefit hugely from Lisp. 

All of the above are sharply affected by the choice of accompanying frameworks and libraries, so they have to be considered as a whole.

This is why I’m so convinced that polyglot systems are the way to go.  Whether the polyglot family has some underlying “assembly language” like Lisp or even Ruby is almost immaterial.  The virtual machine can also be the underlying unifying theme, so long as the various polyglot languages can communicate with one another. 

A last comment on this way of thinking is that I feel Design Patterns are simply ways of imposing or adding those formalisms a particular language doesn’t directly support or doesn’t support well.  There’s been a lot of back and forth on this in the blogosphere, and it seems controversial.  Some say that a perfect language would need no design patterns.  I don’t think so.  I hope I’ve pointed out adequately that the domains require different languages so that there are no “perfect” languages.  From my perspective, you could take any design pattern and come up with a language construct that builds the pattern into the language.  Whether it makes sense or not is another issue.

Languages can become too boroque and obtuse when they are burdened with too many formalisms.  This is particularly true if the formalisms are at war with one another.  Too many ways to do the same thing may not be helpful.  I’d better stop, because I’m rapidly heading back to my polyglot programming soapbox.  Give me a set of very concise and relatively small langauges for well-defined problem domains.  Perhaps I’ll take one “general purpose”  (Jack of all Trades, master of none?) language to fill in the gaps.

Which brings me to Joel Spolsky’s post.  He wants to forget a lot of the formal training and immerse undergrads in large programming projects coordinated by talented teachers.  Joel is responding to a lamentation that schools are too quick to press kids into Java and don’t teach enough of the old formalisms (now you see why I’ve used that peculiar word so much!).  His answer is to create what he views as a “Julliard” style curriculum where the students build relatively complex software in teams under the watchful eye of talented teachers and without so much of the formalisms.  In this model, students would sign up to build a real piece of software of some kind, perhaps a game or social network.

I had one of those formal computer science educations.  I only have the Bachelor’s degree, but I completed all of the work for a PhD as well.  I just felt it would be more fun to write a business plan for my first startup than a thesis (something which maybe Joel and I agree on given his post).   We covered the formalisms in spades.  There was also an opportunity to take on some large software projects, but that was perhaps 1/3 of the curriculum. 

You are probably familiar with the observation many have made that the top 5% of programmers are perhaps 20 times more productive than the average programmer.  I know this to be true, and have seen it many times.  What does this have to do with the discussion?  Simply this: the amount of experience does not seem to matter in determine whether you’re in that 5%.  It does not seem to be possible to teach one who is not a 5% performer how to get there.  I won’t try to prove that here, take my word for it, or at least ask yourself what it means if it is true.

What is more relevant is my observations on watching that 5% group.  What did they do better?  I used to say the thing that really set them apart was a facility with factoring of all kinds.  Those were the words I used, but factoring has so many meanings these days that I feel I should clarify.  The top performers were really good at flipping problems around and viewing them from many angles.  The top players could do it from extreme “meta” positions that were layers of abstraction away from the core problem.  These people lived and breathed isomorphisms, whether or not they had any idea what the word meant.  I’ve described the application of isomorphisms to creativity in an earlier post on “The Medici” effect.  It’s worth a reread.

By contrast, the average performers tended to view everything according to a very limited set of models or formalisms they grasped.  They would try to hammer every problem into one of those few models.  In the worst cases, the models they were able to understand and employ were limited simply to basic structured programming constructs.  With difficulty they could get control flow.  They were not especially good at breaking things into functions or procedures, let alone modules.  Object oriented programming was often a bridge too far. 

We had several classes where this distinction became obvious.  One was a survey of programming languages that included Lisp, APL, SNOBOL, Prolog, and a couple of others I’ve forgotten.  We learned each new language and had to complete a model program that demonstrated we had really grasped the new models and formalisms being offered.  Another was a course wherein we built up an entire language using the fundamental Turing constructs.  The language created was our own, and everybody’s was different, but we had to build it up from the basics of Turing.  This course was done in Lisp.  It was a 300-level undergraduate course, but it was a definite weed-out course that showed who had it and who didn’t.  The last one was the hardcore algorithms course.  Once again, to understand a full range of interesting algorithms required applying many different formalisms.

You’ve probably guessed my problem with Joel’s approach.  It’s probably fine if you are seeking to create a vocational school for factory programmers.  But, the formalisms are the real meat.  It isn’t the lines of code.  The programmers who got the formalisms could grind out more code than any half dozen of the others combined.  They did so almost automatically and unconsciously.  I suspect Julliard doesn’t work quite as Joel envisions either.  Do students there stretch their thinking about musics “formalisms”, or do they spend most of their time perfecting a single arrangement and composition?

People, go out and learn some new languages.  Learn some new design patterns.  Dig out those formalisms.  It is surprising how intuitive many of them will be.  The more you pick up, the more intuitive future ones will be.  Divorce yourself from a particular language.  That’s what’s holding you up and making things unintuitive.  The best news is that because you don’t have to write reams of code ala Joel, you can learn faster and in your spare time.  The emphasis is on variety more so than volume.

QED

Posted in platforms, software development | 4 Comments »

IBM Trying to Keep Up With the Cloud Jones’s

Posted by smoothspan on January 3, 2008

Can you tell that the whole cloud computing thing is ratcheting up a few notches in intensity?  I blame Amazon, who’ve rolled out a ton of initiatives and gotten lots of traction among startups.  But we ain’t seen nothin’ yet, friends.

Already there are signs that others are feeling like the train is leaving the station.  One of the more interesting is that IBM is bringing on CouchDB’s Damien Katz to work on the project full time.  It seems to me that IBM is making this move to ensure that they have an answer for Amazon’s SimpleDB in the form of CouchDB.  Thanks to Patrick Logan for pointing this out in his own blog post.

We’re going to see this pace continue to accelerate, and we’re going to see those who want to be players jockeying to make sure that they have all the elements in their Cloud Platform Suite.  It’s still to early to tell what the exact combination of ingredients for success will be, but so far it looks like Amazon is the head chef when we see others trying to emulate what they’ve done.

Meanwhile this is fantastic news for developers and startups that want to embrace these technologies.  The danger in things like Amazon’s Web Services is that they are so unique that you become utterly dependant on them.  The more others offer the same sort of services, the more competition can work its magic and make the whole scene more vibrant, cheaper, and innovative.

Viva les Cloud Computing!

Posted in amazon, data center, grid, platforms, saas, strategy | 1 Comment »

Coté’s Excellent Description of the Microsoft Web Rift

Posted by smoothspan on January 2, 2008

Coté’s latest RedMonk post perfectly captures my reservations about Microsoft, which I refer to as their “rift with the web”.  Here are the relevant passages:

Microsoft frameworks are plagued by lock-in fears. That is, you’re either a 100% Microsoft coder or a 0% Microsoft coder. Sure, that’s an exaggeration, but the more nuanced consequences are that something intriguing like Astoria will play best with Microsoft coders, unlike Amazon’s web services which will play well with any coder.

This thing he calles “lock-in fear” and the extreme polarization (encouraged by Microsoft’s rhetoric, tactics, and track record) that you’re either all-Microsoft or no-Microsoft is my “web rift”.  I’ve written about this a couple of times before, and it never fails to raise the ire of some Microsoft fan or other.

This particular RedMonk post is chiefly concerned, it seems to me, with making sure that Microsoft’s Astoria doesn’t disappear under the continuous din of innovation Amazon is putting up for us lately around their cloud computing services.  The essential new thing about Astoria in Coté’s mind is that it is a RESTful framework rather than a .NET framework:

If you’re just coding to a URL, that’s not quit so bad as coding to a .Net library and all the Microsoft baggage and tool-chain needed to support that.

I agree, but I think Coté has obfuscated two issues together that don’t have to be: the issue of how software components communicate versus whether a solution is hosted/SaaS or not.  One can imagine components communicating RESTfully without any need to have them be hosted in SaaS fashion.  My own bias would be to go ahead and host, but it’s not a requirement and to put the two together as one is conflating the issue (don’t you love that word “conflate” that has drifted into common use here in the valley?).  Coté’s contention that hosting itself will reduce fears of lock-in is also pretty hard to swallow.  While I am again a whole-hearted advocate, giving your software over to a hosted environment and all of its attendant API’s (RESTful or not) is a big step towards lock-in, no matter how you look at it.  This is again an area where Microsoft’s old school monopolist behavior won’t serve it well.  There will be fear, perhaps unreasonable, that Microsoft will take unfair advantage if handed the keys to your kingdom by hosting on their cloud infrastructure.  The problem is that they’ve failed to conduct themselves as the Swiss do in matters of banking.  They are voracious competitors and seemingly always will be.  It isn’t enough for them to win, others must lose.

There are several other interesting points made in the post.  For example, on the issue of hosting, Coté wonders why big companies are so slow to launch.  It’s very true, we’ve seen it for all the big players.  The answer is perhaps that they have more to lose and are more likely to reach the win-lose decision point too quickly and with too much momentum to gracefully correct the problems.  Startups have a distinct advantage in this.  It’s not clear to me why more big companies don’t fund startups expressly to deal with the issue with the intention of acquiring them later if things work out.

Coté also makes a hugely important point about the value of self-service:

My sense is that unless it’s all delivered as a URL with dead-simple docs and pricing (check out the page for SimpleDB), any given technology won’t work out at web-scale.

Put another way, these new technologies need to be completely self-service. If a developer has to ever talk with a human from the company or team offering the project, something has gone wrong.

Self-service is a crucial part of viral growth potential in today’s world.  Any company that releases a product without at least a semblance of a plan for how to make it self-service at some point down the road is laying the foundations for failure.

Related Articles

Latest Microsoft Office service packs decommits support for Older File Formats: Especially Competitors

This is typical of Microsoft’s “we don’t have to be nice, we’re the phone company” (appologies to Lily Tomlyn) behavior.  To access your old files requires you to delve into the registry.  Microsoft claims these old files are a security risk.  It’s tacky, it’s more lock-in, and it’s more evidence that MSFT is up to their same old tricks.

Posted in amazon, platforms, saas, soa, software development, strategy | 8 Comments »

Eventual Consistency Is Not That Scary

Posted by smoothspan on December 22, 2007

Amazon’s new SimpleDB offering, like many other post-modern databases such as CouchDB, offers massive scaling potential if users will accept eventual consistency.  It feels like a weighty decision.  Cast in the worst possible light, eventual consistency means the database will sometimes return the wrong answer in the interests of allowing it to keep scaling.  Gasp!  What good is a database that returns the wrong answer?  Why bother? 

Often waiting for the write answer (sorry, that inadvertant slip makes for a good pun so I’ll leave it in place) returns a different kind of wrong answer.  Specifically, it may not return an answer at all.  The system may simply appear to hang. 

How does all this come about?  Largely, it’s a function of how fast changes in the database can be propogated to the point they’re available to everyone reading from the database.  For small numbers of users (i.e. we’re not scaling at all), this is easy.  There is one copy of the data sitting in a table structure, we lock up the readers so they can’t access it whenever we change that data, and everyone always gets the right answer.  Of course, solving simple problems is always easy.  It’s solving the hard problems that lands us the big bucks.  So how do we scale that out?  When we reach a point where we are delivering that information from that one single place as fast as it can be delivered, we have no choice but to make more places to deliver from.  There are many different mechanisms for replicating the data and making it all look like one big happy (but sometimes inconsistent) database, let’s look at them.

Once again, this problem may be simpler when cast in a certain way.  The most common and easiest approach is to keep one single structure as the source of truth for writing, and then replicate out changes to many other databases for reading.  All the common database software supports this.  If your single database could handle 100 users consistently, you can imagine if those 100 users were each another database you were replication to, suddenly you could handle 100 * 100 users, or 10,000 users.  Now we’re scaling.  There are schemes to replicate the replicated and so on and so forth.  Note that in this scenario, all writing must still be done on the one single database.  This is okay, because for many problems, perhaps even the majority, readers far outnumber writers.  In fact, this works so well, that we may not even use databases for the replication.  Instead, we might consider a vast in-memory cache.  Software such as memcached does this for us quite nicely, with another order of magnitude performance boost since reading things in memory is dramatically faster than trying to read from disk.

Okay, that’s pretty cool, but is it consistent?  This will depend on how fast you can replicate the data.  If you can get every database and cache in the system up to date between consecutive read requests, you are sure to be consistent.  In fact, it just has to get done between read requests for any piece of data that changed, which is a much lower bar to hurdle.  If consistency is critical, the system may be designed to inhibit reading until changes have propogated.  It take some very clever algorithms to do this well without throwing a spanner into the works and bringing the system to its knees performance-wise. 

Still, we can get pretty far.  Suppose your database can service 100 users with reads and writes and keep it all consistent with appropriate performance.  Let’s say we replace those 100 users with 100 copies of your database to get up to 10,000 users.  It’s now going to take twice as long.  During the first half, we’re copying changes from the Mother Server to all of the children.  The second half we’re serving the answers to the readers requesting them.  Let’s say we can keep the overall time the same just by halving how many are served.  So the Mother Server talks to 50 children.  Now we can scale to 50 * 50 = 2500 users.  Not nearly as good, but still much better than not scaling at all.  We can go 3 layers deep and have Mother serve 33 children each serve 33 grand children to get to 33 * 33 * 33 = 35,937 users.  Not bad, but Google’s founders can still sleep soundly at night.  The reality is we probably can handle a lot more than 100 on our Mother Server.  Perhaps she’s good for 1000.  Now the 3-layered scheme will get us all the way to 333*333*333 = 36 million.  That starts to wake up the sound sleepers, or perhaps makes them restless.  Yet, that also means we’re using over 100,000 servers too: 1 Mothers talks to 333 children who each have 333 grandchildren.  It’s a pretty wasteful scheme.

Well, let’s bring in Eventual Consistency to reduce the waste.  Assume you are a startup CEO.  You are having a great day, because you are reading the wonderful review of your service in Techcrunch.  It seems like the IPO will be just around the corner after all that gushing does it’s inevitable work and millions suddenly find their way to your site.  Just at the peak of your bliss, the CTO walks in and says she has good news and bad news.  The bad news is the site is crashing and angry emails are pouring in.  The other bad news is that to fix it “right”, so that the data stays consistent, she needs your immediate approval to purchase 999 servers so she can set up a replicated scheme that runs 1 Mother Server (which you already own) and 999 children.  No way, you say.  What’s the good news?  With a sly smile, she tells you that if you’re willing to tolerate a little eventual consistency, your site could get by on a lot fewer servers than 999.

Suppose you are willing to have it take twice as long as normal for data to be up to date.  The readers will read just as fast, it’s just that if they’re reading something that changed, it won’t be correct until the second consecutive read or page refresh.  So, our old model that had the system able to handle 1,000 users, and replicated to 999 servers to handle 1 million users used to have to go to 3 tiers (333 * 333 * 333) to get to the next level at 36 million and still serve everything consistently and just as fast.  If we relax the “just as fast”, we can let our Mother Server handle 2,000 at half the speed to get to 2000 * 1000 = 2 million users on 3 tiers with 2000 servers instead of 100,000 servers to get to 36 million. If we run 4x slower on writes, we can get 4000*1000 = 4 million users with 4000 servers.  Eventually things will bog down and thrash, but you can see how tolerating Eventual Consistency can radically reduce your machine requirements in this simple architecture.  BTW, we all run into Eventual Consistency all the time on the web, whether or not we know it.  I use Google Reader to read blogs and WordPress to write this blog.  Any time a page refresh shows you a different result when you didn’t change anything, you may be looking at Eventual Consistency.  Even if you suspect others changed something, Google Reader still comes along frequently and says an error occured and asks me to refresh.  It’s telling me they relied on Eventual Consistency and I have an inconsistent result.

As I mention, these approaches can still be wasteful of servers because of all the data copies that are flowing around.  This leads us to wonder, “What’s the next alternative?”  Instead of just using servers to copy data to other servers, which is a prime source of the waste, we could try to employ what’s called a sharded or Federated architecture.  In this approach, there is only one copy of each piece of data, but we’re dividing up that data so that each server is only responsible for a small subset of it.  Let’s say we have a database keeping up with our inventory for a big shopping site.  It’s really important to have it be consistent so that when people buy, they know the item was in stock.  Hey, it’s a contrived example and we know we can cheat on it, but go with it.  Let’s further suppose we have 100,000 SKU’s, or different kinds of items in our inventory.  We can divide this across 100 servers by letting each server be responsible for 1,000 items.  Then we write some code that acts as the go-between with the servers.  It simply checks the query to see what you are looking for, and sends your query to the correct sub-server.  Voila, you have a sharded architecture that scales very efficiently.  Our replicated model would blow out 99 copies from the 1 server, and it could be about 50 times faster (or handle 50x the users as I use a gross 1/2 time estimate for replication time) on reads, but it was no faster at all on writes.  That wouldn’t work for our inventory problem because writes are so common during the Christmas shopping season. 

Now what are the pitfalls of sharding.  First, there is some assembly required.  Actually, there is a lot of assembly required.  It’s complicated to build such architectures.  Second, it may be very hard to load balance the shards.  Just dividing up the product inventory across 100 servers is not necessarily helpful.  You would want to use a knowledge of access patterns to divide the products so the load on each server is about the same.  If all the popular products wound up on one server, you’d have a scaling disaster.  These balances can change over time and have to be updated, which brings more complexity.  Some say you never stop fiddling with the tuning of a sharded architecture, but at least we don’t have Eventual Consistency.  Hmmm, or do we?  If you can ever get into a situation where there is more than one copy of the data and the one you are accessing is not up to date, Eventual Consistency could rear up as a design choice made by the DB owners.  In that case, they just give you the wrong answer and move on. 

How can this happen in the sharded world?  It’s all about that load balancing.  Suppose our load balancer needs to move some data to a different shard.  Suppose the startup just bought 10 more servers and wants to create 10 additional shards.  While that data is in motion, there are still users on the site.  What do we tell them?  Sometimes companies can shut down the service to keep everything consistent while changes are made.  Certainly that is  one answer, but it may annoy your users greatly.  Another answer is to tolerate Eventual Consistency while things are in motion with a promise of a return to full consistency when the shards are done rebalancing.  Here is a case where the Eventual Consistency didn’t last all that long, so maybe that’s better than the case where it happens a lot. 

Note that consistency is often in the eye of the beholder.  If we’re talking Internet users, ask yourself how much harm there would be if a page refresh delivered a different result.  In may applications, the user may even expect or welcome a different result.  An email program that suddenly shows mail after a refresh is not at all unexpected.  That the user didn’t know the mail was already on the server at the time of the first refresh doesn’t really hurt them.  There are cases where absolute consistency is very important.  Go back to the sharded database example.  It is normal to expect every single product in the inventory to have a unique id that lets us find that part.  Those ids have to be unique and consistent across all of the shards.  It is crucially important that any id changes are up to date before anything else is done or the system can get really corrupted.  So, we may create a mechanism to generate consistent ids across shards.  This adds still more architectural complexity.

There are nightmare scenarios where it becomes impossible to shard efficiently.  I will over simplify to make it easy and not necessarily correct, but I hope you will get the idea.  Suppose you’re dealing with operations that affect many different objects.  The objects are divided into shards naturally when examined individually, but the operations between the objects span many shards.  Perhaps the relationships between shards are incompatible to the extent that there is no way to shard them across machines such that every single operation doesn’t hit many shards instead of a single shard.  Hitting many shards will invalidate the sharding approach.  In times like this, we will again be tempted to opt for Eventual Consistency.  We’ll get to hitting all the shards in our sweet time, and any accesses before that update is finished will just live with inconsistent results.  Such scenarios can arise where there is no obvious good sharding algorithm, or where the relationships between the objects (perhaps its some sort of real time collaborative application where people are bouncing around touching objects unpredictably) are changing much too quickly to rebalance the shards.  One really common case of an operation hitting many shards is queries.  You can’t anticipate all queries such that any of them can be processed within a single shard unless you sharply limit the expressiveness of the query tools and languages.

I hope you come away from this discussion with some new insights:

-  Inconsistency derives from having multiple copies of the data that are not all in sync.

-  We need multiple copies to scale.  This is easiest for reads.  Scaling writes is much harder.

-  We can keep copies consistent at the expense of slowing everything down to wait for consistency.  The savings in relaxing this can be quite large.

-  We can somewhat balance that expense with increasingly complex architecture.  Sharding is more efficient than replication, but gets very complex and can still break down, for example. 

-  It’s still cheaper to allow for Eventual Consistency, and in many applications, the user experience is just as good.

Big web sites realized all this long ago.  That’s why sites like Amazon have systems like SimpleDB and Dynamo that are built from the ground up with Eventual Consistency in mind.  You need to look very carefully at your application to know what’s good or bad, and also understand what the performance envelope is for the Eventual Consistency.  Here are some thoughts from the blogosphere:

Dare Obasanjo

The documentation for the PutAttributes method has the following note

Because Amazon SimpleDB makes multiple copies of your data and uses an eventual consistency update model, an immediate GetAttributes or Query request (read) immediately after a DeleteAttributes or PutAttributes request (write) might not return the updated data.

This may or may not be a problem depending on your application. It may be OK for a del.icio.us style application if it took a few minutes before your tag updates were applied to a bookmark but the same can’t be said for an application like Twitter. What would be useful for developers would be if Amazon gave some more information around the delayed propagation such as average latency during peak and off-peak hours.

Here I think Dare’s example of Twitter suffering from Eventual Consistency is interesting.  In Twitter, we follow mico-blog postings.  What would be the impact of Eventual Consistency?  Of course it depends on the exact nature of the consistency, but lets look at our replicated reader approach.  Recall that in the Eventual Consistency version, we simply tolerate that we allow reads to come in so fast that some of the replicated read servers are not up to date.  However, they are up to date with respect to a certain point in time, just not necessarily the present.  In other words, I could read at 10:00 am and get results on one server that are up to date through 10:00 am and on another results only up to date through 9:59 am.  For Twitter, depending on which server my session is connected to, my feeds may update a little behind the times.  Is that the end of the world?  For Twitter users, if they are engaged in a real time conversation, it means the person with the delayed feed may write something that looks out of sequence to the person with the up to date feed whenever the two are in a back and forth chat.  OTOH, if Twitter degraded to that mode rather than taking longer and longer to accept input or do updates, wouldn’t that be better? 

Erik Onnen

Onnen wrote a post called “Socializing Eventual Consistency” that has two important points.  First, many developers are not used to talking about Eventual Consistency.  The knee jerk reaction is that it’s bad, not the right thing, or an unnecessary compromise for anyone but a huge player like Amazon.  It’s almost like a macho thing.  Onnen lacked the right examples and vocabulary to engage his peers when it was time to decide about it.  Hopefully all the chatter about Amazon’s SimpleDB and other massively scalable sites will get more familiarity flowing around these concepts.  I hope this article also makes it easier.

His other point is that when push comes to shove, most business users will prefer availability over consistency.  I think that is a key point.  It’s also a big takeaway from the next blog:

Werner Vogels

Amazon’s CTO posted to try to make Eventual Consistency and it’s trade offs more clear for all.  He lays a lot of good theoretical groundwork that boils down to explaining that there are tradeoffs and you can’t have it all.  This is similar to the message I’ve tried to portray above.  Eventually, you have to keep multiple copies of the data to scale.  Once that happens, it becomes harder and harder to maintain consistency and still scale.  Vogels provides a full taxonomy of concepts (i.e. Monotonic Write Consistency et al) with which to think about all this and evaluate the trade offs.  He also does a good job pointing out how often even conventional RDMS’s wind up dealing with inconsistency.  Some of the best (and least obvious to many) examples include the idea that your mechanism for backups is often not fully consistent.  The right answer for many systems is to require that writes always work, but that reads are only eventually consistent.

Conclusion

I’ve covered a lot of consistency related tradeoffs involved in database systems for large web architectures.  Rest assured, that unless you are pretty unsuccessful, you will have to deal with this stuff.  Get ahead of the curve and understand for your application what the consistency requirements will be.  Do not start out being unnecessarily consistent.  That’s a premature optimization that can bite you in many ways.  Relaxing consistency as much as possible while still delivering a good user experience can lead to radically better scaling as well as making your life simpler.  Eventual Consistency is nothing to be afraid of.  Rather, it’s a key concept and tactic to be aware of.

Personally, I would seriously look into solutions like Amazon’s Simple DB while I was at it. 

Posted in amazon, data center, enterprise software, grid, platforms, soa, software development | 4 Comments »

What if Twitter Was Built on Amazon’s Cloud?

Posted by smoothspan on December 18, 2007

There was recent bellyaching in the blogosphere again about Twitter being down.  Dave Winer grumbles, “What other basic form of communication goes down for 12 hours at a time?”  There are various comments, and in the end, apparently it was about their moving ISP’s.  Twitter themselves had this to say:

Twitter is humming along now after a late night. Our team worked earnestly into the night and morning on our largest and most complex maintenance project ever. Everything went pretty much according to plan except for one thing: an incorrect switch.

The switch in question caps traffic an unacceptable level. In order to correct this, we’ll need to get some hardware installed. Unfortunately, that means we’re not done with our datacenter move just yet. This type of work can be frustrating but it’s all towards Twitter’s highest goal: reliability.

Such moves are never easy, they always include a hitch of some kind, and the Twitter customer base is hopelessly addicted to the medium so Twitter hears about it whenever the turn the thing off for any period of time.  I look at this and for me it’s just one more reason I wouldn’t want to own a datacenter.

Suppose your service, or maybe even Twitter, was built on Amazon’s Cloud or some other Utility Computing solution.  You don’t own the servers, you are renting them.  If loads go up, you can simply rent more in direct proportion to the loads and on 10 minutes notice.  A recent High Scalability article on scaling Twitter shows they don’t really have all that many servers:

  • 1 MySQL Server (one big 8 core box) and 1 slave. Slave is read only for statistics and reporting.
  • 8 Sun X4100s.
  • 10 boxes, in other words.  Now it comes time to upgrade.  Much pain and frustration.  To do it well, and without interruption, they really need 2 complete copies of their infrastructure.  This way, they can prepare the new version and start cutting users over to it while leaving the old one running.  When everyone is over, the old system can be decommissioned.  For many startups, owning twice as much hardware as they use is just out of the question.  The more successful they become, the more expensive it becomes to entertain such a luxury.  Not so on a utility computing service like Amazon’s.  Purchase the use of twice as many servers for just how long it takes for a successful upgrade and then cut them loose afterward.

    There are detractors to the Amazon approach out there, but do we really think it would make Twitter much less reliable?  What if it made it much more reliable?

    Here’s another thought that runs rampant:  how well would Amazon’s new SimpleDB work for a service like Twitter?  It seems tailormade.  Certainly the notion of a “texty” database with up to 1024 characters per field seems like a fit.  It would be fascinating to see some of the Twitterati put up a Twitter clone on Amazon’s Web Services using SimpleDB just to see how well it works and how quickly it could be put together.  Given the platform and the requirements of the application, it seems like it would not be that hard to do the experiment.  It would certainly make for an interesting test of how well Amazon’s infrastructure really works.

    Posted in Web 2.0, data center, ec2, grid, platforms | 1 Comment »

    To Rule the Clouds Takes Software: Why Amazon SimpleDB is a Huge Next Step

    Posted by smoothspan on December 15, 2007

    One Ring to rule them all, One Ring to find them,
    One Ring to bring them all and in the darkness bind them…

    J. R. R. Tolkien

    There is much interesting cloud-related news in the blogosphere.  Various pundits are sharing a back and forth on the potential for cloud centralization to result in just a very few datacenters and what that might mean.  The really big news is Amazon’s fascinating new addition to their cloud platform of SimpleDB.  Let’s talk about what it all means.

    Sun’s CTO, Greg Papadopoulos, has been predicting that the earth’s compute resources will resolve into about “five hyperscale, pan-global broadband computing services giants” — with Sun, in its version of this future scenario, the primary supplier of hardware and operations software to those giants. The last was channeled via Phil Wainewright, who goes on to ask, “What is it about a computing grid that’s inherently “more centralized” in nature?”  He feels that Nick Carr has missed the mark and swallowed Sun’s line hook, line, and sinker.  For his part, Carr’s only crime was to seize on a good story, because at the same dinner, another Sun executive, Subodh Bapat, was telling Carr that sometime soon a major datacenter failure would have “major national effects.”  The irony is positively juicy with Sun talking out both sides of their proverbial mouths.

    The tradeoff that Carr and Wainewright are worried about is one of economies of scale that favor centralization versus flexibility and resiliency that favors decentralization.  Where they differ is that Carr sees economies of scale winning in a world where IT matters less and less and Wainewright favors the superior architectural possibilities of decentralization.  Is datacenter centralization inexorable?  In a word, yes, but it may not boil down to just 5 data center owners, and it may take quite a while for the forces at work to finish this evolution.  The factors that determine who the eventual winners will be are also quite interesting, and have the potential to change a lot of landscapes that today are relatively isolated.  Let’s consider what the forces of centralization are.

    First, there is a huge migration of software underway to the cloud.  In other words, software that is never installed on your machine or in your company’s datacenter.  It resides in the cloud and comes to you via the browser.  Examples include SaaS on the business side and the vast armada of consumer Web 2.0 products such as Facebook.  No category is safe from this trend, not even traditional bastions as should be clear from the growing crop of Microsoft Office competitors that reside in the cloud.

    Second, this migration leads to centralization.  The mere act of building around a cloud architecture, even if it is a private cloud in your own company’s datacenter, leads to centralization.  After all, software is moving off your desktop and into that datacenter.  When many companies are aggregated into a single datacenter, into a SaaS multi-tenant architecture, for example, further centralization occurs.  When you offer a ubiquitous service to the masses, as is the case with something like Google, the requirements to deliver that can lead to some of the largest datacenter operations in the land. 

    Third, there are the afore-mentioned economies of scale.  Google has grown so large that it now builds its own special-purpose switches and servers to enable it to grow more cheaply.  The big web empires are all built on the notion of scaling out rather than scaling up, and they run on commodity hardware.  Because they have so many servers, automating their care and feeding has been baked into their DNA.  Not so with most corporate datacenters that are just beginning to see the fruits of crude generic technologies like virtualization that seek to be all things to all people.  Virtualization is a great next step for them, but there are bigger steps ahead yet that will further reduce costs.

    Fourth, the ultimate irony is that centralization begats centralization through network effects.  This is the story of the big consumer web properties.  Every person that joins a social network adds more value to the network than the prior person did.  The value of the network grows exponentially.  This connectedness is facilitated most easily in today’s world by centralization.  Vendors that start to get traction increase their network effects in various ways:  Amazon charges to bring data in and out of their cloud, but not to transfer between services within the cloud.

    Lastly, there are green considerations at work.  The biggest costs associated with datacenters these days are around electricity and cooling.  Microsoft is building a data center in Siberia, which is both cold and pretty central to Asia.  Consider this:  given the speed of light over a fiber connection, what is the cost of latency in having a data center somewhere far north (and cold) in Canada like Winnipeg versus far south (and hot) like Austin, Texas?  It’s 1349 miles, which, as the photon travels (186,000 miles per second) is about 7.2 milliseconds.  The world’s fastest hard drive, the nifty Mtron solid state disks I’m now coveting thanks to Engadget and Kevin Burton, can only write a paltry 80K or so bytes in that time:  not even enough for one photo at decent resolution.  So consider a ring of datacenter clusters built in colder regions.  Centralized computing is up north where the cold that computers like is nearly free for the asking: just open a window many days.  Or come closer.  Put it up on a mountain peak.  Immerse it near a hydro dam and get the juice cheaper too.  It doesn’t matter.  Laying fiber is pretty cheap compared to paying the energy bills.

    The next question is trickier: how do these clouds compete?  Eventually, they will become commoditized, and they will compete on price, but we are a long ways from that point.  At least 10 years or more.  Before that can happen, customers have to agree on what the essential feature sets are for this “product”.  I believe this is where software comes into play, and that should be a matter of great concern for the hosting providers of today whose expertise largely does not revolve around software as a way to add value.   As Eric Schmidt said (via Nick Carr) when he started saying Google would enter this market:

    For clouds to reach their potential, they should be nearly as easy to program and navigate as the Web. This, say analysts, should open up growing markets for cloud search and software tools—a natural business for Google and its competitors.

    Some will immediately react with, “Hold it a minute, what about the hardware?  What about the network?”  The best of the cloud architectures will commoditize those considerations away.  In fact, commoditization will start down at the bottom of the technology stack and work its way up.  The first stage of that, BTW, is already almost over.  That was the choice of CPU.  MIPS?  PowerPC?  SPARC?  No, Intel/AMD are the winners.  The others still exist (not all of them!), but they’ve peaked and are on their way down at various terminal velocities.  Their owners need to milk them for profit, but it would be a losing battle to invest there.  Even Macs now carry Intel inside, and Sun now carries the ticker symbol “JAVA”, a not-so-subtle hat tip to the importance of software.

    Hardware boxes are largely a dead issue too.  There is too little opportunity to differentiate for very long and the cpu’s dictate an awful lot of what must be done.  Dell is an assembler and marketer of the lowest cost components delivered just in time lest they devalue in inventory.  Sun still pushes package design, and it may have some relevance to centralization, but this will be commoditized because of centralization.

    Next up will be the operating system.  Again, we’re pretty far down the path of Linux.  Corporations still carry a lot of other things inside their firewalls, but the clouds will be populated almost exclusively with Linux, and we could already see that has happened if we could get reliable statistics on it.  Linux defines the base minimum of what a cloud offering has to provide:  utility computing instances running Linux.  This is exactly what Amazon’s EC2 offers.

    What else does the cloud need?  Reliable archival storage.  Again, Amazon offers this with S3.  Cloud consumers are adopting it in droves because it makes sense.  It’s a better deal than a raw disk array because it adds value versus that disk array for archival storage.  The value is in the form of resiliency and backup.  Put the data on S3 and forget about those problems.  This begins the commoditization of storage.  Is it any wonder that EMC bought VMWare and that a software offering is now most of their market cap?  Hardware guys, put on your thinking caps, this will get much worse.  What software assets do you bring to the table.

    3Tera is a service I’ve talked about before that has a very similar offering available from multiple hosting partners of theirs.  They create a virtual SAN that you can backup and mirror at the click of a mouse.  They let you configure Linux instances to your heart’s content.  Others will follow.  IBM’s Blue Cloud offers much the same.  This collection is today’s blueprint for what the Cloud offers in terms of a platform.

    But, this platform is a moving target, and it will keep moving up the stack.  Amazon just announced another rung up with SimpleDB.  For most software that goes into the Cloud, once you have an OS and a file system, the next thing you want to see is a database.  Certainly when I attended Amazon Startup Project, the availability of a robust database solution was the number one thing folks wanted to see Amazon bring out.  The GM of EC2 promised me that this was on the way and that there would be several announcements before the end of the year.  First we saw the availability of EC2 instances that had more memory, disk, and cpu, so that they’d make better database hosts.  SimpleDB is much more ambitious.  It’s a replacement for the conventional database as embodied in products like mySQL and Oracle that was designed from the ground up to live in a cloud computing world.  At one stroke it solves a lot of very interesting problems that used to challenge would-be EC2 users around the database.

    Along the lines of my list of factors that drive data center centralization, Phil Windley says the economics are impossible to stop.  Scoble asks whether MySQL, Oracle, and SQL Server are dead:

    Since Amazon brought out its S3 storage service, I’ve seen many many startups give up data centers altogether.

    Tell me why the same thing won’t happen here.

    There is no doubt in my mind that all startups will give up having datacenters altogether before this ends.  However, before we get too head up in assuming that SimpleDB gives us that opportunity, let’s drop back and consider what it’s limitations are:

    - It is similar to a relational database, but there are significant differences.  Code will have to be reworked to run there, even if it doesn’t run afoul of the other issues.

    - Latency is a problem when your database is in another datacenter from the rest of your code.  Don MacAskill brings this one up, and all I can say is that this is another network effect that leads to more centralization.  If you like Simple DB, it’s another reason to bring all of your code inside Amazon’s cloud.

    - All fields are strings, and they are limited to 1024 characters.  Savvy developers can use the 1024 characters to find unlimited size files on S3, as well as other methods like combining fields to get around this limit.  Mind you, a lot can be done with that, but it is again a difference from traditional RDMS systems and it means more work for developers that must overcome the limitation.

    - There are no joins, if you want them (and many proponents of hugely scalable sites view joins as evil), you have to roll your own. 

    - Transactions and consistency are also absent.  Reads are not guaranteed to be fully up to date with writes.

    - There is no indexing and a whole host of other trappings that database afficionados have gotten comfortable with.

    Mind you, serious web software is created within these limitations including some at Amazon itself.  In exchange for living with them, you get massively scalable database access at good performance and very cheaply.  And, as Techcrunch says, you may be able to get rid of one of the highest cost IT operations jobs around, database administration, and your costs are even lower.  Remember my analysis that shows SaaS vendors need to achieve 16:1 operations cost advantages over conventional software and you can see this is a big step in that direction already.

    There is no doubt that cloud computing will be massively disruptive, and that Amazon are well on their way in the race to plant their flag at the top of the mountain.  The pace of progress for Amazon Web Services has been blistering this year, and much more hype free than what we’ve gotten from the likes of Google and Facebook when it comes to platform speak.  It’s almost odd that we haven’t heard more from these other players, and especially from the likes of Google.  GigaOm says that Simple DB completes the Amazon Web Services Trifecta.  They go on to say that Amazon’s announcements have the feel of a well thought out long term strategy, while Google’s make it sound like the ad hoc grab bag of tools.  I think that’s true, and perhaps reflective of Google’s culture, which is hugely decentralized to the point of giving developers 20% free time to work on projects of their choosing.  The problem is that such a culture can more easily give us a grab bag of applications, as Google has, than it can provide a well-designed platform, as Amazon has.  Or, as Mathew Ingram puts it, while everyone else was talking about it, Amazon went ahead and did it.

    I’ve talked to a dozen or so startups that are eagerly working with the Amazon Web Services and having great success, as well as some frustrations.  They require rethinking the old ways.  Integrity issues are particularly different in this brave new world, as are issues of latency.  That matters to how a lot of folks think about their applications.  Because of the learning curve, I don’t plan to go out and short Oracle immediately, but the sand has started running in the hourglass.  There will be more layers added to the cloud, and over time it will become harder and harder to ignore.  There will be economic advantage to those who embrace the new ways, and penalties for those who don’t.  This is a bet-your-business drama that’s unfolding, make no mistake.  At the very least, you need to get yourself educated about what these kinds of services offer and what they mean for application architecture.

    Business located low in the stack I’ve mentioned will be hit hard if they don’t have a strategy to embrace and win a piece of the cloud computing New Deal.  We’re talking hardware manufacturers like Sun, Dell, IBM, and HP.  Software infrastructure comes next.  Applications that depend on low cost delivery, aka SaaS, are also very much in the crosshairs, although probably at a slightly later date.

    Welcome to the brave new world of utility cloud computing.  Long live the server, the server is dead!

    Related Articles

    Amazon Raises the Cloud Platform Bar Again With DevPay

    Coté’s Excellent Description of the Microsoft Web Rift :  Nice post on cloud computing at Microsoft

    Posted in Web 2.0, amazon, data center, ec2, grid, platforms, saas | 10 Comments »

    Scalability is a Requirement for Startups

    Posted by smoothspan on December 6, 2007

    Dharmesh Shah wonders whether startups should ignore scalability:

    You’re worrying about scalability too early. Don’t blow your limited resources on preparing for success. Instead, spend them on increasing the chances that you’ll actually succeed.

    It’s an interesting question: should startups worry about scalability, or does that get in the way of finding a proper product/market fit?  If you’ve read my blog much you’ll know that I view achieving that product/market fit as the highest priority for a startup, and I’m not alone, Marc Andreesen says it too.  I think this is so important that I have advocated some relatively radical architectural ramifications to help facilitate the flexibility of a product so it can evolve towards that ideal even faster.

    But where does scalability fit in?  Can you achieve that product/market fit without it?  For most startups, I think it is either difficult to verify a true product/market fit without it, or worse, you may achieve it only to immediately fall to earth a victim of poor user experience.  There are certainly plenty of examples of companies that started out great, seemed to have that product/market fit, but got into persistent hot water because they couldn’t scale out a good user experience when their site began to take off.  Fred Wilson writes recently about his love/hate relationship with Technorati, which has been a good example of this.

    Here is another question, “How much success do  you need to verify product/market fit?”  Signing up a few customers to a beta, or even having a large beta is not really enough in my opinion.  It’s pretty easy to get a ton of people to try something that sounds sexy and is promoted well.  The question is whether that really takes well enough.  Marc Andreesen’s Ning is a good example.  When they launched their original product it required a fair amount of custom programming to create a custom Social Network.  They had 30,000 social networks created even so, but the service wasn’t taking off.  Michael Arrington was calling it R.I.P.  Then they released a version that eliminated the need for programming and suddenly the product/market fit was there and it took off like a rocket, crossing 100,000 social networks in record time.  Clearly Ning had to deal with scalability before they could learn much about their product/market fit.

    Google is another great example of this.  They had to scale from day one because of the problem they were solving.  Om Malik says their infrastructure and ability to scale is actually their strategic advantage.  Certainly the nature of the problem Google wanted to solve required scalability from day one.  This is how Aloof Schipperke wants to view the question when he says, “Scalability is a requirement, not an optimization.”  It’s a bit of a double entendre.  One could say it is a requirement that all startups deal with it, or one could say startups need to evaluate whether scalability is a requirement in their domain.  I’m in that latter camp.  Figure out what success really looks like.  When do you know you have product/market fit?  Be conservative.  What are the requirements to get there?  Aloof lumps scalability in with other “ilities”.  Can your startup reach product/market fit without security, for example?  The answers may surprise you if you’re really honest about it.

    Chances are, you may have to do more to be sure about product/market fit than you are comfortable with in release 1.0.  You’ll need a phased plan for how to get there.  Lest you use this as an excuse to ignore scalability until the last minute, keep in mind that these phased plans should have short milestones.  Quarterly or six month iterations at most.  Scaling a really poorly architected application can amount to a painful rewrite.  So do a phasing plan for scaling.  What are the big ticket items you’ll need to enable early so that scaling later is not too hard?  There are a few well-known touchpoints that can make scalability easier.  I’m not going to go over all of them, you know what I’m talking about:  statelessness, RESTful web services, and beware the database.  If you don’t know about these things, get some people on your team who do!  It’s not hard to start out with a plan in mind about your eventual scalablity and just make sure that along the way you don’t inadvertently shoot yourself in the foot.  It usually boils down to securing the two ends of the puzzle with good scalability:

    - How will the client side web servers scale?

    - How will the database back end scale?

    Make a plan for what it will look like when it’s done, and put phased milestones in place to get there over time. 

    Here’s another key issue.  Dharmesh’s original question assumes scalability and user/experience compete for scarce resources.  Ed Sim somewhat follows this path too when he writes that it’s hard to sell Scalability.  Aren’t we talking about the tradeoffs between UI/Features and Infrastructure (web or DB)?  Are the same engineers really doing both things?  It seems to me a lot more common to have a “front end” or application group and a “back end” or infrastructure group, even if “group” is a bit grandiose for a couple of people.  Take the opportunity to map out how the modules produced by these two groups will communicate.  Make that communication architecturally clean so the groups are decoupled.  Make the communication work the way it will when you build out scalability, but then don’t build it out at first.  This will enable the infrastructure group’s agenda to decouple from the user experience guys. 

    BTW, if you’re thinking the true competition between the two is you want to hire all user experience people with your capital and no infrastructure, that just sounds like a bad idea to me.  It’s hard to deliver good user experience if your infrastructure is lousy, buggy, and doesn’t perform.  There are ample studies that show the speed with which your application serves up pages is a big contributor to user experience as well. 

    I’ve gone down this path before of having essentially two small teams and making sure there was clean communication between their code from the start.  My company PriceRadar was lucky enough to land a partnership with AskJeeves early on.  Part of the deal was we had to pass a load test that showed we could handle 10,000 simultaneous users hammering our application.  At the time, most of my experience and developers were from the Microsoft world, so we were .NET all the way.  I remember meeting with the advisory board for a company called iSharp.  It was an all-star cast of web application CTO’s and VP’s of Engineering.  We went around the table to hear what everyone was doing.  I was the only Microsoft guy in the room, and the Unix crowd just laughed when I told them we had to pass this big load test.  AskJeeve’s CTO was there as well as the fellow in charge of AOL Instant Messenger and about 10 others.  They flat said it was impossible on Unix.  In less than a month we had it all working with a distributed grid architecture.  The front end guys were never even involved and changed little or no code.  The back end guys didn’t sleep much, but they emerged triumphant.  And the entire team was about 10 developers, per my small team mentality.

    Yes Virginia, you should worry about your scalability, but it need not be all consuming.  You can handle it.

    Posted in Web 2.0, data center, grid, multicore, platforms, saas, strategy | 2 Comments »

    A Kindle User After My Own Heart

    Posted by smoothspan on November 29, 2007

    Go read Josh Taylor’s post on how he took a Kindle to the Carribean and why he has fallen into “deep like” for the device after that.  Being able to travel without a suitcase full of books was the first lightbulb that lit for me when I heard about Kindle.  The truth is, I’d seen an eBook a