SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the ‘multicore’ Category

Single Tenant, Multitenant, Private and Public Clouds: Oh My!

Posted by Bob Warfield on August 27, 2010

My head is starting to hurt with all the back and forth among my Enterprise Irregulars buddies about the relationships between the complex concepts of Multitenancy, Private, and Public Clouds.  A set of disjoint conversations and posts came together like the whirlpool in the bottom of a tub when it drains.  I was busy with other things and didn’t get a chance to really respond until I was well and truly sucked into the vortex.  Apologies for the long post, but so many wonderful cans of worms finally got opened that I just have to try to deal with a few of them.  That’s why I love these Irregulars!

To start, let me rehash some of the many memes that had me preparing to respond:

Josh Greenbaum’s assertion that Multitenancy is a Vendor, not a Customer Issue.  This post includes some choice observations like:

While the benefits that multi-tenancy can provide are manifold for the vendor, these rationales don’t hold water on the user side.

That is not to say that customers can’t benefit from multi-tenancy. They can, but the effects of multi-tenancy for users are side-benefits, subordinate to the vendors’ benefits. This means, IMO, that a customer that looks at multi-tenancy as a key criteria for acquiring a new piece of functionality is basing their decision on factors that are not directly relevant to their TCO, all other factors being equal.

and:

Multi-tenancy promises to age gracelessly as this market matures.

Not to mention:

Most of the main benefits of multi-tenancy – every customer is on the same version and is updated simultaneously, in particular – are vendor benefits that don’t intrinsically benefit customers directly.

The implication being that someone somewhere will provide an alternate technology very soon that works just as good or better than multitenancy.  Wow.  Lots to disagree with there.  My ears are still ringing from the sound of the steel gauntlet that was thrown down.

-  Phil Wainewright took a little of the edge of my ire with his response post to Josh, “Single Tenancy, the DEC Rainbow of SaaS.”  Basically, Phil says that any would-be SaaS vendor trying to create an offering without multitenancy is doomed as the DEC Rainbow was.  They have some that sort of walks and quacks like a SaaS offering but that can’t really deliver the goods.

-  Well of course Josh had to respond with a post that ends with:

I think the pricing and services pressure of the multi-tenant vendors will force single-tenant vendors to make their offerings as compatible as possible. But as long as they are compatible with the promises of multi-tenancy, they don’t need to actually be multi-tenant to compete in the market.

That’s kind of like saying, “I’m right so long as nothing happens to make me wrong.”  Where are the facts that show this counter case is anything beyond imagination?  Who has built a SaaS application that does not include multitenancy but that delivers all the benefits?

Meanwhile back at the ranch (we EI’s need a colorful name for our private community where the feathers really start to fly as we chew the bones of some good debates), still more fascinating points and counterpoints were being made as the topic of public vs private clouds came up (paraphrasing):

-  Is there any value in private clouds?

-  Do public clouds result in less lock-in than private clouds?

-  Are private clouds and single tenant (sic) SaaS apps just Old School vendors attempts to hang on while the New Era dawns?  Attempts that will ultimately prove terribly flawed?

-  Can the economics of private clouds ever compete with public?

-  BTW, eBay now uses Amazon for “burst” loads and purchases servers for a few hours at a time on their peak periods.  Cool!

-  Companies like Eucalyptus and Nimbula are trying to make Private Clouds that are completely fungible with Public Clouds.  If you  in private cloud frameworks like these means you have
to believe companies are going to be running / owning their own servers for a long time to come even if the public cloud guys take over a number of compute workloads.  The Nimbula guys built EC2 and they’re no dummies, so if they believe in this, there must be something to it.

-  There are two kinds of clouds – real and virtual.  Real clouds are multi-tenant. Virtual clouds are not. Virtualization is an amazing technology but it can’t compete with bottoms up multi-tenant platforms and apps.

Stop!  Let me off this merry go-round and let’s talk.

What It Is and Why Multitenancy Matters

Sorry Josh, but Multitenancy isn’t marketing like Intel Inside (BTW, do you notice Intel wound up everywhere anyway?  That wasn’t marketing either), and it matters to more than just vendors.  Why?

Push aside all of the partisan definitions of multitenancy (all your customers go in the same table or not).   Let’s look at the fundamental difference between virtualization and multitenancy, since these two seem to be fighting it out.

Virtualization takes multiple copies of your entire software stack and lets them coexist on the same machine.  Whereas before you had one OS, one DB, and one copy of your app, now you may have 10 of each.  Each of the 10 may be a different version entirely.  Each may be a different customer entirely, as they share a machine.  For each of them, life is just like they had their own dedicated server.  Cool.  No wonder VMWare is so successful.  That’s a handy thing to do.

Multitenancy is a little different.  Instead of 10 copies of the OS, 10 copies of the DB, and 10 copies of the app, it has 1 OS, 1 DB, and 1 app on the server.  But, through judicious modifications to the app, it allows those 10 customers to all peacefully coexist within the app just as though they had it entirely to themselves.

Can you see the pros and cons of each?  Let’s start with cost.  Every SaaS vendor that has multitenancy crows about this, because its true.  Don’t believe me?  Plug in your VM software, go install Oracle 10 times across 10 different virtual machines.  Now add up how much disk space that uses, how much RAM it uses when all 10 are running, and so on.  This is before you’ve put a single byte of information into Oracle or even started up an app.  Compare that to having installed 1 copy of Oracle on a machine, but not putting any data into it.  Dang!  That VM has used up a heck of a lot of resources before I even get started!

If you don’t think that the overhead of 10 copies of the stack has an impact on TCO, you either have in mind a very interesting application + customer combination (some do exist, and I have written about them), or you just don’t understand.  10x the hardware to handle the “before you put in data” requirements are not cheap.  Whatever overhead is involved in making that more cumbersome to automate is not cheap.  Heck, 10x more Oracle licenses is very not cheap.  I know SaaS companies who complain their single biggest ops cost is their Oracle licenses. 

However, if all works well, that’s a fixed cost to have all those copies, and you can start adding data by customers to each virtual Oracle, and things will be okay from that point on.  But, take my word for it, there is no free lunch.  The VM world will be slower and less nimble to share resources between the different Virtual Machines than a Multitenant App can be.  The reason is that by the time it knows it even needs to share, it is too late.  Shifting things around to take resource from one VM and give it to another takes time.  By contrast, the Multitenant App knows what is going on inside the App because it is the App.  It can even anticipate needs (e.g. that customer is in UK and they’re going to wake up x hours before my customers in the US, so I will put them on the same machine because they mostly use the machine at different times).

So, no, there is not some magic technology that will make multitenant obsolete.  There may be some new marketing label on some technology that makes multitenancy automatic and implicit, but if it does what I describe, it is multitenant.  It will age gracefully for a long time to come despite the indignities that petty competition and marketing labels will bring to bear on it.

What’s the Relationship of Clouds and Multitenancy?

Must Real Clouds be Multitenant?

Sorry, but Real Clouds are not Multitenant because they’re based on Virtualization not Multitenancy in any sense such as I just defined.  In fact, EC2 doesn’t share a core with multiple virtual machines because it can’t.  If one of the VM’s started sucking up all the cycles, the other would suffer terrible performance and the hypervisors don’t really have a way to deal with that.  Imagine having to shut down one of the virtual machines and move it onto other hardware to load balance.  That’s not a simple or fast operation.  Multi-tasking operating systems expect a context switch to be as fast as possible, and that’s what we’re talking about.  That’s part of what I mean by the VM solution being less nimble.  So instead, cores get allocated to a particular VM.  That doesn’t mean a server can’t have multiple tenants, just that at the granularity of a core, things have to be kept clean and not dynamically moved around. 

Note to rocket scientists and entrepreneurs out there–if you could create a new hardware architecture that was really fast at the Virtual Machine load balancing, you would have a winner.  So far, there is no good hardware architecture to facilitate a tenant swap inside a core at a seamless enough granularity to allow the sharing.  In the Multicore Era, this would be the Killer Architecture for Cloud Computing.  If you get all the right patents, you’ll be rich and Intel will be sad.  OTOH, if Intel and VMWare got their heads together and figured it out, it would be like ole Jack Burton said, “You can go off and rule the universe from beyond the grave.”

But, it isn’t quite so black and white.  While EC2 is not multitenant at the core level, it sort of is at the server level as we discussed.  And, services like S3 are multitenant through and through.  Should we cut them some slack?  In a word, “No.”  Even though an awful lot of the overall stack cost (network, cpu, and storage) is pretty well multitenant, I still wind up installing those 10 copies of Oracle and I still have the same economic disadvantage as the VM scenario.  Multitenancy is an Application characteristic, or at the very least, a deep platform characteristic.  If I build my app on Force.com, it is automatically multitenant.  If I build it on Amazon Web Services, it is not automatic.

But isn’t there Any Multitenant-like Advantage to the Cloud?  And how do Public and Private Compare?

Yes, there are tons of benefits to the Cloud, and through an understanding and definition of them, we will tease out the relationship of Public and Private Clouds.  Let me explain…

There are two primary advantages to the Cloud:  it is a Software Service and it is Elastic.  If you don’t have those advantages, you don’t have a Cloud.  Let’s drill down.

The Cloud is a Software Service, first and foremost.  I can spin up and control a server entirely through a set of API’s.  I never have to go into a Data Center cage.  I never have to ask someone at the Data Center to go into the Cage (though that would be a Service, just not a Software Service, an important distinction).  This is powerful for basically the same reasons that SaaS is powerful versus doing it yourself with On-prem software.  Think Cloud = SaaS and Data Center = On Prem and extrapolate and you’ll have it. 

Since Cloud is a standardized service, we expect all the same benefits as SaaS:

- They know their service better than I do since it is their whole business.  So I should expect they will run it better and more efficiently.

- Upgrades to that service are transparent and painless (try that on your own data center, buddy!).

- When one customer has a problem, the Service knows and often fixes it before the others even know it exists.  Yes Josh, there is value in SaaS running everyone on the same release.  I surveyed Tech Support managers one time and asked them one simple question:  How many open problems in your trouble ticketing system are fixed in the current release?  The answers were astounding–40 to 80%.  Imagine a world where your customers see 40 to 80% fewer problems.  It’s a good thing!

- That service has economic buying power that you don’t have because it is aggregated across many customers.  They can get better deals on their hardware and order so much of it that the world will build it precisely to their specs.  They can get stuff you can’t, and they can invest in R&D you can’t.  Again, because it is aggregated across many customers.  A Startup running in the Amazon Cloud can have multipe redundant data centers on multiple continents.  Most SaaS companies don’t get to building multiple data centers until they are way past having gone public. 

-  Because it is a Software Service, you can invest your Ops time in automation, rather than in crawling around Data Center cages.  You don’t need to hire anyone who knows how to hot swap a disk or take a backup.  You need peeps who know how to write automation scripts.  Those scripts are a leveragable asset that will permanently lower your costs in a dramatic way.  You have reallocated your costs from basic Data Center grubbing around (where does this patch cable go, Bruce?), an expense, to actually building an asset.

The list goes on.

The second benefit is Elasticity.  It’s another form of aggregation benefit.  They have spare capacity because everyone doesn’t use all the hardware all the time.  Whatever % isn’t utilized, it is a large amount of hardware, because it is aggregated.  It’s more than you can afford to have sitting around idle in your own data center.  Because of that, they don’t have to sell it to you in perpetuity.  You can rent it as you need it, just like eBay does for bursting.  There are tons of new operational strategies that are suddenly available to you by taking advantage of Elasticity.

Let me give you just one.  For SaaS companies, it is really easy to do Beta Tests.  You don’t have to buy 2x the hardware in perpetuity.  You just need to rent it for the duration of the Beta Test and every single customer can access their instance with their data to their heart’s content.  Trust me, they will like that.

What about Public Versus Private Clouds?

Hang on, we’re almost there, and it seems like it has been a worthwhile journey.

Start with, “What’s a Private Cloud?”  Let’s take all the technology of a Public Cloud (heck, the Nimbulla guys built EC2, so they know how to do this), and create a Private Cloud.  The Private Cloud is one restricted to a single customer.  It’d be kind of like taking a copy of Salesforce.com’s software, and installing it at Citibank for their private use.  Multitenant with only one tenant.  Do you hear the sound of one hand clapping yet?  Yep, it hurts my head too, just thinking about it.  But we must.

Pawing through the various advantages we’ve discussed for the Cloud, there are still some that accrue to a Cloud of One Customer:

-  It is still a Software Service that we can control via API’s, so we can invest in Ops Automation.  In a sense, you can spin up a new Virtual Data Center (I like that word better than Private Cloud, because it’s closer to the truth) on 10 minutes notice.  No waiting for servers to be shipped.  No uncrating and testing.  No shoving into racks and connecting cables.  Push a button, get a Data Center.

-  You get the buying power advantages of the Cloud Vendor if they supply your Private Cloud, though not if you buy software and build  your Private Cloud.  Hmmm, wonder what terminology is needed to make that distinction?  Forrester says it’s either a Private Cloud (company owns their own Cloud) or a Hosted Virtual Private Cloud.  Cumbersome.

But, and this is a huge one, the granularity is huge, and there is way less Elasticity.  Sure, you can spin up a Data Center, but depending on its size, it’s a much bigger on/off switch.  You likely will have to commit to buy more capacity for a longer time at a bigger price in order for the Cloud Provider to recoup giving you so much more control.  They have to clear other customers away from a larger security zone before you can occupy it, instead of letting your VM’s commingle with other VM’s on the same box.  You may lose the more multitenant-like advantages of the storage cluster and the network infrastructure (remember, only EC2 was stuck being pure virtual). 

What Does it All Mean, and What Should My Company Do?

Did you see Forrester’s conclusion that most companies are not yet ready to embrace the Cloud and won’t be for a long time?

I love the way Big Organizations think about things (not!).  Since their goal is preservation of wealth and status, it’s all about risk mitigation whether that is risk to the org or to the individual career.  A common strategy is to take some revolutionary thing (like SaaS, Multitenancy, or the Cloud), and break it down into costs and benefits.  Further, there needs to be a phased modular approach that over time, captures all the benefits with as little cost as possible.  And each phase has to have a defined completion so we can stop, evaluate whether we succeeded, celebrate the success, punish those who didn’t play the politics well enough, check in with stakeholders, and sing that Big Company Round of Kumbaya.  Yay!

In this case, we have a 5 year plan for CIO’s.  Do you remember anything else, maybe from the Cold War, that used to work on 5 year plans?  Never mind.

It asserts that before you are ready for the Cloud, you have to cross some of those modular hurdles:

A company will need a standardized operating procedure, fully-automated deployment and management (to avoid human error) and self-service access for developers. It will also need each of its business divisions – finance, HR, engineering, etc – to be sharing the same infrastructure.  In fact, there are four evolutionary stages that it takes to get there, starting with an acclimation stage where users are getting used to and comfortable with online apps, working to convince leaders of the various business divisions to be guinea pigs. Beyond that, there’s the rollout itself and then the optimization to fine-tune it.

Holy CYA, Batman!  Do you think eBay spent 5 years figuring out whether it could benefit from bursting to the Cloud before it just did it?

There’s a part of me that says if your IT org is so behind the times it needs 5 years just to understand it all, then you should quit doing anything on-premise and get it all into the hands of SaaS vendors.  They’re already so far beyond you that they must have a huge advantage.  There is a another part that says, “Gee guys, you don’t have to be able to build an automobile factory as good as Toyota to be able to drive a car.”

But then sanity and Political Correctness prevail, I come back down to Earth, and I realize we are ready to summarize.  There are 4 levels of Cloud Maturity (Hey, I know the Big Co IT Guys are feeling more comfortable already, they can deal with a Capability and Maturity Model, right?):

Level 1:  Dabbling.  You are using some Virtualization or Cloud technology a little bit at your org in order to learn.  You now know what a Machine Image is, and you have at least seen a server that can run them and swapped a few in and out so that you experience the pleasures of doing amazing things without crawling around the Data Center Cage.

Level 2:  Private Cloud.  You were impressed enough by Level 1 that you want the benefits of Cloud Technology for as much of your operation as you can as fast as you can get it.  But, you are not yet ready to relinquish much of any control.  For Early Level 2, you may very well insist on a Private Cloud you own entirely.  Later stage Level 2 and you will seek a Hosted Virtual Private Cloud.

Level 3:  Public Cloud.  This has been cool, but you are ready to embrace Elasticity.  You tripped into it with a little bit of Bursting like eBay, but you are gradually realizing that the latency between your Data Center and the Cloud is really painful.  To fix that, you went to a Hosted Virtual Private Cloud.  Now that your data is in that Cloud and Bursting works well, you are realizing that the data is already stepping outside your Private Cloud pretty often anyway.  And you’ve had to come to terms with it.  So why not go the rest of the way and pick up some Elasticity?

Level 4:  SaaS Multitenant.  Eventually, you conclude that you’re still micromanaging your software too much and it isn’t adding any value unique to your organization.  Plus, most of the software you can buy and run in your Public Cloud world is pretty darned antiquated anyway.  It hasn’t been rearchitected since the late 80′s and early 90′s.  Not really.  What would an app look like if it was built from the ground up to live in the Cloud, to connect Customers the way the Internet has been going, to be Social, to do all the rest?  Welcome to SaaS Multitenant.  Now you can finally get completely out of Software Operations and start delivering value.

BTW, you don’t have to take the levels one at a time.  It will cost you a lot more and be a lot more painful if you do.  That’s my problem with the Forrester analysis.  Pick the level that is as far along as you can possibly stomach, add one to that, and go.  Ironically, not only is it cheaper to go directly to the end game, but each level is cheaper for you on a wide scale usage basis all by itself.  In other words, it’s cheaper for you to do Public Cloud than Private Cloud.  And it’s WAY cheaper to go Public Cloud than to try Private Cloud for a time and then go Public Cloud.  Switching to a SaaS Multitenant app is cheaper still.

Welcome to crazy world of learning how to work and play well together when efficiently sharing your computing resources with friends and strangers!

Posted in amazon, cloud, data center, ec2, enterprise software, grid, multicore, platforms, saas, service | 14 Comments »

Salesforces Switches to Dell/Linus. What’s Next, MySQL Over Oracle?

Posted by Bob Warfield on July 15, 2008

Salesforce will be unplugging the last of their Sun Solaris servers from their SaaS operations this week, according to TechCrunchIT.  That’s quite a big change for Salesforce, and a bit of a PR blow for Sun.  It reflects some important operational realities that the rest of the industry and corporate IT should be watching carefully.

First, vertical scaling is hard in the multicore crisis era.  When cpus no longer get twice as fast with every Moore Cycle, scaling is harder to come by and hardware gets commoditized.  Future scaling has to come from software architecture changes.  Horizontal Scaling, in other words, not Vertical Scaling.  The multicore crisis brings us to an era of many small computers rather than fewer more powerful computers, and its up to the software guys to figure that out.

Second, for a SaaS company, the cost of service delivery is an absolutely critical factor.  Once you have software that runs well and scales horizontally on cheap commodity hardware, you’ve created a huge cost advantage for yourself.  As we speak, the cost to deliver service for the various public SaaS companies is all over the map, but Salesforce has always had one of the lowest if not the lowest cost on the map.  This allows them to either show greater profitability or reinvest the savings in faster growth.

This brings me to my other point.  How long can it be before they investigate swapping out Oracle for MySQL?  As the TechCrunchIT article mentions, Salesforce started with Oracle but there’s been no mention recently about the current status.  It would be a logical further development in reducing costs if they had chosen to eliminate or were working on eliminating the cost of Oracle licenses.  For many SaaS vendors, this is a huge piece of their Cost of Services. 

Can you build industrial grade software without Oracle?  In a word, yes.  Many highly scalable web sites have done so and lived to tell the tale.  It’s more work, but once you’ve done the work the payoff can mean huge savings.  At a prior employer we were actually quite surprised to test several Open Source DB’s and learn their performance was actually not that far off of Oracle’s.  My current employer, Helpstream, has built everything on an Open Source stack and the benefits have been enormous.

How long will it be before we’re hearing that Salesforce has dropped Oracle too?  What’s your company doing to leverage commodity hardware and Open Source databases?

Related Articles

Fellow Enterprise Irregulars Dennis Howlett, Vinnie Mirchandani, and Thomas Foydel (who first raised the point about brand comfort) make some excellent points on this subject, particularly the issue of switching off Oracle.

One key point that I have personally heard before is that SaaS vendors like to offer customers the comfort of knowing the solution runs on Oracle versus Open Source.  It’s a more conservative stance that plays well in the Enterprise.  Brand matters.  Sun is working hard on the MySQL brand, but they certainly haven’t caught Oracle yet.  As Vinnie puts it, the question is, “whether SaaS vendors benefit from at least the perception that Oracle is more “bullet proof” or do SaaS customers just want results (high uptime, performance etc) and don’t really care what the underlying  technology is – especially if the economics are more attractive?”

Dennis adds some other unique thinking.  If Salesforce wants to be acquired by Oracle then it should stick with the Oracle stack.  The only thing I’d add there is it’s pretty easy to switch from MySQL back to Oracle and much harder to do the reverse.  I think they’d be fine in an acquisition if they were simple careful not to emphasize a switch to MySQL.  They did work reasonably hard to keep the Dell switch quiet, so they may already be on that path.

The second thought Dennis has is that licensing is very complex on these things.  Here again I have to agree.  Just dealing with the legals and other aspect of a relationship with a large Enterprise vendor/technology partner is expensive for a startup.

In both cases, Vinnie’s post title fits:  Does SaaS need Oracle more than Oracle needs SaaS?

Good insights, guys!

Posted in multicore, platforms, saas | 8 Comments »

Are Custom Chips An Answer to the Multicore Crisis?

Posted by Bob Warfield on March 20, 2008

Stacey Higginbotham wrote an interesting piece that made me wonder.  Apparently there are lots of less than cutting edge chip fabs out there that people want to keep running.  It got me to wonder.  In an age where smaller features on chips translate only to more transistors, but not neccesarily faster transistors, is the ability to have more transistors as economically valuable?  Particularly if we can’t put them to use?  The problem is Moore’s Law these days translates to more cores, not faster clock speeds.  Nearly all the software out there can’t make use of the extra cores yet, and there is a lot of discussion about how the world may have to completely retool software to make use of lots of cores.  Meanwhile, Intel sails on with 6 core chips on the horizon, more to come, and not a very good idea what to do with them.

What if what separates the latest “good” fabs from older “obsolete” fabs is not longer that valuable?  Maybe the value is less from ever smaller feature sizes and more from new chip types?  That would shift the economics from favoring giants like Intel capable of building ever more expensive fabs to those with the IP to design more new chips on fab processes that are “good enough” for lots of interesting applications.  As big as Intel is, maybe driving faster CPU’s is not the most lucrative pasttime at this point in the technology curve.

It used to be that a general purpose CPU that constantly doubled in speed every 18 months (Moore’s Law) was the place to invest.  It was a truly general purpose device capable of running all sorts of software.  There have been special purpose chips built too for maximum performance in various areas:  graphics coprocessors and various network chips are good examples.  Suppose we can make special purpose chips for almost any purpose.  I once talked to a startup that had built a hardware search accelerator for example.  They vanished into the Government spy world never to be heard from again, but it is intriguing. 

If you could create a dirt cheap special purpose chip, what would it do?  What would be the market for it?

Before the dawn of RISC there was much interest in hardware accelerators for specific languages.  Lisp machines were one such.  I remember reading a quote from Alan Kay that modern machines don’t run dynamic languages like Lisp and Smalltalk as much faster than the old machines like the Dorado as their newfound clockspeeds would imply they should.  He hinted that radically different hardware architectures could greatly benefit such languages.  I couldn’t find more on that than this quote that says the new generation is only about 50x faster than the old machines, which is a pretty poor showing indeed.

Today we have a renaissance in the interpreted and scripting languages that are descendants of languages like Lisp and Smalltalk.  Languages like Ruby on Rails, Python, and PHP are very mainstream and might benefit.  One wonders whether even Java might benefit.  Would a chip optimized to run the virtual machines of one of these languages without regard to compatibility with the old x86 world be able to run them a lot faster?  Would a chip that runs Java 10x faster than the fastest available cores from Intel be valuable at a time when Java has stopped getting faster via Moore’s Law? 

It seems to me it would.

Posted in multicore | 1 Comment »

Google Reports iPhone Usage 50x Other Handsets; Amazon S3 Goes Down: Low Friction Has a Cost

Posted by Bob Warfield on February 15, 2008

As I write this post there are two articles that caught my eye.  For most, the iPhone and Amazon’s Web Services have little to do with one another, but I see a bit of a pattern here that’s interesting.

Slash Lane of Apple Insider reports that Google was shocked that is was seing 50 times more search requests coming from Apple iPhones than any other mobile handset — a revelation so astonishing that the company originally suspected it had made an error culling its own data.  It’s an amazing statistic, really.  But I can attest to hitting Google quite a lot myself whenever I’m out and about and killing time before the next meeting.  In fact, I am very pleased to have my bookmarks out on a web page rather than in my browser so I can easily access all of my favorite sites from whatever device is at hand.  The iPhone is quite a credible web browser.  I can’t wait for the 3G version and higher speeds.

Following closely on my read of the iPhone piece is Nick Carr’s article about an Amazon S3 outage.  Nothing all that earth-shattering or unexpected, just that S3 was out for several hours this morning, beginning at 7:30am EST.  The gist of the article is that while the outage was to be expected, Amazon did a poor job keeping users informed of what was going on and providing explanations after the fact.  Carr is right, of course, but business is always embarassed when things go wrong and the first (and wrong) human instinct is to be shy about details.

Why do these two go together?  I’ll give you a hint:  the tales of Facebook applications reaching millions of users in an incredibly short time also goes with the theme I’m thinking of.  That theme has to do with friction.  Friction is my word for all the factors that slow adoption.  The time needed for word of mouth, decisionmaking, purchase, installation, getting through the learning curve, and finally being a first class citizen of whatever community results is governed by the degree of friction.

One of the things the Internet does is reduce friction.  In its most extreme, friction actually reverses and becomes a propelling force.  We call that viral marketing.  Most of the innovations in this second Internet round (post-bubble) have been focused on reducing friction.  Social Networks, for example, dramatically reduce the friction of networking.  Twitter dramatically reduces the friction of blogging, right down to limiting the article length to 140 characters so you don’t have to labor over the wordsmithing.

While it’s harder, the web is also a powerful means of reducing friction for more physical things.  The iPhone and Amazon Web Services are two great examples.  In an extremely short time the iPhone has racked up 50x the usage of other competing handsets for the Internet.  The traffic to AWS in approximately the same short time now exceeds the combined traffic for all other Amazon properties.

While the web itself helped to spread the word, I think it is no coincidence that these two have a lot to do with the web and offer a lot of value back to the web.  It’s what some folks call a virtuous circle.  Look for more of these as time goes on.

Now that cost side.  These growth rates are not predictable.  Nobody would have guessed that either business would get so big so fast.  In fact, many guessed just the opposite.  Even if you did guess it could happen, it would only be a guess that it could, not that it would.  A prudent business would not invest in infrastructure built to the level and assumption that it would happen.  That means there will be painful outages from time to time.  Hopefully, the infrastructure owners will take those outages as signs that its time to double down and extend their projections of what might happen much further up and to the right.  Those that succeed in keeping hold of the Tiger by the Tail will survive and prosper.

Posted in amazon, data center, grid, Marketing, multicore, Web 2.0 | 5 Comments »

Software Testing in the Multicore Cloud Computing Era With Replay Solutions

Posted by Bob Warfield on February 11, 2008

I had the opportunity to visit Jonathan Lindo, CEO and co-founder of Replay Solutions last week and I came away impressed.  This Hummer Winblad and Partech backed startup has some fascinating new technology to help with software testing and debugging.  I like to think of their software as a time machine for complex software.  With it, you can go back and recreate the circumstances that led to a bug, and thereby figure out what has happened.  Their software works by turning your J2EE application into a black box, and monitoring everything that comes and goes into or out of the box.  Using their proprietary algorithms, the data required to do this is actually kept very small.  So small, that the company got its start helping game companies to monitor their software using the same technology.  They’re still doing a business in that market, and you can imagine the software has to be pretty unintrusive if its not going to interfere with a game.  And so it is. 

It works this magic by tapping into and instrumenting the Java code.  This sounds a lot like what my old alma mater Pure Software did with their memory leak detection.  What’s nice about it is that no access to source code is required.  In the demo, Jonathan fired up an app server (they support Tomcat and JBoss, and soon WebLogic), lit up their instrumentation module, and from that point on just used the software being tested normally.  Of course in the demo, using the software “normally” eventually led to a crash.  It was the classic ugly Java stack dump that tells you very little about what actually happened–just the thing to annoy both the user and the developers.

Replay Solutions to the rescue.  Jonathan likes to think of it as “Tivo for Software.”  Looking at the screen one sees a screenshot of every HTML rendering to the screen.  This makes it easy to tell where in the recorded dump you are and what the user was doing at the time.  The developer can set breakpoints in their code and then use ReplayDIRECTOR (that’s what the software is called) to bring the program up to the point of failure.  This can be done over and over until the programmer has figured out what went wrong.

Sounds cool, but why is this software an essential tool for the Multicore Cloud Computing Era?  Think about it.  In the old days, reproducing bugs was hard enough.  It could take days to find the exact set of steps needed to make a bug reproducible.  And until the bug is reproducible, it’s nearly impossible to fix.  Now fast forward to the Multicore Cloud Computing Era.  You’ve got hundreds or even thousands of simultaneous users running against a hundred or more CPU’s.  There are many many processes running.  Developers recognize this as a nightmare situation, because it becomes impossible to reproduce bugs in such a world.  How would you ever get all of those users to do exactly the same thing twice?  Add to that all the other crazy timing-related issues and it’s darned near impossible to track down many kinds of bugs on such software.

I talked over with Jonathan what I thought was a really cool scenario.  Would it be possible to set up ReplayDIRECTOR to continuously monitor a big SaaS or Web 2.0 system?  The answer, surprisingly, is that it is completely possible.  Suddenly, we can make these kinds of bugs reproducible.  But it gets even better.  ReplayDIRECTOR will reproduce the problem on far less hardware than the original system.  That’s another big issue to be faced with such systems–the cost of providing a duplicate environment for testing.  With Replay, the “black box” can be just the J2EE server.  All of the other pieces are simulated.

If I were currently involved with a J2EE-architecture piece of Enterprise Software, I would definitely be trying to get into Replay’s Beta Testing program.

Posted in multicore, saas, Web 2.0 | Leave a Comment »

Apple, MacWorld, User Experience, and the Multicore Crisis

Posted by Bob Warfield on January 16, 2008

Looking over the parachute drops of information from MacWorld, I was struck by some underlying themes.  I won’t bore you with a recitation of the huge amount of surface level activity: plenty of better more firsthand places to get that.  But some of those first hand sources excited some patterns I’m familiar with.

First, the multicore crisis bit.  I’ve written about it before, but let me recap.  What is the multicore crisis?  It is a wave of change that is being unleashed by virtue of the fact that microprocessors have stopped getting faster every 18 months.  Instead of gaining a faster clock speed with free benefits for all at scarcely any effort, we get more cores.  That ain’t bad, but it takes considerable effort at the software end to take advantage of the additional cores.  For the most part, we are far from keeping up with the availability of those cores.  For emphasis, here is a graph of Intel clock speeds that vividly shows just how long the curve has been flattened out:

Clock Speed Timeline

We’ve had another year in 2007 while the curve remained flat.

What does this have to do with Apple and MacWorld?  Well, on a simple vein, it was the multicore crisis checking in that caused Mathew Ingram to write, “Hey, Steve–you broke the Internet.”  He was remarking about how Twitter was virtually unusable for hours.  Twitter has become somewhat of an unwilling canary in the coal mine: if something is hot and getting traffic, Twitter seems bound to go down.  Why?  Because it is a victim of the Multicore Crisis.  The system’s architecture isn’t scaling.  It may be a software problem, i.e. it is not designed to take advantage of enough cpu’s, or an infrastructure problem, i.e. it can only take advantage of the cpus Twitter has physically bought and installed in their data center.  These can both be overcome.  Software can be made to take advantage of lots more processors.  Services like Amazon and others offer let you scale up to many more cpu’s on short notice without having to buy physical hardware.  Failure to provide for both these contingencies is succumbing to the Multicore Crisis.

Twitter was not unique.  Mathew’s blog was very slow to come up when I tried to access the article, having been Techmemed.  He mentions Fake Steve Jobs got creamed and couldn’t make CoverIt Live work (Zoli mentions CoveritLive was CoveritDead).  The Apple store was down at one point too.

Scoble tells a similar story:  Engadget was up but very slow, Qik’s macworld channel was up and down, Mogulus was slow to unreachable.  Live video was hard to come by.  TUAW fairly unreachable.  There were a couple sites that passed muster including TechCrunch (bravo!) and MacRumorsLive.  TechCrunch hammers Twitter for being down.  Again.  If, as its pundits like to think, Twitter will play a signficiant role in reporting events, it needs to work all the time.  It is, after all, a communication channel.  Moreover, it’s a communication channel under constant scrutiny.

This brings me to a point I want to make about the Multicore Crisis and The Big Switch (what Nick Carr calls the trend to move to Cloud Computing).  These two megatrends are combining to change what the important core competencies are to succeed.  Once upon a time, it was enough just to be able to lash together all the myriad pieces needed to create a web application with a good user design.  You could count on Moore’s Law to make machines faster and your customer growth was slow enough that scalability could be comfortable pushed out into the future as a high quality problem to deal with if you succeeded.  That’s no longer the case.  The ability for new ideas to catch on has become viral on the web for a variety of reasons, not the least of which is that so many more people are on the web and they’re interconnected in so many more ways than simple e-mail, search, and web browsing.

There is another, more subtle manifestation of all this.  The new MacBook Air personifies this.  In the Multicore era: user experience is the new black for hardware.  Why?  Well, in the old days, everyone wanted to upgrade every two years.  For a while, I bought a new PC every year.  And it was worth it.  The new machines were significantly faster than the old.  In a world where the upgrade cycle is so short, you want to buy cheap hardware.  Result?  Dell wins big.  They’re the best at building their hardware cheap, so you can buy it more often, so you can get that speed.  Dell was driven by the Need for Speed, and the relative ease with which Moore’s Law delivered it.

Times have changed.  In an era when you probably won’t upgrade every two years, let alone every year, it makes sense to look at something other than speed.  I have an idea, how about looking at the User Experience?  Is the machine sexier?  Does it do cool things?  I love the Air’s ability to “borrow” a disk drive via WiFi from a nearby machine as well as its ability to handle iPhone-like gestures on its touch pad.  Combining Apple’s trademark radically uber-cool Industrial Design with genuine usability innovation is a winning formula.  If it gets you to buy a new machine when you otherwise would be happy to stand pat, they win.  The fact that so much of what one does on a computer is via the Internet combined with the rise of very effective virtualization software has radically lowered the barriers to PC/Windows users buying a Mac as well.  The latter is the Big Switch component.

That’s two significant changes brought on by the Multicore Crisis and The Big Switch.  What is your company doing to get ahead of these trends before some competitor uses them to ride right over your business?

Posted in data center, multicore, saas, strategy, user interface, Web 2.0 | 3 Comments »

Scalability is a Requirement for Startups

Posted by Bob Warfield on December 6, 2007

Dharmesh Shah wonders whether startups should ignore scalability:

You’re worrying about scalability too early. Don’t blow your limited resources on preparing for success. Instead, spend them on increasing the chances that you’ll actually succeed.

It’s an interesting question: should startups worry about scalability, or does that get in the way of finding a proper product/market fit?  If you’ve read my blog much you’ll know that I view achieving that product/market fit as the highest priority for a startup, and I’m not alone, Marc Andreesen says it too.  I think this is so important that I have advocated some relatively radical architectural ramifications to help facilitate the flexibility of a product so it can evolve towards that ideal even faster.

But where does scalability fit in?  Can you achieve that product/market fit without it?  For most startups, I think it is either difficult to verify a true product/market fit without it, or worse, you may achieve it only to immediately fall to earth a victim of poor user experience.  There are certainly plenty of examples of companies that started out great, seemed to have that product/market fit, but got into persistent hot water because they couldn’t scale out a good user experience when their site began to take off.  Fred Wilson writes recently about his love/hate relationship with Technorati, which has been a good example of this.

Here is another question, “How much success do  you need to verify product/market fit?”  Signing up a few customers to a beta, or even having a large beta is not really enough in my opinion.  It’s pretty easy to get a ton of people to try something that sounds sexy and is promoted well.  The question is whether that really takes well enough.  Marc Andreesen’s Ning is a good example.  When they launched their original product it required a fair amount of custom programming to create a custom Social Network.  They had 30,000 social networks created even so, but the service wasn’t taking off.  Michael Arrington was calling it R.I.P.  Then they released a version that eliminated the need for programming and suddenly the product/market fit was there and it took off like a rocket, crossing 100,000 social networks in record time.  Clearly Ning had to deal with scalability before they could learn much about their product/market fit.

Google is another great example of this.  They had to scale from day one because of the problem they were solving.  Om Malik says their infrastructure and ability to scale is actually their strategic advantage.  Certainly the nature of the problem Google wanted to solve required scalability from day one.  This is how Aloof Schipperke wants to view the question when he says, “Scalability is a requirement, not an optimization.”  It’s a bit of a double entendre.  One could say it is a requirement that all startups deal with it, or one could say startups need to evaluate whether scalability is a requirement in their domain.  I’m in that latter camp.  Figure out what success really looks like.  When do you know you have product/market fit?  Be conservative.  What are the requirements to get there?  Aloof lumps scalability in with other “ilities”.  Can your startup reach product/market fit without security, for example?  The answers may surprise you if you’re really honest about it.

Chances are, you may have to do more to be sure about product/market fit than you are comfortable with in release 1.0.  You’ll need a phased plan for how to get there.  Lest you use this as an excuse to ignore scalability until the last minute, keep in mind that these phased plans should have short milestones.  Quarterly or six month iterations at most.  Scaling a really poorly architected application can amount to a painful rewrite.  So do a phasing plan for scaling.  What are the big ticket items you’ll need to enable early so that scaling later is not too hard?  There are a few well-known touchpoints that can make scalability easier.  I’m not going to go over all of them, you know what I’m talking about:  statelessness, RESTful web services, and beware the database.  If you don’t know about these things, get some people on your team who do!  It’s not hard to start out with a plan in mind about your eventual scalablity and just make sure that along the way you don’t inadvertently shoot yourself in the foot.  It usually boils down to securing the two ends of the puzzle with good scalability:

- How will the client side web servers scale?

- How will the database back end scale?

Make a plan for what it will look like when it’s done, and put phased milestones in place to get there over time. 

Here’s another key issue.  Dharmesh’s original question assumes scalability and user/experience compete for scarce resources.  Ed Sim somewhat follows this path too when he writes that it’s hard to sell Scalability.  Aren’t we talking about the tradeoffs between UI/Features and Infrastructure (web or DB)?  Are the same engineers really doing both things?  It seems to me a lot more common to have a “front end” or application group and a “back end” or infrastructure group, even if “group” is a bit grandiose for a couple of people.  Take the opportunity to map out how the modules produced by these two groups will communicate.  Make that communication architecturally clean so the groups are decoupled.  Make the communication work the way it will when you build out scalability, but then don’t build it out at first.  This will enable the infrastructure group’s agenda to decouple from the user experience guys. 

BTW, if you’re thinking the true competition between the two is you want to hire all user experience people with your capital and no infrastructure, that just sounds like a bad idea to me.  It’s hard to deliver good user experience if your infrastructure is lousy, buggy, and doesn’t perform.  There are ample studies that show the speed with which your application serves up pages is a big contributor to user experience as well. 

I’ve gone down this path before of having essentially two small teams and making sure there was clean communication between their code from the start.  My company PriceRadar was lucky enough to land a partnership with AskJeeves early on.  Part of the deal was we had to pass a load test that showed we could handle 10,000 simultaneous users hammering our application.  At the time, most of my experience and developers were from the Microsoft world, so we were .NET all the way.  I remember meeting with the advisory board for a company called iSharp.  It was an all-star cast of web application CTO’s and VP’s of Engineering.  We went around the table to hear what everyone was doing.  I was the only Microsoft guy in the room, and the Unix crowd just laughed when I told them we had to pass this big load test.  AskJeeve’s CTO was there as well as the fellow in charge of AOL Instant Messenger and about 10 others.  They flat said it was impossible on Unix.  In less than a month we had it all working with a distributed grid architecture.  The front end guys were never even involved and changed little or no code.  The back end guys didn’t sleep much, but they emerged triumphant.  And the entire team was about 10 developers, per my small team mentality.

Yes Virginia, you should worry about your scalability, but it need not be all consuming.  You can handle it.

Posted in data center, grid, multicore, platforms, saas, strategy, Web 2.0 | 2 Comments »

A Pile of Lamps Needs a Brain

Posted by Bob Warfield on October 28, 2007

Continuing the discussion of a Pile of Lamps (a clustered Lamp stack in more prosaic terms), Aloof Schipperke writes about how such a thing might manage its consumption of machines on a utility computing fabric:

Techniques for managing large sets of machines tend to either highly centralized or highly decentralized. Centralized solutions tend to come from system administration circles as ways to cope with large quantities of machines. Decentralized solutions tend to come from the parallel computing space where algorithms are designed to take advantage of large quantities of machines.

Neither approach tends to provide much coupling between management actions and application conditions. Neither approach seems well adapted for any form of semi-intelligent dynamic configuration of multi-layer web application. Neither of them seem well suited for non-trivial quantities of loosely coupled LAMP stacks.

Aloof has been contemplating whether a better approach might be to have the machines converse amongst themselves in some way.  He envisions machines getting together when loads become too challenging and deciding to spawn another machine to take some of the load on.

Let’s drop back and consider this more generally.  First, we have a unique capability emerging in hosted utility grids.  These range from systems like Amazon’s Web Services to 3Tera’s ability to create grids at their hosting partners.  It started with the grid computing movement which sought to use “spare” computers on demand, and has now become a full blown commercially available service.  Applications can order and provision a new server literally on 10 minutes notice, use it for a period of time, and then release the machine back to the pool only paying for the time they’ve used.  This differs markedly from stories such as iLike’s, who had to drive around in a truck borrowing servers everywhere they could, and then physically connect them up.  Imagine how much easier it could have been to push a button and bring on the extra servers on 10 minutes notice as they were needed.

Second, we have the problem of how to manage such a system.  This is Aloof’s problem.  Just because we can provision a new machine on 10 minutes notice doesn’t mean a lot of other things:

  • It doesn’t mean our application is architected to take advantage of another machine. 
  • It doesn’t mean we can reconfigure our application to take advantage in 10 minutes.
  • It doesn’t mean we have a system in place that knows when it’s time to add a machine, or take one back off.

This requires another generation of thinking beyond what’s typically been implemented.  New variable cost infrastructure has to trickle down into fixed cost architectures.  For me, this sort of problem always boils down to finding the right granularity of “object” to think about.  Is the machine the object?  Whether or not it is, our software layers must take account of machines as objects because that’s how we pay for them.

So to attack this problem, we need to understand a collection of questions:

  1. What is to be our unit of scalability?  A machine?  A process?  A thread?  A component of some kind?  At some level, the unit has to map to a machine so we can properly allocate on a utility grid.
  2. How do we allocate activity to our scalability units?  Examples include load balancing and database partitioning.  Abstractly, we need some hashing function that selects the scalability unit to allocate work (data, compute crunching, web page serving, etc.) to.
  3. What is the mechanism to rebalance?  When a scalability unit reaches saturation by some measure, we must rebalance the system.  We change the hashing function in #2 and we have a mechanism to redistribute without losing anything while the process is happening.  We also must understand how we measure saturation or load for our particular domain.

Let’s cast this back to the world of a Pile of Lamps.  A traditional Lamp stack scaling effort is going to view each component of the stack separately.  The web piece is separate from the data piece, so we have different answers for the 3 issues on each of the 2 tiers.  Pile of Lamps changes how we factor the problem.  If I understand the concept correctly, instead of independently scaling the two tiers, we will simply add more Lamp clusters, each of which is a quasi-independent system.

This means we have to add a #4 to the first 3.  It was implicit anyway:

    4.  How do the scaling units communicate when the resources needed to finish some work are not all present within the scaling unit?

Let’s say we’re using a Pile of Lamps to create a service like Twitter.  As long as the folks I’m following are on the same scaling unit as me, life is good.  But eventually, I will follow someone on another scaling unit.  If the Pile of Lamps is clever, it makes this transparent in some way.  If it can do that, the other three issues are at least things we can go about doing behind the scenes without bothering developers to handle it in their code.  If not, we’ll have to build a layer into our application code that makes it transparent for most of the rest of the code.

I think Aloof’s musings about whether #3 can be done as conversations between the machines will be clearer if the Pile of Lamps idea is mapped out more fulling in terms of all 4 questions.

Posted in grid, multicore, platforms, strategy, Web 2.0 | 1 Comment »

Pile O’ LAMPs: What Would Fielding Say?

Posted by Bob Warfield on October 21, 2007

I’ve been pondering the Pile O’ Lamps concept that I first read about in Aloof Architecture and Process Perfection.  Read the posts yourself for the horse’s mouth, but to me, the Pile O’ Lamps concept is basically asking whether a computing grid of LAMP stacks is a worthwhile architectural construct that could be highly reusable for a variety of applications.  I say grid, because in my mind, it achieves maximal potential if deployed flexibly on a utility computing fabric such as Amazon EC2 where it can automatically flex to a larger cluster based on load requirements.  If it is fixed in size by configuration (which still means changeable, just not as quickly and automatically), I guess it would be more proper to call it a LAMP cluster.

LAMP refers to Linux as the OS, Apache as the web server, mySQL as the database, and a “P” language (usually PHP or Python) as the langauge used to implement the application.  It has become almost ubiquitious as a superfast way to bring up a new web application.  There are some shortcomings, but by and large, it remains one of the simplest ways to get the job done and still have the thing continue to work if you move into the big time.  A Pile of Lamps architecture would presumably simplify scaling by building it in at the outset rather than trying to tack it on later.

In general, I love the idea.  People are effectively doing what it calls for all the time anyway, they just do so in an ad hoc manner.  I got ambitious this Sunday morning and thought I’d drag out Fielding’s Dissertation and see how the idea stacks up.  If you’ve never had a look at Roy Fielding’s Architectural Styles and the Design of Network-Based Software Architectures, you missed out on a beautiful piece of work from the man that co-designed the Internet protocols.  This particular document sets forth the REST (Representational State Transfer) architecture.  What’s cool about it is that Fielding has a framework that he uses to evaluate the various components of REST that is applicable to a lot of other network architecture problems.  See Chapter 3 of the Dissertation for details, but that is my favorite part of the document. 

His concept is to create a scorecard for various network architectural components, and then use that scorecard together with the domain requirements of the design problem to arrive at an optimal architecture.  He says that’s how he got to REST, and it certainly seems to make sense as you read the Dissertation.  Here is a rendition of his ranking criteria for the models he considers:

Fielding Framework

A “0″ means the architectural style is beneficial to some domains and not others.  Positive means the style has benefit and negative means it is a poorer choice.

The components that make up REST look like this:

RESTful according to Fielding

There are 3 components that go into it:

  • Layered Cached Stateless Client Server:  The row marked LCS+C$SS
  • Uniform Interface, which isn’t in the original Fielding taxonomy, but which he says adds the qualities listed.
  • Code on Demand:  This is the ability of the web to send code to the client for execution based on what it requests.  So, for example, Flash or AJAX.

The “RESTful Result” is simply the total of the other attributes.  You can see it hits pretty darned well on most of the categories with the exception of Network efficiency.  As noted, this primarily means it isn’t suited to extremely fine grained communication, but is fine for a web page.  Pretty cool framework, eh?

Incidentally, Fielding’s framework really dumps on CORBA for all the right reasons.  Give it a read to see why.

Now let’s look at the Pile of Lamps.  Note that we aren’t trying to compare it to REST–they solve different problems.  Fielding tells us to do the analysis based on our domain, so put aside the RESTful scores, they aren’t meaningful to compare to anything but REST competitors.  Here is the result for Pile of Lamps:

Pile of Lamps

I view the LAMP stack as Layered Client Server, which is already a decent protocol.  A Pile of Lamps seems to me is basically adding a cached and replicated capability to the LAMP stack, so I add the cached/replicated repository to the equation.  You can see that it amplifies the LAMP stack while taking nothing away.  Basically, it makes it more efficient, more scalable, and it delivers those benefits in a simple way.  This makes total sense to me, given the concept. 

One can use the framework to fiddle with other potential additions to the Pile of Lamps idea.  For example, what if statelessness were pervasive in this paradigm?  I leave further refinement of the idea to readers, commenters, and the original authors, but it looks promising to me.  I’d also encourage others to delve into Fielding’s work.  It has application well beyond just describing REST.

Related Articles

A Pile of Lamps Needs a Brain

Posted in grid, multicore, Open Source, platforms, software development, strategy, Web 2.0 | 3 Comments »

Amazon Beefs Up EC2 With New Options

Posted by Bob Warfield on October 16, 2007

I’ve been a big fan of Amazon’s Web Services for quite a while and attended their Startup Project, which is an afternoon seeing what it can do and hearing from entrepreneurs who’ve built on this utility computing fabric.  Read my writeup on the Startup Project for more.  Amazon has been steadily rolling out improvements, such as the addition of SLA’s for the S3 storage service.  Today, there is big news in the Amazon EC2 camp:

Amazon has just announced two new instance types for their EC2 utility computing service.  The original type will continue to be available as the “small” type.  The “large” type has four times the CPU, RAM, and Disk Storage, while the “extra large” has eight times the CPU, RAM, and Disk.  The large and extra large also sport 64 bit cpus.  Supersize your EC2!

Why do this?  Because the original small instance was a tad lightweight for database activity with just 1.7GB of RAM while the extra large at 15GB is about right.  Imagine a cluster of the extra large instances running memcached and you can see how this going to dramatically improve the possibilities for hosting large sites.

One of the neat things about this new announcement is pricing.  They’ve basically linearly scaled pricing.  Whereas a small instance costs 10 cents per instance hour, the extra large has 8x the capacity and costs 8×10 cents or 80 cents per hour.

What’s next?  These new instances open a lot of possibilities, but Amazon still doesn’t have painless persistence for databases like mySQL.  If you are running mySQL on an extra large instance and the server goes down for whatever reason, all the data on it is lost and you have to rebuild a new machine around some form of hot backup or failover.  That exercise has been left to the user.  It’s doable: you have to solve the problem in any data center of what you plan to do if the disk totally crashes and no data can be recovered.  However, folks have been vocally requesting a better solution from Amazon where the data doesn’t go away and the machine can be rebooted intact.  I was told by the EC2 folks at the Startup Project to expect 3 announcements before the end of the year that were related.  I’m guessing this is the first such announcement and two more will follow. 

There’s tremendous excitement right now around these kinds of offerings.  They virtualize the data center to reduce the cost and complexity of setting up the infrastructure to do web software.  They allow you to flex capacity up or down and pay as you go.  Amazon is not the only such option.  I’ll be reporting on some others shortly.  It’s hard to see how it makes sense to build your own data center without the aid of one of these services any more. 

Posted in amazon, ec2, grid, multicore, platforms, saas, software development, Web 2.0 | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 36 other followers