SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the ‘data center’ Category

Big Data is a Small Market Compared to Suburban Data

Posted by Bob Warfield on February 2, 2013

BurbsBig Data is all the rage, and seem to be one of the prime targets for new entrepreneurial ventures since VC-dom started to move from Consumer Internet to Enterprise recently.  Yet, I remain skeptical about Big Data for a variety of reasons.  As I’ve noted before, it seems to be a premature optimization for most companies.  That post angered the Digerati who are quite taken with their NoSQL shiny objects, but there have been others since who reach much the same conclusion.  The truth is, Moore’s Law scales faster than most organizations can scale their creation of data.  Yes, there are some few out of millions of companies that are large enough to really need Big Data and yes, it is so fashionable right now that many who don’t need it will be talking about it and using it just so they can be part of the new new thing.  But they’re risking the problems many have had when they adopt the new new thing for fashion rather than because it solves real problems they have.

This post is not really about Big Data, other than to point out that I think it is a relatively small market in the end.  It’ll go the way of Object Oriented Databases by launching some helpful new ideas, the best of which will be adopted by the entrenched vendors before the OODB companies can reach interesting scales.  So it will be with Hadoop, NoSQL, and the rest of the Big Data Mafia.  For those who want to get a head start on the next wave, and on a wave that is destined to be much more horizontal, much larger, and of much greater appeal, I offer the notion of Suburban Data.

While I shudder at the thought of any new buzzwords, Suburban Data is what I’ve come up with when thinking about the problem of massively parallel architectures that are so loosely coupled (or perhaps not coupled at all) that they don’t need to deal with many of the hard consistency problems of Big Data.  They don’t care because what they are is architectures optimized to create a Suburb of very loosely coordinated and relatively small collections of data.  Think of Big Data’s problems as being those of the inner city where there is tremendous congestion, real estate is extremely expensive, and it makes sense to build up, not out.  Think Manhattan.  It’s very sexy and a wonderful place to visit, but a lot of us wouldn’t want to live there.  Suburban Data, on the other hand, is all about the suburbs.  Instead of building giant apartment buildings where everyone is in very close proximity, Suburban Data is about maximizing the potential of detached single family dwellings.  It’s decentralized and there is no need for excruciatingly difficult parallel algorithms to ration scarce services and enforce consistency across terabytes.

Let’s consider a few Real World application examples.

WordPress.com is a great place to start.  It consists of many instances of WordPress blogs.  Anyone who likes can get one for free.  I have several, including this Smoothspan Blog.  Most of the functionality offered by wp.com does not have to coordinate between individual blogs.  Rather, it’s all about administering a very large number of blogs that individually have very modest requirements on the power of the underlying architecture.  Yes, there are some features that are coordinated, but the vast majority of functionality, and the functionality I tend to use, is not.  If you can see the WordPress.com example, web site hosting services are another obvious example.  They just want to give out instances as cheaply as possible.  Every blog or website is its own single family home.

There are a lot of examples along these lines in the Internet world.  Any offering where the need to communicate and coordinate between different tenants is minimized is a good candidate.  Another huge area of opportunity for Suburban Data are SaaS companies of all kinds.  Unless a SaaS company is exclusively focused on extremely large customers, the requirements of an average SaaS instance in the multi-tenant architecture are modest.  What customers want is precisely the detached single family dwelling, at least that’s what they want from a User Experience perspective.  Given that SaaS is the new way of the world, and even a solo bootstrapper can create a successful SaaS offering, this is truly a huge market.  The potential here is staggering, because this is the commodity market.

Look at the major paradigm shifts that have come before and most have amounted to a very similar (metaphorically) transition.  We went from huge centralized mainframes to mini-computers.  We went from mini-computers to PC’s.  Many argue we’re in the midst of going from PC’s to Mobile.  Suburban Data is all about how to create architectures that are optimal for creating Suburbs of users.

What might such architectures look like?

First, I think it is safe to say that while existing technologies such as virtualization and the increasing number of server hardware architectures being optimized for data center use (Facebook and Google have proprietary hardware architectures for their servers) are a start, there is a lot more that’s possible and the job has hardly begun.  To be the next Oracle in the space needs a completely clean sheet design from top to bottom.  I’m not going to map the architecture out in great detail because its early days and frankly I don’t know all the details.  But, let’s Blue Sky a bit.

Imagine an architecture that puts at least 128 x86 compatible (we need a commodity instruction set for our Suburbs) cores along with all the RAM and Flash Disc storage they need onto the equivalent of a memory stick for today’s desktop PC’s.  Because power and cooling are two of the biggest challenges in modern data centers, the Core Stick will use the most miserly architectures possible–we want a lot of cores with reasonable but no extravagant clock speeds.  Think per-core power consumption suitable for Mobile Devices more than desktops.  For software, let’s imagine these cores run an OS Kernel that’s built around virtualization and the needs of Suburban Data from the ground up.  Further, there is a service layer running on top of the OS that’s also optimized for the Suburban Data world but has the basics all ready to go:  Apache Web Server and MySQL.  In short, you have 128 Amazon EC2 instances potent enough to run 90% of the web sites on the Internet.  Now let’s create backplanes that fit a typical 19″ rack set up with all the right UPS and DC power capabilities the big data centers already know how to do well.  The name of the game will be Core Density.  We get 128 on a memory stick, and let’s say 128 sticks in a 1U rack mount, so we can support 16K web instances in one of those rack mounts.

There will many valuable problems to solve with such architectures, and hence many opportunities for new players to make money.  Consider what has to be done to reinvent hierarchical storage manage for such architectures.  We’ve got a Flash local disc with each core, but it is probably relatively small.  Hence we need access to storage on a hierarchical basis so we can consume as much as we want and it seamlessly works.  Or, consider communicating with and managing the cores.  The only connections to the Core Stick should be very high speed Ethernet and power.  Perhaps we’ll want some out of band control signals for security’s sake as well.  Want to talk to one of these little gems, just fire up the browser and connect to its IP address.  BTW, we probably want full software net fabric capabilities on the stick.

It’ll take quite a while to design, build, and mature such architectures.  That’s fine, it’ll give us several more Moore cycles in which to cement the inevitability of these architectures.

You see what I mean when I say this is a whole new ballgame and a much bigger market than Big Data?  It goes much deeper and will wind up being the fabric of the Internet and Cloud of tomorrow.

Posted in business, cloud, data center, enterprise software, multicore, platforms, saas, service | 2 Comments »

NoSQL is a Premature Optimization

Posted by Bob Warfield on July 22, 2011

No SQL databasesThere’s been a lot of back and forth lately from the NoSQL crowd around Michael Stonebreaker’s contention that reliance on relational technology and MySQL has trapped Facebook in a ‘fate worse than death.’   This was reported in a GigaOm post by Derrick Harris.  Harris reports in a later post that most of the reaction to Stonebreaker’s contention was negative:

By and large, the responses weren’t positive. Some singled out Stonebraker as out of touch or as just trying to sell a product. Some pointed to the popularity of MySQL as evidence of its continued relevance. Many questioned how Stonebraker dare question the wisdom of Facebook’s top-of-the-line database engineers.

Harris, Jim Starkey, Paul Mikesell, and Curt Monash all take a stab at rehabilitating Stonebreaker’s argument in the second post.  Their argument boils down to, “Yeah, Facebook did it, but only because they have great engineers, spent a fortune, and endured a lot of pain.  There are easier ways.”

Sorry fellas, time to annoy the digerati again, and so soon after bashing Social Media.  I disagree with their contention, which is well expressed in the article by this Jim Starkey quote:

If a company has plans for its web application to scale and start driving a lot of traffic, Starkey said, he can’t imagine why it would build that new application using MySQL.

In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You will have plenty of time to switch to NoSQL as and if it becomes helpful.  Until that time, NoSQL is an expensive distraction you don’t need.

The best example I see for why that’s the way to look at NoSQL comes from Netflix, which is mentioned towards the end of the article.  I went through several expositions by Netflix engineers on their experience transitioning from an Oracle Relational data center to one based on NoSQL in the form of Amazon’s SimpleDB and then later Cassandra (the latter is still an ongoing transition as I understand it).  You’re welcome to read the same sources, I’ve listed them at the bottom.

Netflix decided to move to the Cloud in late 2008 to early 2009 after an outage prompted them to consider what it would take to engineer their way to significantly higher up time.  They concluded they couldn’t build data centers fast enough, and that as soon as one was built it was swamped for capacity and out of date.  They agree with Amazon’s Werner Vogels that building data centers represented “undifferentiated heavy lifting”, and was therefore to be avoided, so they bet heavily on the Cloud.  These are smart technologists who have been very transparent about their experiences, so it’s worth learning from them.  Werner Vogels reaction to Stonebreaker’s remarks about Facebook are an apt way to start:

Scaling data systems in real life has humbled me.  I would not dare criticize an architecture that holds social graphs of 750M and works.

The gist of the argument for NoSQL being a premature optimization is straightforward and rests on 3 points:

Point 1:  NoSQL technologies require more investment than Relational to get going with. 

The remarks from Netflix are pretty clear on this.  From the Netflix “Tech” blog:

Adopting the non-relational model in general is not easy, and Netflix has been paying a steep pioneer tax while integrating these rapidly evolving and still maturing NoSQL products. There is a learning curve and an operational overhead.

Or, as Sid Anand says, “How do you translate relational concepts, where there is an entire industry built up on an understanding of those concepts, to NoSQL?’

Companies embarking on NoSQL are dealing with less mature tools, less available talent that is familiar with the tools, and in general fewer available patterns and know-how with which to apply the new technology.  This creates a greater tax on being able to adopt the technology.  That sounds a lot like what we expect to see in premature optimizations to me.

Point 2:  There is no particular advantage to NoSQL until you reach scales that require it.  In fact it is the opposite, given Point 1.

It’s harder to use.  You wind up having to do more in your application layer to make up for what Relational does that NoSQL can’t that you may rely on.  Take consistency, for example.  As Anand says in his video, “Non-relational systems are not consistent.  Some, like Cassandra, will heal the data.  Some will not.  If yours doesn’t, you will spend a lot of time writing consistency checkers to deal with it.”  This is just one of many issues involved with being productive with NoSQL.

Point 3:  If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

The root of premature optimization is engineers hating the thought of rewriting.  Their code has to do everything just exactly right the first time or its crap code.  But what about the idea you don’t even understand the problem well enough to write “good” code at first.  Maybe you need to see how users interact with it, what sorts of bottlenecks exist, and how the code will evolve.  Perhaps your startup will have to pivot a time or two before you’ve even started building the right product.  Wouldn’t it be great to be able to use more productive tools while you go through that process?  Isn’t that how we think about modern programming?

Yes it is, and the only reason not to think that way is if we have reason to believe that a migration will be, to use Stonebreaker’s words, “a fate worse than death.”  The trouble is, it isn’t a fate worse than death.  And yes, it will help to have great engineers, but by the time you get to the volumes that require NoSQL, you’ll be able to afford them, and even then, it isn’t that bad.

Netflix’s story is a great one in this respect.  They went about their NoSQL migration in a clever way.  They built a bi-directional replication between Oracle and SimpleDB, and then they started moving over one app at a time.   They did this against a mature system rather than a new buggy untested by users system.  As a result, things went pretty quickly and pretty smoothly.  That’s how engineers are supposed to work: bravo Netflix!

I have a note out to Adrian Cockcroft to ask how long it took, but already I have found a reference to Sid Anand doing the initial “forklifting” of a billion records from Oracle to Simple DB in about 9 months, and they went on from there.  When Sid Anand was asked what the most complex query was to convert from Oracle to NoSQL he said, “There weren’t really any.”  He went on to say you wouldn’t convert your transactional data anyway, and that was pretty much it.

Conclusion

The world loves to see things in black and white.  It sells more papers.  Therefore, because some situations benefit from NoSQL for scaling, we hear a hue and cry that everyone must embrace NoSQL immediately.  Poppycock.  You can go a long long way with SQL-based approaches, they’re more proven, they’re cheaper, and they’re easier.  Start out there and if the horse you’re riding is strong enough to carry you to NoSQL scaling levels you can tackle that when the time comes.  Meanwhile, avoid premature optimizations.  You don’t have time for them.  Let all these guys with NoSQL startups make their money elsewhere.  You need to stay agile and focused on your next minimum viable deliverable.

Extra! Extra!

This post is now, at least for a time, the biggest post ever for Smoothspan.  Be sure to check out my follow up post:  Biggest Post Ever Redux:  NoSQL as a More Flexible Solution?

Articles on the Netflix NoSQL Transition

Sid Anand’s Experience “Forklifting” the Data from Oracle into SimpleDB

Adrian Cockcroft’s NoSQL Migration Slides

Sid Anand’s QCon Video on NoSQL at Netflix

Posted in cloud, data center | 47 Comments »

CIO’s: You’ve Got a Target On Your Back, Use It!

Posted by Bob Warfield on May 24, 2011

Bummer of a birthmarkThis week’s InfoBoom post is all about security and Sony’s recent epidemic of hacker attacks.  I can’t imagine any CIO watching the Sony/Hacker drama unfold wouldn’t be wondering whether it could happen to their organization. I was reminded of it again when I read this morning about “Another day, another attack on Sony.”

Clearly Sony had very much underestimated their risks, but isn’t it likely almost everyone has too? So far, Sony has estimated $171 million in costs relating to these attacks.  In this post, I look at whether a strategy similar to Netflix’s “Chaos Monkey” might help CIO’s feel a little more secure.

Check it out on InfoBoom.

Posted in business, cloud, data center, strategy | Leave a Comment »

Gartner: The Cloud is Not a Contract

Posted by Bob Warfield on January 12, 2011

There is a bit of a joust on between Gartner, GigaOm, and likely others over the recent Gartner Magic Quadrant for Cloud Infrastructure.  The Internet loves a good fight!

Gartner launched their magic quadrant with some fanfare on December 22.  Immediately after the holidays, on January 4, GigaOm’s Derrick Harris threw down the gauntlet by bluntly saying, “Gartner just flat got it wrong.”  Can’t get much more black and white than that.  His reasoning is as follows:

Initially, it seems inconceivable that anybody could rank IaaS providers and not list Amazon Web Services among the leaders. Until, that is, one looks at Gartner’s ranking criteria, which is skewed against ever placing a pure cloud provider in that quadrant. Large web hosts and telcos certainly have a wider range of offerings and more enterprise-friendly service levels, but those aren’t necessarily what cloud computing is all about. Cloud IaaS is about letting users get what they need, when they need it — ideally, with a credit card. It doesn’t require requisitioning servers from the IT department, signing a contract for any predefined time period or paying for services beyond the computing resources.

I have to say, he is right.  It is obvious and absurd not to rank Amazon Web Services at least among the leaders.  If you’re going to take that step, it’s a bold one, and needs to be expressed up front with no ambiguity and leading with a willingness to have a big discussion about it.  Gartner didn’t do that either.  They just presented their typical blah, blah, blah report.  For weaknesses, which presumably got Amazon moved out of the ranks of leaders, they offer the following:

  • No managed services.
  • No collocation, dedicated nonvirtualized servers (often used for databases), or private non-Internet connectivity.
  • The weakest cloud compute SLA of any of the evaluated competing public cloud compute services.  They offer 99.95% uptimes instead of the 99.99% of many others and the penalties are capped.
  • Support and other items are unbundled.
  • Amazon’s offering is developer-centric, rather than enterprise-oriented, although it has significant traction in large enterprises. Its services are normally purchased online with a credit card; traditional corporate invoicing must be negotiated as a special request. Prospective customers who want to speak with a sales representative can fill out an online form to request contact; Amazon does have field sales and solutions engineering. Amazon will negotiate and sign contracts known as Enterprise Agreements, but customers often report that the negotiation process is frustrating.

My first reaction to reading those negatives is they make a pretty good list of criteria for differentiating an old-fashioned managed hosting data center from a real Cloud service.  Does Gartner understand what the Cloud really is, what it is about, and how to engage with it successfully? 

For her part, the lead analyst, Lydia Leong, responded the day after the GigaOm post.  Here response, predictably, is to disagree with Derrick’s quoted paragraph above, saying:

I dispute Derrick’s assertion of what cloud IaaS is about. I think the things he cites above are cool, and represent a critical shake-up in thinking about IT access, but it’s not ultimately what the whole cloud IaaS market is about.

Lemme get this straight, the Cloud IaaS market is about (since I will negate Derrick’s remarks that Lydia disagrees with):

  • Eliminating Pure Cloud vendors from serious consideration.  You must have non-Cloud offerings to play.
  • Eliminating the self-service aspect of letting users get what they need, when they need it–ideally, with a credit card.
  • Eliminating the possibility for self-service without a contract negotiation.

Newsflash for you folks at Gartner: the Cloud is Not a Contract.  It is a Service, but it is a not a legion of Warm Bodies.  It’s not about sucking up with field sales and solutions engineering (“You’re a handsom and powerful man!”).

I can understand that Lydia’s clients mention the need for elaborate contracts with detailed provisions unique to their circumstances.  When that happens, and when it is so at odds with the landscape of a fundamentally new development that respecting it will prevent you from naming legitimate leaders like Amazon as leaders, there are two ways you can proceed.  The easy thing is to cave to your clients since they’re paying the bills and concoct a scenario where the clients get what they think they want.  The hard thing is to show some leadership, earn your fees, and explain to the client, or at least to the vendors slighted, why your recommendation is right. 

Let’s put on our analyst hats, leave Gartner’s misguided analysis, and look at the issue squarely of, “How should we be looking at the issue of Contracts and the Cloud?”

As I have already said, “The Cloud is not about contacts.”  What it is about is commoditization through scale and through sharing of resources which leads to what we call elasticity.  That’s the first tier.  The second tier is that it is about the automation of operations through API’s, not feet in the datacenter cages.  All the rest is hype.  It is this unique combination of: scale, sharing of resources, elasticity, and the automation of ops through API’s that makes the Cloud a new paradigm.  That’s how the Cloud delivers savings.  It’s not that hard to understand once you look at it in those terms.

Now what is the impact of contracts on all that?  First, a contract cannot make an ordinary datacenter into a Cloud no matter who owns it unless it addresses those issues.  Clouds are Clouds because they have those qualities and not because some contract or marketer has labeled them as such.  Second, arbitrary contracts have the power to turn Clouds into ordinary hosted data centers: 

A contract can destroy a Cloud’s essential “Cloudness”!

I wanted to put that in bold and set apart because it is important.  When you, Mr handsom and powerful IT leader, are negotiating hard with your Cloud vendor, you have the power to destroy what it was you thought you were buying.  Your biggest risk is not that they will say, “No”, it is that they might say, “Yes” if you wave a big enough check.  Those who have made the mistake of getting exactly what they want on a big Enterprise Software implementation that wound up going very far wrong because what you wanted was not what the software really did will know what I am talking about.

How do we avoid having a contract destroy “Cloudness?”  This is simple:

Never sign a contract with your Cloud provider that interferes with their ability to commoditize through scale, sharing, and automation of operations.

If they are smart, the Cloud provider will never let it get to that stage.  This is one reason Amazon won’t negotiate much in a contract.  Negotiating fees for services offered is fine.  That does not interfere with the critical “Cloudness” qualities (I am steadfastly refusing the term Cloudiness so as not to deprecate my central thesis!).  BTW, there are very close corollaries for SaaS, which is why they are also much more limited relative to On-prem vendors in what they can negotiate and why they try to hew to the side that Amazon has.  This stuff didn’t get invented for Cloud or out of disdain for customers, there are real economic impacts.

Let’s try a simple example.  Your firm wants to secure Cloud resources from a provider who has some sort of maintenance outage provision for the infrastructure.  It’s a made up example for Cloud (mostly), but is on point for SaaS and easy to understand, so let me continue.  Your firm finds that maintenance window to be unacceptable because you have aligned your own maintenance windows with those of another vendor.  If you accept the Cloud vendor’s window, you will now have two windows to present your constituents and that is unacceptable.  So you want to negotiate a change in the contracts.  Sounds very innocent, doesn’t it?  I’ve been through this exact scenario at SaaS companies where customers wanted this to be done for the same legitimate and logical reasons. 

But consider it from the Cloud provider’s point of view.  If they have a special maintenance window for you, they have to take a portion of their infrastructure and change how it works.  Unless they have other customers that want exactly the same terms, they will have to dedicate that infrastructure to your use.  Can you see the problem?  We have now eliminated the ability to scale that infrastructure through sharing and scale.  In addition, depending on how their automated operations function, it may or may not be applicable to your specialized resources.  It isn’t just a matter of changing a schedule for some ops people–the automated ops is either set up to deal with this kind of flexibility or it isn’t.

That was an example for a maintenance window, but any deviation you negotiate in a contract that impacts scale, sharing, or automated ops can have the same impact.  Here are more examples:

-  You want to change the disk quotas or cpus on your instances.

-  Your SLA requirements damage some baked-in aspect of the providers automated ops infrastructure.  This is easy to have happen.  You insist on network bandwidth that requires a different switching fabric or whatever.

-  You want to limit how and when machines can be patched by the provider.

-  You want to put your machines on private subnets as Gartner suggests should be possible and has many who think the idea of a Private Cloud is Marketing BS decry.

That list can be a mile long.  When you get done ruling out all the things you really can’t negotiate without un-Clouding your Cloud, you’re going to see that relatively simple contracts such as you can already negotiate with Amazon are all that’s left.  Congratulations!  Unlike Gartner, you now understand what a Cloud is and how to take advantage of it. 

And remember, the next time you’re negotiating a Cloud contract, be careful what you wish for–you just might get it.

Postscript

Lydia Leong, one of the Gartner analysts who did the Magic Quadrant discussed here, has responded.  I read her response as boiling down to agreement with a lot of what I’ve said, excusing the disagreements on the fact that the Magic Quadrant concerned both Cloud and Web Hosting and calling attention to Gartner’s plan to do a mid-year pure-Cloud Magic Quadrant.   The latter is a good idea, and one would think had to be at least partially motivated by more negative feedback than just my own about this current Magic Quadrant.  Try to compare Cloud vendors against Web Hosting vendors on the same quadrant is an apples-to-oranges exercise at best and misleading at worst.

In any event, I thank Lydia for her gracious response and look forward to seeing the Cloud Pure Play Magic Quadrant.  That’s where the meat will be!

Posted in business, cloud, data center, saas, service, strategy | 3 Comments »

What is the Sound of One Cloud Thunderclapping?

Posted by Bob Warfield on November 1, 2010

I just finished reading Randy Bias’s piece, “Elasticity is NOT #Cloud Computing … Just Ask Google“, and I must admit, it takes me back to all sorts of questions of a vaguely unsettling nature that have been bubbling below the surface of my cloudy thinking for some time.

If Elasticity is NOT Cloud Computing, and I profoundly disagree with that statement, then what the heck is Cloud Computing?  I admit to already having been pretty skeptical about whether Private Clouds are really Cloud Computing, but I got over that from an understanding of elasticity and private clouds.  However, if we take away Elasticity, and make whatever is left Private, I’m not sure we have anything profound whatsoever.  How does it differ from an outsourced data center, for example?

Randy seems to have come to his position about elasticity by looking at the large web operators like Amazon, Google and Salesforce who offer Cloud platforms of one kind or another (IaaS, PaaS, and the rest of the alphabet soup).  He’s interested in their history, why they built the kind of infrastructure they did, and why that turned out to be an ideal platform from which to offer Cloud Computing offerings.  My problem is that the mere fact that Amazon did not get the Elasticity benefit from its Cloud that AWS customers can does not in any way diminish the revolution that is the Cloud and that comes with Elasticity.  Without customers, these fancy Amazon and Google data centers are literally the sound of One Cloud Thunderclapping (because Clouds don’t just clap in the sense of the old Zen saying).  I don’t see how you can regard those data centers as “Cloud” at all.  A term more like “commodity computing” data center, perhaps with some form of “virtual” thrown in makes more sense.  Interestingly, it is the addition of the customers to their data centers that exposed Amazon and Google to a little elasticity.  Why?  Because now they have customers to help pay for excess computing capacity and hence elasticity.   What a beautiful thing, and another strong argument for why elasticity changes everything. 

The term “Cloud”, in any sense I’ve ever heard it used, is a service of some kind provided by one organization and shared (this sharing is critical) by many other independent customer organizations.  Thinking back on the idea that Amazon’s data center wasn’t cloud until they flipped the switch and started sharing it with customers (that’s the point where they got elasticity too), you can see that the ideas of “sharing” and “independepent customers” are critical.  Without both, there is no way to pay for the excess capacity that is elasticity, or at the very least, if the sharing is the sort pushed by companies like VMWare to up server utilization, it isn’t quite the same as having real net positive dollars coming in from paying customers, versus simple cost savings.

This is where I get into trouble with the Private Cloud concept.  Yeah sure, people are interested in building data centers that use technology similar to what the “Cloud” vendor’s data centers use.  Yeah sure, there is some benefit to that.  Those technologies were developed by organizations that needed to commoditize their data center services into as cheap an offering as possible.  But to call that Cloud Computing sure seems like gratuitous marketing to me.  If nothing else, it radically understates the challenge those organizations faced.  A little bit of storage virtualization, a healthy dose of VMWare-style machine virtualization, and do you really have the full benefits of Cloud Computing?  Not in my book.  You just have a modern data center. 

There are steps beyond that to take advantage of.  And the advantages are enough of a quantum leap that it isn’t fair to award the accolade “Cloud” for anything less.  If you do, you’re just engaging in gratuitous marketing, trying to draft somebody else’s hype and momentum.  I am already on record as arguing for the definition of Cloud being two things: 

-  It is Software as a Service.  I sometimes refer to this as “Ops as an API”, meaning rather than crawling around cages and machine racks, you perform ops by making API calls on your Cloud infrastructure.

-  It is Elastic.  Yep, I refuse to separate Elasticity from the Cloud, though Randy’s piece wants to leave it more in the camp of simply “Data Center as a Service”.

So then what really is a Private Cloud, and can we still call it a Cloud?  If you include the Software as a Service Piece, and the Elasticity, I would say “yes”.  The difference between a Public and Private Cloud is simply that in the Private case, the Customer paid the service provider to decouple their Private Cloud from the rest of the Public Cloud so no network traffic can get into the Private Cloud without its full knowledge and consent.  This is no biggie–it’s a subnet with the additional proviso that we don’t run anyone else’s Cloud Software but the one owner of the Private Cloud insider that subnet.  This is actually a pretty good deal for all concerned.  The Private Cloud user still gets nearly all the benefits of the Public Cloud user.  They do not get quite a good a price as they must monopolize the hardware to a greater extent than their Public brethren, but it is still a great deal.  Elasticity is still available to them, as is “Ops as an API”.  If your Private Cloud lacks either one of those characteristics, you have a modern data center, but it isn’t Cloud, despite what your vendor’s marketing people want to tell you.

Why do a Private Cloud? 

Largely to reduce perceived risks.  The risks boil down to security and performance.  The Private Cloud is presumably more secure, and its performance is more predictable because no unknown Public Cloud tenants can get into the Private Cloud and do unknown things.

There is one more advantage I will mention for the Cloud, which very much does include Elasticity, versus the Modern Data Center, which uses some Cloud technology, but has no Elasticity and is not a Cloud:

You Cloud Vendor may have more buying power than you, more technology they have amortized over more customers, and hence a lower cost to deliver the service than even relatively large corporate data centers can attain.  Hey, if it was easy, everyone would be doing it, instead of just everyone claiming to do it.

Posted in cloud, data center, platforms, saas, strategy | 4 Comments »

Half of VMWare’s Customer Base Should’ve Bought SaaS

Posted by Bob Warfield on October 4, 2010

According to Network World (which got its data from a VMWare customer survey), half of VMWare’s customer base is using the product to virtualize Microsoft Exchange.

Why in the world didn’t these customers buy a SaaS mail product?  For all the hassle of managing Exchange servers in your own data center, it’s just not worth it.  The savings could have been tremendous and a lot could’ve been offloaded.  Anyone who has used a product like Google’s G-Mail knows the spam filtering is 10x better than what anyone can afford with individual Exchange servers too.

Come to that, why isn’t Microsoft migrating Exchange customers into its own Cloud as fast as it can?

‘Nuff said.

Posted in business, cloud, data center, saas | 1 Comment »

Single Tenant, Multitenant, Private and Public Clouds: Oh My!

Posted by Bob Warfield on August 27, 2010

My head is starting to hurt with all the back and forth among my Enterprise Irregulars buddies about the relationships between the complex concepts of Multitenancy, Private, and Public Clouds.  A set of disjoint conversations and posts came together like the whirlpool in the bottom of a tub when it drains.  I was busy with other things and didn’t get a chance to really respond until I was well and truly sucked into the vortex.  Apologies for the long post, but so many wonderful cans of worms finally got opened that I just have to try to deal with a few of them.  That’s why I love these Irregulars!

To start, let me rehash some of the many memes that had me preparing to respond:

Josh Greenbaum’s assertion that Multitenancy is a Vendor, not a Customer Issue.  This post includes some choice observations like:

While the benefits that multi-tenancy can provide are manifold for the vendor, these rationales don’t hold water on the user side.

That is not to say that customers can’t benefit from multi-tenancy. They can, but the effects of multi-tenancy for users are side-benefits, subordinate to the vendors’ benefits. This means, IMO, that a customer that looks at multi-tenancy as a key criteria for acquiring a new piece of functionality is basing their decision on factors that are not directly relevant to their TCO, all other factors being equal.

and:

Multi-tenancy promises to age gracelessly as this market matures.

Not to mention:

Most of the main benefits of multi-tenancy – every customer is on the same version and is updated simultaneously, in particular – are vendor benefits that don’t intrinsically benefit customers directly.

The implication being that someone somewhere will provide an alternate technology very soon that works just as good or better than multitenancy.  Wow.  Lots to disagree with there.  My ears are still ringing from the sound of the steel gauntlet that was thrown down.

-  Phil Wainewright took a little of the edge of my ire with his response post to Josh, “Single Tenancy, the DEC Rainbow of SaaS.”  Basically, Phil says that any would-be SaaS vendor trying to create an offering without multitenancy is doomed as the DEC Rainbow was.  They have some that sort of walks and quacks like a SaaS offering but that can’t really deliver the goods.

-  Well of course Josh had to respond with a post that ends with:

I think the pricing and services pressure of the multi-tenant vendors will force single-tenant vendors to make their offerings as compatible as possible. But as long as they are compatible with the promises of multi-tenancy, they don’t need to actually be multi-tenant to compete in the market.

That’s kind of like saying, “I’m right so long as nothing happens to make me wrong.”  Where are the facts that show this counter case is anything beyond imagination?  Who has built a SaaS application that does not include multitenancy but that delivers all the benefits?

Meanwhile back at the ranch (we EI’s need a colorful name for our private community where the feathers really start to fly as we chew the bones of some good debates), still more fascinating points and counterpoints were being made as the topic of public vs private clouds came up (paraphrasing):

-  Is there any value in private clouds?

-  Do public clouds result in less lock-in than private clouds?

-  Are private clouds and single tenant (sic) SaaS apps just Old School vendors attempts to hang on while the New Era dawns?  Attempts that will ultimately prove terribly flawed?

-  Can the economics of private clouds ever compete with public?

-  BTW, eBay now uses Amazon for “burst” loads and purchases servers for a few hours at a time on their peak periods.  Cool!

-  Companies like Eucalyptus and Nimbula are trying to make Private Clouds that are completely fungible with Public Clouds.  If you  in private cloud frameworks like these means you have
to believe companies are going to be running / owning their own servers for a long time to come even if the public cloud guys take over a number of compute workloads.  The Nimbula guys built EC2 and they’re no dummies, so if they believe in this, there must be something to it.

-  There are two kinds of clouds – real and virtual.  Real clouds are multi-tenant. Virtual clouds are not. Virtualization is an amazing technology but it can’t compete with bottoms up multi-tenant platforms and apps.

Stop!  Let me off this merry go-round and let’s talk.

What It Is and Why Multitenancy Matters

Sorry Josh, but Multitenancy isn’t marketing like Intel Inside (BTW, do you notice Intel wound up everywhere anyway?  That wasn’t marketing either), and it matters to more than just vendors.  Why?

Push aside all of the partisan definitions of multitenancy (all your customers go in the same table or not).   Let’s look at the fundamental difference between virtualization and multitenancy, since these two seem to be fighting it out.

Virtualization takes multiple copies of your entire software stack and lets them coexist on the same machine.  Whereas before you had one OS, one DB, and one copy of your app, now you may have 10 of each.  Each of the 10 may be a different version entirely.  Each may be a different customer entirely, as they share a machine.  For each of them, life is just like they had their own dedicated server.  Cool.  No wonder VMWare is so successful.  That’s a handy thing to do.

Multitenancy is a little different.  Instead of 10 copies of the OS, 10 copies of the DB, and 10 copies of the app, it has 1 OS, 1 DB, and 1 app on the server.  But, through judicious modifications to the app, it allows those 10 customers to all peacefully coexist within the app just as though they had it entirely to themselves.

Can you see the pros and cons of each?  Let’s start with cost.  Every SaaS vendor that has multitenancy crows about this, because its true.  Don’t believe me?  Plug in your VM software, go install Oracle 10 times across 10 different virtual machines.  Now add up how much disk space that uses, how much RAM it uses when all 10 are running, and so on.  This is before you’ve put a single byte of information into Oracle or even started up an app.  Compare that to having installed 1 copy of Oracle on a machine, but not putting any data into it.  Dang!  That VM has used up a heck of a lot of resources before I even get started!

If you don’t think that the overhead of 10 copies of the stack has an impact on TCO, you either have in mind a very interesting application + customer combination (some do exist, and I have written about them), or you just don’t understand.  10x the hardware to handle the “before you put in data” requirements are not cheap.  Whatever overhead is involved in making that more cumbersome to automate is not cheap.  Heck, 10x more Oracle licenses is very not cheap.  I know SaaS companies who complain their single biggest ops cost is their Oracle licenses. 

However, if all works well, that’s a fixed cost to have all those copies, and you can start adding data by customers to each virtual Oracle, and things will be okay from that point on.  But, take my word for it, there is no free lunch.  The VM world will be slower and less nimble to share resources between the different Virtual Machines than a Multitenant App can be.  The reason is that by the time it knows it even needs to share, it is too late.  Shifting things around to take resource from one VM and give it to another takes time.  By contrast, the Multitenant App knows what is going on inside the App because it is the App.  It can even anticipate needs (e.g. that customer is in UK and they’re going to wake up x hours before my customers in the US, so I will put them on the same machine because they mostly use the machine at different times).

So, no, there is not some magic technology that will make multitenant obsolete.  There may be some new marketing label on some technology that makes multitenancy automatic and implicit, but if it does what I describe, it is multitenant.  It will age gracefully for a long time to come despite the indignities that petty competition and marketing labels will bring to bear on it.

What’s the Relationship of Clouds and Multitenancy?

Must Real Clouds be Multitenant?

Sorry, but Real Clouds are not Multitenant because they’re based on Virtualization not Multitenancy in any sense such as I just defined.  In fact, EC2 doesn’t share a core with multiple virtual machines because it can’t.  If one of the VM’s started sucking up all the cycles, the other would suffer terrible performance and the hypervisors don’t really have a way to deal with that.  Imagine having to shut down one of the virtual machines and move it onto other hardware to load balance.  That’s not a simple or fast operation.  Multi-tasking operating systems expect a context switch to be as fast as possible, and that’s what we’re talking about.  That’s part of what I mean by the VM solution being less nimble.  So instead, cores get allocated to a particular VM.  That doesn’t mean a server can’t have multiple tenants, just that at the granularity of a core, things have to be kept clean and not dynamically moved around. 

Note to rocket scientists and entrepreneurs out there–if you could create a new hardware architecture that was really fast at the Virtual Machine load balancing, you would have a winner.  So far, there is no good hardware architecture to facilitate a tenant swap inside a core at a seamless enough granularity to allow the sharing.  In the Multicore Era, this would be the Killer Architecture for Cloud Computing.  If you get all the right patents, you’ll be rich and Intel will be sad.  OTOH, if Intel and VMWare got their heads together and figured it out, it would be like ole Jack Burton said, “You can go off and rule the universe from beyond the grave.”

But, it isn’t quite so black and white.  While EC2 is not multitenant at the core level, it sort of is at the server level as we discussed.  And, services like S3 are multitenant through and through.  Should we cut them some slack?  In a word, “No.”  Even though an awful lot of the overall stack cost (network, cpu, and storage) is pretty well multitenant, I still wind up installing those 10 copies of Oracle and I still have the same economic disadvantage as the VM scenario.  Multitenancy is an Application characteristic, or at the very least, a deep platform characteristic.  If I build my app on Force.com, it is automatically multitenant.  If I build it on Amazon Web Services, it is not automatic.

But isn’t there Any Multitenant-like Advantage to the Cloud?  And how do Public and Private Compare?

Yes, there are tons of benefits to the Cloud, and through an understanding and definition of them, we will tease out the relationship of Public and Private Clouds.  Let me explain…

There are two primary advantages to the Cloud:  it is a Software Service and it is Elastic.  If you don’t have those advantages, you don’t have a Cloud.  Let’s drill down.

The Cloud is a Software Service, first and foremost.  I can spin up and control a server entirely through a set of API’s.  I never have to go into a Data Center cage.  I never have to ask someone at the Data Center to go into the Cage (though that would be a Service, just not a Software Service, an important distinction).  This is powerful for basically the same reasons that SaaS is powerful versus doing it yourself with On-prem software.  Think Cloud = SaaS and Data Center = On Prem and extrapolate and you’ll have it. 

Since Cloud is a standardized service, we expect all the same benefits as SaaS:

- They know their service better than I do since it is their whole business.  So I should expect they will run it better and more efficiently.

- Upgrades to that service are transparent and painless (try that on your own data center, buddy!).

- When one customer has a problem, the Service knows and often fixes it before the others even know it exists.  Yes Josh, there is value in SaaS running everyone on the same release.  I surveyed Tech Support managers one time and asked them one simple question:  How many open problems in your trouble ticketing system are fixed in the current release?  The answers were astounding–40 to 80%.  Imagine a world where your customers see 40 to 80% fewer problems.  It’s a good thing!

- That service has economic buying power that you don’t have because it is aggregated across many customers.  They can get better deals on their hardware and order so much of it that the world will build it precisely to their specs.  They can get stuff you can’t, and they can invest in R&D you can’t.  Again, because it is aggregated across many customers.  A Startup running in the Amazon Cloud can have multipe redundant data centers on multiple continents.  Most SaaS companies don’t get to building multiple data centers until they are way past having gone public. 

-  Because it is a Software Service, you can invest your Ops time in automation, rather than in crawling around Data Center cages.  You don’t need to hire anyone who knows how to hot swap a disk or take a backup.  You need peeps who know how to write automation scripts.  Those scripts are a leveragable asset that will permanently lower your costs in a dramatic way.  You have reallocated your costs from basic Data Center grubbing around (where does this patch cable go, Bruce?), an expense, to actually building an asset.

The list goes on.

The second benefit is Elasticity.  It’s another form of aggregation benefit.  They have spare capacity because everyone doesn’t use all the hardware all the time.  Whatever % isn’t utilized, it is a large amount of hardware, because it is aggregated.  It’s more than you can afford to have sitting around idle in your own data center.  Because of that, they don’t have to sell it to you in perpetuity.  You can rent it as you need it, just like eBay does for bursting.  There are tons of new operational strategies that are suddenly available to you by taking advantage of Elasticity.

Let me give you just one.  For SaaS companies, it is really easy to do Beta Tests.  You don’t have to buy 2x the hardware in perpetuity.  You just need to rent it for the duration of the Beta Test and every single customer can access their instance with their data to their heart’s content.  Trust me, they will like that.

What about Public Versus Private Clouds?

Hang on, we’re almost there, and it seems like it has been a worthwhile journey.

Start with, “What’s a Private Cloud?”  Let’s take all the technology of a Public Cloud (heck, the Nimbulla guys built EC2, so they know how to do this), and create a Private Cloud.  The Private Cloud is one restricted to a single customer.  It’d be kind of like taking a copy of Salesforce.com’s software, and installing it at Citibank for their private use.  Multitenant with only one tenant.  Do you hear the sound of one hand clapping yet?  Yep, it hurts my head too, just thinking about it.  But we must.

Pawing through the various advantages we’ve discussed for the Cloud, there are still some that accrue to a Cloud of One Customer:

-  It is still a Software Service that we can control via API’s, so we can invest in Ops Automation.  In a sense, you can spin up a new Virtual Data Center (I like that word better than Private Cloud, because it’s closer to the truth) on 10 minutes notice.  No waiting for servers to be shipped.  No uncrating and testing.  No shoving into racks and connecting cables.  Push a button, get a Data Center.

-  You get the buying power advantages of the Cloud Vendor if they supply your Private Cloud, though not if you buy software and build  your Private Cloud.  Hmmm, wonder what terminology is needed to make that distinction?  Forrester says it’s either a Private Cloud (company owns their own Cloud) or a Hosted Virtual Private Cloud.  Cumbersome.

But, and this is a huge one, the granularity is huge, and there is way less Elasticity.  Sure, you can spin up a Data Center, but depending on its size, it’s a much bigger on/off switch.  You likely will have to commit to buy more capacity for a longer time at a bigger price in order for the Cloud Provider to recoup giving you so much more control.  They have to clear other customers away from a larger security zone before you can occupy it, instead of letting your VM’s commingle with other VM’s on the same box.  You may lose the more multitenant-like advantages of the storage cluster and the network infrastructure (remember, only EC2 was stuck being pure virtual). 

What Does it All Mean, and What Should My Company Do?

Did you see Forrester’s conclusion that most companies are not yet ready to embrace the Cloud and won’t be for a long time?

I love the way Big Organizations think about things (not!).  Since their goal is preservation of wealth and status, it’s all about risk mitigation whether that is risk to the org or to the individual career.  A common strategy is to take some revolutionary thing (like SaaS, Multitenancy, or the Cloud), and break it down into costs and benefits.  Further, there needs to be a phased modular approach that over time, captures all the benefits with as little cost as possible.  And each phase has to have a defined completion so we can stop, evaluate whether we succeeded, celebrate the success, punish those who didn’t play the politics well enough, check in with stakeholders, and sing that Big Company Round of Kumbaya.  Yay!

In this case, we have a 5 year plan for CIO’s.  Do you remember anything else, maybe from the Cold War, that used to work on 5 year plans?  Never mind.

It asserts that before you are ready for the Cloud, you have to cross some of those modular hurdles:

A company will need a standardized operating procedure, fully-automated deployment and management (to avoid human error) and self-service access for developers. It will also need each of its business divisions – finance, HR, engineering, etc – to be sharing the same infrastructure.  In fact, there are four evolutionary stages that it takes to get there, starting with an acclimation stage where users are getting used to and comfortable with online apps, working to convince leaders of the various business divisions to be guinea pigs. Beyond that, there’s the rollout itself and then the optimization to fine-tune it.

Holy CYA, Batman!  Do you think eBay spent 5 years figuring out whether it could benefit from bursting to the Cloud before it just did it?

There’s a part of me that says if your IT org is so behind the times it needs 5 years just to understand it all, then you should quit doing anything on-premise and get it all into the hands of SaaS vendors.  They’re already so far beyond you that they must have a huge advantage.  There is a another part that says, “Gee guys, you don’t have to be able to build an automobile factory as good as Toyota to be able to drive a car.”

But then sanity and Political Correctness prevail, I come back down to Earth, and I realize we are ready to summarize.  There are 4 levels of Cloud Maturity (Hey, I know the Big Co IT Guys are feeling more comfortable already, they can deal with a Capability and Maturity Model, right?):

Level 1:  Dabbling.  You are using some Virtualization or Cloud technology a little bit at your org in order to learn.  You now know what a Machine Image is, and you have at least seen a server that can run them and swapped a few in and out so that you experience the pleasures of doing amazing things without crawling around the Data Center Cage.

Level 2:  Private Cloud.  You were impressed enough by Level 1 that you want the benefits of Cloud Technology for as much of your operation as you can as fast as you can get it.  But, you are not yet ready to relinquish much of any control.  For Early Level 2, you may very well insist on a Private Cloud you own entirely.  Later stage Level 2 and you will seek a Hosted Virtual Private Cloud.

Level 3:  Public Cloud.  This has been cool, but you are ready to embrace Elasticity.  You tripped into it with a little bit of Bursting like eBay, but you are gradually realizing that the latency between your Data Center and the Cloud is really painful.  To fix that, you went to a Hosted Virtual Private Cloud.  Now that your data is in that Cloud and Bursting works well, you are realizing that the data is already stepping outside your Private Cloud pretty often anyway.  And you’ve had to come to terms with it.  So why not go the rest of the way and pick up some Elasticity?

Level 4:  SaaS Multitenant.  Eventually, you conclude that you’re still micromanaging your software too much and it isn’t adding any value unique to your organization.  Plus, most of the software you can buy and run in your Public Cloud world is pretty darned antiquated anyway.  It hasn’t been rearchitected since the late 80’s and early 90’s.  Not really.  What would an app look like if it was built from the ground up to live in the Cloud, to connect Customers the way the Internet has been going, to be Social, to do all the rest?  Welcome to SaaS Multitenant.  Now you can finally get completely out of Software Operations and start delivering value.

BTW, you don’t have to take the levels one at a time.  It will cost you a lot more and be a lot more painful if you do.  That’s my problem with the Forrester analysis.  Pick the level that is as far along as you can possibly stomach, add one to that, and go.  Ironically, not only is it cheaper to go directly to the end game, but each level is cheaper for you on a wide scale usage basis all by itself.  In other words, it’s cheaper for you to do Public Cloud than Private Cloud.  And it’s WAY cheaper to go Public Cloud than to try Private Cloud for a time and then go Public Cloud.  Switching to a SaaS Multitenant app is cheaper still.

Welcome to crazy world of learning how to work and play well together when efficiently sharing your computing resources with friends and strangers!

Posted in amazon, cloud, data center, ec2, enterprise software, grid, multicore, platforms, saas, service | 14 Comments »

Is the Open Stack Cloud Announcement a Big Deal?

Posted by Bob Warfield on July 19, 2010

Open stack MIGHT be a big deal, it awaits adoption to see.

In discussions with the Enterprise Irregulars, the question came up of whether it was a good analog to view Open Stack as the “Android” to Amazon’s “iPhone” (where is their antennagate?).  This is an interesting metaphor as much for what it tells us about where it doesn’t fit as where it does.  It is a good analog to Android in the sense it gives a lot of helpless hosters a shot at the Cloud much as Android gives helpless handset makers a shot.  Similar to the Android, a lot still depends on how well the handset guys do their part, how many great apps wind up on the platform, and on how well the market likes the combined offerings.  Substitute “hosters” for handset guys, and let the other two stand when talking about Open Stack.

I can’t emphasize the point that there is more here than just the Cloud Software.  This goes to the essence of Software as a Service.  It isn’t just software, it’s the whole Service.  In a related discussion, someone came out with the line that SAP ByD was “real but maybe not cost effective.”  ByD has been plagued by delay and the company has said the delays where because the architecture could not be delivered for a low enough cost.  Clearly SAP can write an ERP system.  But just as clearly, a SaaS system that works, but is not cost effective is not a SaaS system at all because the service can’t be delivered.  It’s the sound of one hand clapping.  Yet, a lot of otherwise reasonable folks just can’t fathom that distinction. 

The inability to fathom the difference with the hosters may fall to the other side.  SAP gets software, but apparently not delivering it as a service.  In some sense, there may be a lot of data center providers who understand how to deliver a service, but not a Cloud.  This is why I am harping on all that it takes beyond the software to get it right.  Amazon has clearly gotten it right, and while many argue it is very early in the Cloud market, yet their momentum gets harder and harder to catch as each day passes by.  I look to Salesforce.com as an example for where we are in the market.  Not long ago they passed $1B in revenues.  That’s a big accomplishment for a SaaS company, yet not really big at all for a Software company.  Does anyone really believe they can catch Salesforce at CRM?  Even a really big company?  The EI’s, who argue it is early yet for SaaS ERP, and hence okay for SAP to be so late, hold up Salesforce as an example of too far ahead for SAP or Oracle to catch.  For those that argue it is very early in the Cloud, consider:

-  Amazon will be at $1B (apparently critical mass for SaaS CRM) in the not too distant future.  Can a competitor get it together and grow fast enough to keep that gap from going over $1B with Amazon?  If $1B isn’t the danger zone, what is?  How long did vendors like DEC let the IBM PC gain momentum before it was too late to catch them?  How long did the PC have multiple operating systems before Microsoft’s advantage was too great?  None of these are exact analogies, but there is a critical mass.  If it is reached without meaningful competition, Amazon has won until the next paradigm shift and their opportunity to succumb to Innovator’s Dilemma.

-  Unlike the Android metaphor, Rackspace ain’t no Google.  It lacks the resources on both the marketing and the development side to build the buzz and build the innovation that Android has.  Strong Brownie points for invoking NASA, BTW, for your buzz side.  Lots of us geeks still love the space program.  Not clear we love NASA for what has happened to it though.  Or, as I am fond of saying, “We can no longer fly supersonically as civilians, put men on the moon, or get through an airport without removing our shoes.  Progress?  What progress?!??”

-  Also unlike Android, it really isn’t clear why Open Stack is great.  Open ain’t enough, particularly with a group arguing that Amazons APIs ought to be a standard and Amazon continuously innovating and cutting prices while many can’t seem to even get in the game.  If the Amazon API is available from more than one vendor, it starts to be pretty open.  Rackspace wants to spark up the “avoid lock-in by choosing us debate“, but pre-Open Stack, Rackspace was the one locking you in more than Amazon.  I guess this is a great example of Jean Louis Gasee’s admonition that if you can’t fix it, feature it (great article on antennagate, BTW).

Ironically, after I published the blog post on Amazon API’s becoming a standard, and hearing a great hue and cry about all the things it couldn’t do, Amazon launched a whole raft of new features.  Cluster Compute Instances, in particular, offer the ability to couple servers in a low latency subnet for cluster computing.  It’s pitched as being all about making Cray-Supercomputers-On-Demand available to all comers (some cool ideas about what I’d do with that!), but ironically, the low latency is exactly what a lot of the detractors of Amazon-API-as-Standard said couldn’t be done.  I know Amazon didn’t build it in response to my blog’s comments (LOL!), but I chuckle at how it came out a couple days later and is focused on exactly the problem being complained about.

BTW, sorry for the OT, but read this guy James Hamilton’s blog for lots of good scoop on scaling and data center architecture.

Getting back to the Rackspace Open Stack announcement, there is a lot of nervousness in the Not-Amazon-But-Wanna-Be-Cloud-Kings herd.  It’s understandable.  Not much to point to for scale in the Cloud world but Amazon.  Concerns that it may be running away with the show.  Concerns that the early decision to wait for someone else to figure out the Cloud because we can always jump in if it looks real enough may have been a bad decision (that Innovator’s Dilemma is a B*atch to face!).  There is an interesting post on GigaOm about VMWare looking ahead to the day when server virtualization might not matter because of Cloud Computing.  That’s an ideal line of thought for VMWare CEO Paul Maritz.  After all, his alma mater Microsoft was always worried back in the day about who might “Microsoft” them.  Of course it happened anyway, multiple times and I personally think they did it to themselves by thinking of the problem as holding on to existing markets at all cost.  Innovator’s Dilemma strikes again.

Nevertheless, the herd is restless.  You can smell the fear.  Open Stack is a good response, at least it is something, and from a company that actually is in the Cloud.  As I said in the beginning, it will be a function of how well the Kieretsu cooperate.  Cloud Computing, like SaaS, is holistic.  Barney partnerships can’t make it go.  If the partners aren’t pretty darned good at working together, this announcement could simply turn out to be one of those frequent bits of “the enemy of my enemy…” desperation marriages of convenience we see so often. 

It’s all about how much wood they can get behind the new arrow.  If I had to make a prediction, it isn’t a Big Deal for the Cloud.  It will largely give the “Private Cloud” (which isn’t the Cloud, but that’s another post) new ammunition without affecting Amazon much at all.

Bears close watching.  Good times for customers in Cloud Computing–we love competition!

Posted in amazon, cloud, data center | 5 Comments »

WordPress and the Dark Side of Multitenancy

Posted by Bob Warfield on June 11, 2010

Quite a bit of hubbub over WordPress’s recent outage.  A number of high profile blogs including Techcrunch, GigaOm, CNN, and your very own SmoothSpan use WordPress.  Matt Mullenweg told Read/WriteWeb:

“The cause of the outage was a very unfortunate code change that overwrote some key options in the options table for a number of blogs. We brought the site down to prevent damage and have been bringing blogs back after we’ve verified that they’re 100% okay.”

Apparently, WordPress has three data centers, 1300 servers, and is home to on the order of 10 million blogs.   Techcrunch is back and talking about it, but as I write this, GigaOm is still out.  Given the nature of the outage, WordPress presumably has to hand tweak that option information back in for all the blogs that got zapped.  If it is restoring from backup, that can be painful too.

While one can lay blame at the doorstep of whatever programmer made the mistake, the reality is that programmers make mistakes.  It is unavailable.  The important question is what has been done from an Operations and Architecture standpoint that either mitigates or compounds the likelihood such mistakes cause a problem.  In this case, I blame multitenancy.  When you can make a single code change that zaps all you customers very quickly like this, you had to have help from your architecture to pull it off.

Don’t get me wrong, I’m all for multitenancy.  In fact, it’s essential for many SaaS operations.  But, companies need to have a plan to manage the risks inherent in multitenancy.  The primary risk is the rapidity with which rolling out a change can affect your customer base.   When operations are set up so that every tenant is in the same “hotel”, this problem is compounded, because it means everyone gets hit.

What to do?

First, your architecture needs to support multiple hotels, and it needs to include tools that make it easy for your operations personnel to manage which tenants are in which hotels, which codelines run on which hotels (more on that one in a minute), and to rapidly rehost tenants to a different hotel, if desired.  These capabilities pave the way for a tremendous increase in operational flexibility that makes it far easier to do all sorts of things and possible to do some things that are completely impossible with a single hotel. 

Second, I highly encourage the use of a Cloud data center, such as Amazon Web Services.  Here again, the reason is operational flexibility.  Spinning up more servers rapidly for any number of reasons is easy to do, and you take the cost of temporarily having a lot more servers (for example, to give your customers a beta test of a new release) off the table because it is so cheap to temporarily have a lot of extra servers.

Last step: use a feathered release cycle.  When you roll out a code change, no matter how well-tested it is, don’t deploy to all the hotels.  A feathered release cycle delivers the code change to one hotel at a time, and waits an appropriate length of time to see that nothing catastrophic has occurred.  It’s amazing what a difference a day makes in understanding the potential pitfalls of a new release.  Given the operational flexibility of being able to manage multiple hotels, you can adopt all sorts of release feathering strategies.  Start with smaller customers, start with brand new customers, start with your freemium customers, and start out by beta testing customers are all possibilities that can result in considerable risk mitigation for the majority of your customer base.

If you’re a customer looking at SaaS solutions, ask about their capacity for multiple hotels and release feathering.  It just may save you considerable pain.

Posted in business, cloud, customer service, data center, saas | 10 Comments »

Minimizing the Cost of SaaS Operations

Posted by Bob Warfield on March 29, 2010

SaaS software is much more dependent on being run by the numbers than conventional on-premises software because the expenses are front loaded and the costs are back loaded.  SAP learned this the hard way with its Business By Design product, for example.  If you run the numbers, there is a high degree of correlation between low-cost of delivering the service and high growth rates among public SaaS companies.  It isn’t hard to understand–every dollar spent delivering the service is a dollar that can’t be spent to find new customers or improve the service.

So how do you lower your cost to deliver a SaaS service? 

At my last gig, Helpstream, we got our cost down to 5 cents per seat per year.  I’ve talked to a lot of SaaS folks and nobody I’ve yet met got even close.  In fact, they largely don’t believe me when I tell them what the figures were.  The camp that is willing to believe immediately wants to know how we did it.  That’s the subject of this “Learnings” blog post.  The formula is relatively complex, so I’ll break it down section by section, and I’ll apologize up front for the long post.

Attitude Matters:  Be Obsessed with Lowering Cost of Service

You get what you expect and inspect.  Never a truer thing said than in this case.  It was a deep-seated part of the Helpstream culture and strategy that Cost of Service had to be incredibly low.  So low that we could exist on an advertising model if we had to.  While we never did, a lot was invested in the critical up front time when it mattered to get the job done.  Does your organization have the religion about cutting service cost, or are there 5 or 6 other things that you consider more important?

Go Multi-tenant, and Probably go Coarse-grained Multi-tenant

Are you betting you can do SaaS well enough with a bunch of virtual machines, or did you build a multi-tenant architecture?  I’m skeptical about your chances if you are in the former camp unless your customers are very very big.  Even so, the peculiar requirements of very big customers (they will insist on doing things their way and you will cave) will drive your costs up.

Multi-tenancy lets you amortize a lot of costs so that they’re paid once and benefit a lot of customers.  It helps smooth loads so that as one customer has a peak load others probably don’t.  It clears the way to massive operations automation which is much harder in a virtual machine scenario.

Multi-tenancy comes in a lot of flavors.  For this discussion, let’s consider fine-grained versus coarse-grained.  Fine grain is the Salesforce model.  You put all the customers together in each table and use a field to extract them out again.  Lots of folks love that model, even to a religious degree that decrees only this model is true multi-tenancy.  I don’t agree.  Fine grained is less efficient.  Whoa!  Sacrilege!  But true, because you’re constantly doing the work of separating one tenant’s records from another.  Even if developers are protected from worrying about it by clever layering of code, it can’t help but require more machine resources to constantly sift records.

Coarse-grained means every customer gets their own database, but these many databases are all on the same instance of the database server.  This is the model we used at Helpstream.  It turns out that a relatively vanilla MySQL architecture can support thousands of tenants per server.  That’s plenty!  Moreover, it requires less machine resources and it scales better.  A thread associated with a tenant gets access to the one database right up front and can quit worrying about the other customers right then.  A server knows that the demands on a table only come from one customer and it can allocate cpus table by table.  Good stuff, relatively easy to build, and very efficient.

The one down side of coarse grain I have discovered is that its hard to analyze all the data across customers because it’s all in separate tables.  Perhaps the answer is a data warehouse constructed especially for the purpose of such analysis that’s fed from the individual tenant schemas.

Go Cloud and Get Out of the Datacenter Business

Helpstream ran in the Amazon Cloud using EC2, EBS, and S3.  We had help from OpSource because you can’t run mail servers in the Amazon Cloud–the IP’s are already largely black listed due to spammers using Amazon.  Hey, spammers want a low-cost of ops too!

Being able to spin up new servers and storage incrementally, nearly instantly (usually way less than 10 minutes for us to create a  new multi-tenant “pod”), and completely from a set of API’s radically cuts costs.  Knowing Amazon is dealing with a lot of the basics like the network infrastructure and replicating storage to multiple physical locations saves costs.  Not having to crawl around cages, unpack servers, or replace things that go bad is priceless. 

Don’t mess around.  Unless your application requires some very special hardware configuration that is unavailable from any Cloud, get out of the data center business.  This is especially true for small startups who can’t afford things like redundant data centers in multiple locations.  Unfortunately, it is a hard to impossible transition for large SaaS vendors that are already thoroughly embedded in their Ops infrastructure.  Larry Dignan wrote a great post capturing how Helpstream managed the transition to Amazon.

Build a Metadata-Driven Architecture

I failed to include this one in my first go-round because I took it for granted people build Metadata-driven architectures when they build Multi-tenancy.  But that’s only partially true, and a metadata-driven architecture is a very important thing to do.

Metadata literally means data about data.  For much of the Enterprise Software world, data is controlled by code, not data.  Want some custom fields?  Somebody has to go write some custom code to create and access the fields.  Want to change the look and feel of a page?  Go modify the HTML or AJAX directly.

Having all that custom code is anathema, because it can break, it has to be maintained, its brittle and inflexible, and it is expensive to create.  At Helpstream, we were metadata happy, and proud of it.  You could get on the web site and provision a new workspace in less than a minute–it was completely automated.  Upgrades for all customers were automated.  A tremendous amount of customization was available through configuration of our business rules platform.  Metadata gives your operations automation a potent place to tie in as well.

Open Source Only:  No License Fees!

I know of SaaS businesses that say over half their operating costs are Oracle licenses.  That stuff is expensive.  Not for us.  Helpstream had not a single license fee to pay anywhere.  Java, MySQL, Lucene, and a host of other components were there to do the job.

This mentality extends to using commodity hardware and Linux versus some fancy box and an OS that costs money too.  See for example Salesforce’s switch.

Automate Operations to Death

Whatever your Operations personnel do, let’s hope it is largely automating and not firefighting.  Target key areas of operational flexibility up front.  For us, this was system monitoring, upgrades, new workspace provisioning, and the flexibility to migrate workspaces (our name for a single tenant) to different pods (multi-tenant instances). 

Every time there is a fire to be fought, you have to ask several questions and potentially do more automation:

1.  Did the customer discover the problem and bring it to our attention?  If so, you need more monitoring.  You should always know before your customer does.

2.  Did you know immediately what the problem was, or did you have to do a lot of digging to diagnose?  If you had to do digging, you need to pump up your logging and diagnostics.  BTW, the most common Ops issue is, “Your service is too slow.”  This is painful to diagnose.  It is often an issue with the customer’s own network infrastructure for example.  Make sure to hit this one hard.  You need to know how many milliseconds were needed for each leg of the journey.  We didn’t finish this one, but were actively thinking of implementing capabilities like Google uses to tell with code at the client when a page seems slow.  Our pages all carried a comment that told how long it took at the server side.  By comparing that with a client side measure of time, we would’ve been able to tell whether it was “us” or “them” more easily.

3.  Did you have to perform a manual operation or write code to fix the problem?  If so, you need to automate whatever it was.

This all argues for the skillset needed by your Ops people, BTW.  It also argues to have Ops be a part of Engineering, because you can see how much impact there is on the product’s architecture.

Hit the Highlights of Efficient Architecture

Without going down the rathole of premature optimization, there is a lot of basic stuff that every architecture should have.  Thread pooling.  Good clean multi-threading that isn’t going to deadlock.  Idempotent operations and good use of transactions with rollback in the face of errors.  Idempotency means if the operation fails you can just do it again and everything will be okay.  Smart use of caching, but not too much caching.  How does your client respond to dropped connections?  How many round trips does the client require to do a high traffic page?

We used Java instead of one of the newer slower languages.  Sorry, didn’t mean to be pejorative, and I know this is a religious minefield, but we got value from Java’s innate performance.  PHP or Python are pretty cool, but I’m not sure they are what you want to squeeze every last drop of operations cost out of your system.  The LAMP stack is cheap up front, but SaaS is forever.

Carefully Match Architecture with SLA’s

The Enterprise Software and IT World is obsessed with things like failover.  Can I reach over and unplug this server and automatically failover to another server without the users ever noticing?  That’s the ideal.  But it may be a premature optimization for your particular application.  Donald Knuth says, “97% of the time: premature optimization is the root of all evil.” 

Ask yourself how much is enough?  We settled on 10 minutes with no data loss.  If our system crashed hard and had to be completely restarted, it was good enough if we could do that in less than 10 minutes and no loss of data during that time.  That meant no failover was required, which greatly simplified our architecture. 

To implement this, we ran a second MySQL replicated from the main instance and captured EBS backup snapshots from that second server.  This took the load of snapshotting off the main server and gave us a cheaper alternative to a true full failover.  If the main server died, it could be brought back up again in less than 10 minutes with the EBS volume mounted and away we would go.  The Amazon infrastructures makes this type of architecture easy to build and very successful.  Note that with coarse-grained multi-tenancy, one could even share the backup server across multiple multi-tenant instances.

Don’t Overlook the Tuning!

Tuning is probably the first thing you thought of with respect to cutting costs, right?  Developers love tuning.  It’s so satisfying to make a program run faster or scale better.  That’s probably because it is an abstract measure that doesn’t involve a customer growling about something that’s making them unhappy.

Tuning is important, but it is the last thing we did.  It was almost all MySQL tuning too.  Databases are somewhat the root of all evil in this kind of software, followed closely by networks and the Internet.  We owe a great debt of gratitude to the experts at Percona.  It doesn’t matter how smart you are, if the other guys already know the answer through experience, they win.  Percona has a LOT of experience here, folks.

Conclusion

Long-winded, I know.  Sorry about that, but you have to fit a lot of pieces together to really keep the costs down.  The good news is that a lot of these pieces (metadata-driven architecture, cloud computing, and so on) deliver benefits in all sorts of other ways besides lowering the cost to deliver the service.  Probably the thing I am most proud of about Helpstream was just how much software we delivered with very few developers.  We never had more than 5 while I was there.  Part of the reason for that is our architecture really was a “whole greater than the sum of its parts” sort of thing.  Of course a large part was also that these developers were absolute rock stars too!

Posted in cloud, data center, ec2, enterprise software, platforms, saas, software development | 5 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 324 other followers

%d bloggers like this: