SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the 'grid' Category


A Pile of Lamps Needs a Brain

Posted by smoothspan on October 28, 2007

Continuing the discussion of a Pile of Lamps (a clustered Lamp stack in more prosaic terms), Aloof Schipperke writes about how such a thing might manage its consumption of machines on a utility computing fabric:

Techniques for managing large sets of machines tend to either highly centralized or highly decentralized. Centralized solutions tend to come from system administration circles as ways to cope with large quantities of machines. Decentralized solutions tend to come from the parallel computing space where algorithms are designed to take advantage of large quantities of machines.

Neither approach tends to provide much coupling between management actions and application conditions. Neither approach seems well adapted for any form of semi-intelligent dynamic configuration of multi-layer web application. Neither of them seem well suited for non-trivial quantities of loosely coupled LAMP stacks.

Aloof has been contemplating whether a better approach might be to have the machines converse amongst themselves in some way.  He envisions machines getting together when loads become too challenging and deciding to spawn another machine to take some of the load on.

Let’s drop back and consider this more generally.  First, we have a unique capability emerging in hosted utility grids.  These range from systems like Amazon’s Web Services to 3Tera’s ability to create grids at their hosting partners.  It started with the grid computing movement which sought to use “spare” computers on demand, and has now become a full blown commercially available service.  Applications can order and provision a new server literally on 10 minutes notice, use it for a period of time, and then release the machine back to the pool only paying for the time they’ve used.  This differs markedly from stories such as iLike’s, who had to drive around in a truck borrowing servers everywhere they could, and then physically connect them up.  Imagine how much easier it could have been to push a button and bring on the extra servers on 10 minutes notice as they were needed.

Second, we have the problem of how to manage such a system.  This is Aloof’s problem.  Just because we can provision a new machine on 10 minutes notice doesn’t mean a lot of other things:

  • It doesn’t mean our application is architected to take advantage of another machine. 
  • It doesn’t mean we can reconfigure our application to take advantage in 10 minutes.
  • It doesn’t mean we have a system in place that knows when it’s time to add a machine, or take one back off.

This requires another generation of thinking beyond what’s typically been implemented.  New variable cost infrastructure has to trickle down into fixed cost architectures.  For me, this sort of problem always boils down to finding the right granularity of “object” to think about.  Is the machine the object?  Whether or not it is, our software layers must take account of machines as objects because that’s how we pay for them.

So to attack this problem, we need to understand a collection of questions:

  1. What is to be our unit of scalability?  A machine?  A process?  A thread?  A component of some kind?  At some level, the unit has to map to a machine so we can properly allocate on a utility grid.
  2. How do we allocate activity to our scalability units?  Examples include load balancing and database partitioning.  Abstractly, we need some hashing function that selects the scalability unit to allocate work (data, compute crunching, web page serving, etc.) to.
  3. What is the mechanism to rebalance?  When a scalability unit reaches saturation by some measure, we must rebalance the system.  We change the hashing function in #2 and we have a mechanism to redistribute without losing anything while the process is happening.  We also must understand how we measure saturation or load for our particular domain.

Let’s cast this back to the world of a Pile of Lamps.  A traditional Lamp stack scaling effort is going to view each component of the stack separately.  The web piece is separate from the data piece, so we have different answers for the 3 issues on each of the 2 tiers.  Pile of Lamps changes how we factor the problem.  If I understand the concept correctly, instead of independently scaling the two tiers, we will simply add more Lamp clusters, each of which is a quasi-independent system.

This means we have to add a #4 to the first 3.  It was implicit anyway:

    4.  How do the scaling units communicate when the resources needed to finish some work are not all present within the scaling unit?

Let’s say we’re using a Pile of Lamps to create a service like Twitter.  As long as the folks I’m following are on the same scaling unit as me, life is good.  But eventually, I will follow someone on another scaling unit.  If the Pile of Lamps is clever, it makes this transparent in some way.  If it can do that, the other three issues are at least things we can go about doing behind the scenes without bothering developers to handle it in their code.  If not, we’ll have to build a layer into our application code that makes it transparent for most of the rest of the code.

I think Aloof’s musings about whether #3 can be done as conversations between the machines will be clearer if the Pile of Lamps idea is mapped out more fulling in terms of all 4 questions.

Posted in Web 2.0, grid, multicore, platforms, strategy | 1 Comment »

Interview With 3Tera’s Peter Nickolov and Bert Armijo, Part 3

Posted by smoothspan on October 26, 2007

Overview

3Tera is one of the new breed of utility computing services such as Amazon Web Services. If you missed Part 1 or Part 2 of the interview, they’re worth a read!

As always in these interviews, my remarks are parenthetical, any good ideas are those of the 3Tera folks, and any foolishness is my responsibility alone.

Utility Computing as a Business

You’ve sold 100 customers in just a year, what’s your Sales and Marketing secret?

3Tera:  We’re still in the early growth phase, our true hockey stick is yet to come, and we expect growth to accelerate.  Right now we’re focused on getting profitable.

We don’t have a secret, really.  We have a very good story to tell.  We’re attending lots of conferences, we’re buying AdWords, we’re getting the word out through bloggers like yourself, and we’re getting a lot of referrals from happy customers.

The truth is, the utility computing story is big.  People hear about Amazon and they start looking at it, and pretty soon they find us.  It’s going to get a lot bigger.  If you read their blogs, Jonathan Schwartz at Sun and Steve Ballmer at Microsoft are out talking to hosters.  Hosting used to be viewed as a lousy business, but the better hosters today are growing at 30-40% a year.  This is big news.

Bob:  (I think their growth in just a year has been remarkable for any company, and speaks highly to the excitement around these kinds of offerings.  Utility computing is the wave of the future, there is a ton of software moving into the clouds, and the economics of managing the infrastructure demand vendors take a look at offerings like 3Tera.  We’re only going to see this trend getting stronger.)

Tell us more about your business model

3Tera:  We offer both hosted (SaaS) and on-premises versions.  As we said, 80% choose the hosted option.  The other 20% are large enterprises that want to do things in their own data center.  British Telecom is an example of that.

We sell directly on behalf of our hosting providers, and there are also hosting providers that have reseller licenses.  Either way, the customer sees one bill from whoever sold them the grid.

Bob:  (This is quite an interesting hybrid business model.  Giving customers the option to take things on-premises is interesting, but even more interesting is how few actually take that approach:  just 20%, and those mostly larger enterprises.  It would make sense to me for a vendor looking to offer both models to draw a line that forces on-premises only for the largest deals anyway.  3Tera’s partnering model with the hosting providers is also quite interesting.)

How do you see the hosting and infrastructure business changing over time?

3Tera:  There are huge forces at work for centralization.  Today, if you are running less than 1000 servers, you should be hosting because you just can’t do it cost effectively yourself.  Over time, that number is going up due to a couple of factors.

First, there is starting to be a lot of regulation that affects data centers.  Europe is already there and the US is not far behind.  There are lots of rules surrounding privacy and data retention, for example.  If I take your picture to make a badge so you can visit, I have to ask your permission.  I have to follow regulations that dictate how long I can keep that picture on file before I dispose of it.  All of this is being expressed as certifications for data centers such as SAS-70.  There are other, more stringent standards out there and on the way.  The cost of adhering to these in your own data center is prohibitive.  Why do it if you can use a hosted data center that has already made the investment and gotten it done?

Second, there are simple physics.  More and more datacenters are a function of electricity.  That’s power for the machines and power for the cooling.  I talked to a smaller telco near hear recently that was planning to do an upgrade to their datacenter.  This was not a new datacenter, just an upgrade, and not that big a data center by telco standards.

The upgrade involved needing an additional 10 megawatts of power.  The total budget was something like $100 million.  These are big numbers.  The amount of effort required to get approval for another 10 megawatts alone is staggering.  There are all kinds of regulations, EPA sign offs, and the like required.

Longer-term, once you remove the requirement for humans to touch the servers, it opens up possibilities.  Why do we put data centers in urban areas?  So people can touch their machines.  If people didn’t have to touch them, we’d put the data centers next to power plants.  We’d change the physical topology and cooling requirements to be much more efficient.

We want people to think of servers the way they think about fluorescent tubes in the office.  If a light goes out, you don’t start paging people and rushing around 24×7 to fix it.  You probably don’t fix it at all.  You wait until 6 or 8 are out and then you send someone around to do it all at once, so it’s cost effective.  Meanwhile, there is enough light available from other tubes so you can live without it.  It’s the same with servers once they’re part of a grid.

Conclusion

The changes in the industry mentioned at the end of the interview are quite interesting.  Legislation is not one I had heard about, but it makes total sense.  Power density is something I’d heard about from several sources including the blogosphere, but also more directly.  I met with one SaaS vendor’s Director of IT Operations who said the growth at their datacenter is extremely visible, and he mentioned they think about it in terms of backup power.  When the SaaS vendor first set up at the colo facility, it had 2 x 2 Megawatt backup generators.  The last time my friend was there that number had grown to 24 units generating about 50 megawatts of backup power.  For perspective, an average person in the US uses about 12,000 watts, so 50 megawatts is enough for a city of over 4,000 people.

Another fellow I had coffee from this morning runs all the product development and IT for a large well-known consumer focused company on the web.  He mentioned they now did all of their datacenter planning around power consumption, and had recently changed some architectures to reduce that consumption, even to the point of asking one of their hardware vendors to improve the machinery along those lines.

These kinds of trends are only going to lead to further increases in datacenter centralization and more computing moving into the cloud to increase efficiency, centralize management to make it cheaper, and load balance so fewer watts of energy need be consumed idling.

Posted in Web 2.0, data center, grid, platforms, saas | No Comments »

Interview With 3Tera’s Peter Nickolov and Bert Armijo, Part 2

Posted by smoothspan on October 24, 2007

Overview

3Tera is one of the new breed of utility computing services such as Amazon Web Services.  If you missed Part 1 of the interview, it’s worth a read!

As always in these interviews, my remarks are parenthetical, any good ideas are those of the 3Tera folks, and any foolishness is my responsibility alone.

What can a customer do with your system, and what does it do for them?

3Tera:  AppLogic converts applications, all of the load balancers and databases, firewalls and Apache servers, to metadata. This is done simply by drawing the topology of the application online, as if it were a whiteboard.  Once that’s done, your application becomes completely self-contained. It’s portable and scalable. In fact, it literally doesn’t exist as an app until you run it, at which point AppLogic instantiates it on the grid.

A 2-tier LAMP stack application can be set up in 5 to 10 minutes.  Once you’re done, you can operate your application as easily as you open a spreadsheet. Starting, stopping, backup and even copying are all single commands, even for applications running on hundreds of cpus. You can also scale resources up and down, from say 1 ½ cpus to 40 cpus, which covers a range of say 6 to 3000 users. You can make copies, or even move it to another data center. 

To make it easy to use existing software, we have what we call Virtual Appliances, which are a combination of virtual machine, virtual network and virtual storage that act together. You can install almost any Linux software in a virtual appliance and then manage the appliance as a unit, which is better than having to go machine by machine.
Applications are then created by simply dragging and dropping Virtual Appliances on an online whiteboard, and become a type of template.  We offer a bunch of these already configured for lots of common applications like the LAMP Stack, and there’ s even one for hosting Facebook applications that somebody did.  Probably half our customers bring everything up with these pre-defined Virtual Appliance templates and they never change anything there, they just run.

In a couple of weeks we’ll introduce new functionality we call Dynamic Appliances, a sort of Infrastructure mash-up, that let’s you package data center operations in the same way.  You can implement performance based SLA’s, offsite backups, and virtually any other policy.  Once added to an application, the app will then manage that function for itself, becoming more or less autonomous.

Our larger Enterprise customers have told us how hard it is (impossible really) to implement standard policy across several hundred applications, but we make it easy with Dynamic Appliances because you’re dealing with the needs of just the specific application.

The bottom line is we eliminate the need for human hands to touch the servers.  You can do it all remotely, and we make it possible to automate most things.  You can configure, instrument, and manage even the largest online services with a web browser.

Bob: (I’ve spoken to a number of people about the 3Tera system, and they all confirm how expensive and painful it is to setup and manage servers.  Jesse Robins over on O’Reilly Radar recently wrote a post called “Operations is a competitive advantage” that talks about exactly what 3Tera offers.) 

What kind of technology is behind these capabilities?

3Tera:  There’s quite a lot, as you can imagine.  Let’s take storage as just one example.  We decided up front that we needed to run entirely on commodity hardware because it keep’s our customer’s costs low.  There are no hardware SAN’s or NAS storage as a result—we work entirely off the local disks attached to the servers in the grid.

But, we also felt users needed redundancy and high-performance. The solution - we’ve written a virtual SAN that runs within the grid, controlling all the available storage out on the grid. Our volume manager runs on top of this and makes mirroring totally painless.  If you lose your disk or a server, access to data isn’t interrupted and we’ll mirror that same data again as well. 

People complain that Amazon has no persistent storage, but Amazon didn’t need to have persistence for their application, so you can’t blame them.  If you choose the same architecture as them, it works great, but if not, you have a lot more work to do.  The trouble is, all the apps we see need persistence for their databases, so we gave it to them.  We’re offering infrastructure that matches the common architectures everyone uses.

The other important foundation for the service is our component assembly technology. This is what allows AppLogic to capture infrastructure definitions using the whiteboard metaphor and package the application. More importantly, it’s what allows AppLogic to then convert that into running applications on the grid.

Bob: (I thought the virtual SAN was very cool.  It will be interesting to see how Amazon addresses the persistence problem.  There are several companies working on a solution for mySQL, but I suspect Amazon has their own internal efforts as well.  OTOH, Amazon’s S3 has a lot of fault tolerance that seems to go a bit beyond 3Tera’s out of box capabilities.  The truth is that a finished complex application will require a number of different capabilities in this area ranging from the immediate problem of keeping the database up to the problems of making sure their are good off-site backups and replication.)

Who are your competitors?

There really isn’t much of anyone unless you want to think of Amazon as a competitor.  We don’t, though, because most users are coming from collocation or managed service providers today.

Challenges for Utility Computing

What concerns do your prospects have when you first meet with them?

3Tera:  Their first concern is how long it will take to learn the system.  It turns out its really easy, and most users are up and running in a couple weeks.  Just to make sure customers know we’re going to make them succesful, we put together a program we call the Assured Success Plan.  It’s designed to take a customer from zero to full production in 2 to 4 weeks.  We charge $300/month for it.

Customers who sign up for the Assured Success Plan get a 1:1 relationship with an assigned 3Tera engineer.  They communicate via WebEx and teleconference.  Their first session is an orientation, and their homework is to just install some app on a grid.  The engineer and customer choose which app.

The second session, they go over the homework, and then they start talking about how to fit the customer’s app onto a grid.  By the third session, they’re ready to try to install the app.  The customer is asked to make a first go of it, and then the 3Tera engineer goes over how the customer did it and gives feedback on how to do better.

The fourth session is really interesting.  Here the customer tests failures and learns to deal with it.  It’s really easy with our management tools to simulate a failure anywhere in the grid.  So the customer practices failure management and recovery.  Most customers never get a chance to thoroughly test these failures when building out with physical servers because the time and cost is prohibitive, so we’re adding a lot of value going through the exercise.  The customer, meanwhile, is getting comfortable that they can manage the system when the chips are down.

Bob: (The assurance program sound like an excellent and very practical introduction and training for the system.  This is classic SaaS.  Why do a proof of concept if you can be up and running in the same time?)

Do customers stay paying for ongoing service?  How do you do Tech Support?

We made the Assured Success Plan cheap enough that we think customers will like to keep it running. After the initial consultations, it converts to pre-paid support.  At the same time, we offer support in units of 5 hours for $300.  Most customers buy at least a unit of support as part of their contract, and we make it easy for them to flex that up if they’re having an unusual amount of trouble.

What about costs?  How do customers get comfortable with what you cost?

First, understand, in many cases they can’t do it at all without a service like ours.  Our service is an enabler.  We make it possible to fund a data center pay-as-you-go on a credit card.  Even in Enterprises, we see a lot of customers go hosted, label it as a “proof of concept” to keep IT happy, but once things are running, they never go and move it to their own data center.

Second, our hosting provider partners are brutally efficient at cutting costs and passing those savings along.  They know they have to be competitive and they’re good at it.  We add a surcharge on top of that, but it’s offset by your savings in management overhead.  You don’t have to buy or write your own infrastructure software, and you can manage your grid with far fewer people.  One person part-time can run 100 servers easily.  A full time person could run probably 500 servers.  Do the math on what a person costs and you’re net/net way ahead.

Our most popular configuration is an 8 server.  It costs about $5000/month as a grid.  If you shopped very carefully, you might save 30% on that.  However, you’d have none of the management advantages we offer.  You’d spend that difference very rapidly either developing infrastructure to automate management with scripts, or hiring people to do it by hand.

Bob:  ( I asked one Director of Operations for a SaaS vendor what they thought of a 30% markup for this kind of service.  He laughed and said they would make that back in productivity savings so quickly that 30% would never be an issue to them.)

What’s a nightmare story you helped a customer with?

Imagine a largish bank.  What do you suppose their procedure and policy is to release code?  They need 1000 signatures over 9 months!  Most of the signatures involve internal manual processes.  They’d developed numerous layers of management to ensure stability and now they can’t get it done internally any other way. Because of this, many projects simply couldn’t get started. The cost wasn’t justifiable.  AppLogic eliminates many of those manual processes because the application is packaged as a unit. No one needs to touch the servers, the firewalls, or the load balancers. Plus, test applications can run in a hosted configuration, outside the corporate data center where there’s no interference with production systems. When they’re completed, they can be migrated inside with a single command. Thus, many of the issues with test deployments just don’t apply.

So this customer was able to move their project, which was dead in the water internally, into PoC. 

Next Installment

Next installment we’ll wrap up with a discussion of SaaS Sales and Marketing and Business Model thoughts from 3Tera.  Be sure to click the “subscribe” link at the top left of the blog page so  you don’t miss out on future posts!

Posted in Web 2.0, data center, grid, platforms, saas | 1 Comment »

Interview With 3Tera’s Peter Nickolov and Bert Armijo, Part 1

Posted by smoothspan on October 22, 2007

Overview

3Tera is one of the new breed of utility computing services such as Amazon Web Services.  I recently had dinner with Peter Nickolov, COO and CTO, and Bert Armijo, SVP of Sales and Product Management to hear their story.  I’ve been watching utility computing closely, and believe it is the wave of the future for hosting.  Check out the 3Tera web site and at least get a look at the screen shots of their visual data center design and management tool, and I think you’ll agree it’s pretty exciting.

As always in these interviews, my remarks are parenthetical, any good ideas are those of the 3Tera folks, and any foolishness is my responsibility alone.

Introduction to 3Tera

What’s your basic elevator pitch to a customer?

3Tera:  We offer Google-like infrastructure for everyone else.  Running grids to power utility computing let’s you run standard infrastructure software that you are used to including the LAMP stack, Oracle RAC, and most anything else in a virtual datacenter together with the tools to make it very easy to manage that datacenter.

Bob: (As 3Tera’s site says, “If servers are cheap and open source is free, why does it cost so much?”  The answer is you have to manage those servers.  If this is done manually, it is horrendously expensive, and eventually becomes impossible with enough servers.  The alternative is creating automated infrastructure, which is also extremely difficult.  3Tera brings you a “virtual datacenter” where that infrastructure is built-in for a modest cost on top of your normal hosting fees.)

How is your product sold?  By seat/month?  Other metric?

3Tera:  We sell by the server since our software is installed on the server to create the virtualization.  We sell both hosted and on-premises depending on where you want to put your servers.  80% of our customers go for hosted, but the price is roughly the same either way. 

How many customers have you sold?

3Tera:  In our first year since being live, we’ve sold over 100 customers.  90% of them are in production today, and 10% are doing development and proof of concept work.

What kinds of customers come to you?

3Tera:  Most of our customers are doing SaaS and Web 2.0 projects., and there are tons of them out there. They come in all sizes, from very small startups all the way up to Enteprises, including British Telecom. 

Bob:  (I found the rapid adoption of 3Tera to be nothing short of astounding.  Since the interview, I have mentioned 3Tera to everyone I’ve come into contact with who would be a potential customer and the responses made it clear how 3Tera has grown so quickly.  There is a burning need for any mechanism that simplifies hosting and data center operations and makes it possible to grow in a utility computing pay-as-you-go way.  Other than Amazon’s Web Service, which everyone I know is watching very closely, and OpSource which one person mentioned, most were not aware of such a service that is ready today.)

What stage is 3Tera at?

3Tera:  We were founded in August, 2004.  We started Beta test in March, 2006, and our oldest customer (International News Media) went live during that Beta, so we’ve had live customers for about 18 months now.  We brought the service out of Beta and launched it in September of 2006, and now AppLogic is at release 2.2 with around 100 production applications deployed.

We are headquartered in Southern California, but we also have offices in Bulgaria, Israel, the Bay Area, and Canada.  Bulgaria and Israel are developers, we have Tech Support in Canada, and the Bay Area is primarily a sales office.

We never would have thought of doing multiple locations in the past for a startup, but times have changed, communication is much better, and you go where the talent is. 

We live on Skype and WebEx!

How were you capitalized?

3Tera:  The company is funded about half and half by management and angels.  We had successes in other startups that let us fund it in this way.  We have no professional institutional investors at this stage.

How did you come up with the idea?

3Tera:  It was a need that we saw while discussing how folks scale online services.  After a while it became clear much of the difficulty occurred because these distributed systems lacked a mechanism to unify the lifecycle of the application.

There really isn’t much new under the sun.  Mainframes were doing virtualization a long time ago, and I guess you could say we’re the modern, visual equivalent of JCL, LOL.

What is Utility Grid Computing and What Does 3Tera Offer?

What fundamental problem are you really solving here, hosting?

3Tera:  No, we’re not a hosting company. Rather we allow access to the inventory of hardware that hosting companies operate. 

Hosting providers are brutally efficient at operating hardware.  A small hoster has maybe 2000-3000 machines, and large hosters can have 50,000 or more.  Their volume allows them to drive out every last penny and deliver you a server at the lowest possible cost. The problem has been that while this was great for standard web hosting there was no way to use that resource to power large scalable services. 

Take a small web startup.  Two or three guys can take the LAMP stack and write a cool piece of software.  Then they have to host it on something. They can start out on a single server, but pretty soon they’ll need to add more. Then they’ll move to colo and before they know it they’ll find they don’t have any time left to write software.  They spend nearly all of it chasing around the machines and keeping them running. 

Even a well funded startup will find it ridiculously hard and expensive to manage, say, 100 machines.  You can’t get past 100 servers writing scripts and hiring warm bodies.  Machines are failing constantly due to MBTF (Mean Time Before Failure).  Things go into gridlock. Large internet companies have found the way to get beyond 100 machines; they write their own infrastructure software to manage the process.  You have to automate failover, backups, and all the hundreds of other processes involved in keeping the thing going.

The beauty of 3Tera is we provide all of that in a world-class easy to use web-based system, but you don’t have to be at 100 servers to take advantage.  The three guys in the web startup can take advantage right away and it grows with them. 

Bob: (The bottom line is 3Tera accelerates the evolution of a web company.  We’ve all read countless tales of what happens when these companies are “TechCrunched” and have to scramble.  You’ve probably heard the iLike story about driving around the Bay Area borrowing servers to keep up with demand after launching their Facebook app.  We’ve seen companies get large only to have service levels fall of rapidly.  Few companies successfully manage to create the kind of automation and virtualization capabilities 3Tera offers up front.)

Do you have a real story about how that worked?

3Tera:  Sure.  We had a company go live on our system very quickly.  Then one day, 3 months later, the System Administrator quit.  The CTO called us up and said, “I understand we’re running something called 3Tera, I lost my SysAdmin, can you help me find someone experienced on your system to hire?”  So we suggested he take the Grid University training so he could manage the system while he looked to hire.  He was skeptical, but took took a couple of the online courses.  We called him back 3 weeks later to ask how he was doing, and he told us it had been so easy to manage the servers that he was just doing it himself and saw no need to hire a SysAdmin.

Bob: (The 3Tera guys assure me one person can run 100 servers in their spare time with the system.  When you look at what IT people cost, these people costs are nearly always the largest cost item in datacenter operations.  I asked the Director of Operations for one SaaS vendor what they thought those people costs were relative to hardware.  In his mind, people were perhaps 60% of the overall costs.  A system that can drastically reduce the people costs, as well as providing a better service could really make the difference for a lot of datacenters.)

Next Installment

We got the basics on what 3Tera is.  For the next installment, I want to drill down deeper on the service, understand some of the technology they’ve built, and start to see this from the customer’s eyes.  Be sure to click the “subscribe” link at the top left of the blog page so  you don’t miss out on these future posts!
 

Posted in Web 2.0, data center, grid, platforms, saas, strategy | 5 Comments »

Pile O’ LAMPs: What Would Fielding Say?

Posted by smoothspan on October 21, 2007

I’ve been pondering the Pile O’ Lamps concept that I first read about in Aloof Architecture and Process Perfection.  Read the posts yourself for the horse’s mouth, but to me, the Pile O’ Lamps concept is basically asking whether a computing grid of LAMP stacks is a worthwhile architectural construct that could be highly reusable for a variety of applications.  I say grid, because in my mind, it achieves maximal potential if deployed flexibly on a utility computing fabric such as Amazon EC2 where it can automatically flex to a larger cluster based on load requirements.  If it is fixed in size by configuration (which still means changeable, just not as quickly and automatically), I guess it would be more proper to call it a LAMP cluster.

LAMP refers to Linux as the OS, Apache as the web server, mySQL as the database, and a “P” language (usually PHP or Python) as the langauge used to implement the application.  It has become almost ubiquitious as a superfast way to bring up a new web application.  There are some shortcomings, but by and large, it remains one of the simplest ways to get the job done and still have the thing continue to work if you move into the big time.  A Pile of Lamps architecture would presumably simplify scaling by building it in at the outset rather than trying to tack it on later.

In general, I love the idea.  People are effectively doing what it calls for all the time anyway, they just do so in an ad hoc manner.  I got ambitious this Sunday morning and thought I’d drag out Fielding’s Dissertation and see how the idea stacks up.  If you’ve never had a look at Roy Fielding’s Architectural Styles and the Design of Network-Based Software Architectures, you missed out on a beautiful piece of work from the man that co-designed the Internet protocols.  This particular document sets forth the REST (Representational State Transfer) architecture.  What’s cool about it is that Fielding has a framework that he uses to evaluate the various components of REST that is applicable to a lot of other network architecture problems.  See Chapter 3 of the Dissertation for details, but that is my favorite part of the document. 

His concept is to create a scorecard for various network architectural components, and then use that scorecard together with the domain requirements of the design problem to arrive at an optimal architecture.  He says that’s how he got to REST, and it certainly seems to make sense as you read the Dissertation.  Here is a rendition of his ranking criteria for the models he considers:

Fielding Framework

A “0″ means the architectural style is beneficial to some domains and not others.  Positive means the style has benefit and negative means it is a poorer choice.

The components that make up REST look like this:

RESTful according to Fielding

There are 3 components that go into it:

  • Layered Cached Stateless Client Server:  The row marked LCS+C$SS
  • Uniform Interface, which isn’t in the original Fielding taxonomy, but which he says adds the qualities listed.
  • Code on Demand:  This is the ability of the web to send code to the client for execution based on what it requests.  So, for example, Flash or AJAX.

The “RESTful Result” is simply the total of the other attributes.  You can see it hits pretty darned well on most of the categories with the exception of Network efficiency.  As noted, this primarily means it isn’t suited to extremely fine grained communication, but is fine for a web page.  Pretty cool framework, eh?

Incidentally, Fielding’s framework really dumps on CORBA for all the right reasons.  Give it a read to see why.

Now let’s look at the Pile of Lamps.  Note that we aren’t trying to compare it to REST–they solve different problems.  Fielding tells us to do the analysis based on our domain, so put aside the RESTful scores, they aren’t meaningful to compare to anything but REST competitors.  Here is the result for Pile of Lamps:

Pile of Lamps

I view the LAMP stack as Layered Client Server, which is already a decent protocol.  A Pile of Lamps seems to me is basically adding a cached and replicated capability to the LAMP stack, so I add the cached/replicated repository to the equation.  You can see that it amplifies the LAMP stack while taking nothing away.  Basically, it makes it more efficient, more scalable, and it delivers those benefits in a simple way.  This makes total sense to me, given the concept. 

One can use the framework to fiddle with other potential additions to the Pile of Lamps idea.  For example, what if statelessness were pervasive in this paradigm?  I leave further refinement of the idea to readers, commenters, and the original authors, but it looks promising to me.  I’d also encourage others to delve into Fielding’s work.  It has application well beyond just describing REST.

Related Articles

A Pile of Lamps Needs a Brain

Posted in Open Source, Web 2.0, grid, multicore, platforms, software development, strategy | 3 Comments »

Amazon Beefs Up EC2 With New Options

Posted by smoothspan on October 16, 2007

I’ve been a big fan of Amazon’s Web Services for quite a while and attended their Startup Project, which is an afternoon seeing what it can do and hearing from entrepreneurs who’ve built on this utility computing fabric.  Read my writeup on the Startup Project for more.  Amazon has been steadily rolling out improvements, such as the addition of SLA’s for the S3 storage service.  Today, there is big news in the Amazon EC2 camp:

Amazon has just announced two new instance types for their EC2 utility computing service.  The original type will continue to be available as the “small” type.  The “large” type has four times the CPU, RAM, and Disk Storage, while the “extra large” has eight times the CPU, RAM, and Disk.  The large and extra large also sport 64 bit cpus.  Supersize your EC2!

Why do this?  Because the original small instance was a tad lightweight for database activity with just 1.7GB of RAM while the extra large at 15GB is about right.  Imagine a cluster of the extra large instances running memcached and you can see how this going to dramatically improve the possibilities for hosting large sites.

One of the neat things about this new announcement is pricing.  They’ve basically linearly scaled pricing.  Whereas a small instance costs 10 cents per instance hour, the extra large has 8x the capacity and costs 8×10 cents or 80 cents per hour.

What’s next?  These new instances open a lot of possibilities, but Amazon still doesn’t have painless persistence for databases like mySQL.  If you are running mySQL on an extra large instance and the server goes down for whatever reason, all the data on it is lost and you have to rebuild a new machine around some form of hot backup or failover.  That exercise has been left to the user.  It’s doable: you have to solve the problem in any data center of what you plan to do if the disk totally crashes and no data can be recovered.  However, folks have been vocally requesting a better solution from Amazon where the data doesn’t go away and the machine can be rebooted intact.  I was told by the EC2 folks at the Startup Project to expect 3 announcements before the end of the year that were related.  I’m guessing this is the first such announcement and two more will follow. 

There’s tremendous excitement right now around these kinds of offerings.  They virtualize the data center to reduce the cost and complexity of setting up the infrastructure to do web software.  They allow you to flex capacity up or down and pay as you go.  Amazon is not the only such option.  I’ll be reporting on some others shortly.  It’s hard to see how it makes sense to build your own data center without the aid of one of these services any more. 

Posted in Web 2.0, amazon, ec2, grid, multicore, platforms, saas, software development | 2 Comments »

To Escape the Multicore Crisis, Go Out Not Up

Posted by smoothspan on September 29, 2007

Of course, you should never go up in a burning building, go out instead.  Amazon’s Werner Voegels sees the Multicore Crisis in much the same way:

Only focusing on 50X just gives you faster Elephants, not the revolutionary new breeds of animals that can serve us better.

Voegels is writing there about Michael Stonebreaker’s claims that he can demonstrate a database architecture that outperforms conventional databases by a factor of 50X.  Stonebreaker is no one to take lightly: he’s accomplished a lot of innovation in his career so far and he isn’t nearly done.  He advocates replacing the Oracle (and mySQL) style databases (which he calls legacy databases) with a collection of special purpose databases that are optimized for particular tasks such as OLTP or data warehousing.  It’s not unlike the concept myself and others have talked about that suggests that the one-language-fits-all paradigm is all wrong and you’d do better to adopt polyglot programming.

I like Stonebreaker’s work.  While I want the ability to scale out to any level that Voegels suggests, I will take the 50X improvement as a basic building block and then scale that out if I can.  That’s a significant scaling factor even looked at in the terms of the Multicore Language Timetable.  It’s nearly 8 years of Moore’s Cycles.  I’m also mindful that databases are the doorway to the I/O side of the equation which is often a lot harder to scale out.  Backing an engine that’s 50X faster sucking the bits off the disk with memcached ought to lead to some pretty amazing performance.

But Voegels is right, in the long term we need to see different beasts than the elephants.  It was with that thought in mind that I’ve been reading with interest articles about Sequoia, an open source database clustering technology that makes a collection of database servers look like one more powerful server.  It can be used to increase performance and reliablity.  It’s worth noting that Sequoia can be installed for any Java app using JDBC without modifying the app.  Their clever monicker for their technology is RAIDb:  Redundant Array of Inexpensive Databases.  There are different levels of RAIDb just as there are RAID levels that allow for partitioning, mirroring, and replication.  The choice of level or combinations of levels governs whether your applications gets more performance, more reliability, or both.

Sequoia is not a panacea, but for some types of benchmarks such as TPC-W, it shows a nearly linear speedup as more cpus are added.  It seems likely a combination of approaches such as Stonebreaker’s specialized databases for particular niches and clustering approaches like Sequoia all running on a utility computing fabric such as Amazon’s EC2 will finally break the multicore logjam for databases.

Posted in Open Source, amazon, ec2, grid, multicore, platforms, software development | 3 Comments »

Who Doesn’t Love Java? (You’d Be Surprised! -and- Part 2 of the Tool/Platform Rants)

Posted by smoothspan on September 17, 2007

When Sun’s Jonathan Schwartz announced that he was changing Sun’s ticker symbol to JAVA, I wasn’t surprised that a lot of folks saw it as a silly move (nor surprised to see Jonathan being defensive).  Now don’t get me wrong:  I like Java.  My last gig involved prodigious amounts of gnarly Java code in a highly scalable Enterprise grid computing application.  The thing is, I have a problem with the idea that a single language can be all things to all people.   

In addition to Java, I’ve used a lot of other languages, and I thought it would be interested to see who else does too:

GoogleCreated a massive foundation for their massively scalable search engine in C++.  The foundation includes MapReduce (a way to employ thousands of CPU’s and avoid the Multicore Crisis), BigTable (their subsitute for a database), and the Google File System.  It all runs on a few hundred thousand Linux commodity boxes.  To be sure, Google has grown so large they now employ some Java, but C++ is the big enchilada.

YahooPushed into PHP back in 2002 in order to improve productivity and quit “reinventing the wheel”.  Translation:  They didn’t want 70% of their coding to be wasted.  Their presentation on this decision is quite interesting.

YouTubeWritten in Python, a hip relatively recent scripting language.  Google also makes extensive use of Python.

Facebook:  The clever fellows at Facebook are definitely technologists (maybe that’s why they got a platform together ahead of their peers) and built a fancy RPC (remote procedure call) core technology that lets them use almost any language they want.  Shades of Google core tech commitment, eh?  When they talk about what languages are used primarily, Java lands in last place.  The pecking order is PHP, C++, Perl, Python, and Java.

MySpaceBuilt on the Microsoft .NET Stack, so of course no Java there.  Are these boys a glutton for punishment or what?  Yet it proves that it can be done and that it will scale.

DiggDigg is built with the PHP scripting language, which is the “P” (or at least one possible “P”) in those LAMP stacks you hear so much about.

Wikipedia:  Like Digg, Wikipedia was also built on PHP

Amazon:  I was surprised and intrigued to learn that Amazon is language agnostic.  Like Google and Facebook, they’ve invested in some core technology that lets a lot of languages play together freely.  Werner Vogels goes on to say, “Developers are like artists; they produce their best work if they have the freedom to do so, but they need good tools.” 

Flickr:  Everyone’s favorite photo sharing service relies largely on PHP and Perl, with one twitchy systems programming service written in Java. 

Croquet:  A massively multiplayer game environment done in Smalltalk.  Who woulda thunk it?

There are many more, but you get the point:  Some, if not the vast majority, of the web’s biggest movers and shakers have decided not make Java their sole language and others have excluded it entirely!  Sure, the immediate reaction is that there will always be some C++ zealots who prefer their language to Java, but that doesn’t change the list all that much.  What we do see are a lot of “P” languages:  Python, PHP, and Perl.  What’s up with that?

Recall Part 1 of the Tool/Platform Rant Series, which talked about how 70% of the Software You Build is Wasted?  These hugely successful web properties have taken steps to reduce that 70% waste, and while there are many factors that contributed to their success, I believe a desire for increased productivity played a significant role.  The days when these languages were essential for performance are coming to an end too, courtesy of the Multicore Crisis.

There are other ways to skin this cat of over-reliance on a language that’s too low level without giving it up entirely.  Who doesn’t feel like technology gave Google a tremendous edge?  One can argue that C++ isn’t really the language of Google, rather, MapReduce, BigTable, and the Google File System are their language.  C++ is just the assembly code used to write the modules that these other platforms mash up.  In fact, it makes sense to think that C, Java, and C++ are all just portable assembly languages.  By doing this, Google has been able to focus a much greater proportion of its resources to create a proprietary edge for itself.  So have all the others on the list.

It gets more radical:  Amazon doesn’t care what language its developers use.  According to Werner Vogels:

I think part of the chaotic nature—the emerging nature—of Amazon’s platform is that there are many tools available, and we try not to impose too many constraints on our engineers. We provide incentives for some things, such as integration with the monitoring system and other infrastructure tools. But for the rest, we allow teams to function as independently as possible.

You have to ask yourself in light of all this, when are the Curly Braced Languages most and least appropriate?  The decision criteria are shockingly stark:

  • You Need a Curly Braced Language When You Are Building Something Totally Proprietary That’s Critical to Your Business And Can’t Be Built With Any Other Language
  • Run Away From Curly Braced Languages And Choose Something Else Every Other Time

Another thing:  notice the companies profiled above that have created their own Component Architecture Frameworks or Core Technology.  Google has core technology in spades.  Facebook has a system that lets them write components in any language they choose.  Amazon also allows multiple languages so long as a few core functions are adhered to.  Isn’t that interesting?  These guys want to be Polyglot Programmers, and they’re investing valuable resources to make that a reality.  This is all part and parcel of the “meta” thing too.  The realization that to fully utilize computers, you must be facile at manipulating the fabric of the programs themselves, and not just the data they are processing.  Adopting a polyglot view positions these vendors better to offer a platform, because it means they can allow others to use the languages of their choice.  There are a lot of benefits to being language agnostic!

Polyglot Programming is becoming increasingly well accepted in an era when the Curly Braced Languages (Java, C++, et al) bring little to the Web 2.0, SaaS, and Multicored Crisis parties.  Those languages are too low-level.  There is no support there for multi-tenancy, web pages, security, scaling, or any of the myriad of problems one encounters when building a Web 2.0 or SaaS business.  You have three choices:  write all of that yourself and pay the 70% overhead tax or become a Polyglot Programmer and minimize the overhead by choosing the best tools for the task and leaving your Curly Braced Power Tools safely in the workshop only to be brought out when some highly custom proprietary work needs to be done.

Related Articles

ESB vs REST (Another Case for Multi-Language Programming)

Posted in Web 2.0, grid, multicore, platforms, saas, software development, strategy | 11 Comments »

Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing

Posted by smoothspan on September 14, 2007

There’s been a lot of back and forth in the Python community over something called the “GIL” or Global Interpreter Lock.  Probably the best “get rid of the GIL” argument comes from Juergen Brendel’s post.  Guido, the benevolent dictator of Python has responded in his own blog that the GIL is here to stay and he doesn’t think it is a problem nor that it’s even the right choice to try to remove it.  Both combatants have been eloquent in expressing their views.  As is often the case, they’re optimizing to different design centers and likely will have to agree to disagree.

Now let’s try to pick apart this issue in a way that everyone can understand and make sense of for large scalability issues in the world of SaaS and Web 2.0.  Note that my arguments may be invalid if your scaling regime is much smaller, but as we’ve seen for sites like Twitter, big time scaling is hard and has to be thought about carefully.

First, a quick explanation on the GIL.  The GIL is a bit of code that causes multiple Python threads to have to wait before an object can be accessed.  Only one thread may access an object at a time. 

Whoa!  That sounds like Python has no ability to scale for multiple cores at all!  How can that be a good thing?  You can see where all the heat is coming from in this discussion.  The GIL just sounds bad, and one blogger refers to it jokingly as the GIL of Doom.

Yet all is not lost.  One can access multiple cpu’s using processes, and the processes run in parallel.  Experienced parallel programmers will know the difference between a process and a thread is that the process has its own state, while threads share their state with other threads.  Hence a thread can reach out and touch the other thread’s objects.  Python is making sure that when that touch happens, only one thread can touch at a time.  Processes don’t have this problem because their communication is carefully controlled and every process has its own objects.

Why do programmers care about threads versus processes?  In theory, threads are lighter weight and they can perform better than a process.  We used to argue back and forth at Oracle about whether to use threads or processes, and there were a lot of trade offs, but it often made sense to go for threads. 

So why won’t Guido get rid of the GIL?  Well, for one thing, it was tried and it didn’t help.  A new interpreter was written with fine-grained locking that minimized the times when multiple threads were locked out.  It ran twice as slow (or worse on Linux) for most applications as the GIL version.  The reason is that having more lock calls was slower:  lock is a slow operating system function.  The way Guido put this was that on a 2 processor machine, Python would run slightly faster than on a single processor machine, and he saw that as too much overhead.  Now I’ve commented before that we need to waste more hardware in the interest of higher parallelism, and this factor of 2 goes away as soon as you run on a quad core cpu, so why not nix the GIL?  BTW, those demanding the demise of the GIL seem to feel that since Java can run faster and supports threads, that the attempt at removing the GIL must have been flawed and there is a better way.

I find myself in a funny quandry on this one, but ultimately agreeing with Guido.  There is little doubt that the GIL creates a scalability speed bump, but that speed bump is localized at the low end of the scalability space.  If you want even more scalability, you still have to do as Guido recommends and use processes and sockets or some such to communicate between them.  I also note that a lot of authorities feel that it is also much harder to program threads than processes, and they call for shared nothing access.  Highly parallel languages like Erlang are focused on a process model for that reason, not a thread model.

Let me explain what all that means.  Threads run inside the same virtual machine, and hence run on the same physical machine.  Processes can run on the same physical machine or in another physical machine.  If you architect your application around threads, you’ve done nothing to access multiple machines.  So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

As Donald Knuth says, “premature optimization is the heart of all evil in programming.”  Threads are a premature optimization when you need massive scaling, while processes lead to greater scalability.  If you’re planning to use a utility computing fabric, such as Amazon EC2, you’ll want processes.  In this case, I’m with Guido, because I think utility computing is more important in the big picture than optimizing for the cores on a single chip.  Take a look at my blog post on Amazon Startup Project to see just a few things folks are doing with this particular utility computing fabric.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in Web 2.0, amazon, data center, ec2, grid, multicore, platforms, saas, software development | No Comments »

Twitter Scaling Story Mirrors the Multicore Language Timetable, Yields 10000% Speedup

Posted by smoothspan on September 14, 2007

There’s a great story over on the High Scalability blog about how Twitter became 10000% faster:

For us, it’s really about scaling horizontally - to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.

This is the story I wanted to tell in Multicore Language Timetable:  a faster language pales in comparison to a more scalable language.  In this case, Twitter didn’t have that luxury, Ruby wasn’t more scalable, but it did have sufficient facilities that they could rearchitect their app for horizontal scaling, which is utilization of more cores.  It’s also the story of how the Multicore Crisis is here today and many of you have already experienced it.

Twitter learned several interesting lessons along the way that I’ve been hearing more and more:

  • Don’t let the database be a bottleneck.  We had the same view at Callidus, the Enterprise Software company I last worked at.  We build a grid architecture and managed to offload enough so that a typical configuration was 75% Java grid computing array and 25% database.  This was for a radically more database-intensive (financial processing) business application than most Web 2.0 apps like Twitter.
  • You have to build it yourself.  Unfortunately, there’s a lot of “almost” technology out there that doesn’t quite work.  That’s really unfortunate because everyone keeps hitting this horizontal scaling problem and having to reinvent the wheel:  70% of the software you write is still wasted.
  • Conventional database wisdom often tragically impairs scalability:  More and more companies are denormalizing to minimize joins and leaving relational integrity as a problem solved outside the database. 
  • “Most performance comes not from language but from application design.”  That’s a quote from the article, but I maintain it is also an artifact of using languages designed for a fundamentally different problem than what web scale applications face today.  Because the languages aren’t meant to solve the scaling problem, we shouldn’t be surprised that they don’t.

Interestingly, Twitter still has 1 single mySQL DB for everything.  It is massively backed up by in-memory caches that run on many machines, but at some point it can become the bottleneck too.  They’ve worked hard to de-emphasize it, but ultimately they have to figure out how to horizontally scale that DB.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in Web 2.0, data center, grid, multicore, software development | 2 Comments »