SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the ‘ec2’ Category

To Escape the Multicore Crisis, Go Out Not Up

Posted by Bob Warfield on September 29, 2007

Of course, you should never go up in a burning building, go out instead.  Amazon’s Werner Voegels sees the Multicore Crisis in much the same way:

Only focusing on 50X just gives you faster Elephants, not the revolutionary new breeds of animals that can serve us better.

Voegels is writing there about Michael Stonebreaker’s claims that he can demonstrate a database architecture that outperforms conventional databases by a factor of 50X.  Stonebreaker is no one to take lightly: he’s accomplished a lot of innovation in his career so far and he isn’t nearly done.  He advocates replacing the Oracle (and mySQL) style databases (which he calls legacy databases) with a collection of special purpose databases that are optimized for particular tasks such as OLTP or data warehousing.  It’s not unlike the concept myself and others have talked about that suggests that the one-language-fits-all paradigm is all wrong and you’d do better to adopt polyglot programming.

I like Stonebreaker’s work.  While I want the ability to scale out to any level that Voegels suggests, I will take the 50X improvement as a basic building block and then scale that out if I can.  That’s a significant scaling factor even looked at in the terms of the Multicore Language Timetable.  It’s nearly 8 years of Moore’s Cycles.  I’m also mindful that databases are the doorway to the I/O side of the equation which is often a lot harder to scale out.  Backing an engine that’s 50X faster sucking the bits off the disk with memcached ought to lead to some pretty amazing performance.

But Voegels is right, in the long term we need to see different beasts than the elephants.  It was with that thought in mind that I’ve been reading with interest articles about Sequoia, an open source database clustering technology that makes a collection of database servers look like one more powerful server.  It can be used to increase performance and reliablity.  It’s worth noting that Sequoia can be installed for any Java app using JDBC without modifying the app.  Their clever monicker for their technology is RAIDb:  Redundant Array of Inexpensive Databases.  There are different levels of RAIDb just as there are RAID levels that allow for partitioning, mirroring, and replication.  The choice of level or combinations of levels governs whether your applications gets more performance, more reliability, or both.

Sequoia is not a panacea, but for some types of benchmarks such as TPC-W, it shows a nearly linear speedup as more cpus are added.  It seems likely a combination of approaches such as Stonebreaker’s specialized databases for particular niches and clustering approaches like Sequoia all running on a utility computing fabric such as Amazon’s EC2 will finally break the multicore logjam for databases.

Posted in amazon, ec2, grid, multicore, Open Source, platforms, software development | 4 Comments »

Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing

Posted by Bob Warfield on September 14, 2007

There’s been a lot of back and forth in the Python community over something called the “GIL” or Global Interpreter Lock.  Probably the best “get rid of the GIL” argument comes from Juergen Brendel’s post.  Guido, the benevolent dictator of Python has responded in his own blog that the GIL is here to stay and he doesn’t think it is a problem nor that it’s even the right choice to try to remove it.  Both combatants have been eloquent in expressing their views.  As is often the case, they’re optimizing to different design centers and likely will have to agree to disagree.

Now let’s try to pick apart this issue in a way that everyone can understand and make sense of for large scalability issues in the world of SaaS and Web 2.0.  Note that my arguments may be invalid if your scaling regime is much smaller, but as we’ve seen for sites like Twitter, big time scaling is hard and has to be thought about carefully.

First, a quick explanation on the GIL.  The GIL is a bit of code that causes multiple Python threads to have to wait before an object can be accessed.  Only one thread may access an object at a time. 

Whoa!  That sounds like Python has no ability to scale for multiple cores at all!  How can that be a good thing?  You can see where all the heat is coming from in this discussion.  The GIL just sounds bad, and one blogger refers to it jokingly as the GIL of Doom.

Yet all is not lost.  One can access multiple cpu’s using processes, and the processes run in parallel.  Experienced parallel programmers will know the difference between a process and a thread is that the process has its own state, while threads share their state with other threads.  Hence a thread can reach out and touch the other thread’s objects.  Python is making sure that when that touch happens, only one thread can touch at a time.  Processes don’t have this problem because their communication is carefully controlled and every process has its own objects.

Why do programmers care about threads versus processes?  In theory, threads are lighter weight and they can perform better than a process.  We used to argue back and forth at Oracle about whether to use threads or processes, and there were a lot of trade offs, but it often made sense to go for threads. 

So why won’t Guido get rid of the GIL?  Well, for one thing, it was tried and it didn’t help.  A new interpreter was written with fine-grained locking that minimized the times when multiple threads were locked out.  It ran twice as slow (or worse on Linux) for most applications as the GIL version.  The reason is that having more lock calls was slower:  lock is a slow operating system function.  The way Guido put this was that on a 2 processor machine, Python would run slightly faster than on a single processor machine, and he saw that as too much overhead.  Now I’ve commented before that we need to waste more hardware in the interest of higher parallelism, and this factor of 2 goes away as soon as you run on a quad core cpu, so why not nix the GIL?  BTW, those demanding the demise of the GIL seem to feel that since Java can run faster and supports threads, that the attempt at removing the GIL must have been flawed and there is a better way.

I find myself in a funny quandry on this one, but ultimately agreeing with Guido.  There is little doubt that the GIL creates a scalability speed bump, but that speed bump is localized at the low end of the scalability space.  If you want even more scalability, you still have to do as Guido recommends and use processes and sockets or some such to communicate between them.  I also note that a lot of authorities feel that it is also much harder to program threads than processes, and they call for shared nothing access.  Highly parallel languages like Erlang are focused on a process model for that reason, not a thread model.

Let me explain what all that means.  Threads run inside the same virtual machine, and hence run on the same physical machine.  Processes can run on the same physical machine or in another physical machine.  If you architect your application around threads, you’ve done nothing to access multiple machines.  So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

As Donald Knuth says, “premature optimization is the heart of all evil in programming.”  Threads are a premature optimization when you need massive scaling, while processes lead to greater scalability.  If you’re planning to use a utility computing fabric, such as Amazon EC2, you’ll want processes.  In this case, I’m with Guido, because I think utility computing is more important in the big picture than optimizing for the cores on a single chip.  Take a look at my blog post on Amazon Startup Project to see just a few things folks are doing with this particular utility computing fabric.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, data center, ec2, grid, multicore, platforms, saas, software development, Web 2.0 | 4 Comments »

Amazon Startup Project Report

Posted by Bob Warfield on September 13, 2007

I attended the Silicon Valley edition of the Amazon Startup Project today.  This is their second such event, the first having been hosted in home-town Seattle.  The event took place at the Stanford University Faculty and was well attended: they basically filled the hall.  The agenda included an opening by Andy Jassy, Sr VP for Amazon Web Services, a discussion on the services themselves by Amazon Evangelist Mike Culver, a series of discussions by various startups using the services, a conversation with Kleiner Perkins VC Randy Komisar, and closing remarks by Jassy again.  Let me walk through what I picked up from the various segments.

First up were the two talks by Amazon folk, Jassy and Mike Culver.  Jassy kept it pretty light, didn’t show slides, and generally set a good tone for what Amazon is trying to accomplish.  The message from him is they’re in it for the long haul, they’ve been doing API’s for years, and the world should expect this to be a cash generating business for Amazon relatively shortly.  That’s good news as I have sometimes heard folks wonder whether this is just remaindering infrastructure they can’t use or whether they are in fact serious.  The volumes of data and cpu they’re selling via these services are enormous and growing rapidly.

Mike Culver’s presentation basically walked through the different Amazon Web Services and tried to give a brief overview of what they were, why you’d want such a thing, and examples of who was using them.  I had several takeaways from Mike’s presentation.  First, his segment on EC2 (Elastic Compute Cloud–the service that sells CPU’s) was the best.  His discussion of how hard it can be to estimate and prepare for the volumes and scaling you may encounter was spot on.  Some of the pithier bullets included:

  • Be prepared to scale down as well as up.
  • Queue everything and scale out the servicing of the queues.

He showed a series of Alexa traffic slides that were particularly good.  First he showed CNN’s traffic:

CNN Traffic

As you can see, there are some significant peaks and valleys.  In theory, you’d need to build for the peaks and eat the cost of overcapacity for the valleys if you build your own data center.  With a utility computing fabric like Amazon’s you can scale up and down to deal with the demand.  He next overlaid Flickr onto this data:

Flickr Traffic

Flickr’s problem is a little different.  They went along for a while and then hit a huge spike in Q206.  Imagine having to deal with that sort of spike by installing a bunch of new physical hardware.  Imagine how unhappy your customers would be while you did it and how close you would come to killing your staff.  Spikes like that are nearly impossible to anticipate.  CNN has bigger spikes, but they go away pretty rapidly.  Flickr had a sustained uptick. 

The last view overlaid Facebook onto the graph:

Facebook Traffic

Here we see yet another curve shape: exponential growth that winds up dwarfing the other two in a relatively short time.  Amazon’s point is that unless you have a utility computing fabric to draw on, you’re at the mercy of trying to chase one of these unpredictable curves, and you’re stuck between two ugly choices:  be behind the curve and making your customers and staff miserable with a series of painful firedrills, or be ahead of the curve and spend the money to handle spikes that may not be sustained, thereby wasting valuable capital.  Scaling is not just a multicore problem, it’s a crisis of creating a flexible enough infrastructure that you can tweak on a short time scale and pay for it as you need it.

One of the things Mike slid in was the idea that Amazon’s paid for images were a form of SaaS.  To use EC2, you first come up with a machine image.  The image is a snapshot of the machine’s disk that you want to boot.  Amazon now has a service where you can put these images up and people pay you money to use them, while Amazon gets a cut.  The idea that these things are like SaaS is a bit far fetched.  By themselves they would be Software without much Service.  However, the thought I had was that they’re really more like Web Appliances.  Some folks have tried to compare SaaS and Appliance software–I still think it doesn’t wash for lack of Service in the appliance, but this Amazon thing is a lot cleaner way to deliver an appliance than having to ship a box.  Mike should change his preso to push it more like appliances!

All of the presentations were good, but the best ones for me were by the startup users of the services.  What was great about them was that they pulled no punches.  The startups got to talk about both the good and bad points of the service, and it wasn’t too salesy about either Amazon or what the startups were doing.  It was more like, “Here’s what you need to know as you’re thinking about using this thing.”  I’ll give a brief summary of each:

Jon Boutelle, CTO, Slideshare

The Slideshare application is used to share slideshows on the web, SaaS-style.  Of course Jon’s preso was done using slideware.  His catchy title was “How to use S3 to avoid VC.”  His firm bootstrapped with minimum capital, and his point is not that you have to get the lowest possible price per GB (Amazon isn’t that), but that the way the price is charged matters a lot more to a bootstrapping firm.  In his firm’s case, they get the value out of S3 about 45 days before they have to pay for it.  In fact, they get their revenue from Google AdSense in advance of their billing from Amazon, so cash flow is good!

He talked about how they got “TechCrunched” and the service just scaled up without a problem.  Many startups have been “TechCrunched” and found it brought the service to its knees because they got slammed by a wall of traffic, but not here.

Joyce Park, CTO, Renkoo/BoozeMail

Joyce was next up and had a cool app/widget called BoozeMail.  It’s a fun service that you can use whether or not you’re on Facebook to send a friend a “virtual drink”.  Joyce gave a great overview of what was great and what was bad about Amazon Web Services.  The good is that it has scaled extremely well for them.  She ran through some of their numbers that I didn’t write down, but they were very large.  The bad is that there have been some outages, and its pretty hard to run things like mySQL on AWS (more about that later).

BoozeMail is using a Federated Database Architecture that tracks the senders and receivers on multiple DB servers.  The sender/receiver lists are broken down into groups, and they will not necessarily wind up on the same server.  At one point, they lost all of their Amazon machines simultaneously because they were all part of the same rack.  This obviously makes failover hard and they were not too happy about it. 

Persistence problems with Amazon are one of the thorniest issues to work through.  Your S3 data is safe, but an EC2 instance could fall over at any time without much warning.  Apparently Renkoo is beta testing under non-disclosure some technology that makes this better, although Joyce couldn’t talk about it.  More later.

Something she mentioned that the others echoed is that disk access for EC2 is very slow.  Trying to get your data into memory cache is essential, and writes are particularly slow.  Again, more on the database aspects in a minute, but help is on the way.

Sean Knapp, President of Technology, Ooyala

Ooyala is a cool service that let’s you select objects on high quality video.  The demo given at Startup Day was clicking on a football player who was about to make a touchdown to learn more about him.  Sean spent most of his preso showing what Ooyala is.  It is clearly an extremely impressive app, and it makes deep use of Amazon Web Services to virtually eliminate any need for doing their own hosting.  The message seemed to be if these guys can make their wild product work on Amazon, you certainly can too.

Don MacAskill, CEO, Smugmug

I’ve been reading Don’s blog for a while now, so I was pleased to get a chance to meet him finally.  Smugmug is a high end photo sharing service.  It charges for use SaaS-style, and is not an advertising supported model.  As I overheard Don telling someone, “You can offer a lot more when people actually pay you something than you can if you’re just getting ad revenue.”  Consequently, his customer base includes some tens of thousands of professional photographers who are really picky about their online photo experience.

Smugmug has been through several generations of Amazon architectures, and may be the oldest customer I’ve come across.  They started out viewing Amazon as backup and morphed until today Amazon is their system of record and source of data that doesn’t have to be served too fast.  They use their own data center for the highest traffic items.  The architecture makes extensive use of caching, and apparently their caches get a 95% hit rate.

Don talked about an area he has blogged on in the past, which is how Amazon saves him money that goes right to the bottom line.

Don’s summary on Amazon:

  • A startup can’t go wrong using it initially
  • Great for “store a lot” + “serve a little”
  • More problematic for “serve a lot”

There are performance issues with the architecture around serve a lot and Don feels they charge a bit too much (though not egregiously) for bandwidth.  His view is that if you use more than a Gigabit connection, Amazon may be too expensive, but that they’re fine up to that usage level.

His top feature requests:

-  Better DB support/persistence

-  Control over where physically your data winds up to avoid the “my whole rack died” problem that Joyce Park talked about.

The Juicy Stuff and Other Observations

At the end of the startup presentations, they opened up the startup folks to questions from the audience.  Without a doubt, the biggest source of questions surrounded database functionality:

-  How do we make it persist?

-  How do we make it fast?

-  Can we run Oracle?  Hmmm…

It’s so clear that this is the biggest obstacle to greater Amazon adoption.  Fortunately, its also clear it will be fixed.  I overheard one of the Amazon bigwigs telling someone to expect at least 3 end of year announcements to address the problem.  What is less clear is whether the announcements would be:

a)  Some sort of mySQL service all bundled up neatly

b)  Machine configurations better suited to DB use:  more spindles and memory was mentioned as desireable

c)  Some solution to machines just going poof!  In other words, persistence at least at a level where the machine can reboot, access the data on its disk, and take off again without being reimaged.

d)  Some or all of the above.

Time will tell, but these guys know they need a solution.

The other observation I will make is one that echoes Don’s observation on Smugmug:  I’m sure seeing a lot of Mac laptops out in the world.  3 of the 4 presenters were sporting Macs, and 2 of them had been customized with their company logos on the cover.  Kewl!

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, data center, ec2, grid, multicore, Partnering, platforms, saas, software development, strategy, venture, Web 2.0 | 12 Comments »

Persistent mySQL Now Available for Amazon EC2/S3 Junkies

Posted by Bob Warfield on September 2, 2007

There are now two companies, Elastra and RightScale, who are offering solutions for Persistent mySQL on Amazon EC2/S3.  This is a significant development in utility computing because most companies wishing to use Amazon’s platform would have to solve this thorny problem before they could get on with doing something interesting.   Having an off-the-shelf solution makes it that much easier to adopt the platform.

Some are concerned about the price or about getting locked into Amazon, but I think these are relatively safe bets.  First, we already have 2 players, and Amazon will likely offer a solution of its own.  Hence the price will stabilize to a lower point in a competitive marketplace.  Second, mySQL is the API here, not Amazon.  Any utility computing service that wants to make a go will have to support mySQL in some form or fashion, so rehosting may not even be that bad.  Hence the lock-in is minimal.

Thanks to High Scalability for the heads up!

Posted in amazon, data center, ec2, grid, multicore, platforms, saas, Web 2.0 | 1 Comment »

You’ve Already Had a Multicore Crisis and Just Didn’t Realize It!

Posted by Bob Warfield on August 30, 2007

As I was driving along pondering the imponderables, I suddenly realized the folks talking about the Multicore Crisis have gotten it all wrong.  For those who haven’t heard of it, the Multicore Crisis is basically concern about what happens as chipmakers shift from being able to deliver ever-faster clock speeds according to Moore’s Law to delivering ever more processor cores on the same chip.  The crisis comes about because its much harder to write truly parallel software than it is to just let the chip get faster and run conventional software twice as fast every 18-24 months.  No lesser folks than Microsoft’s Craig Mundie have proclaimed that we are 10 years away from having the proper languages and other tools to efficiently harness the hardware that will exist in a multicore world.

Some of the pundits in the blogosphere have argued that we have plenty of time to get ready for the Multicore Crisis, and that all the hubub today is just hype and hand wringing.  They will do projections that say it’s easy with a couple cores to just give one to the OS, save the other for the app, and see an immediate speedup.  By the time there’s enough cores on a chip that this quits working, 10 years will have gone by and we’ll have all those great new tools needed to harness the big chips.  There are some pretty good rebuttals for this already, BTW.

Never mind that quad core chips have already shipped, motherboards are cheaply available to put two of these together in a “V8″ 8-core configuration, 8-core chips are nearly here from Intel and already here from Sun.  Never mind that Intel has an 80 core chip in their labs and there are startups looking at 64 cores in the relative near term.  Let’s also forget that with 4 cores shipped now and 8 cores due out next year we will see 64 cores in more like 6 years than 10, according to standard Moore’s Law rates.  Despite all that, it’s all going to be okay.  Really!

Here is my problem with all this back and forth:  we’ve already hit the Multicore Brick Wall without leaving skid marks and most people just don’t realize it!  I hear the crowd out there now, beyond the klieg lights, grumbling in the dark, “What’s he on about now?”  Patience please.  The problem with multicore is it teaches us that someday we will expect software to scale linearly.  That Alpha Geek Speak means if I double the number of available cores, I want my software to run twice as fast.  Hallelujah!  I’m back to getting twice the speed every 18-24 months just like in the heyday of Moore’s Law.  In the post-clockspeed-doubling world that’s coming, this will be a requirement or all computing progress grinds to a halt (that means the money stops: true crisis), or so say the Multicored Chicken Littles.

Linear Scalability is hard to do, but ironically, it is nothing new.  Guess what?  We’ve already been fighting with “scalability” for a long time.  Can you see where I’m going with this?  Let me give you some examples.

Once upon a time eBay was plagued by terrible outages.  Analysts stated that this was due to eBay’s failure to build a redundant, scalable web architecture.  One of my startups was located on eBay’s campus in Campbell, and the story we heard at the local Starbucks was interesting.  It seems eBay had built out their original architecture around the idea of running a 3rd party search engine on a mainframe.  Eventually, they reached a point where they had purchased the largest mainframe Sun had to offer.  Unfortunately, being a Red Shifted business, they were growing at a rate faster than Moore’s Law, and hence faster than Sun could provide them more powerful machines!  Or, as eBay themselves put it in a presentation on their architectural evolution, “By November 1999, the database servers approached their limits of physical growth.”

In August of 1999, Meg Whitman hired Maynard Webb on the heels of all this to fix it.  The fix (despite many protestations that at least some of the problem was due to issues with eBay’s vendors like Sun) boiled down rearchitecting the very fabric of eBay to allow for:

    “clustering the servers for greater availability, dividing the workload among its Oracle databases

Wow!  Deja Vu all over again.  They needed to find a way to harness more cores to keep up with the load:  eBay had a Multicore Crisis in 1999!  

When I worked for Oracle, we used to employ the Multicore Crisis to make sure our server win the benchmarks against competitors.  It was easy.  Just insist on running the benchmark on a server that had more cpus than Microsoft SQL Server could utilize.  If Oracle could run 2x the cpus and keep them all efficiently humming away, we would run 2x as fast on the same hardware.  As I recall, at first SQL Server could utilize just 4 cores.  At some point, and after a lot of pain, they upped it to 8.  I’ve worked on big Enterprise projects where we successfully harnessed well over 100 cpus.

Which brings me to my last company, Callidus Software.  We used scalability as a powerful competitive weapon.  We had built a grid computing infrastructure to run our incentive compensation software.  The competition literaly had to throw in the towel at certain volume levels.  Beyond here there be scalability dragons.  There’s nothing quite like competing in a deal where you know your competition can’t produce a single happy reference at the volume levels the prospect requires.

More recently, the Skype VOIP service was down for an extended time due to what was basically a scaling problem.  Microsoft forced some updates through to Windows users, Windows had to reboot (what else is new), and suddenly there were millions of rebooted machines trying to log onto Skype all at the same time.  Skype’s explanation was:

Our software’s peer-to-peer network management algorithm was not tuned to take into account a combination of high load and supernode rebooting.

Consider the costs to businesses that depend on Skype?  Looking closer to home at eBay, Skype’s owner, investors saw a loss of $1B in market value as the drama unfolded.  A Multicore Crisis can be really bad for your business!  As more and more of the computing world turns to centralized models like SaaS and Web 2.0, it becomes more important than ever to solve the Multicore Crisis, or at least the Scalability Crisis for these businesses to succeed.

If we want to move beyond this, SaaS and Web 2.0 sites have to be architected for massive scalability, particularly if they’re built on cost-effective Lintel (Linux on commodity Intel boxes) architectures like so many of these sites are.  In addition, companies need to invest in utility computing at the hosting end so they can rapidly increase (or decrease) the hardware they have on line when demand hits.  One example of a utility computing service would be Amazon’s EC2 and S3 services that let you dynamically provision a machine in their data center in about 10 minutes.

Have you ever encountered massive outages on a new and rapidly growing service?  Perhaps a newly minted Web 2.0 startup?  Perhaps you’ve been really unlucky and encountered the problem as you company tried to install a mission critical piece of Enterprise Software.  Post a comment here to share your experiences.  I know many of you have already had a Multicore Crisis, and now you know what to look for.

For those who are thinking you’ll worry about the Multicore Crisis in 10 years when it’s an easy problem to solve, remember:

You’ve already had a Multicore Crisis and just didn’t know it!

Related Articles:

A Picture of the Multicore Crisis:  See a timeline of it unfolding.

Multicore Language Timetable

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, ec2, grid, multicore, platforms, saas, software development, Web 2.0 | 8 Comments »

Why Don’t Search Startups Share Data, Part 2

Posted by Bob Warfield on August 22, 2007

I mentioned in an earlier post that search startups ought to look into a divide and conquer approach when crawling the web.  After all, one of the biggest complaints about a lot of interesting search services is they don’t find as much as Google does.  TechCrunch, for example, complains that Microsoft’s new Tafiti produces search results that are “not as relevant as Google or Yahoo“.  And yet, they also admit Tafiti is beautiful (as an aside, it is very cool and worth a look to see what Microsoft’s Flex killer, Silverlight, can do for a web site).  If the Alt search sites band together to do the basic crawling and crunching using Google’s MapReduce-style algorithms (possible based on the Open Sourced Hadoop Yahoo is pushing), they could share one of the bigger costs of being in business and ameliorate the huge advantage in reach that the biggest players have over them.

ZDNet bloggers Dan Farber and Larry Dignan ask whether Open Sourced Hadoop can give Yahoo the leverage it needs to close the gap with Google.  Their first words are that “Open source is always friend to the No. 2 player in a market and always the enemy of the top dog.”  I don’t think Hadoop by itself is enough, but if Yahoo were to create a collaborative search service, maybe it would be.  In fact, what if search was much more like Facebook only more open (Hey, if Scoble can do it with a hotel, I can do it with a search engine!)?  In a manner similar to my “Web Hosting Plan for World Domination“, Yahoo could undertake a plan for “Search Engine World Domination”.  Here’s how it would work:

-  Yahoo builds up the Hadoop Open Source infrastructure for Web Crawling.  Alt Search engines can tie back into that to get the raw data and avoid doing their own crawling.  Even GigaOm says “The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that matter) is the high cost of infrastructure.”  Let’s share those costs and further defray them by having a big player like Yahoo help out.

-  Yahoo can also offer up the Hadoop scaffolding to do any massively parallel processing these Alt Search Engines need to compute their indices.  Think of it as being like Amazon’s EC2 and S3, but purpose-built to simplify search engines.  People are already asking Amazon for Search Engine AMI’s, so there is clearly interest.

-  Now here is there Facebook piece of the puzzle:  Yahoo needs to turn this whole infrastructure play into a Social Networking play.  That means they offer Search Widgits to any Social Network that wants them, and they let you personalize your own search experience by collecting the widgits you like.  Most importantly, Yahoo creates basic widgits that reflect their current search offering, but they allow the Alt Search Engines to make widgits that package their search functionality.  Take a look at Tafiti and see how it let’s you select different “views”.  Those views are widgits!

-  Yahoo gets a big new channel for its ads, and it gracioulsy shares the revenues with the Widgit builders because that’s what makes the world go round.  Perhaps they even have virtual dollars that can be used to pay for the infrastructure using ad revenue, although I personally think they should give away as much infrastructure as possible to attract the Alt Search crowd to their platform. 

Don Dodge, meanwhile, is wondering what the exit strategy is for the almost 1,000 startups out there trying to peddle alternative search engines.  It sure seems to me that creating this search widgit social network world solves a big problem for Yahoo and at the same time creates a lot of new opportunity for the exit strategy of these engines.  Suddenly, they have access to large volumes of data they couldn’t afford and a distribution channel in which to build an audience. 

Open Source Swarm Competition in the Search Engine Space is Born!

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, business, ec2, grid, Marketing, Open Source, Partnering, software development, user interface, venture, Web 2.0 | 3 Comments »

Are You Red Shifted? (aka Do you use Utility Computing, Web 2.0, and Every Other Cool Thing???)

Posted by Bob Warfield on August 21, 2007

Sun’s CTO Greg Papadapoulos has been espousing what he calls the Red Shift theory of computing . Aside from having its own Wikipedia entry, the Red Shift theory has something to offer to a variety of audiences.  It gives permission to believe the computing industry is entering another period of hypergrowth.  It provides commentary on Moore’s Law, and which types of problems may or may not encounter the Multicore Crisis.  It gives a reason to believe Sun can once again regain its lost glories by leading the Red Shifted contingents.  And lastly, it provides yet another way to talk about whether your organization is a hip Web 2.0 “Red Shifted” organization, or whether you’re one of those oh-so-yesterday “Blue Shifted” deals.

What then is the Red Shift theory?  For starters, red and blue shift have to do with Doppler effects on light that tell us whether stars or galaxies are moving towards us or away from us and hence whether the universe is expanding or contracting.  Ignore all of that, it has little to do with the theory at hand, which has another meaning and simply uses the terminology as packaging.  Simply put, Papadapoulos postulates that demand for computing resources is segmented into a hyper growth “Red Shifted” segment and a much slower growing “Blue Shifted” segment.  In fact, the definitions for “fast” and “slow” have “fast” being growth that is much faster than Moore’s Law and “slow” being growth that is much slower. 

Slow growth is fueled by demand that grows, well slowly.  This demand is basic spurred by the use of computers to manage conventional financial transaction.  In other words, this segment is what most of the Enterprise Software Industry does today.  If much of your computer usage is built around this kind of activity, you live in a Blue Shifted world.  We shouldn’t expect much to happen in this world, and if we are bored with Enterprise Software today, it’s because too much of it is doing the basic plumbing for the Blue Shifted world.  That world isn’t suddenly going to wake up and start going gangbusters again, it’s done.  In fact, consolidation, virtualization and more power efficient components will be the dominant activities as enterprises try to reduce costs for core applications and services.  Virtualization et al will further reduce growth as the Blue Sector figures out how to use what it has ever more efficiently.  Growth here has regressed to the mean of GDP growth, which is slow indeed by computer industry standards.  End of an era.

The Red Shifted world is the more exciting world.  The thought leader for Red Shift is Web 2.0.  Not far out of the limelight are such applications as financial market simulations, drug industry research, and computer animation.  The shift to SaaS (growing 43 percent annually, according to a recent report by RBC Capital Markets), while it involves moving Blue Zone applications, will also deliver Red Shifted growth  because of the rate of conversion.  Their demand for computing is slated to increase at a rate faster than Moore’s Law, which is voracious indeed.  To make matters even more interesting, Papadopoulos goes on to argue that companies who embrace Red Shifted applications will grow much faster than those that stick to their Blue Shift knitting.  To paraphrase Will Smith, “I gotta get me summa dat Red Shift!”

Of course the theory goes on to describe a bright future for Sun which is well positioned to deliver scale efficient infrastructure, which they call “brutally” efficient infrastructure.  Microsoft and IDC agree with the vision, so they see something bright in their future around it too.  Sun goes on to project that at some point in the not too distant future, there will be just 5 massive data centers worldwide doing all of this business.  Wow!  Of course they haven’t heard about my web hosting plan for world domination yet, so maybe there will be 5+1 where the “+1” is a consortium of much smaller vendors delivering a shared utility computing fabric.

Personally, I like and agree with many aspects of the Red Shift Theory.  I’ve said many times during my Enterprise Career that Moore’s Law has passed up the growth in financial transactions and that this will lead to a cheapening of infrastructure cost for conventional Enterprise applications.  It’s about time too.  The centralization and skillset focus that SaaS brings to the table will bring further economies of scale to the table and make traditional computing still cheaper.  I also agree that bringing in the Web 2.0 collaborative-connectedness paradigm is another world changer, and one we are much closer to first steps on than SaaS.

There are some aspects of the theory I wonder about, however.  For example, Sun is obsessed that their “brutal efficiency” mantra means Big Iron/Big Servers.  When you have a hammer, everything is a nail. The trouble is that market experience seems to imply commodity computing is a lot cheaper than Big Iron.  Google, Yahoo, Amazon, et al are built on Lintel (Linux + Intel compatible) machines that are cheap.  Their infrastructure lashes together thousands of these boxes.  My experience at Callidus Software with our grid-computing based Enterprise software was that it was only the database that benefited from expensive Big Iron boxes.  Moreover, having written our app to run on its own mini-grid, we tended to minimize the DB as much as possible to the point where only 25% of the cpus had to be Big Iron DB-class machines.  Interestingly, the 25% often cost as much as the remaining 75%!  Because of this, I’m much more sold on utility computing infrastructures delivered on commodity Lintel stacks.  Just the sort of thing Amazon offers with EC2 and S3.

The other interesting viewpoint I came across was a number of folks who had concluded that the practical upshot of the Red Shift theory is that the future belongs to the database, not the processor.  The argument is that if you’re data intensive, you are by definition in the Red Zone, so companies like credit card processors are there.  I don’t agree.  Database scaling will be an important axis for Red Shifted companies to master, but the database is an effect, not a cause, and not all data-intensive activities like credit card processing will necessarily grow at rates faster than Moore’s Law.  I do think we’ll see some fundamental changes in how the world works with databases, and perhaps those changes will break Oracle’s hedgemony at the high end, but the DB is the tail wagging the dog.  I also think the data has huge value and we’re just beginning to think about the idea that the data may in fact be more valuable even than the software, particularly in a Web 2.0 context.  However, first you have to be doing something that generates all those data volumes, delivers truly valuable data, and that gets us back to the original Red Shift argument about which kinds of apps qualify.This is all just another side of the whole Multicore Crisis too.  After all, what do you do if your business is growing faster than Moore’s Law and you just bought the biggest machine they make?  Lest you laugh, this is actually what happened to eBay during the bad old days of their service outages.  We were located on their campus in Campbell and I used to watch the satellite crews set up to interview the eBayers about what had gone wrong.  Fundamentally, they had a monolithic architecture doing their auction search and once they had it running on the biggest mainframe Sun could sell them, they were stuck.  This precipitated a total rewrite to get things to horizontally scale.  I’m sure it was a harrowing experience, but we will see it play out over and over again as the Red Shift collides with the Multicore Crisis.

It’s an exciting world we live in!

p>

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, business, data center, ec2, grid, Marketing, multicore, saas, Web 2.0 | Leave a Comment »

How Does Virtualization Impact Hosting Providers? (A Secret Blueprint for Web Hosting World Domination)

Posted by Bob Warfield on August 16, 2007

I’ve written in the past about data centers growing ever larger and more complex in the era of SaaS and Web 2.0.  My friend Chris Cabrera, CEO of SaaS provider Xactly, recently commented along similar lines  when asked about the VMWare IPO. 

Now Isabel Wang who really understands the hosting world has written a great post on the impact of virtualization (in the wake of VMWare’s massive IPO) on the web hosting business.  I took away several interesting messages from Isabel’s post:

-          Virtualization will be essential to the success of Hosters because it lets them offer their service more economically by upping server utilization.  It’s an open question whether those economies are passed to the customer or the bottom line.

-          These technologies help address the performance and scalability issues that keep a lot of folks awake at night.  Amazon’s Bezos and Microsoft’s Ray Ozzie realize this, and that’s why they’re rushing full speed ahead into this market.  They’ve solved the problems for their organizations and see a great opportunity to help others and make money along the way.

-          The market has moved on from crude partitioning techniques to much more sophisticated and flexible approaches.  Virtualization in data centers will be layered, and will involve physical server virtualization, utility computing fabric comprised of pools of servers across multiple facilities, applications frameworks such as Amazon Web Services, and Shared Services such as identity management.  This complexity tells us the virtualization wars are just beginning and VMWare isn’t even close to looking it all up, BTW.

-          This can all be a little threatening to the established hosting vendors.  Much of their expertise is tied up in building racks of servers, keeping them cool, and hot swapping the things that break.  The new generation requires them to develop sophisticated software infrastructure which is not something they’ve been asked to do in that past.  It may wind up being something they don’t have the expertise to do either.  These are definitely the ingredients of paradigm shifts and disruptive technologies!

We’re talking about nothing less than utility computing here, folks.  It’s a radical step-up in the value hosting can offer, and it fits what customers really want to achieve.  Hosting customers want infinite variability, fine granularity of offering, and real-time load tracking without downtime like the big crash in San Fran that recently took out a bunch of Web 2.0 companies.  They want help creating this flexibility in their own applications.  They want billing that is cost effective and not monolithic.  Billing that lets them buy (sorry to use this here) On-demand.  After all, their own businesses are selling On-demand and they want to match expenses to revenue as closely as possible to create the hosting equivalent of just in time inventory. Call it just in time scaling or just in time MIPS.  Most of all, they want to focus their energies on their distinctive competencies and let the hoster solve these hard problems painlessly on their behalf.

When I read what folks like Amazon and the Microsofties have to say about it, I’m reminded of the Intel speeches of yore  that talked about how chip fabs would become so expensive to build that only a very few companies would have the luxury of owning them and Intel would be one of those companies.  Google, for example, spends $600 million on each data center.  Big companies love to use big infrastructure costs to create the walls around their very own gardens!  Why should the hosting world be any different?

The trouble is, the big guys also have a point.  To paraphrase a particular blog title, “Data centers are a pain in the SaaS”.  They are a pain in the Web 2.0 too.  Or, as Amazon.com Chief Technology Officer, Werner Vogels said, “Building data centers requires technologists and engineering staff to spend 70% of their efforts on undifferentiated heavy lifting.”

Does this mean the big guys like Amazon and Microsoft (and don’t forget others like Sun Grid) will use software layers atop their massive data centers to massively centralize and monopolize data centers?  Here’s where it gets interesting, and I see winning strategies for both the largest and smaller players.

First, the big players worry about how to beat each other, not the little guys.  Amazon knows Microsoft will come gunning for them, because they must.  Can Amazon really out innovate Microsoft at software?  Maybe.  The world needs an alternative to Microsoft anyway.  But the answer when competing against players like Microsoft and IBM has historically been to play the “Open System vs. Monolithic Proprietary System” card.  It has worked time and time again, even allowing the open system to beat better products (sorry Sun, the Apollo was better way back when!).

How does Amazon do this to win the massive data center wars?  It’s straightforward:  they place key components of Amazon Web Services into the Open Source community while keeping critical gate keeping functions closed and under their control.  This lets them “franchise” out AWS to other data centers.  If you are a web hoster and you can offer to resell capacity that is accessible with Amazon’s API’s, wouldn’t that be an attractive way to quit worrying so much about it?  Wouldn’t it make the Amazon API dramatically more attractive if you knew there would be other players supporting it? 

Amazon, meanwhile, takes a smaller piece of a bigger pie.  They charge their franchisees for the key pieces they hold onto to make the whole thing work.  Perhaps they keep the piece needed to provision a server and get back an IP and charge a small tax to bring a new server for EC2 or S3 online in another data center.  How about doing the load balancing and failover bits?  Wouldn’t you like it if you could buy capacity accessed through a common API that can fail over to any participating data center in the world?  How about being able to change your SaaS datacenter to take advantage of better pricing simply by reprovisioning any or all of the machines in your private cloud to move?  How about being able to tell your customers your SaaS or Web 2.0 offering is that much safer for them to choose because it is data center agnostic?

BTW, any of the big players could opt to play this trump card.  It just means getting out of the “I want to own the whole thing” game of chicken and taking that smaller piece of a bigger pie.  Would you buy infrastructure from Google or Yahoo if they offered such a deal?  Why not?  Whoever opens their system gains a big advantage over those who keep theirs monolithic.  It answers many of the objections raised in an O’Reilly post about what to do if Amazon decides to get out of the business or has a hiccup.

Second, doesn’t that still mean the smaller players of less than Amazon/Google/Microsoft stature are out in the cold?  Not yet.  Not if they act quickly, before the software layers needed to get to first base become too deep and there are too many who have adopted those layers.  What the smaller players need to do is immediately launch a collaborative Open Source project to develop Amazon-compatible API’s that anyone can deploy.  Open Source trumps Open System which trumps Closed Monoliths.  It leverages a larger community to act in their own enlightened self-interest to solve a problem no single one of these players can probably afford to solve on their own.  Moreover, this is the kind of problem the Uber Geeks love to work on, so you’ll get some volunteers.

Can it be done?  I haven’t looked at it in great detail, but the API’s look simple enough today that I will argue it is within the scope of a relatively near-term Open Source initiative.  This is especially true if a small consortium got together and started pushing.  One comment from that same O’Neil blog post said, “From an engineering standpoint, there’s not much magic involved in EC2.  Will you suffer for a while without the nifty management interface? Sure. Could you build your own using Ruby or PHP in a few days? Yep.”  I don’t know if it’s that easy, but it sure sounds doable.  By the way, the “nifty management interface” is another gatekeeper Amazon might hold on to and monetize.

But wait, won’t Amazon sue?  Perhaps.  Perhaps it tips their hands to Open Source it themselves.  Legal protection of API’s is hard.  The players could start from a different API and simple build a connector that lets their different API also work seamlessly with Amazon and arrive at the same endpoint—developers who write to that API can use Amazon or any other provider that supports the API.

You only need three services to get going:  EC2, S3, and a service Amazon should have provided that I will call the “Elastic Data Cloud”.  It offers mySQL without the pain of losing your data if the EC2 instance goes down.  By the way, this is also something a company bent on dominating virtualization or data center infrastructure could undertake, it is something a hardware vendor could build and sell to favor their hardware, and its something some other player could go after.  The mySQL service, for example, would make sense for mySQL themselves to build.  One can envision similar services and their associated machine images being a requirement after some point if you want to sell to SaaS and Web companies.  Big Enterprise might undertake to use this set of API’s and infrastructure to remainder unused capacity in their data centers (unlikely, they’re skittish), help them manage their data centers (yep, they need provisioning solutions), use outsourcers to get apps distributed and hardened for disaster recovery, and the like.

So there you have it, hosting providers, virtualizers, and software vendors:  a blueprint for world domination.  I hope you go for it. I’m building stuff I’d like to host on such a platform, and I’m sure others are too!

Note that the game is already afoot with Citrix having bought XenSource.  Why does this put things in play?  Because Amazon EC2 is built around Xen.  Hmmmm…

Some late breaking news: 

There’s been a lot of blogging lately over whether Yahoo’s support of Open Sourced Hadoop will help them close the gap against Google.  As ZDNet points out, “Open source is always friend to the No. 2 player in a market and always the enemy of the top dog.”  That’s basically my point on the Secret Blueprint for Web Hosting World Domination.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, business, data center, ec2, grid, multicore, Open Source, Partnering, saas, venture, Web 2.0 | 8 Comments »

Amazon is the Hardware and OS Vendor of SaaS

Posted by Bob Warfield on May 27, 2007

Continuing the “Total SaaS Enterprise” theme, where every aspect of computing in an enterprise is purchased as SaaS except, perhaps for the laptops and internet connection (but then see www.centerbeam.com!), how do Amazon’s Web Services fit in?

AWS offers several services at this time.  The ones I want to talk about are EC2 (the “Elastic Computer Cloud”), S3 (“Simple Storage Service”), and SQS (“Simple Queue Service”).  Using EC2, one can get control of individual machines roughly equivalent to  a 1.7Ghz x86 processor, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of network bandwidth.  These machines are paid for by the hour at a rate of 10 cents an hour, with additional charges for connectivity outside the Amazon world.  Communications inside, between EC2 machines or S3, are free.  S3 offers the equivalent sort of service for bulk storage, offered at a rate of 15 cents per gigabyte per month, with charges to move data in and out of S3, but, cleverly, it is cheaper to move data in than out.  Lastly SQS, is a messaging system, that charges microcents to send reliable messages between processes that are queued.  For example, it would make an effective way for your EC2 machines to communicate with one another, or perhaps for machines outside the Amazon world to communicate into their EC2 resources.

What a cool concept!  And in fact, despite the fact it is relatively new, it has captured the imagination of many developers out there.  In fact, when I checked this morning, I got more hits on Amazon EC2 than I did Salesforce AppExchange on Google, despite the fact AppExchange has been available for much longer and EC2 is still in early beta test.

I got to thinking about the whole concept, and I like it a lot.  When looking at where to place it in the pantheon of SaaS offerings, it seems to me that what Amazon is offering is the equivalent of what Hardware and OS vendors offer under perpetual license.   The difference is you don’t have to install it, pay for HVAC to cool it, and so on.  The classic advantages of SaaS are available even for raw hardware. 

So my elevator pitch for Amazon is “Amazon Web Services makes it the SaaS of Hardware and OS vendors”.

Posted in amazon, ec2, saas, software development | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 324 other followers

%d bloggers like this: