Amazon Startup Project Report
Posted by Bob Warfield on September 13, 2007
I attended the Silicon Valley edition of the Amazon Startup Project today. This is their second such event, the first having been hosted in home-town Seattle. The event took place at the Stanford University Faculty and was well attended: they basically filled the hall. The agenda included an opening by Andy Jassy, Sr VP for Amazon Web Services, a discussion on the services themselves by Amazon Evangelist Mike Culver, a series of discussions by various startups using the services, a conversation with Kleiner Perkins VC Randy Komisar, and closing remarks by Jassy again. Let me walk through what I picked up from the various segments.
First up were the two talks by Amazon folk, Jassy and Mike Culver. Jassy kept it pretty light, didn’t show slides, and generally set a good tone for what Amazon is trying to accomplish. The message from him is they’re in it for the long haul, they’ve been doing API’s for years, and the world should expect this to be a cash generating business for Amazon relatively shortly. That’s good news as I have sometimes heard folks wonder whether this is just remaindering infrastructure they can’t use or whether they are in fact serious. The volumes of data and cpu they’re selling via these services are enormous and growing rapidly.
Mike Culver’s presentation basically walked through the different Amazon Web Services and tried to give a brief overview of what they were, why you’d want such a thing, and examples of who was using them. I had several takeaways from Mike’s presentation. First, his segment on EC2 (Elastic Compute Cloud–the service that sells CPU’s) was the best. His discussion of how hard it can be to estimate and prepare for the volumes and scaling you may encounter was spot on. Some of the pithier bullets included:
- Be prepared to scale down as well as up.
- Queue everything and scale out the servicing of the queues.
He showed a series of Alexa traffic slides that were particularly good. First he showed CNN’s traffic:
As you can see, there are some significant peaks and valleys. In theory, you’d need to build for the peaks and eat the cost of overcapacity for the valleys if you build your own data center. With a utility computing fabric like Amazon’s you can scale up and down to deal with the demand. He next overlaid Flickr onto this data:
Flickr’s problem is a little different. They went along for a while and then hit a huge spike in Q206. Imagine having to deal with that sort of spike by installing a bunch of new physical hardware. Imagine how unhappy your customers would be while you did it and how close you would come to killing your staff. Spikes like that are nearly impossible to anticipate. CNN has bigger spikes, but they go away pretty rapidly. Flickr had a sustained uptick.
The last view overlaid Facebook onto the graph:
Here we see yet another curve shape: exponential growth that winds up dwarfing the other two in a relatively short time. Amazon’s point is that unless you have a utility computing fabric to draw on, you’re at the mercy of trying to chase one of these unpredictable curves, and you’re stuck between two ugly choices: be behind the curve and making your customers and staff miserable with a series of painful firedrills, or be ahead of the curve and spend the money to handle spikes that may not be sustained, thereby wasting valuable capital. Scaling is not just a multicore problem, it’s a crisis of creating a flexible enough infrastructure that you can tweak on a short time scale and pay for it as you need it.
One of the things Mike slid in was the idea that Amazon’s paid for images were a form of SaaS. To use EC2, you first come up with a machine image. The image is a snapshot of the machine’s disk that you want to boot. Amazon now has a service where you can put these images up and people pay you money to use them, while Amazon gets a cut. The idea that these things are like SaaS is a bit far fetched. By themselves they would be Software without much Service. However, the thought I had was that they’re really more like Web Appliances. Some folks have tried to compare SaaS and Appliance software–I still think it doesn’t wash for lack of Service in the appliance, but this Amazon thing is a lot cleaner way to deliver an appliance than having to ship a box. Mike should change his preso to push it more like appliances!
All of the presentations were good, but the best ones for me were by the startup users of the services. What was great about them was that they pulled no punches. The startups got to talk about both the good and bad points of the service, and it wasn’t too salesy about either Amazon or what the startups were doing. It was more like, “Here’s what you need to know as you’re thinking about using this thing.” I’ll give a brief summary of each:
Jon Boutelle, CTO, Slideshare
The Slideshare application is used to share slideshows on the web, SaaS-style. Of course Jon’s preso was done using slideware. His catchy title was “How to use S3 to avoid VC.” His firm bootstrapped with minimum capital, and his point is not that you have to get the lowest possible price per GB (Amazon isn’t that), but that the way the price is charged matters a lot more to a bootstrapping firm. In his firm’s case, they get the value out of S3 about 45 days before they have to pay for it. In fact, they get their revenue from Google AdSense in advance of their billing from Amazon, so cash flow is good!
He talked about how they got “TechCrunched” and the service just scaled up without a problem. Many startups have been “TechCrunched” and found it brought the service to its knees because they got slammed by a wall of traffic, but not here.
Joyce Park, CTO, Renkoo/BoozeMail
Joyce was next up and had a cool app/widget called BoozeMail. It’s a fun service that you can use whether or not you’re on Facebook to send a friend a “virtual drink”. Joyce gave a great overview of what was great and what was bad about Amazon Web Services. The good is that it has scaled extremely well for them. She ran through some of their numbers that I didn’t write down, but they were very large. The bad is that there have been some outages, and its pretty hard to run things like mySQL on AWS (more about that later).
BoozeMail is using a Federated Database Architecture that tracks the senders and receivers on multiple DB servers. The sender/receiver lists are broken down into groups, and they will not necessarily wind up on the same server. At one point, they lost all of their Amazon machines simultaneously because they were all part of the same rack. This obviously makes failover hard and they were not too happy about it.
Persistence problems with Amazon are one of the thorniest issues to work through. Your S3 data is safe, but an EC2 instance could fall over at any time without much warning. Apparently Renkoo is beta testing under non-disclosure some technology that makes this better, although Joyce couldn’t talk about it. More later.
Something she mentioned that the others echoed is that disk access for EC2 is very slow. Trying to get your data into memory cache is essential, and writes are particularly slow. Again, more on the database aspects in a minute, but help is on the way.
Sean Knapp, President of Technology, Ooyala
Ooyala is a cool service that let’s you select objects on high quality video. The demo given at Startup Day was clicking on a football player who was about to make a touchdown to learn more about him. Sean spent most of his preso showing what Ooyala is. It is clearly an extremely impressive app, and it makes deep use of Amazon Web Services to virtually eliminate any need for doing their own hosting. The message seemed to be if these guys can make their wild product work on Amazon, you certainly can too.
Don MacAskill, CEO, Smugmug
I’ve been reading Don’s blog for a while now, so I was pleased to get a chance to meet him finally. Smugmug is a high end photo sharing service. It charges for use SaaS-style, and is not an advertising supported model. As I overheard Don telling someone, “You can offer a lot more when people actually pay you something than you can if you’re just getting ad revenue.” Consequently, his customer base includes some tens of thousands of professional photographers who are really picky about their online photo experience.
Smugmug has been through several generations of Amazon architectures, and may be the oldest customer I’ve come across. They started out viewing Amazon as backup and morphed until today Amazon is their system of record and source of data that doesn’t have to be served too fast. They use their own data center for the highest traffic items. The architecture makes extensive use of caching, and apparently their caches get a 95% hit rate.
Don talked about an area he has blogged on in the past, which is how Amazon saves him money that goes right to the bottom line.
Don’s summary on Amazon:
- A startup can’t go wrong using it initially
- Great for “store a lot” + “serve a little”
- More problematic for “serve a lot”
There are performance issues with the architecture around serve a lot and Don feels they charge a bit too much (though not egregiously) for bandwidth. His view is that if you use more than a Gigabit connection, Amazon may be too expensive, but that they’re fine up to that usage level.
His top feature requests:
– Better DB support/persistence
– Control over where physically your data winds up to avoid the “my whole rack died” problem that Joyce Park talked about.
The Juicy Stuff and Other Observations
At the end of the startup presentations, they opened up the startup folks to questions from the audience. Without a doubt, the biggest source of questions surrounded database functionality:
– How do we make it persist?
– How do we make it fast?
– Can we run Oracle? Hmmm…
It’s so clear that this is the biggest obstacle to greater Amazon adoption. Fortunately, its also clear it will be fixed. I overheard one of the Amazon bigwigs telling someone to expect at least 3 end of year announcements to address the problem. What is less clear is whether the announcements would be:
a) Some sort of mySQL service all bundled up neatly
b) Machine configurations better suited to DB use: more spindles and memory was mentioned as desireable
c) Some solution to machines just going poof! In other words, persistence at least at a level where the machine can reboot, access the data on its disk, and take off again without being reimaged.
d) Some or all of the above.
Time will tell, but these guys know they need a solution.
The other observation I will make is one that echoes Don’s observation on Smugmug: I’m sure seeing a lot of Mac laptops out in the world. 3 of the 4 presenters were sporting Macs, and 2 of them had been customized with their company logos on the cover. Kewl!