SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the ‘multicore’ Category

Amazon Beefs Up EC2 With New Options

Posted by Bob Warfield on October 16, 2007

I’ve been a big fan of Amazon’s Web Services for quite a while and attended their Startup Project, which is an afternoon seeing what it can do and hearing from entrepreneurs who’ve built on this utility computing fabric.  Read my writeup on the Startup Project for more.  Amazon has been steadily rolling out improvements, such as the addition of SLA’s for the S3 storage service.  Today, there is big news in the Amazon EC2 camp:

Amazon has just announced two new instance types for their EC2 utility computing service.  The original type will continue to be available as the “small” type.  The “large” type has four times the CPU, RAM, and Disk Storage, while the “extra large” has eight times the CPU, RAM, and Disk.  The large and extra large also sport 64 bit cpus.  Supersize your EC2!

Why do this?  Because the original small instance was a tad lightweight for database activity with just 1.7GB of RAM while the extra large at 15GB is about right.  Imagine a cluster of the extra large instances running memcached and you can see how this going to dramatically improve the possibilities for hosting large sites.

One of the neat things about this new announcement is pricing.  They’ve basically linearly scaled pricing.  Whereas a small instance costs 10 cents per instance hour, the extra large has 8x the capacity and costs 8×10 cents or 80 cents per hour.

What’s next?  These new instances open a lot of possibilities, but Amazon still doesn’t have painless persistence for databases like mySQL.  If you are running mySQL on an extra large instance and the server goes down for whatever reason, all the data on it is lost and you have to rebuild a new machine around some form of hot backup or failover.  That exercise has been left to the user.  It’s doable: you have to solve the problem in any data center of what you plan to do if the disk totally crashes and no data can be recovered.  However, folks have been vocally requesting a better solution from Amazon where the data doesn’t go away and the machine can be rebooted intact.  I was told by the EC2 folks at the Startup Project to expect 3 announcements before the end of the year that were related.  I’m guessing this is the first such announcement and two more will follow. 

There’s tremendous excitement right now around these kinds of offerings.  They virtualize the data center to reduce the cost and complexity of setting up the infrastructure to do web software.  They allow you to flex capacity up or down and pay as you go.  Amazon is not the only such option.  I’ll be reporting on some others shortly.  It’s hard to see how it makes sense to build your own data center without the aid of one of these services any more. 

Posted in amazon, ec2, grid, multicore, platforms, saas, software development, Web 2.0 | 2 Comments »

To Escape the Multicore Crisis, Go Out Not Up

Posted by Bob Warfield on September 29, 2007

Of course, you should never go up in a burning building, go out instead.  Amazon’s Werner Voegels sees the Multicore Crisis in much the same way:

Only focusing on 50X just gives you faster Elephants, not the revolutionary new breeds of animals that can serve us better.

Voegels is writing there about Michael Stonebreaker’s claims that he can demonstrate a database architecture that outperforms conventional databases by a factor of 50X.  Stonebreaker is no one to take lightly: he’s accomplished a lot of innovation in his career so far and he isn’t nearly done.  He advocates replacing the Oracle (and mySQL) style databases (which he calls legacy databases) with a collection of special purpose databases that are optimized for particular tasks such as OLTP or data warehousing.  It’s not unlike the concept myself and others have talked about that suggests that the one-language-fits-all paradigm is all wrong and you’d do better to adopt polyglot programming.

I like Stonebreaker’s work.  While I want the ability to scale out to any level that Voegels suggests, I will take the 50X improvement as a basic building block and then scale that out if I can.  That’s a significant scaling factor even looked at in the terms of the Multicore Language Timetable.  It’s nearly 8 years of Moore’s Cycles.  I’m also mindful that databases are the doorway to the I/O side of the equation which is often a lot harder to scale out.  Backing an engine that’s 50X faster sucking the bits off the disk with memcached ought to lead to some pretty amazing performance.

But Voegels is right, in the long term we need to see different beasts than the elephants.  It was with that thought in mind that I’ve been reading with interest articles about Sequoia, an open source database clustering technology that makes a collection of database servers look like one more powerful server.  It can be used to increase performance and reliablity.  It’s worth noting that Sequoia can be installed for any Java app using JDBC without modifying the app.  Their clever monicker for their technology is RAIDb:  Redundant Array of Inexpensive Databases.  There are different levels of RAIDb just as there are RAID levels that allow for partitioning, mirroring, and replication.  The choice of level or combinations of levels governs whether your applications gets more performance, more reliability, or both.

Sequoia is not a panacea, but for some types of benchmarks such as TPC-W, it shows a nearly linear speedup as more cpus are added.  It seems likely a combination of approaches such as Stonebreaker’s specialized databases for particular niches and clustering approaches like Sequoia all running on a utility computing fabric such as Amazon’s EC2 will finally break the multicore logjam for databases.

Posted in amazon, ec2, grid, multicore, Open Source, platforms, software development | 4 Comments »

Memcached: When You Absolutely Positively Have to Get It To Scale the Next Day

Posted by Bob Warfield on September 23, 2007

Why should “Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0″ care about a piece of technology like memcached?  Because it just might save your bacon, that’s why. 

Suppose your team have just finished working feverishly to implement “Virus-a-Go-Go”, your new Facebook widget that is guaranteed to soar to the top of the charts.  You launch it, and sure enough you were right.  Zillions of hits are suddenly raining down on your new widget. 

But you were also wrong.  You woefully underestimated in your architecture what it would take to handle the traffic that is now beating your pitiful little servers into oblivion.  Angry email is flowing so fast it threatens to overwhelm your mail servers.  Worse, someone has published your phone number in a blog post, and now it rings continuously.  Meanwhile, your software developers are telling you that your problem is in your use of the database (seems to be the problem so much of the time, doesn’t it?), and the architecture is inherently not scalable.  Translation:  they want a lot of time to make things faster, time you don’t have.  What to do, what to do?

With appologies to Federal Express, the point of my title is that memcached may be one of the fastest things you can retrofit to your software to make it scale.  Memcached when properly used has the potential to increase performance by hundreds or sometimes even thousands of times.  I don’t know if you can quite manage it overnight, but desperate times call for desperate measures.  Hopefully next time you are headed for trouble, you’ll start out with a memcached architecture in advance and buy yourself more time before you hit a scaling crunch.  Meanwhile, let me tell you more about this wonder drug, memcached.

What is memcached? 

Simply put, memcached sits between your database and whatever is generating too many queries on it and attempts to avoid repetitious queries by caching the answer in memory.  If you ask it for something it already knows, it retrieves it very quickly from memory.  If you ask it for something it doesn’t know, it gets it from the database, copies it into the cache for future reference and hands it over.  Someone once said, it doesn’t matter how smart you are, if the other guy already knows the answer to a hard question, he can blurt it out before you can figure it out.  And this is exactly what memcached does for you. 

The beautiful thing about memcached is that it can usually be added to your software without huge structural changes being necessary.  It sits as a relatively transparent layer that does the same thing your software has always done, but just a whole lot faster.  Most of the big sites use memcached to good effect.  Facebook, for example, uses 200 quad core machines that each have 16GB of RAM to create a 3 Terabyte memcached that apparently has a 99% hit rate.

Here’s another beautiful thought: memcached gives you a way to leverage lots of machines easily instead of rewriting your software to eliminate the scalability bottleneck.  You can run it on every spare machine you can lay hands on, and it soaks up available memory on those machines to get smarter and smarter and faster and faster.  Cool!  Think of it as a short term bandaid to help you overcome your own personal Multicore Crisis.

What’s Needed

Getting the memcached software is easy.  The next step I would take is to go read some case studies from folks using tools similar to yours and see how they’ve integrated memcache into their architectural fabric.  Next, you need to look for the right place to apply leverage with memcache within your application.  Memcached takes a string to use as a key, and returns a result associated with the key.  Some possibilities for keys include:

-  A sql query string

-  A name that makes sense:  <SocialNetID>.<List of Friends>

The point is that you are giving memcached a way to identify the result you are looking for.  Some keys are better than others–think about your choice of key carefully!

Now you can insert calls to memcached into your code in strategic places and it will begin to search the cache.  You’ll also need to handle the case where the cache has no entry by detecting it and telling memcached to add the missing entry.  Be careful not to create a race condition!  This happens when multiple hits on the cache (you’re using a cache because you expect multiple hits, right?) cause several processes to compete with who gets to put the answer into memcached.  There are straightforward solutions available, so don’t panic.

Last step?  You need to know when the answer is no longer valid and be prepared to tell memcached about it so it doesn’t give you back an answer that’s out of date.  This can sometimes be the hardest part, and giving careful thought to what sort of key you’re using is important to making this step easier.  Think carefully about how often its worth updating the cache too.  Sometimes availability is more important than consistency.  The more you update, the fewer hits on the cache will be made between updates, which will slow down your system.  Sometimes things don’t have to be right all the time.  For example, do you really need to be able to lookup which of your friends are online at the moment and be right every millisecond?  Perhaps it is good enough to be right every two minutes.  It’s certainly much easier to cache something changing on a two minute interval.

Memcached is Not Just for DB Scaling

The DB is often the heart of your scalability problem, but memcached can be used to cache all sorts of things.  An alternate example would be to cache some computation that is expensive, must be performed fairly often, and whose answer doesn’t change nearly as often as it is asked for.   Sessions can also be an excellent candidate for storage in memcached although strictly speaking, this is typically more DB caching.

Downsides to memcached

  • Memcached is cache hit frequency dependant.  So long as your application’s usage patterns are such that a given entry in the cache gets hit multiple times, you’re good.  But a cache won’t help if every access is completely different.  In fact, it will hurt, because you pay the cost to look in the cache and fail before you can go get the value from the DB.  Because of this, you will need to verify that what you’re caching actually has this behaviour.  If it doesn’t, you’ll need to think of another solution.
  • Memcached is not secure by itself, so it must either be deployed inside your firewall or you’ll need to spend time building layers around it to make it secure.
  • Memcached needs memory.  The more the merrier.  Remember that Facebook is using 16GB machines.  This is not such a happy story for something like Amazon EC2 at the moment, where individual nodes get very little memory.  I have heard Amazon will be making 3 announcements by end of year to help DB users of EC2.  Perhaps one of these will involve more memory for more $$$ on an EC2 instance.  That would help both the DB and your chances of running memcached on EC2.  There are other problems with memcached on EC2 as well, such as a need to use it with consistent hashing to deal with machines coming and going, and the question of latency if all the servers are not in the EC2 cloud. 
  • Memcached is slower than local caches such as APC cache since it is distributed.  However, it has the potential to store a lot more objects since it can harness many machines.  Consider whether your application benefits from a really big cache, or whether some of the objects aren’t better off with smaller, higher performance, local caches.

Alternatives to memcached

  • Other caching mechanisms are available.  Be sure you understand the tradeoffs, and don’t take someone else’s benchmarks for granted.  You need to run your own! 

  • Static Web Pages.  Sometimes you can pre-render dynamic behaviour for a web page and cache the static web pages instead.  If it works out, it should be faster than memcached which only caches a portion of work required to render the page, usually just the DB portion.  However, rendering the whole page is a lot of work, so you probably only want to consider it if the page will be hit a lot and changed very little.  I’ve seen tag landing pages (e.g. show me all the links associated with a particular tag) done this way and only updated once a day or a couple of times a day.  You may also not have a convenient way to do static pages depending on how your application works.  The good news is static pages work great with Amazon S3, so you have a wonderful infrastructure on which to build a static page cache.

Conclusion

Memcached would seem to be an absolutely essential tool for scaling web software, but I noticed something funny while researching this article.  Scaling gets 765,013 hits on Google Blog Search.  Multicore (related to scaling) gets 88,960 hits.  Memcached only gets 15,062.  I can only hope that means most people already know about it and find the concepts easy enough to grasp that they don’t consider it worth writing about.  If only 1 in  50 of those writing about scaling know about memcached, its time more learned.

Go forward and scale out using memcached, but remember that its a tactical weapon.  You should also be thinking about how to take the next step, which is breaking down your DB architecture so it scales more easily using techniques like sharding, partitioning, and federated architectures.  This will let you apply multiple cores to the DB problem.  I’ll write more about that in a later post.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in multicore, saas, software development, strategy, Web 2.0 | 11 Comments »

And Now For Something Completely Different: Visual Programming Cracks the Multicore Barrier

Posted by Bob Warfield on September 19, 2007

I’ve been admiring LabView, a visual programming language created by a company called National Instruments.  To be precise, the language is actually called “G”.  These guys didn’t set out to solve the multicore problem, in fact LabView was introduced way back in 1986 before anyone was worried about multicore.  It’s used in data acquisition, instrument control, and industrial automation tasks. 

Jim Kring called my attention to LabView in a couple of different ways.  First, I came across his great book, LabView for Everyone, in a Borders Books.  It’s a wonderful introduction to the potential for this language, but you won’t find the book in the programming section–it’s over by the Engineering books.  Second, Jim and I have corresponded about the Multicore Crisis for a little while and he recently posted on his great blog about how LabView is one potential answer to the problem.

Why is LabView able to scale more easily than conventional curly braced languages?  The answer is simple, and perhaps surprising: because the language says very little about the sequence of execution.  It is what’s called a Dataflow Language.  You’re almost certainly familiar with the world’s most common dataflow language already: spreadsheets.  Like spreadsheets, LabView has a very simple way of specifying sequence of execution.  An object in LabView executes whenever all of its inputs are ready.  This corresponds to a spreadsheet where a formula may be recalculated as soon as any other cells it depends on have been recalculated.  So, in theory, every single cell in the spreadsheet that creates an input for a certain cell may be computed in parallel.  Likewise with the objects in LabView.  Conventional languages, by contrast, consist of lists of steps that must be evaluated strictly in order.

Here is the other amazing thing about these two.  People may complain about how hard it is to learn parallel languages like Erlang, but who complains about learning spreadsheets?  They’re easy.  LabView is so easy that National Instruments has managed to build an almost $700M a year business around it.  Their EBITDA income was a little over $100M on that, and they are growing at about 20% a year.  Now Sun won’t break out Java’s revenues, but this company is nearly 10% of Sun’s size.  I have to figure that if Java were close to $700M they’d want to talk about it.  How’s that for the biggest language company you’ve never heard of?

When we compare LabView to something like Erlang, it shows up pretty well.  Recursion is a fundamental construct in many parallel languages like Erlang, but it isn’t necessarily a fundamental construct for many human minds.  Yet the idea of delegation, which is one form of parallelism, and of an assembly line, another form of parallelism often called pipelining, is very natural for people, and is inherent in dataflow languages such as spreadsheets and LabView.

There are other languages like this floating around too.  Yahoo’s Pipes is one designed for doing mashups that’s very cool.  The ease of use seems to carry over into this domain too as I read various examples:

The list goes on, but there seems to be a lot of potential unleashed when you quit thinking about things as a linear list to be solved strictly in order and break loose a bit more parallelism.  Like I said, it’s something completely different to think about.

Posted in multicore, platforms, ria, software development, Web 2.0 | Leave a Comment »

Who Doesn’t Love Java? (You’d Be Surprised! -and- Part 2 of the Tool/Platform Rants)

Posted by Bob Warfield on September 17, 2007

When Sun’s Jonathan Schwartz announced that he was changing Sun’s ticker symbol to JAVA, I wasn’t surprised that a lot of folks saw it as a silly move (nor surprised to see Jonathan being defensive).  Now don’t get me wrong:  I like Java.  My last gig involved prodigious amounts of gnarly Java code in a highly scalable Enterprise grid computing application.  The thing is, I have a problem with the idea that a single language can be all things to all people.   

In addition to Java, I’ve used a lot of other languages, and I thought it would be interested to see who else does too:

GoogleCreated a massive foundation for their massively scalable search engine in C++.  The foundation includes MapReduce (a way to employ thousands of CPU’s and avoid the Multicore Crisis), BigTable (their subsitute for a database), and the Google File System.  It all runs on a few hundred thousand Linux commodity boxes.  To be sure, Google has grown so large they now employ some Java, but C++ is the big enchilada.

YahooPushed into PHP back in 2002 in order to improve productivity and quit “reinventing the wheel”.  Translation:  They didn’t want 70% of their coding to be wasted.  Their presentation on this decision is quite interesting.

YouTubeWritten in Python, a hip relatively recent scripting language.  Google also makes extensive use of Python.

Facebook:  The clever fellows at Facebook are definitely technologists (maybe that’s why they got a platform together ahead of their peers) and built a fancy RPC (remote procedure call) core technology that lets them use almost any language they want.  Shades of Google core tech commitment, eh?  When they talk about what languages are used primarily, Java lands in last place.  The pecking order is PHP, C++, Perl, Python, and Java.

MySpaceBuilt on the Microsoft .NET Stack, so of course no Java there.  Are these boys a glutton for punishment or what?  Yet it proves that it can be done and that it will scale.

DiggDigg is built with the PHP scripting language, which is the “P” (or at least one possible “P”) in those LAMP stacks you hear so much about.

Wikipedia:  Like Digg, Wikipedia was also built on PHP

Amazon:  I was surprised and intrigued to learn that Amazon is language agnostic.  Like Google and Facebook, they’ve invested in some core technology that lets a lot of languages play together freely.  Werner Vogels goes on to say, “Developers are like artists; they produce their best work if they have the freedom to do so, but they need good tools.” 

Flickr:  Everyone’s favorite photo sharing service relies largely on PHP and Perl, with one twitchy systems programming service written in Java. 

Croquet:  A massively multiplayer game environment done in Smalltalk.  Who woulda thunk it?

There are many more, but you get the point:  Some, if not the vast majority, of the web’s biggest movers and shakers have decided not make Java their sole language and others have excluded it entirely!  Sure, the immediate reaction is that there will always be some C++ zealots who prefer their language to Java, but that doesn’t change the list all that much.  What we do see are a lot of “P” languages:  Python, PHP, and Perl.  What’s up with that?

Recall Part 1 of the Tool/Platform Rant Series, which talked about how 70% of the Software You Build is Wasted?  These hugely successful web properties have taken steps to reduce that 70% waste, and while there are many factors that contributed to their success, I believe a desire for increased productivity played a significant role.  The days when these languages were essential for performance are coming to an end too, courtesy of the Multicore Crisis.

There are other ways to skin this cat of over-reliance on a language that’s too low level without giving it up entirely.  Who doesn’t feel like technology gave Google a tremendous edge?  One can argue that C++ isn’t really the language of Google, rather, MapReduce, BigTable, and the Google File System are their language.  C++ is just the assembly code used to write the modules that these other platforms mash up.  In fact, it makes sense to think that C, Java, and C++ are all just portable assembly languages.  By doing this, Google has been able to focus a much greater proportion of its resources to create a proprietary edge for itself.  So have all the others on the list.

It gets more radical:  Amazon doesn’t care what language its developers use.  According to Werner Vogels:

I think part of the chaotic nature—the emerging nature—of Amazon’s platform is that there are many tools available, and we try not to impose too many constraints on our engineers. We provide incentives for some things, such as integration with the monitoring system and other infrastructure tools. But for the rest, we allow teams to function as independently as possible.

You have to ask yourself in light of all this, when are the Curly Braced Languages most and least appropriate?  The decision criteria are shockingly stark:

  • You Need a Curly Braced Language When You Are Building Something Totally Proprietary That’s Critical to Your Business And Can’t Be Built With Any Other Language
  • Run Away From Curly Braced Languages And Choose Something Else Every Other Time

Another thing:  notice the companies profiled above that have created their own Component Architecture Frameworks or Core Technology.  Google has core technology in spades.  Facebook has a system that lets them write components in any language they choose.  Amazon also allows multiple languages so long as a few core functions are adhered to.  Isn’t that interesting?  These guys want to be Polyglot Programmers, and they’re investing valuable resources to make that a reality.  This is all part and parcel of the “meta” thing too.  The realization that to fully utilize computers, you must be facile at manipulating the fabric of the programs themselves, and not just the data they are processing.  Adopting a polyglot view positions these vendors better to offer a platform, because it means they can allow others to use the languages of their choice.  There are a lot of benefits to being language agnostic!

Polyglot Programming is becoming increasingly well accepted in an era when the Curly Braced Languages (Java, C++, et al) bring little to the Web 2.0, SaaS, and Multicored Crisis parties.  Those languages are too low-level.  There is no support there for multi-tenancy, web pages, security, scaling, or any of the myriad of problems one encounters when building a Web 2.0 or SaaS business.  You have three choices:  write all of that yourself and pay the 70% overhead tax or become a Polyglot Programmer and minimize the overhead by choosing the best tools for the task and leaving your Curly Braced Power Tools safely in the workshop only to be brought out when some highly custom proprietary work needs to be done.

Related Articles

ESB vs REST (Another Case for Multi-Language Programming)

Posted in grid, multicore, platforms, saas, software development, strategy, Web 2.0 | 11 Comments »

Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing

Posted by Bob Warfield on September 14, 2007

There’s been a lot of back and forth in the Python community over something called the “GIL” or Global Interpreter Lock.  Probably the best “get rid of the GIL” argument comes from Juergen Brendel’s post.  Guido, the benevolent dictator of Python has responded in his own blog that the GIL is here to stay and he doesn’t think it is a problem nor that it’s even the right choice to try to remove it.  Both combatants have been eloquent in expressing their views.  As is often the case, they’re optimizing to different design centers and likely will have to agree to disagree.

Now let’s try to pick apart this issue in a way that everyone can understand and make sense of for large scalability issues in the world of SaaS and Web 2.0.  Note that my arguments may be invalid if your scaling regime is much smaller, but as we’ve seen for sites like Twitter, big time scaling is hard and has to be thought about carefully.

First, a quick explanation on the GIL.  The GIL is a bit of code that causes multiple Python threads to have to wait before an object can be accessed.  Only one thread may access an object at a time. 

Whoa!  That sounds like Python has no ability to scale for multiple cores at all!  How can that be a good thing?  You can see where all the heat is coming from in this discussion.  The GIL just sounds bad, and one blogger refers to it jokingly as the GIL of Doom.

Yet all is not lost.  One can access multiple cpu’s using processes, and the processes run in parallel.  Experienced parallel programmers will know the difference between a process and a thread is that the process has its own state, while threads share their state with other threads.  Hence a thread can reach out and touch the other thread’s objects.  Python is making sure that when that touch happens, only one thread can touch at a time.  Processes don’t have this problem because their communication is carefully controlled and every process has its own objects.

Why do programmers care about threads versus processes?  In theory, threads are lighter weight and they can perform better than a process.  We used to argue back and forth at Oracle about whether to use threads or processes, and there were a lot of trade offs, but it often made sense to go for threads. 

So why won’t Guido get rid of the GIL?  Well, for one thing, it was tried and it didn’t help.  A new interpreter was written with fine-grained locking that minimized the times when multiple threads were locked out.  It ran twice as slow (or worse on Linux) for most applications as the GIL version.  The reason is that having more lock calls was slower:  lock is a slow operating system function.  The way Guido put this was that on a 2 processor machine, Python would run slightly faster than on a single processor machine, and he saw that as too much overhead.  Now I’ve commented before that we need to waste more hardware in the interest of higher parallelism, and this factor of 2 goes away as soon as you run on a quad core cpu, so why not nix the GIL?  BTW, those demanding the demise of the GIL seem to feel that since Java can run faster and supports threads, that the attempt at removing the GIL must have been flawed and there is a better way.

I find myself in a funny quandry on this one, but ultimately agreeing with Guido.  There is little doubt that the GIL creates a scalability speed bump, but that speed bump is localized at the low end of the scalability space.  If you want even more scalability, you still have to do as Guido recommends and use processes and sockets or some such to communicate between them.  I also note that a lot of authorities feel that it is also much harder to program threads than processes, and they call for shared nothing access.  Highly parallel languages like Erlang are focused on a process model for that reason, not a thread model.

Let me explain what all that means.  Threads run inside the same virtual machine, and hence run on the same physical machine.  Processes can run on the same physical machine or in another physical machine.  If you architect your application around threads, you’ve done nothing to access multiple machines.  So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

As Donald Knuth says, “premature optimization is the heart of all evil in programming.”  Threads are a premature optimization when you need massive scaling, while processes lead to greater scalability.  If you’re planning to use a utility computing fabric, such as Amazon EC2, you’ll want processes.  In this case, I’m with Guido, because I think utility computing is more important in the big picture than optimizing for the cores on a single chip.  Take a look at my blog post on Amazon Startup Project to see just a few things folks are doing with this particular utility computing fabric.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, data center, ec2, grid, multicore, platforms, saas, software development, Web 2.0 | 4 Comments »

Twitter Scaling Story Mirrors the Multicore Language Timetable, Yields 10000% Speedup

Posted by Bob Warfield on September 14, 2007

There’s a great story over on the High Scalability blog about how Twitter became 10000% faster:

For us, it’s really about scaling horizontally – to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.

This is the story I wanted to tell in Multicore Language Timetable:  a faster language pales in comparison to a more scalable language.  In this case, Twitter didn’t have that luxury, Ruby wasn’t more scalable, but it did have sufficient facilities that they could rearchitect their app for horizontal scaling, which is utilization of more cores.  It’s also the story of how the Multicore Crisis is here today and many of you have already experienced it.

Twitter learned several interesting lessons along the way that I’ve been hearing more and more:

  • Don’t let the database be a bottleneck.  We had the same view at Callidus, the Enterprise Software company I last worked at.  We build a grid architecture and managed to offload enough so that a typical configuration was 75% Java grid computing array and 25% database.  This was for a radically more database-intensive (financial processing) business application than most Web 2.0 apps like Twitter.
  • You have to build it yourself.  Unfortunately, there’s a lot of “almost” technology out there that doesn’t quite work.  That’s really unfortunate because everyone keeps hitting this horizontal scaling problem and having to reinvent the wheel:  70% of the software you write is still wasted.
  • Conventional database wisdom often tragically impairs scalability:  More and more companies are denormalizing to minimize joins and leaving relational integrity as a problem solved outside the database. 
  • “Most performance comes not from language but from application design.”  That’s a quote from the article, but I maintain it is also an artifact of using languages designed for a fundamentally different problem than what web scale applications face today.  Because the languages aren’t meant to solve the scaling problem, we shouldn’t be surprised that they don’t.

Interestingly, Twitter still has 1 single mySQL DB for everything.  It is massively backed up by in-memory caches that run on many machines, but at some point it can become the bottleneck too.  They’ve worked hard to de-emphasize it, but ultimately they have to figure out how to horizontally scale that DB.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in data center, grid, multicore, software development, Web 2.0 | 2 Comments »

Amazon Startup Project Report

Posted by Bob Warfield on September 13, 2007

I attended the Silicon Valley edition of the Amazon Startup Project today.  This is their second such event, the first having been hosted in home-town Seattle.  The event took place at the Stanford University Faculty and was well attended: they basically filled the hall.  The agenda included an opening by Andy Jassy, Sr VP for Amazon Web Services, a discussion on the services themselves by Amazon Evangelist Mike Culver, a series of discussions by various startups using the services, a conversation with Kleiner Perkins VC Randy Komisar, and closing remarks by Jassy again.  Let me walk through what I picked up from the various segments.

First up were the two talks by Amazon folk, Jassy and Mike Culver.  Jassy kept it pretty light, didn’t show slides, and generally set a good tone for what Amazon is trying to accomplish.  The message from him is they’re in it for the long haul, they’ve been doing API’s for years, and the world should expect this to be a cash generating business for Amazon relatively shortly.  That’s good news as I have sometimes heard folks wonder whether this is just remaindering infrastructure they can’t use or whether they are in fact serious.  The volumes of data and cpu they’re selling via these services are enormous and growing rapidly.

Mike Culver’s presentation basically walked through the different Amazon Web Services and tried to give a brief overview of what they were, why you’d want such a thing, and examples of who was using them.  I had several takeaways from Mike’s presentation.  First, his segment on EC2 (Elastic Compute Cloud–the service that sells CPU’s) was the best.  His discussion of how hard it can be to estimate and prepare for the volumes and scaling you may encounter was spot on.  Some of the pithier bullets included:

  • Be prepared to scale down as well as up.
  • Queue everything and scale out the servicing of the queues.

He showed a series of Alexa traffic slides that were particularly good.  First he showed CNN’s traffic:

CNN Traffic

As you can see, there are some significant peaks and valleys.  In theory, you’d need to build for the peaks and eat the cost of overcapacity for the valleys if you build your own data center.  With a utility computing fabric like Amazon’s you can scale up and down to deal with the demand.  He next overlaid Flickr onto this data:

Flickr Traffic

Flickr’s problem is a little different.  They went along for a while and then hit a huge spike in Q206.  Imagine having to deal with that sort of spike by installing a bunch of new physical hardware.  Imagine how unhappy your customers would be while you did it and how close you would come to killing your staff.  Spikes like that are nearly impossible to anticipate.  CNN has bigger spikes, but they go away pretty rapidly.  Flickr had a sustained uptick. 

The last view overlaid Facebook onto the graph:

Facebook Traffic

Here we see yet another curve shape: exponential growth that winds up dwarfing the other two in a relatively short time.  Amazon’s point is that unless you have a utility computing fabric to draw on, you’re at the mercy of trying to chase one of these unpredictable curves, and you’re stuck between two ugly choices:  be behind the curve and making your customers and staff miserable with a series of painful firedrills, or be ahead of the curve and spend the money to handle spikes that may not be sustained, thereby wasting valuable capital.  Scaling is not just a multicore problem, it’s a crisis of creating a flexible enough infrastructure that you can tweak on a short time scale and pay for it as you need it.

One of the things Mike slid in was the idea that Amazon’s paid for images were a form of SaaS.  To use EC2, you first come up with a machine image.  The image is a snapshot of the machine’s disk that you want to boot.  Amazon now has a service where you can put these images up and people pay you money to use them, while Amazon gets a cut.  The idea that these things are like SaaS is a bit far fetched.  By themselves they would be Software without much Service.  However, the thought I had was that they’re really more like Web Appliances.  Some folks have tried to compare SaaS and Appliance software–I still think it doesn’t wash for lack of Service in the appliance, but this Amazon thing is a lot cleaner way to deliver an appliance than having to ship a box.  Mike should change his preso to push it more like appliances!

All of the presentations were good, but the best ones for me were by the startup users of the services.  What was great about them was that they pulled no punches.  The startups got to talk about both the good and bad points of the service, and it wasn’t too salesy about either Amazon or what the startups were doing.  It was more like, “Here’s what you need to know as you’re thinking about using this thing.”  I’ll give a brief summary of each:

Jon Boutelle, CTO, Slideshare

The Slideshare application is used to share slideshows on the web, SaaS-style.  Of course Jon’s preso was done using slideware.  His catchy title was “How to use S3 to avoid VC.”  His firm bootstrapped with minimum capital, and his point is not that you have to get the lowest possible price per GB (Amazon isn’t that), but that the way the price is charged matters a lot more to a bootstrapping firm.  In his firm’s case, they get the value out of S3 about 45 days before they have to pay for it.  In fact, they get their revenue from Google AdSense in advance of their billing from Amazon, so cash flow is good!

He talked about how they got “TechCrunched” and the service just scaled up without a problem.  Many startups have been “TechCrunched” and found it brought the service to its knees because they got slammed by a wall of traffic, but not here.

Joyce Park, CTO, Renkoo/BoozeMail

Joyce was next up and had a cool app/widget called BoozeMail.  It’s a fun service that you can use whether or not you’re on Facebook to send a friend a “virtual drink”.  Joyce gave a great overview of what was great and what was bad about Amazon Web Services.  The good is that it has scaled extremely well for them.  She ran through some of their numbers that I didn’t write down, but they were very large.  The bad is that there have been some outages, and its pretty hard to run things like mySQL on AWS (more about that later).

BoozeMail is using a Federated Database Architecture that tracks the senders and receivers on multiple DB servers.  The sender/receiver lists are broken down into groups, and they will not necessarily wind up on the same server.  At one point, they lost all of their Amazon machines simultaneously because they were all part of the same rack.  This obviously makes failover hard and they were not too happy about it. 

Persistence problems with Amazon are one of the thorniest issues to work through.  Your S3 data is safe, but an EC2 instance could fall over at any time without much warning.  Apparently Renkoo is beta testing under non-disclosure some technology that makes this better, although Joyce couldn’t talk about it.  More later.

Something she mentioned that the others echoed is that disk access for EC2 is very slow.  Trying to get your data into memory cache is essential, and writes are particularly slow.  Again, more on the database aspects in a minute, but help is on the way.

Sean Knapp, President of Technology, Ooyala

Ooyala is a cool service that let’s you select objects on high quality video.  The demo given at Startup Day was clicking on a football player who was about to make a touchdown to learn more about him.  Sean spent most of his preso showing what Ooyala is.  It is clearly an extremely impressive app, and it makes deep use of Amazon Web Services to virtually eliminate any need for doing their own hosting.  The message seemed to be if these guys can make their wild product work on Amazon, you certainly can too.

Don MacAskill, CEO, Smugmug

I’ve been reading Don’s blog for a while now, so I was pleased to get a chance to meet him finally.  Smugmug is a high end photo sharing service.  It charges for use SaaS-style, and is not an advertising supported model.  As I overheard Don telling someone, “You can offer a lot more when people actually pay you something than you can if you’re just getting ad revenue.”  Consequently, his customer base includes some tens of thousands of professional photographers who are really picky about their online photo experience.

Smugmug has been through several generations of Amazon architectures, and may be the oldest customer I’ve come across.  They started out viewing Amazon as backup and morphed until today Amazon is their system of record and source of data that doesn’t have to be served too fast.  They use their own data center for the highest traffic items.  The architecture makes extensive use of caching, and apparently their caches get a 95% hit rate.

Don talked about an area he has blogged on in the past, which is how Amazon saves him money that goes right to the bottom line.

Don’s summary on Amazon:

  • A startup can’t go wrong using it initially
  • Great for “store a lot” + “serve a little”
  • More problematic for “serve a lot”

There are performance issues with the architecture around serve a lot and Don feels they charge a bit too much (though not egregiously) for bandwidth.  His view is that if you use more than a Gigabit connection, Amazon may be too expensive, but that they’re fine up to that usage level.

His top feature requests:

-  Better DB support/persistence

-  Control over where physically your data winds up to avoid the “my whole rack died” problem that Joyce Park talked about.

The Juicy Stuff and Other Observations

At the end of the startup presentations, they opened up the startup folks to questions from the audience.  Without a doubt, the biggest source of questions surrounded database functionality:

-  How do we make it persist?

-  How do we make it fast?

-  Can we run Oracle?  Hmmm…

It’s so clear that this is the biggest obstacle to greater Amazon adoption.  Fortunately, its also clear it will be fixed.  I overheard one of the Amazon bigwigs telling someone to expect at least 3 end of year announcements to address the problem.  What is less clear is whether the announcements would be:

a)  Some sort of mySQL service all bundled up neatly

b)  Machine configurations better suited to DB use:  more spindles and memory was mentioned as desireable

c)  Some solution to machines just going poof!  In other words, persistence at least at a level where the machine can reboot, access the data on its disk, and take off again without being reimaged.

d)  Some or all of the above.

Time will tell, but these guys know they need a solution.

The other observation I will make is one that echoes Don’s observation on Smugmug:  I’m sure seeing a lot of Mac laptops out in the world.  3 of the 4 presenters were sporting Macs, and 2 of them had been customized with their company logos on the cover.  Kewl!

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, data center, ec2, grid, multicore, Partnering, platforms, saas, software development, strategy, venture, Web 2.0 | 12 Comments »

What’s the Killer App for Multicore?

Posted by Bob Warfield on September 11, 2007

I saw this question posed on MulticoreEra, so I thought I’d take a crack at it.

Let me start out with a contrarian view of the whole Multicore Thing.  Many have called it the Multicore Crisis, and I am definitely part of the “Crisis” crowd because I think software is not keeping up with hardware.  It’s too hard to write software that takes advantage of a lot of cores.  So here is my contrarian proposition: if it’s too hard, maybe the first killer app won’t be parallel at all.  Maybe it takes a lot of serial killer apps and just a few parallel killers to get us started.

Your desktop suddenly sprouts 4 cores.  Windows uses 1, your current app uses another, maybe something in the background uses a third, but what do you do with the 4th core?  And next year, what to do with cores 5, 6, 7, and 8?  I suggest we look for the Killer App to be something that runs in the background all the time, has an insatiable appetite for cpu cycles, and is completely indispensible: once we get it, we can’t live without it.  That app may or may not be parallel itself.  If its not, we’ll just run a bunch of differnet ones on the different cores to take advantage.

One last disclaimer:  lots of Multicore Killer App possibilities for the server end, this post is focused on the desktop.

Without further ado, here is my list of possibilities:

Be Speculative

What if your computer was always one step ahead?  Speculative execution has been a mainstay of cpu design for a long time, but this would be speculative execution of software on your PC.  What the system needs is some idea of what you might want to do next.  This could be as simple as knowing which files you’ve accessed recently.  Based on that, the system would know which apps you’re most likely to run.  Imagine if on startup, each core took one of the most popular apps you like to run and started to bring it up behind the scenes.  If you actually ask for the app, it pops up immediately because it was already there. 

The same trick can be employed with data.  Let’s get the data I’m likely to want into RAM and off the disk (or off the Net).  It will be much faster to access it there if it’s called for.

Programmers Need Compiler Farms/Testers Need Test Farms

This is one of those insatiable desire problems if ever there was one.  Make the compile of a single file parallel or running a single test script in parallel might be hard.  However, most programs have many files and many test scripts.  With a little creative scheduling, we can do a lot of that work in parallel even though most of the software uses only one core at a time.

Indexing

If you’re like me, anything that makes it easier to find what you’re looking for on your computer would be indispensible.  But since we’re talking about exotic multicore stuff, how about getting a little more exotic and creative?  I’m thinking of a background task that trolls through your digital photos and uses face recognition to provide an index of who is in the picture.  How cool would that be?  You’d start this program out with a nucleus of starter definitions.  Perhaps your family members and closest friends would be identified manually in a few pictures.  The program would then go off looking at pictures and looking to do a couple of things:

1)  Identify the face regions.  It creates a list of cropped rectangles for each picture, one rectangle per face.

2)  Identify faces.  It matches the regions to known faces.  There’d be a relevancy ranking:  72% chance this one Aunt Jane and only 17% chance its really Uncle Joe.

3)  It asks the user to confirm any faces where the relevancy is too low.  In so doing, it would learn how to identify a particular face better.  Likewise, you’d be able to easily correct its mistakes and it would learn there too.

4)  It builds a list of unidentified faces and ranks them by frequency.  Periodically, it would ask you to identify those that are the most common unknown persons.

Now, any time you want to look at your photos, you’d get captions to go with the photos telling who the people are.  The technology to do this exists today, and it seems to me would make for a killer app.

Tend the Garden

While we’re indexing, there’s a lot of other tasks one could undertake tending the garden of local data.  Clearly I can be on the lookout for malicious offenders: viruses, spyware, and all that sort of thing.  I can rearrange the data on the disk to defrag it and optimize it for better performance.  I can back it up over the web to keep it safe.  I can prune by looking for candidates for deletion.  I can encrypt to keep prying eyes from accessing the data too easily.

Hang Out a Shingle

Even if you don’t have software to take advantage of extra cores, maybe someone else does.  You can hang out a shingle (digitally, of course) whenever you have idle cores for rent.  Anyone whom you allow to share your cores gets to use some of them.  Perhaps it works along the lines of a Hadoop.  Heck, you could envision a search firm letting it’s users do both the web crawling and the search indexing during the normal course of their web browsing.  What a good opportunity to periodically ask them a question to help make results better.

Endless Possibilities, OS Help Wanted

There are endless possibilities here.  Right away it seems that we’ll want some sort of OS help to enable proper sharing of system resources among all these different apps that have suddenly sprouted.  The thing I’d worry most about is disk access.  When my antivirus program fires up it ruins performance of most other apps by hogging the disk.  The OS knows which app has focus and needs to be smart about parceling out that scarce disk resource.  Greater sophistication will be needed there to fully maximize the Multicore Experience.

Posted in grid, multicore, ria, user interface | Leave a Comment »

Using Mindmaps to Explore User Interaction and Create QA Test Plans

Posted by Bob Warfield on September 9, 2007

Using Mindmaps to Explore User Interaction is a cool idea.  Basically, you depict the flow through the UI as a visual map:

WordPress UI Mindmap

Doing so can give you an instant idea of how hard or easy it is to navigate the UI to key functionality.  I would venture to add that studying the “traffic patterns” along the mindmap’s highways and optimizing them the way you would a web site is probably a good way to tune up a UI.

Now here is a newsflash:  you can buy a piece of software that will automatically generate these mindmaps.  It’s called Rational Test Factory.  Here is a UI Map from Test Factory:

Test Factory UI Map

Same principle, with each UI object called out inside its container (window) and links showing how you navigate.  Test Factor is a product I built once upon a time at a startup I founded that was called Integrity QA Software.  There were a number of unique aspects to the product.  First was its ability to “spider” or crawl a user interface and produce these maps.  Second, it had the ability to automatically create test suites by using the map as a template to generate tests. 

QA people take note, because these maps are good for a lot besides UI design and optimization.  A mind map like this of a UI is exactly the map you need to plan your testing campaign!  I could see putting a big map up on a wall and using it to cross reference your test plans and see that you were getting proper coverage across the whole UI.

A third thing we did that was very powerful was we defined behaviours at each node.  A button is obvious, but menus and type-ins are more interesting.  For example, we made it easy to cycle through a bunch of test files.  The type-in for the file name would be given a property that was a list of test files.  Push a button, and a script got generated to crank through all those files.  Being able to define allowable inputs (for example, what is the range of valid numbers) is hugely powerful for testers.  But it would also be hugely powerful for UI and even program design.  This tool was aimed at testing, but at some point we planned to use it as a UI design tool and do code generation to create a UI from one of these maps.  Never got to it.

The fourth really cool thing this product did was to use Genetic Algorithms to generate optimal test scripts.  This was some seriously cutting edge stuff.  The UI maps gave us templates and made it easy to generate test scripts.  The role of the Genetic Algorithm was to evolve better and better scripts.  “Better” was defined by a fitness function.  We used code coverage and script length.  The shortest script that tested the most code was deemed the best script.  QA people loved this thing.  If the developer’s changed the UI, they could just remap and regenerate new scripts.  Add new code?  Regenerate scripts to test the new code.

The last way cool thing was that this was the first product I build with a grid architecture.  We had a demo room with a rack of 4 by 4 (16) PC’s and two really comfortable chairs set in front of the rack.  On a table between the chairs was the 17th PC that served as a control console.  We’d sit a customer down in one chair and the demoer in the other.  The customer’s software would be loaded.  Hit a button and all 16 machines started mapping the UI in parallel.  The customer’s eyes would literally bug out watching all that activity on their familiar product.  Then testing would begin.  Bugs would start being reported on the console.

I haven’t kept up with Rational’s doing with the product, but at the time it wasn’t suited to testing web applications because it was built before there were many.

It certainly is a fabulous example of how to use multicore to solve a hard problem.  The graphical programming aspect is also intriguing.  It begs for that UI design and code generation tool we never built.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in grid, multicore, ria, software development, user interface | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 265 other followers

%d bloggers like this: