Google’s Folding@Home on the “Multi-Core Crisis” and Enterprise Software
Posted by Bob Warfield on June 6, 2007
Tim O’Reilly picked up on an exchange about programming for multi-core computers between Andrew Donoho and Adam Beberg. The gist is that Donoho believes there is a major speed bump in the near future when the next increment in performance from computers will require massively parallel programming while Beberg (of http://folding.stanford.edu/, a massively parallel research project that uses computers on the internet to simulate a supercomputer) says the days have come and gone and the problems are already well understood.
It’s interesting to contemplate this discussion in the context of Enterprise software. At my previous employer, Callidus Software, we solved huge scalability problems by means of grid computing, which is essentially the technology used by Folding@Home, albeit we didn’t try to spread computers all over the Internet! In essence, we were able to harness large numbers of commodity computers running a common Java application to simulate a huge mainframe class computer. In some cases we ran a hundred or more cpu’s. The advantages to the customer were several.
First, we were able to solve their problem, which involved computing sales compensation for some of the largest sales forces in the world; companies like Allstate, United Health Group, Sprint Nextel, and the like. These are customers that had up to several hundred thousand payees and many millions of transactions that had to be analyzed and a commission calculated through some extremely complex business logic that changed from customer to customer. Our system made it simple because our product, TrueComp, had a rules language that let the customer create their own business logic in a language similar to Excel formulas.
Second, the grid system turned out to be extremely valuable for the operational flexibility it gave IT departments. We had one customer that started out with a 32 cpu Solaris SPARC cluster, added 40 cpus of Wintel blade servers to that, and then followed on with a 24 cpu IBM AIX cluster. From the perspective of our software, these were all the same computer even though we had 3 radically different hardware and operating system architectures! Thank you Java, portability is thy middle name.
What does all this have to do with Tim O’Reilly’s blog post? Just that software companies need to consider how to help their users take advantage of cheap computing resources that may not even be from the same architecture family. This means first having an internal architecture that can take advantage of such resources, and second having user friendly tools to help your customers take advantage. In this case, Callidus had a rules language that hid the fact the customer was dealing with so many cpu’s entirely.
Some of this is available almost for free. Web servers scale pretty well in this fashion, and products like Oracle’s database are now being offered with grid computing. Where I think the original article is right on the mark is there are precious few Enterprise Business Logic layers that are built around these architectural ideas, and I think the
Enterprise thinkers out there are going to have to work on this.
Enterprise players that don’t find ways of embracing grid architectures will find themselves increasingly at a disadvantage relative to competitors who have because their systems won’t scale as cheaply. Hardware is a big piece of the overall cost of the solution, so this can quickly become a fatal flaw. This certainly worked to Callidus’ advantage and is one of the reasons they became the market leaders in their space.
One thing I will say is that helping with this problem will be an integral part of anything SmoothSpan produces.