SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

A Multicore Language Timetable (Waste More Hardware!)

Posted by Bob Warfield on September 8, 2007

What!  You can’t be serious!  You actually want me to waste my hardware?!??

The logic of this all hit home as I read a line from Patrick Logan’s excellent blog:

Today we are wasting hardware running garbage collectors in order to save developer ergs. Increasingly we need to be wasting hardware running large numbers of small processes.

Java (one of the curly braced languages of note) is one language that wastes hardware to run a garbage collector.  A wickedly clever program written in C or C++ that does all of its own memory management can run faster than the Java equivalent, yet it will take a lot longer to write and test that program, and you’ll need much more skilled coders.  Hence Java’s garbage collection is a waste of hardware (because it isn’t the absolute highest performance solution) yet it is worth it because it saves the programmers so much time.  One could argue that garbage collection is Java’s biggest advantage over C++ (though others like portability also come to mind).

The same will be true when it comes to solving the Multicore Crisis and making your software scale properly to take advantage of as many cpu’s and cores as are available.  The equivalent of wickedly cool programming in C will be continued use of the curly braced languages:  they don’t waste enough hardware to make it easy to do multicore programming.

Why is it so important to favor use of multiple cores over efficiency?  Let’s get calibrated on speed first.  There’s a site called Computer Language Benchmarks Game that collects benchmarks by language.  We’ll just use it as a relative scale for some important languages, so it doesn’t matter if you have a different benchmark, the principle is the same. 

What I want to understand is how soon your favorite “inefficient” language is just as fast as the “efficient” language if the “inefficient” language is programmed for multicore and the “efficient” language is not.  I know that’s clear as mud, but bear with me as we look at this result:

Multicore Language Timetable

Language Timetable

Here’s how we use the timetable.  Languages like C, C++, Java and C#, all run pretty much the same today.  I have therefore assigned them 1 core.  So long as we’re running on 1 core, for most applications, it just won’t matter from a performance standpoint which of these languages we choose.  Now look at Smalltalk and Erlang.  These are pretty exotic langauges.  Most people probably think they are pretty slow compared to Java, although Erlang is designed to be really good at using lots of cores.  If Erlang can use 6 cores as easily as Java uses 1 core, then for the same level of programming effort, Erlang and Java are equivalent in performance.  Since we can get 8 cores pretty easily today (Intel is already phasing out some 4 cores!), I show the year of their equivalence as now and they’re both in the “green” zone.

If Moore’s Law holds, in just 2 years, we will have access to 16 cores on a regular basis.  So, by 2009, Python, Perl, and PHP will run just as fast in 16 cores as Java does on a single core, assuming the are parallelized and Java isn’t.  Another 2 years after that and we see JavaScript and Ruby being just as fast.

Note that I am not saying any of these languages make multicore any easier!  (but some do)

I’m also not saying that in 4 years JavaScript will wipe out Java because of the Multicore Crisis!  (but some say it could)

I’m simply suggesting that multicore will be a great leveler of languages, and that if there is a language that makes parallel programming a lot easier, that language will start to have more and more advantage over time.

It is interesting also to look at what happens when a language that’s good at using cores gets ahead.  Erlang is one such.  I’ve said it is equivalent in performance today to Java/C++.  In 2 years, it will now look twice as fast as Java/C++ because it can keep using cores.  How much faster does it have to be before there are such compelling economic advantages that we have to abandon the old languages?

Now let’s drop the other shoe.  One of the reasons people tolerate the relatively poorer performance of languages at the bottom of the list like Ruby, JavaScript, and PHP, is that they’re fast enough AND they are much easier and more productive than the old curly braced languages.  In other words, we are once again wasting some hardware in order to save developer ergs as Patrick Logan suggested.  Interestingly, the increased productivity of the LAMP stack has made venture capital much cheaper for Web 2.0 companies and enabled an explosion in innovation.  Those economic forces I mention may get here sooner than we think, and multicore can skew those economics mightily because it harnesses Moore’s Law to do its bidding. 

Imagine how potent a language will be if it can both save lots of developer ergs ala Ruby and make parallel programming much easier.  That language would bust open the doors of change.

Postscript:

Some comments folks have made on the side:

-  You never get 100% parallelization.  Absolutely true, but there are many problems that are linearly scalable, such as sorting.  The lack of 100% parallelization becomes more of a bit of constant friction than a degradation in the O(n) speedup for those problems.  There are a bunch of folks who have succeeded at this, not the least of which would be Google which depends on it.  What does it mean for the article?  Perhaps a slight delay in the time when parallel code on a slower language beats fast but non-parallel code.  Realistically, that delay is at most 1 Moore Cycle or 2 years.  Many of the anecdotal cases that didn’t get 100% parallelism you have to wonder how well parallelized the code really was.  There is a developing theory that is identifying which problems can be parallelized and which cannot.  Developing for Developers has a good article on P-Completeness and the limits of parallelism.

-  Java has decent concurrency constructs and will get better.  Opinions strongly vary over whether the constructs are good.  Dr Dobbs has a good discussion of some techniques for parallelism that’s anti-thread for example.  Let’s just settle for the idea that Java’s constructs make concurrency possible, but not easy.  The point here is that a language that is somewhat slower than Java, but that makes concurrency much easier can win.  Unless of course someone figures out how to make concurrency easy in Java.  The track record is not that good, however.

If you take away nothing else here, my point is that language efficiency will matter exponentially less than multicore efficiency in the very near term where performance is concerned. 

Related Articles:

A Picture of the Multicore Crisis

You’ve Already Had a Multicore Crisis

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

About these ads

8 Responses to “A Multicore Language Timetable (Waste More Hardware!)”

  1. [...] A Multicore Language Timetable (Waste More Hardware!) [...]

  2. [...] A Multicore Language Timetable (Waste More Hardware!) [...]

  3. That my friend was a super geeky post. Onward!

  4. [...] FeaturesWeb 2.0 Personality TypesYou’ve Already Had a Multicore Crisis and Just Didn’t Realize It!A Multicore Language Timetable (Waste More Hardware!)How to Have a Happy CEO/CTO Marriage (Features vs [...]

  5. [...] Warfield over at his blog really starts to get into concurrency. He has a call to waste more hardware and he is actually serious with it. And although many of the HPC people may not like it: I think he [...]

  6. [...] the First TimeTwitter Scaling Story Mirrors the Multicore Language Timetable, Yields 10000% SpeedupA Multicore Language Timetable (Waste More Hardware!)Very few products wouldn’t get better if people just tried to make them betterAboutDell Touches the [...]

  7. jfalgout said

    Interesting take on the multi-core issue, thanks for the discussion. An approach that “wastes” hardware is dataflow. Like Erlang, it is more of a functional approach to concurrent programming and has concurrency built in to the infrastructure. But, where Erlang’s concurrency is well suited to control flow, dataflow is better suited to, well, data flow (large scale data processing). A not too complex dataflow graph can easily use hundreds of threads to accomplish it’s work. Maybe a bit overwhelming for my dual-core laptop but not so for a monster like Azul and the 16 core, 32 core … and so on machines available soon with quad-core processors. One implementation of dataflow is DataRush. Check it out and let me know what you think.

  8. Future Programming—Art or Engineering?

    The emergence of multicore computer systems—ranging from portable and desktop computers to petascale supercomputers—is placing new demands on computer software, and is catalyzing a reexamination of the fundamental methods of computer programming.
    W…

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 322 other followers

%d bloggers like this: