SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the 'multicore' Category


And Now For Something Completely Different: Visual Programming Cracks the Multicore Barrier

Posted by smoothspan on September 19, 2007

I’ve been admiring LabView, a visual programming language created by a company called National Instruments.  To be precise, the language is actually called “G”.  These guys didn’t set out to solve the multicore problem, in fact LabView was introduced way back in 1986 before anyone was worried about multicore.  It’s used in data acquisition, instrument control, and industrial automation tasks. 

Jim Kring called my attention to LabView in a couple of different ways.  First, I came across his great book, LabView for Everyone, in a Borders Books.  It’s a wonderful introduction to the potential for this language, but you won’t find the book in the programming section–it’s over by the Engineering books.  Second, Jim and I have corresponded about the Multicore Crisis for a little while and he recently posted on his great blog about how LabView is one potential answer to the problem.

Why is LabView able to scale more easily than conventional curly braced languages?  The answer is simple, and perhaps surprising: because the language says very little about the sequence of execution.  It is what’s called a Dataflow Language.  You’re almost certainly familiar with the world’s most common dataflow language already: spreadsheets.  Like spreadsheets, LabView has a very simple way of specifying sequence of execution.  An object in LabView executes whenever all of its inputs are ready.  This corresponds to a spreadsheet where a formula may be recalculated as soon as any other cells it depends on have been recalculated.  So, in theory, every single cell in the spreadsheet that creates an input for a certain cell may be computed in parallel.  Likewise with the objects in LabView.  Conventional languages, by contrast, consist of lists of steps that must be evaluated strictly in order.

Here is the other amazing thing about these two.  People may complain about how hard it is to learn parallel languages like Erlang, but who complains about learning spreadsheets?  They’re easy.  LabView is so easy that National Instruments has managed to build an almost $700M a year business around it.  Their EBITDA income was a little over $100M on that, and they are growing at about 20% a year.  Now Sun won’t break out Java’s revenues, but this company is nearly 10% of Sun’s size.  I have to figure that if Java were close to $700M they’d want to talk about it.  How’s that for the biggest language company you’ve never heard of?

When we compare LabView to something like Erlang, it shows up pretty well.  Recursion is a fundamental construct in many parallel languages like Erlang, but it isn’t necessarily a fundamental construct for many human minds.  Yet the idea of delegation, which is one form of parallelism, and of an assembly line, another form of parallelism often called pipelining, is very natural for people, and is inherent in dataflow languages such as spreadsheets and LabView.

There are other languages like this floating around too.  Yahoo’s Pipes is one designed for doing mashups that’s very cool.  The ease of use seems to carry over into this domain too as I read various examples:

The list goes on, but there seems to be a lot of potential unleashed when you quit thinking about things as a linear list to be solved strictly in order and break loose a bit more parallelism.  Like I said, it’s something completely different to think about.

Posted in Web 2.0, multicore, platforms, ria, software development | No Comments »

Who Doesn’t Love Java? (You’d Be Surprised! -and- Part 2 of the Tool/Platform Rants)

Posted by smoothspan on September 17, 2007

When Sun’s Jonathan Schwartz announced that he was changing Sun’s ticker symbol to JAVA, I wasn’t surprised that a lot of folks saw it as a silly move (nor surprised to see Jonathan being defensive).  Now don’t get me wrong:  I like Java.  My last gig involved prodigious amounts of gnarly Java code in a highly scalable Enterprise grid computing application.  The thing is, I have a problem with the idea that a single language can be all things to all people.   

In addition to Java, I’ve used a lot of other languages, and I thought it would be interested to see who else does too:

GoogleCreated a massive foundation for their massively scalable search engine in C++.  The foundation includes MapReduce (a way to employ thousands of CPU’s and avoid the Multicore Crisis), BigTable (their subsitute for a database), and the Google File System.  It all runs on a few hundred thousand Linux commodity boxes.  To be sure, Google has grown so large they now employ some Java, but C++ is the big enchilada.

YahooPushed into PHP back in 2002 in order to improve productivity and quit “reinventing the wheel”.  Translation:  They didn’t want 70% of their coding to be wasted.  Their presentation on this decision is quite interesting.

YouTubeWritten in Python, a hip relatively recent scripting language.  Google also makes extensive use of Python.

Facebook:  The clever fellows at Facebook are definitely technologists (maybe that’s why they got a platform together ahead of their peers) and built a fancy RPC (remote procedure call) core technology that lets them use almost any language they want.  Shades of Google core tech commitment, eh?  When they talk about what languages are used primarily, Java lands in last place.  The pecking order is PHP, C++, Perl, Python, and Java.

MySpaceBuilt on the Microsoft .NET Stack, so of course no Java there.  Are these boys a glutton for punishment or what?  Yet it proves that it can be done and that it will scale.

DiggDigg is built with the PHP scripting language, which is the “P” (or at least one possible “P”) in those LAMP stacks you hear so much about.

Wikipedia:  Like Digg, Wikipedia was also built on PHP

Amazon:  I was surprised and intrigued to learn that Amazon is language agnostic.  Like Google and Facebook, they’ve invested in some core technology that lets a lot of languages play together freely.  Werner Vogels goes on to say, “Developers are like artists; they produce their best work if they have the freedom to do so, but they need good tools.” 

Flickr:  Everyone’s favorite photo sharing service relies largely on PHP and Perl, with one twitchy systems programming service written in Java. 

Croquet:  A massively multiplayer game environment done in Smalltalk.  Who woulda thunk it?

There are many more, but you get the point:  Some, if not the vast majority, of the web’s biggest movers and shakers have decided not make Java their sole language and others have excluded it entirely!  Sure, the immediate reaction is that there will always be some C++ zealots who prefer their language to Java, but that doesn’t change the list all that much.  What we do see are a lot of “P” languages:  Python, PHP, and Perl.  What’s up with that?

Recall Part 1 of the Tool/Platform Rant Series, which talked about how 70% of the Software You Build is Wasted?  These hugely successful web properties have taken steps to reduce that 70% waste, and while there are many factors that contributed to their success, I believe a desire for increased productivity played a significant role.  The days when these languages were essential for performance are coming to an end too, courtesy of the Multicore Crisis.

There are other ways to skin this cat of over-reliance on a language that’s too low level without giving it up entirely.  Who doesn’t feel like technology gave Google a tremendous edge?  One can argue that C++ isn’t really the language of Google, rather, MapReduce, BigTable, and the Google File System are their language.  C++ is just the assembly code used to write the modules that these other platforms mash up.  In fact, it makes sense to think that C, Java, and C++ are all just portable assembly languages.  By doing this, Google has been able to focus a much greater proportion of its resources to create a proprietary edge for itself.  So have all the others on the list.

It gets more radical:  Amazon doesn’t care what language its developers use.  According to Werner Vogels:

I think part of the chaotic nature—the emerging nature—of Amazon’s platform is that there are many tools available, and we try not to impose too many constraints on our engineers. We provide incentives for some things, such as integration with the monitoring system and other infrastructure tools. But for the rest, we allow teams to function as independently as possible.

You have to ask yourself in light of all this, when are the Curly Braced Languages most and least appropriate?  The decision criteria are shockingly stark:

  • You Need a Curly Braced Language When You Are Building Something Totally Proprietary That’s Critical to Your Business And Can’t Be Built With Any Other Language
  • Run Away From Curly Braced Languages And Choose Something Else Every Other Time

Another thing:  notice the companies profiled above that have created their own Component Architecture Frameworks or Core Technology.  Google has core technology in spades.  Facebook has a system that lets them write components in any language they choose.  Amazon also allows multiple languages so long as a few core functions are adhered to.  Isn’t that interesting?  These guys want to be Polyglot Programmers, and they’re investing valuable resources to make that a reality.  This is all part and parcel of the “meta” thing too.  The realization that to fully utilize computers, you must be facile at manipulating the fabric of the programs themselves, and not just the data they are processing.  Adopting a polyglot view positions these vendors better to offer a platform, because it means they can allow others to use the languages of their choice.  There are a lot of benefits to being language agnostic!

Polyglot Programming is becoming increasingly well accepted in an era when the Curly Braced Languages (Java, C++, et al) bring little to the Web 2.0, SaaS, and Multicored Crisis parties.  Those languages are too low-level.  There is no support there for multi-tenancy, web pages, security, scaling, or any of the myriad of problems one encounters when building a Web 2.0 or SaaS business.  You have three choices:  write all of that yourself and pay the 70% overhead tax or become a Polyglot Programmer and minimize the overhead by choosing the best tools for the task and leaving your Curly Braced Power Tools safely in the workshop only to be brought out when some highly custom proprietary work needs to be done.

Related Articles

ESB vs REST (Another Case for Multi-Language Programming)

Posted in Web 2.0, grid, multicore, platforms, saas, software development, strategy | 11 Comments »

Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing

Posted by smoothspan on September 14, 2007

There’s been a lot of back and forth in the Python community over something called the “GIL” or Global Interpreter Lock.  Probably the best “get rid of the GIL” argument comes from Juergen Brendel’s post.  Guido, the benevolent dictator of Python has responded in his own blog that the GIL is here to stay and he doesn’t think it is a problem nor that it’s even the right choice to try to remove it.  Both combatants have been eloquent in expressing their views.  As is often the case, they’re optimizing to different design centers and likely will have to agree to disagree.

Now let’s try to pick apart this issue in a way that everyone can understand and make sense of for large scalability issues in the world of SaaS and Web 2.0.  Note that my arguments may be invalid if your scaling regime is much smaller, but as we’ve seen for sites like Twitter, big time scaling is hard and has to be thought about carefully.

First, a quick explanation on the GIL.  The GIL is a bit of code that causes multiple Python threads to have to wait before an object can be accessed.  Only one thread may access an object at a time. 

Whoa!  That sounds like Python has no ability to scale for multiple cores at all!  How can that be a good thing?  You can see where all the heat is coming from in this discussion.  The GIL just sounds bad, and one blogger refers to it jokingly as the GIL of Doom.

Yet all is not lost.  One can access multiple cpu’s using processes, and the processes run in parallel.  Experienced parallel programmers will know the difference between a process and a thread is that the process has its own state, while threads share their state with other threads.  Hence a thread can reach out and touch the other thread’s objects.  Python is making sure that when that touch happens, only one thread can touch at a time.  Processes don’t have this problem because their communication is carefully controlled and every process has its own objects.

Why do programmers care about threads versus processes?  In theory, threads are lighter weight and they can perform better than a process.  We used to argue back and forth at Oracle about whether to use threads or processes, and there were a lot of trade offs, but it often made sense to go for threads. 

So why won’t Guido get rid of the GIL?  Well, for one thing, it was tried and it didn’t help.  A new interpreter was written with fine-grained locking that minimized the times when multiple threads were locked out.  It ran twice as slow (or worse on Linux) for most applications as the GIL version.  The reason is that having more lock calls was slower:  lock is a slow operating system function.  The way Guido put this was that on a 2 processor machine, Python would run slightly faster than on a single processor machine, and he saw that as too much overhead.  Now I’ve commented before that we need to waste more hardware in the interest of higher parallelism, and this factor of 2 goes away as soon as you run on a quad core cpu, so why not nix the GIL?  BTW, those demanding the demise of the GIL seem to feel that since Java can run faster and supports threads, that the attempt at removing the GIL must have been flawed and there is a better way.

I find myself in a funny quandry on this one, but ultimately agreeing with Guido.  There is little doubt that the GIL creates a scalability speed bump, but that speed bump is localized at the low end of the scalability space.  If you want even more scalability, you still have to do as Guido recommends and use processes and sockets or some such to communicate between them.  I also note that a lot of authorities feel that it is also much harder to program threads than processes, and they call for shared nothing access.  Highly parallel languages like Erlang are focused on a process model for that reason, not a thread model.

Let me explain what all that means.  Threads run inside the same virtual machine, and hence run on the same physical machine.  Processes can run on the same physical machine or in another physical machine.  If you architect your application around threads, you’ve done nothing to access multiple machines.  So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

As Donald Knuth says, “premature optimization is the heart of all evil in programming.”  Threads are a premature optimization when you need massive scaling, while processes lead to greater scalability.  If you’re planning to use a utility computing fabric, such as Amazon EC2, you’ll want processes.  In this case, I’m with Guido, because I think utility computing is more important in the big picture than optimizing for the cores on a single chip.  Take a look at my blog post on Amazon Startup Project to see just a few things folks are doing with this particular utility computing fabric.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in Web 2.0, amazon, data center, ec2, grid, multicore, platforms, saas, software development | No Comments »

Twitter Scaling Story Mirrors the Multicore Language Timetable, Yields 10000% Speedup

Posted by smoothspan on September 14, 2007

There’s a great story over on the High Scalability blog about how Twitter became 10000% faster:

For us, it’s really about scaling horizontally - to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.

This is the story I wanted to tell in Multicore Language Timetable:  a faster language pales in comparison to a more scalable language.  In this case, Twitter didn’t have that luxury, Ruby wasn’t more scalable, but it did have sufficient facilities that they could rearchitect their app for horizontal scaling, which is utilization of more cores.  It’s also the story of how the Multicore Crisis is here today and many of you have already experienced it.

Twitter learned several interesting lessons along the way that I’ve been hearing more and more:

  • Don’t let the database be a bottleneck.  We had the same view at Callidus, the Enterprise Software company I last worked at.  We build a grid architecture and managed to offload enough so that a typical configuration was 75% Java grid computing array and 25% database.  This was for a radically more database-intensive (financial processing) business application than most Web 2.0 apps like Twitter.
  • You have to build it yourself.  Unfortunately, there’s a lot of “almost” technology out there that doesn’t quite work.  That’s really unfortunate because everyone keeps hitting this horizontal scaling problem and having to reinvent the wheel:  70% of the software you write is still wasted.
  • Conventional database wisdom often tragically impairs scalability:  More and more companies are denormalizing to minimize joins and leaving relational integrity as a problem solved outside the database. 
  • “Most performance comes not from language but from application design.”  That’s a quote from the article, but I maintain it is also an artifact of using languages designed for a fundamentally different problem than what web scale applications face today.  Because the languages aren’t meant to solve the scaling problem, we shouldn’t be surprised that they don’t.

Interestingly, Twitter still has 1 single mySQL DB for everything.  It is massively backed up by in-memory caches that run on many machines, but at some point it can become the bottleneck too.  They’ve worked hard to de-emphasize it, but ultimately they have to figure out how to horizontally scale that DB.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in Web 2.0, data center, grid, multicore, software development | 2 Comments »

Amazon Startup Project Report

Posted by smoothspan on September 13, 2007

I attended the Silicon Valley edition of the Amazon Startup Project today.  This is their second such event, the first having been hosted in home-town Seattle.  The event took place at the Stanford University Faculty and was well attended: they basically filled the hall.  The agenda included an opening by Andy Jassy, Sr VP for Amazon Web Services, a discussion on the services themselves by Amazon Evangelist Mike Culver, a series of discussions by various startups using the services, a conversation with Kleiner Perkins VC Randy Komisar, and closing remarks by Jassy again.  Let me walk through what I picked up from the various segments.

First up were the two talks by Amazon folk, Jassy and Mike Culver.  Jassy kept it pretty light, didn’t show slides, and generally set a good tone for what Amazon is trying to accomplish.  The message from him is they’re in it for the long haul, they’ve been doing API’s for years, and the world should expect this to be a cash generating business for Amazon relatively shortly.  That’s good news as I have sometimes heard folks wonder whether this is just remaindering infrastructure they can’t use or whether they are in fact serious.  The volumes of data and cpu they’re selling via these services are enormous and growing rapidly.

Mike Culver’s presentation basically walked through the different Amazon Web Services and tried to give a brief overview of what they were, why you’d want such a thing, and examples of who was using them.  I had several takeaways from Mike’s presentation.  First, his segment on EC2 (Elastic Compute Cloud–the service that sells CPU’s) was the best.  His discussion of how hard it can be to estimate and prepare for the volumes and scaling you may encounter was spot on.  Some of the pithier bullets included:

  • Be prepared to scale down as well as up.
  • Queue everything and scale out the servicing of the queues.

He showed a series of Alexa traffic slides that were particularly good.  First he showed CNN’s traffic:

CNN Traffic

As you can see, there are some significant peaks and valleys.  In theory, you’d need to build for the peaks and eat the cost of overcapacity for the valleys if you build your own data center.  With a utility computing fabric like Amazon’s you can scale up and down to deal with the demand.  He next overlaid Flickr onto this data:

Flickr Traffic

Flickr’s problem is a little different.  They went along for a while and then hit a huge spike in Q206.  Imagine having to deal with that sort of spike by installing a bunch of new physical hardware.  Imagine how unhappy your customers would be while you did it and how close you would come to killing your staff.  Spikes like that are nearly impossible to anticipate.  CNN has bigger spikes, but they go away pretty rapidly.  Flickr had a sustained uptick. 

The last view overlaid Facebook onto the graph:

Facebook Traffic

Here we see yet another curve shape: exponential growth that winds up dwarfing the other two in a relatively short time.  Amazon’s point is that unless you have a utility computing fabric to draw on, you’re at the mercy of trying to chase one of these unpredictable curves, and you’re stuck between two ugly choices:  be behind the curve and making your customers and staff miserable with a series of painful firedrills, or be ahead of the curve and spend the money to handle spikes that may not be sustained, thereby wasting valuable capital.  Scaling is not just a multicore problem, it’s a crisis of creating a flexible enough infrastructure that you can tweak on a short time scale and pay for it as you need it.

One of the things Mike slid in was the idea that Amazon’s paid for images were a form of SaaS.  To use EC2, you first come up with a machine image.  The image is a snapshot of the machine’s disk that you want to boot.  Amazon now has a service where you can put these images up and people pay you money to use them, while Amazon gets a cut.  The idea that these things are like SaaS is a bit far fetched.  By themselves they would be Software without much Service.  However, the thought I had was that they’re really more like Web Appliances.  Some folks have tried to compare SaaS and Appliance software–I still think it doesn’t wash for lack of Service in the appliance, but this Amazon thing is a lot cleaner way to deliver an appliance than having to ship a box.  Mike should change his preso to push it more like appliances!

All of the presentations were good, but the best ones for me were by the startup users of the services.  What was great about them was that they pulled no punches.  The startups got to talk about both the good and bad points of the service, and it wasn’t too salesy about either Amazon or what the startups were doing.  It was more like, “Here’s what you need to know as you’re thinking about using this thing.”  I’ll give a brief summary of each:

Jon Boutelle, CTO, Slideshare

The Slideshare application is used to share slideshows on the web, SaaS-style.  Of course Jon’s preso was done using slideware.  His catchy title was “How to use S3 to avoid VC.”  His firm bootstrapped with minimum capital, and his point is not that you have to get the lowest possible price per GB (Amazon isn’t that), but that the way the price is charged matters a lot more to a bootstrapping firm.  In his firm’s case, they get the value out of S3 about 45 days before they have to pay for it.  In fact, they get their revenue from Google AdSense in advance of their billing from Amazon, so cash flow is good!

He talked about how they got “TechCrunched” and the service just scaled up without a problem.  Many startups have been “TechCrunched” and found it brought the service to its knees because they got slammed by a wall of traffic, but not here.

Joyce Park, CTO, Renkoo/BoozeMail

Joyce was next up and had a cool app/widget called BoozeMail.  It’s a fun service that you can use whether or not you’re on Facebook to send a friend a “virtual drink”.  Joyce gave a great overview of what was great and what was bad about Amazon Web Services.  The good is that it has scaled extremely well for them.  She ran through some of their numbers that I didn’t write down, but they were very large.  The bad is that there have been some outages, and its pretty hard to run things like mySQL on AWS (more about that later).

BoozeMail is using a Federated Database Architecture that tracks the senders and receivers on multiple DB servers.  The sender/receiver lists are broken down into groups, and they will not necessarily wind up on the same server.  At one point, they lost all of their Amazon machines simultaneously because they were all part of the same rack.  This obviously makes failover hard and they were not too happy about it. 

Persistence problems with Amazon are one of the thorniest issues to work through.  Your S3 data is safe, but an EC2 instance could fall over at any time without much warning.  Apparently Renkoo is beta testing under non-disclosure some technology that makes this better, although Joyce couldn’t talk about it.  More later.

Something she mentioned that the others echoed is that disk access for EC2 is very slow.  Trying to get your data into memory cache is essential, and writes are particularly slow.  Again, more on the database aspects in a minute, but help is on the way.

Sean Knapp, President of Technology, Ooyala

Ooyala is a cool service that let’s you select objects on high quality video.  The demo given at Startup Day was clicking on a football player who was about to make a touchdown to learn more about him.  Sean spent most of his preso showing what Ooyala is.  It is clearly an extremely impressive app, and it makes deep use of Amazon Web Services to virtually eliminate any need for doing their own hosting.  The message seemed to be if these guys can make their wild product work on Amazon, you certainly can too.

Don MacAskill, CEO, Smugmug

I’ve been reading Don’s blog for a while now, so I was pleased to get a chance to meet him finally.  Smugmug is a high end photo sharing service.  It charges for use SaaS-style, and is not an advertising supported model.  As I overheard Don telling someone, “You can offer a lot more when people actually pay you something than you can if you’re just getting ad revenue.”  Consequently, his customer base includes some tens of thousands of professional photographers who are really picky about their online photo experience.

Smugmug has been through several generations of Amazon architectures, and may be the oldest customer I’ve come across.  They started out viewing Amazon as backup and morphed until today Amazon is their system of record and source of data that doesn’t have to be served too fast.  They use their own data center for the highest traffic items.  The architecture makes extensive use of caching, and apparently their caches get a 95% hit rate.

Don talked about an area he has blogged on in the past, which is how Amazon saves him money that goes right to the bottom line.

Don’s summary on Amazon:

  • A startup can’t go wrong using it initially
  • Great for “store a lot” + “serve a little”
  • More problematic for “serve a lot”

There are performance issues with the architecture around serve a lot and Don feels they charge a bit too much (though not egregiously) for bandwidth.  His view is that if you use more than a Gigabit connection, Amazon may be too expensive, but that they’re fine up to that usage level.

His top feature requests:

-  Better DB support/persistence

-  Control over where physically your data winds up to avoid the “my whole rack died” problem that Joyce Park talked about.

The Juicy Stuff and Other Observations

At the end of the startup presentations, they opened up the startup folks to questions from the audience.  Without a doubt, the biggest source of questions surrounded database functionality:

-  How do we make it persist?

-  How do we make it fast?

-  Can we run Oracle?  Hmmm…

It’s so clear that this is the biggest obstacle to greater Amazon adoption.  Fortunately, its also clear it will be fixed.  I overheard one of the Amazon bigwigs telling someone to expect at least 3 end of year announcements to address the problem.  What is less clear is whether the announcements would be:

a)  Some sort of mySQL service all bundled up neatly

b)  Machine configurations better suited to DB use:  more spindles and memory was mentioned as desireable

c)  Some solution to machines just going poof!  In other words, persistence at least at a level where the machine can reboot, access the data on its disk, and take off again without being reimaged.

d)  Some or all of the above.

Time will tell, but these guys know they need a solution.

The other observation I will make is one that echoes Don’s observation on Smugmug:  I’m sure seeing a lot of Mac laptops out in the world.  3 of the 4 presenters were sporting Macs, and 2 of them had been customized with their company logos on the cover.  Kewl!

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in Partnering, Web 2.0, amazon, data center, ec2, grid, multicore, platforms, saas, software development, strategy, venture | 12 Comments »

What’s the Killer App for Multicore?

Posted by smoothspan on September 11, 2007

I saw this question posed on MulticoreEra, so I thought I’d take a crack at it.

Let me start out with a contrarian view of the whole Multicore Thing.  Many have called it the Multicore Crisis, and I am definitely part of the “Crisis” crowd because I think software is not keeping up with hardware.  It’s too hard to write software that takes advantage of a lot of cores.  So here is my contrarian proposition: if it’s too hard, maybe the first killer app won’t be parallel at all.  Maybe it takes a lot of serial killer apps and just a few parallel killers to get us started.

Your desktop suddenly sprouts 4 cores.  Windows uses 1, your current app uses another, maybe something in the background uses a third, but what do you do with the 4th core?  And next year, what to do with cores 5, 6, 7, and 8?  I suggest we look for the Killer App to be something that runs in the background all the time, has an insatiable appetite for cpu cycles, and is completely indispensible: once we get it, we can’t live without it.  That app may or may not be parallel itself.  If its not, we’ll just run a bunch of differnet ones on the different cores to take advantage.

One last disclaimer:  lots of Multicore Killer App possibilities for the server end, this post is focused on the desktop.

Without further ado, here is my list of possibilities:

Be Speculative

What if your computer was always one step ahead?  Speculative execution has been a mainstay of cpu design for a long time, but this would be speculative execution of software on your PC.  What the system needs is some idea of what you might want to do next.  This could be as simple as knowing which files you’ve accessed recently.  Based on that, the system would know which apps you’re most likely to run.  Imagine if on startup, each core took one of the most popular apps you like to run and started to bring it up behind the scenes.  If you actually ask for the app, it pops up immediately because it was already there. 

The same trick can be employed with data.  Let’s get the data I’m likely to want into RAM and off the disk (or off the Net).  It will be much faster to access it there if it’s called for.

Programmers Need Compiler Farms/Testers Need Test Farms

This is one of those insatiable desire problems if ever there was one.  Make the compile of a single file parallel or running a single test script in parallel might be hard.  However, most programs have many files and many test scripts.  With a little creative scheduling, we can do a lot of that work in parallel even though most of the software uses only one core at a time.

Indexing

If you’re like me, anything that makes it easier to find what you’re looking for on your computer would be indispensible.  But since we’re talking about exotic multicore stuff, how about getting a little more exotic and creative?  I’m thinking of a background task that trolls through your digital photos and uses face recognition to provide an index of who is in the picture.  How cool would that be?  You’d start this program out with a nucleus of starter definitions.  Perhaps your family members and closest friends would be identified manually in a few pictures.  The program would then go off looking at pictures and looking to do a couple of things:

1)  Identify the face regions.  It creates a list of cropped rectangles for each picture, one rectangle per face.

2)  Identify faces.  It matches the regions to known faces.  There’d be a relevancy ranking:  72% chance this one Aunt Jane and only 17% chance its really Uncle Joe.

3)  It asks the user to confirm any faces where the relevancy is too low.  In so doing, it would learn how to identify a particular face better.  Likewise, you’d be able to easily correct its mistakes and it would learn there too.

4)  It builds a list of unidentified faces and ranks them by frequency.  Periodically, it would ask you to identify those that are the most common unknown persons.

Now, any time you want to look at your photos, you’d get captions to go with the photos telling who the people are.  The technology to do this exists today, and it seems to me would make for a killer app.

Tend the Garden

While we’re indexing, there’s a lot of other tasks one could undertake tending the garden of local data.  Clearly I can be on the lookout for malicious offenders: viruses, spyware, and all that sort of thing.  I can rearrange the data on the disk to defrag it and optimize it for better performance.  I can back it up over the web to keep it safe.  I can prune by looking for candidates for deletion.  I can encrypt to keep prying eyes from accessing the data too easily.

Hang Out a Shingle

Even if you don’t have software to take advantage of extra cores, maybe someone else does.  You can hang out a shingle (digitally, of course) whenever you have idle cores for rent.  Anyone whom you allow to share your cores gets to use some of them.  Perhaps it works along the lines of a Hadoop.  Heck, you could envision a search firm letting it’s users do both the web crawling and the search indexing during the normal course of their web browsing.  What a good opportunity to periodically ask them a question to help make results better.

Endless Possibilities, OS Help Wanted

There are endless possibilities here.  Right away it seems that we’ll want some sort of OS help to enable proper sharing of system resources among all these different apps that have suddenly sprouted.  The thing I’d worry most about is disk access.  When my antivirus program fires up it ruins performance of most other apps by hogging the disk.  The OS knows which app has focus and needs to be smart about parceling out that scarce disk resource.  Greater sophistication will be needed there to fully maximize the Multicore Experience.

Posted in grid, multicore, ria, user interface | No Comments »

Using Mindmaps to Explore User Interaction and Create QA Test Plans

Posted by smoothspan on September 9, 2007

Using Mindmaps to Explore User Interaction is a cool idea.  Basically, you depict the flow through the UI as a visual map:

WordPress UI Mindmap

Doing so can give you an instant idea of how hard or easy it is to navigate the UI to key functionality.  I would venture to add that studying the “traffic patterns” along the mindmap’s highways and optimizing them the way you would a web site is probably a good way to tune up a UI.

Now here is a newsflash:  you can buy a piece of software that will automatically generate these mindmaps.  It’s called Rational Test Factory.  Here is a UI Map from Test Factory:

Test Factory UI Map

Same principle, with each UI object called out inside its container (window) and links showing how you navigate.  Test Factor is a product I built once upon a time at a startup I founded that was called Integrity QA Software.  There were a number of unique aspects to the product.  First was its ability to “spider” or crawl a user interface and produce these maps.  Second, it had the ability to automatically create test suites by using the map as a template to generate tests. 

QA people take note, because these maps are good for a lot besides UI design and optimization.  A mind map like this of a UI is exactly the map you need to plan your testing campaign!  I could see putting a big map up on a wall and using it to cross reference your test plans and see that you were getting proper coverage across the whole UI.

A third thing we did that was very powerful was we defined behaviours at each node.  A button is obvious, but menus and type-ins are more interesting.  For example, we made it easy to cycle through a bunch of test files.  The type-in for the file name would be given a property that was a list of test files.  Push a button, and a script got generated to crank through all those files.  Being able to define allowable inputs (for example, what is the range of valid numbers) is hugely powerful for testers.  But it would also be hugely powerful for UI and even program design.  This tool was aimed at testing, but at some point we planned to use it as a UI design tool and do code generation to create a UI from one of these maps.  Never got to it.

The fourth really cool thing this product did was to use Genetic Algorithms to generate optimal test scripts.  This was some seriously cutting edge stuff.  The UI maps gave us templates and made it easy to generate test scripts.  The role of the Genetic Algorithm was to evolve better and better scripts.  “Better” was defined by a fitness function.  We used code coverage and script length.  The shortest script that tested the most code was deemed the best script.  QA people loved this thing.  If the developer’s changed the UI, they could just remap and regenerate new scripts.  Add new code?  Regenerate scripts to test the new code.

The last way cool thing was that this was the first product I build with a grid architecture.  We had a demo room with a rack of 4 by 4 (16) PC’s and two really comfortable chairs set in front of the rack.  On a table between the chairs was the 17th PC that served as a control console.  We’d sit a customer down in one chair and the demoer in the other.  The customer’s software would be loaded.  Hit a button and all 16 machines started mapping the UI in parallel.  The customer’s eyes would literally bug out watching all that activity on their familiar product.  Then testing would begin.  Bugs would start being reported on the console.

I haven’t kept up with Rational’s doing with the product, but at the time it wasn’t suited to testing web applications because it was built before there were many.

It certainly is a fabulous example of how to use multicore to solve a hard problem.  The graphical programming aspect is also intriguing.  It begs for that UI design and code generation tool we never built.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in grid, multicore, ria, software development, user interface | 2 Comments »

A Multicore Language Timetable (Waste More Hardware!)

Posted by smoothspan on September 8, 2007

What!  You can’t be serious!  You actually want me to waste my hardware?!??

The logic of this all hit home as I read a line from Patrick Logan’s excellent blog:

Today we are wasting hardware running garbage collectors in order to save developer ergs. Increasingly we need to be wasting hardware running large numbers of small processes.

Java (one of the curly braced languages of note) is one language that wastes hardware to run a garbage collector.  A wickedly clever program written in C or C++ that does all of its own memory management can run faster than the Java equivalent, yet it will take a lot longer to write and test that program, and you’ll need much more skilled coders.  Hence Java’s garbage collection is a waste of hardware (because it isn’t the absolute highest performance solution) yet it is worth it because it saves the programmers so much time.  One could argue that garbage collection is Java’s biggest advantage over C++ (though others like portability also come to mind).

The same will be true when it comes to solving the Multicore Crisis and making your software scale properly to take advantage of as many cpu’s and cores as are available.  The equivalent of wickedly cool programming in C will be continued use of the curly braced languages:  they don’t waste enough hardware to make it easy to do multicore programming.

Why is it so important to favor use of multiple cores over efficiency?  Let’s get calibrated on speed first.  There’s a site called Computer Language Benchmarks Game that collects benchmarks by language.  We’ll just use it as a relative scale for some important languages, so it doesn’t matter if you have a different benchmark, the principle is the same. 

What I want to understand is how soon your favorite “inefficient” language is just as fast as the “efficient” language if the “inefficient” language is programmed for multicore and the “efficient” language is not.  I know that’s clear as mud, but bear with me as we look at this result:

Multicore Language Timetable

Language Timetable

Here’s how we use the timetable.  Languages like C, C++, Java and C#, all run pretty much the same today.  I have therefore assigned them 1 core.  So long as we’re running on 1 core, for most applications, it just won’t matter from a performance standpoint which of these languages we choose.  Now look at Smalltalk and Erlang.  These are pretty exotic langauges.  Most people probably think they are pretty slow compared to Java, although Erlang is designed to be really good at using lots of cores.  If Erlang can use 6 cores as easily as Java uses 1 core, then for the same level of programming effort, Erlang and Java are equivalent in performance.  Since we can get 8 cores pretty easily today (Intel is already phasing out some 4 cores!), I show the year of their equivalence as now and they’re both in the “green” zone.

If Moore’s Law holds, in just 2 years, we will have access to 16 cores on a regular basis.  So, by 2009, Python, Perl, and PHP will run just as fast in 16 cores as Java does on a single core, assuming the are parallelized and Java isn’t.  Another 2 years after that and we see JavaScript and Ruby being just as fast.

Note that I am not saying any of these languages make multicore any easier!  (but some do)

I’m also not saying that in 4 years JavaScript will wipe out Java because of the Multicore Crisis!  (but some say it could)

I’m simply suggesting that multicore will be a great leveler of languages, and that if there is a language that makes parallel programming a lot easier, that language will start to have more and more advantage over time.

It is interesting also to look at what happens when a language that’s good at using cores gets ahead.  Erlang is one such.  I’ve said it is equivalent in performance today to Java/C++.  In 2 years, it will now look twice as fast as Java/C++ because it can keep using cores.  How much faster does it have to be before there are such compelling economic advantages that we have to abandon the old languages?

Now let’s drop the other shoe.  One of the reasons people tolerate the relatively poorer performance of languages at the bottom of the list like Ruby, JavaScript, and PHP, is that they’re fast enough AND they are much easier and more productive than the old curly braced languages.  In other words, we are once again wasting some hardware in order to save developer ergs as Patrick Logan suggested.  Interestingly, the increased productivity of the LAMP stack has made venture capital much cheaper for Web 2.0 companies and enabled an explosion in innovation.  Those economic forces I mention may get here sooner than we think, and multicore can skew those economics mightily because it harnesses Moore’s Law to do its bidding. 

Imagine how potent a language will be if it can both save lots of developer ergs ala Ruby and make parallel programming much easier.  That language would bust open the doors of change.

Postscript:

Some comments folks have made on the side:

-  You never get 100% parallelization.  Absolutely true, but there are many problems that are linearly scalable, such as sorting.  The lack of 100% parallelization becomes more of a bit of constant friction than a degradation in the O(n) speedup for those problems.  There are a bunch of folks who have succeeded at this, not the least of which would be Google which depends on it.  What does it mean for the article?  Perhaps a slight delay in the time when parallel code on a slower language beats fast but non-parallel code.  Realistically, that delay is at most 1 Moore Cycle or 2 years.  Many of the anecdotal cases that didn’t get 100% parallelism you have to wonder how well parallelized the code really was.  There is a developing theory that is identifying which problems can be parallelized and which cannot.  Developing for Developers has a good article on P-Completeness and the limits of parallelism.

-  Java has decent concurrency constructs and will get better.  Opinions strongly vary over whether the constructs are good.  Dr Dobbs has a good discussion of some techniques for parallelism that’s anti-thread for example.  Let’s just settle for the idea that Java’s constructs make concurrency possible, but not easy.  The point here is that a language that is somewhat slower than Java, but that makes concurrency much easier can win.  Unless of course someone figures out how to make concurrency easy in Java.  The track record is not that good, however.

If you take away nothing else here, my point is that language efficiency will matter exponentially less than multicore efficiency in the very near term where performance is concerned. 

Related Articles:

A Picture of the Multicore Crisis

You’ve Already Had a Multicore Crisis

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in Web 2.0, grid, multicore, platforms, software development, venture | 8 Comments »

Twitter Experienced a Multicore Crisis…

Posted by smoothspan on September 7, 2007

From their blog:

During our scheduled maintenance window we experienced some unexpected heavy traffic at 3a PT. We’ve been working throughout the night to resolve the issues and get back to 100% functionality as soon as possible.

Sounds like a Multicore Crisis.  Been a lot of it going around lately!

Related Articles:

 As Paul Harvey says, here is the rest of the story.  Twitter did have a multicore crisis and rearchitected to move past it.

Posted in Web 2.0, multicore | No Comments »

A Picture of the Multicore Crisis

Posted by smoothspan on September 6, 2007

I’ve written that most people Have Already Encountered the Multicore Crisis But Just Didn’t Realize It.  Now I thought I would post a chart that shows graphically how the crisis has been here for longer than people might suppose:

Clock Speed Timeline

The Multicore Crisis has to do with a shift in the behavior of Moore’s Law.  The law basically says that we can expect to double the number of transistors on a chip every 18-24 months.  For a long time, it meant that clock speeds, and hence the ability of the chip to run the same program faster, would also double along the same timeline.  This was a fabulous thing for software makers and hardware makers alike.  Software makers could write relatively bloated software (we’ve all complained at Microsoft for that!) and be secure in the knowledge that by the time they finished it and had it on the market for a short while, computers would be twice as fast anyway.  Hardware makers loved it because with machines getting so much faster so quickly people always had a good reason to buy new hardware.

Alas this trend has ground to a halt!  It’s easy to see from the chart above that relatively little progress has been made since the curve flattens out around 2002.  Here we are 5 years later in 2007.  The 3GHz chips of 2002 should be running at about 24 GHz, but in fact, Intel’s latest Core 2 Extreme is running at about 3 GHz.  Doh!  I hate when this happens!  In fact, Intel made an announcement in 2003 that they were moving away from trying to increase the clock speed and over to adding more cores.  Four cores are available today, and soon there will be 8, 16, 32, or more cores.

What does this mean?  First, Moore’s Law didn’t stop working.  We are still getting twice as many transistors.  The Core 2 now includes 2 complete CPU’s for the price of one!  However, unless you have software that’s capable of taking advantage of this, it will do you no good.  It turns out there is precious little software that benefits if we look at articles such Jeff Atwood’s comparison of 4 core vs 2 core performance.  Blah!  Intel says that software has to start obeying Moore’s Law.  What they mean is software will have to radically change how it is written to exploit all these new cores.  The software factories are going to have to retool, in other words.

With more and more computing moving into the cloud on the twin afterburners of SaaS and Web 2.0, we’re going to see more and more centralized computing built on utility infrastructure using commodity hardware.  That’s means we have to learn to use thousands of these little cores.  Google did it, but only with some pretty radical new tooling.

You’ll see more from me on the multicore topic (you can always click it in the tag cloud on the left), but I did this chart on a lark and thought it showed the problem pretty clearly so I wanted to get a post out.

Related Articles:

Multicore Language Timetable

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in grid, multicore, software development | 19 Comments »