Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing
Posted by Bob Warfield on September 14, 2007
There’s been a lot of back and forth in the Python community over something called the “GIL” or Global Interpreter Lock. Probably the best “get rid of the GIL” argument comes from Juergen Brendel’s post. Guido, the benevolent dictator of Python has responded in his own blog that the GIL is here to stay and he doesn’t think it is a problem nor that it’s even the right choice to try to remove it. Both combatants have been eloquent in expressing their views. As is often the case, they’re optimizing to different design centers and likely will have to agree to disagree.
Now let’s try to pick apart this issue in a way that everyone can understand and make sense of for large scalability issues in the world of SaaS and Web 2.0. Note that my arguments may be invalid if your scaling regime is much smaller, but as we’ve seen for sites like Twitter, big time scaling is hard and has to be thought about carefully.
First, a quick explanation on the GIL. The GIL is a bit of code that causes multiple Python threads to have to wait before an object can be accessed. Only one thread may access an object at a time.
Whoa! That sounds like Python has no ability to scale for multiple cores at all! How can that be a good thing? You can see where all the heat is coming from in this discussion. The GIL just sounds bad, and one blogger refers to it jokingly as the GIL of Doom.
Yet all is not lost. One can access multiple cpu’s using processes, and the processes run in parallel. Experienced parallel programmers will know the difference between a process and a thread is that the process has its own state, while threads share their state with other threads. Hence a thread can reach out and touch the other thread’s objects. Python is making sure that when that touch happens, only one thread can touch at a time. Processes don’t have this problem because their communication is carefully controlled and every process has its own objects.
Why do programmers care about threads versus processes? In theory, threads are lighter weight and they can perform better than a process. We used to argue back and forth at Oracle about whether to use threads or processes, and there were a lot of trade offs, but it often made sense to go for threads.
So why won’t Guido get rid of the GIL? Well, for one thing, it was tried and it didn’t help. A new interpreter was written with fine-grained locking that minimized the times when multiple threads were locked out. It ran twice as slow (or worse on Linux) for most applications as the GIL version. The reason is that having more lock calls was slower: lock is a slow operating system function. The way Guido put this was that on a 2 processor machine, Python would run slightly faster than on a single processor machine, and he saw that as too much overhead. Now I’ve commented before that we need to waste more hardware in the interest of higher parallelism, and this factor of 2 goes away as soon as you run on a quad core cpu, so why not nix the GIL? BTW, those demanding the demise of the GIL seem to feel that since Java can run faster and supports threads, that the attempt at removing the GIL must have been flawed and there is a better way.
I find myself in a funny quandry on this one, but ultimately agreeing with Guido. There is little doubt that the GIL creates a scalability speed bump, but that speed bump is localized at the low end of the scalability space. If you want even more scalability, you still have to do as Guido recommends and use processes and sockets or some such to communicate between them. I also note that a lot of authorities feel that it is also much harder to program threads than processes, and they call for shared nothing access. Highly parallel languages like Erlang are focused on a process model for that reason, not a thread model.
Let me explain what all that means. Threads run inside the same virtual machine, and hence run on the same physical machine. Processes can run on the same physical machine or in another physical machine. If you architect your application around threads, you’ve done nothing to access multiple machines. So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.
As Donald Knuth says, “premature optimization is the heart of all evil in programming.” Threads are a premature optimization when you need massive scaling, while processes lead to greater scalability. If you’re planning to use a utility computing fabric, such as Amazon EC2, you’ll want processes. In this case, I’m with Guido, because I think utility computing is more important in the big picture than optimizing for the cores on a single chip. Take a look at my blog post on Amazon Startup Project to see just a few things folks are doing with this particular utility computing fabric.