Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing

September 2007
M	T	W	T	F	S	S
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Posted by Bob Warfield on September 14, 2007

There’s been a lot of back and forth in the Python community over something called the “GIL” or Global Interpreter Lock. Probably the best “get rid of the GIL” argument comes from Juergen Brendel’s post. Guido, the benevolent dictator of Python has responded in his own blog that the GIL is here to stay and he doesn’t think it is a problem nor that it’s even the right choice to try to remove it. Both combatants have been eloquent in expressing their views. As is often the case, they’re optimizing to different design centers and likely will have to agree to disagree.

Now let’s try to pick apart this issue in a way that everyone can understand and make sense of for large scalability issues in the world of SaaS and Web 2.0. Note that my arguments may be invalid if your scaling regime is much smaller, but as we’ve seen for sites like Twitter, big time scaling is hard and has to be thought about carefully.

First, a quick explanation on the GIL. The GIL is a bit of code that causes multiple Python threads to have to wait before an object can be accessed. Only one thread may access an object at a time.

Whoa! That sounds like Python has no ability to scale for multiple cores at all! How can that be a good thing? You can see where all the heat is coming from in this discussion. The GIL just sounds bad, and one blogger refers to it jokingly as the GIL of Doom.

Yet all is not lost. One can access multiple cpu’s using processes, and the processes run in parallel. Experienced parallel programmers will know the difference between a process and a thread is that the process has its own state, while threads share their state with other threads. Hence a thread can reach out and touch the other thread’s objects. Python is making sure that when that touch happens, only one thread can touch at a time. Processes don’t have this problem because their communication is carefully controlled and every process has its own objects.

Why do programmers care about threads versus processes? In theory, threads are lighter weight and they can perform better than a process. We used to argue back and forth at Oracle about whether to use threads or processes, and there were a lot of trade offs, but it often made sense to go for threads.

So why won’t Guido get rid of the GIL? Well, for one thing, it was tried and it didn’t help. A new interpreter was written with fine-grained locking that minimized the times when multiple threads were locked out. It ran twice as slow (or worse on Linux) for most applications as the GIL version. The reason is that having more lock calls was slower: lock is a slow operating system function. The way Guido put this was that on a 2 processor machine, Python would run slightly faster than on a single processor machine, and he saw that as too much overhead. Now I’ve commented before that we need to waste more hardware in the interest of higher parallelism, and this factor of 2 goes away as soon as you run on a quad core cpu, so why not nix the GIL? BTW, those demanding the demise of the GIL seem to feel that since Java can run faster and supports threads, that the attempt at removing the GIL must have been flawed and there is a better way.

I find myself in a funny quandry on this one, but ultimately agreeing with Guido. There is little doubt that the GIL creates a scalability speed bump, but that speed bump is localized at the low end of the scalability space. If you want even more scalability, you still have to do as Guido recommends and use processes and sockets or some such to communicate between them. I also note that a lot of authorities feel that it is also much harder to program threads than processes, and they call for shared nothing access. Highly parallel languages like Erlang are focused on a process model for that reason, not a thread model.

Let me explain what all that means. Threads run inside the same virtual machine, and hence run on the same physical machine. Processes can run on the same physical machine or in another physical machine. If you architect your application around threads, you’ve done nothing to access multiple machines. So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

As Donald Knuth says, “premature optimization is the heart of all evil in programming.” Threads are a premature optimization when you need massive scaling, while processes lead to greater scalability. If you’re planning to use a utility computing fabric, such as Amazon EC2, you’ll want processes. In this case, I’m with Guido, because I think utility computing is more important in the big picture than optimizing for the cores on a single chip. Take a look at my blog post on Amazon Startup Project to see just a few things folks are doing with this particular utility computing fabric.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

This entry was posted on September 14, 2007 at 3:09 pm and is filed under amazon, data center, ec2, grid, multicore, platforms, saas, software development, Web 2.0. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing”

blaisorblade said

January 7, 2009 at 5:55 am
> So why won’t Guido get rid of the GIL? Well, for one thing, it was tried and it didn’t help. A new interpreter was written with fine-grained locking that minimized the times when multiple threads were locked out. It ran twice as slow (or worse on Linux) for most applications as the GIL version.

> The reason is that having more lock calls was slower: lock is a slow operating system function.
What you say would imply that to take a lock you need a (slow) system call, and that’s wrong.

The problem in CPython are reference counts. Reference counting is slow to begin with (no efficient VM uses it nowadays, see Java/.NET), and converting operations on it to atomic ones makes it even slower. No wonder Java doesn’t need a GIL. So, Python is paying the cost of refcounting twice.
Also, for an atomic increment you don’t need to call the OS, and nowadays even on Linux locks are fast, so that an uncontended lock is acquired in few instructions and one atomic ones. It’s still not for free, but much cheaper.

Reply
jdcioccio said

April 29, 2009 at 10:07 pm
Most of the expense in locking is not necessarily the number of CPU instructions required, it’s the expense of the memory barrier that needs to be put in place in order to guarantee consistency.

See http://en.wikipedia.org/wiki/Memory_barrier for more information.

Reply
- blaisorblade said
  
  September 8, 2010 at 12:57 pm
  @Jdcioccio: if locking were a system call, it would be much slower, of the order of thousands of cycles instead of around 100 ones for an atomic instruction, which includes a memory barrier (at least on x86). And when I wrote “atomic instruction”, it was implicit that it was an expensive one, because that’s obvious in the field. My bad for not being clearer.
  Also, IIRC biased locking, as used in Hotspot, allows reacquiring a lock in an even cheaper way (read barriers are probably still required, but they almost for free). Java volatile fields also allow much cheaper reads – that only require that loads are not reordered, neither in the compiler, nor by the CPU, which is infinitely cheaper (see http://g.oswego.edu/dl/jmm/cookbook.html, reading a volatile needs no StoreLoad barriers, and all other barriers are for free on x86).
  
  > BTW, those demanding the demise of the GIL seem to feel that since Java can run faster and supports threads, that the attempt at removing the GIL must have been flawed and there is a better way.
  
  I exactly agree, and I don’t have “feelings”, but concrete considerations. Without refcounting, a GIL-less Python would be much faster. That’s not enough however – the internal dictionaries must become thread-safe, and doing that with standard locks would be slow, but as argued, there are better ways, and among them there is removal of most internal dictionaries (the ones used for fields) altogether (as done by Google V8 for Javascript).
  
  Reply
How do threads work in Python, and what are common Python-threading specific pitfalls? « « Programmers Goodies Programmers Goodies said

July 4, 2011 at 5:12 am
[…] http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multico… […]

Reply

	Camels to Cars, Arti… on A Picture of the Multicore Cri…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	Start With an Audien… on The Very First Thing a Foundin…
	Breaking through the… on Reflections on Six Years of Co…

SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Blog Tools

Archives

Recent Comments

Pages

Top Posts

Recent Posts

Meta