comment 0

Readers/Writer Lock for Twisted

twisted_rwlock_400x400Some times, you can get an exception that occurs on a recurrent basis but really hard to reproduce. By the look and feel of such issues, and also after deep study of the logs, it can appear this is a very classic race condition case, where you have several threads that want to access to a data while this data is being changed by someone else.

The obvious answer to such issue is to implement a Lock, where data is locked each time anyone want to access it, preventing any other to read or write this share while it is “locked”. A more clever approach is to lock only on write: Readers/Writer Lock.

If you are using mutlithreading in Python, you have some options. But if you are using Twisted and want to lock a share that are access but several, concurrent deferreds, using this pattern, you don’t have any other option than to develop your own. I present in the rest of this post the module I have developed to bring Readers/Writer Lock to Twisted: txrwlock.

Reminding about Deferreds and multithreading

Twisted is a very good framework that assume most software actually spend its time waiting for external events, like requesting a URL and waiting for its response. It  allows developers to run several logical flow of instructions, that will run inside a single loop, calling reactor. This flow of instruction is usually executed inside a Thread that is provided by the operating system. The “Reactor” acts like a “minischeduler” and runs in a single system thread. This is called green thread, coroutine, promise or deferred, and this allows to simulate mutlithreading within a single real thread.

A call to a coroutine / promise / deferred does not execute the code of the function, but it returns an object with all information to execute this method later.

For a very good tutorial about async/await, see this tutorial.

When we have several “parallel” flows of instructions, green thread, threads or multi processing, things get messy when data that are shared between all of them flows need to be changed.

Twisted Deferred

A “Twisted program” is a program that uses Deferreds, which is a called “promise” on Javascript or “coroutine” on AsyncIO. It is the unit of execution that is scheduled by the Twisted Reactor (equivalent to the “event loop” of AsyncIO)

Monolithic program:

drawit-diagram-8

Multithreading program:

drawit-diagram-10

As you can see, threads can be put at rest (waiting for network request, and so on).

Twisted will “mix” several fake thread (“green thread”) inside a the main Reactor thread, scheduling each “block” one after the other, and executing code from another block (“deferred”) while the other are waiting, typically for external resources. The main thread can also be in idle mode when no deferred are to be executed (for example, when a web server has no input request to handle).drawit-diagram-9We see on this diagram that all jobs are “interleaved”, and the unit of execution in this case is the Deferred.

A deferred cannot be interrupted by the reactor, but between 2 chained deferred, the reactor might scheduler another deferred from another greenthread.

Said otherwise, a data cannot be changed during the execution of a single deferred (if no multithreading is used on the program).

Inside a Twisted reactor, the program is actually divided into chain of deferreds that are dynamically generated in reaction to an event, generally a network requests.

drawit-diagram-11

Nothing prevent, for example for a response to HTTP server, to have only one single big deferred:

drawit-diagram-4

But if the process is more complex, or may involve other network requests, we do not want to block the reactor so other requests can be processed while waiting for the external resources.

drawit-diagram-3

Twisted and Asyncio provides an easy way for transparently chaining deferred/future. In the rest of this paper, I will only speak about the inlineCallbacks syntax. I don’t like dealing with addCallback manually unless forced to a limitation from inlineCallbacks.

Here how it looks:

This will be translated actually to 3 chained deferreds like this:

drawit-diagram-12

Let’s take a simpler example with data access:

At line 8, we call a function and at line 9 we yield another deferred. The call to aSynchronousFunction will be executed directly and the reactor will not be able to interrupt it. I am not speaking about interruptions by other threads or processes that can interrupt  aSynchronousFunction at any given time of. But until the end of the execution of line 6, the flow of execution is continuous. We call it a synchronous call.

At line 9, the call to anAsynchronousFunction looks different with the yield keyword. This is how asynchronous calls are made using the “inlineCallbacks” pattern on Twisted. It basically tells to the reactor “I need to call this method, which is a special method that return a deferred, so you can schedule any other deferred in between if you need”.

inlineCallbacks and asyncio are wonderful, because both hides the complexity of the management of deferreds into Python code that looks like classic function.

Like we can see on the example above, what will occur if someone else (another deferred from another greenthread) have changed a ? We might want to use the same data (here, server address and port) during the whole execution of myFunc. To do so, we can copy the structure to a local variable or implement a lock.

Sharing Data

Deferred has the same problem about data sharing like with multi-threading. When deferred are chained, data might change from one deferred to another one. In the above example, the a variable might be changed another deferred than  anAsynchronousFunction, thanks to the reactor scheduling.

Multi-threading provide semaphore, locks and mutexes to protect data from being changed when it should not. It allows temporarily restrict the ability to access, read or write, some data.

Twisted also provides a simple lock mechanism, to avoid two Deferreds to read or write at the same time: DeferredLock. But when we have lot more readers deferred than writer, using DeferredLock will block each time one deferred request read access. We cannot have multiple readers accessing to the same time to this data. When data are just read, simultaneous readers can read data at the same time. Locking will have too much bad impact.

Reader/Writer Lock Pattern

This synchronization primitive allows to let any readers read data in “parallel” (in the Twisted reactor), but once a writer arrives to change the protected data, all incoming readers start being stacked in a waiting queue. The writer itself waits for all reads that are underway to fully execute until no reader is left. If another writer arrives, it is stacked and will wait for the complete execution of previous writer.

drawit-diagram-13

So, this mechanism ensure readers can be executed at the same time, and no reader is executed when a writer want to change anything.

I have written a Deferred version of the Readers/Writer Lock for Twisted. It has been highly inspirated by this Python Recipe. It can be found on this Github repository.

It is well suited for protecting a share that needs to stay coherent inside a deferred even if there are several deferred calls, and where there are lot of readers of this share and much less writers.

Usage

An Inlinecallbacks deferred that needs “read” access to a share use the following pattern:

An Inlinecallbacks deferred that needs “write” access to a share uses the following pattern:

 

Extra

You might want to be really careful when starting dealing with locks, you will have the same issues than with normal thread, for example Dead Locks. You can read this blog post too if  you don’t like threads at all.