There is a highly strategic thing to do when dealing with multi-developer and multi-branch project, especially if you indent to start automating the merge between branches: enforce a strict structure of the import statement in you Python script, and especially you import one and only one element per line. Here is why.
Some times, you can get an exception that occurs on a recurrent basis but really hard to reproduce. By the look and feel of such issues, and also after deep study of the logs, it can appear this is a very classic race condition case, where you have several threads that want to access to a data while this data is being changed by someone else.
The obvious answer to such issue is to implement a Lock, where data is locked each time anyone want to access it, preventing any other to read or write this share while it is “locked”. A more clever approach is to lock only on write: Readers/Writer Lock.
If you are using
mutlithreading in Python, you have some options. But if you are using Twisted and want to lock a share that are access but several, concurrent deferreds, using this pattern, you don’t have any other option than to develop your own. I present in the rest of this post the module I have developed to bring Readers/Writer Lock to Twisted: txrwlock.
I am working on a set of patches for the Apache Spark project job to ease the way to deploy complex Python program with external dependencies. One should be able to deploy job as easy as it should be, and Wheels make this job really easy.
Deployment is never a fascinating task to do, we as developer want our code to work in production exactly how it does on our machine. Python was never really good at deployment, but in recent years, it became easier and standardized to package a project, describes in a unified way its dependencies and have them installed properly with Pip, isolated inside virtualenv. It is however not obvious at first sight for non-Pythonista experts and there are several tasks to do to make everything automatic for Python package developer, and so, for a PySpark developer as well.
I describe in this blog post some thoughts on how PySpark should allow users to deploy Python applications and no more simple Python scripts, by handling Wheels and isolated virtual environments.
The main idea behind this proposal is to let developers handle the Python environment to deploy on executors instead of being jailed by what is actually installed on the Spark’s Executors Python envionment. If you agree with this approach, please add a comment in the JIRA ticket for speeding up the integration inside Spark 2.x soon.
I currently have 6 pull requests on Apache Spark projet, mostly code housekeeping on PySpark module… some opened since 2 months !
This post describes an idea for a data processing framework built on Python for Data Processing project inspirited by state-of-the-art actor systems such as Akka. It is a bit like a restricted version of a lambda architecture.
This could be used in ETL, data extraction or any custom warehouse process, where data are pushed on pulled from one side, need some obscure processing which can involve getting more data from somewhere else, and then stored in a database or storage area.
I don’t have a name for this framework, I like how “Akka” is a small palindrome name. Maybe I’ll find a nice palindrome name in a near future.
I’ll start with a high overview of the Lambda Architecture and Actor Model where I found some inspiration and then describe the variation I would like my system to be from this model.
I am learning Scala.
I am sad to see that, again, like OCAML, like Rust, like CoffeeScript, Scala does not like the
return statement. It is implicit as the evaluation of the latest expression in a block.
Some languages allow, or even force you, to declare your return statement in your function without any visual difference, only by being the last statement in the function. This is what we can call implicit return.
This article gives 8 reasons on why this is a bad design and should be avoided at all costs for general code, and give the only acceptable case where return might be avoided. For readability and maintainance sake, always use the
return keyword of your language in your functions. Always.
Markdown and reStructuredText (ReST) are two heavily used markup languages on the WWW. While I would have expected at first that the markup language from Wikipedia to spread due to the high popularity of the web site, it is actually two other ones which became successful.
Markdown is heavily used in the blog world, while ReST comes from the Python world. However, Markdown suffers from fragmentation and incompatibility in implementations, while ReST as a good specification and hooks for extensions.
I personnaly highly prefer RST. The following page will compare both markup syntaxe with a highly subjective point of view in flavor of ReST.
Since I heard a presentation about the Rust Language (actually, read it on Linuxfr), I’ve became really interested in this new language. It have lot of good promises and if it does so properly, I hope it will become my main programming language.