Does Python packaging have a left-pad problem?
How did this happen, and what is code reuse?
Could this happen with Python packaging?
In short, yes. Imagine for a moment that the requests library, which is one of the most use libraries in the Python world, was suddenly taken off line. What would the consequences be? Immediately many Continuous Integration (CI) tasks would fail. Modern software development loves to use CI as a way to validate changes made to software. In the Python community CI is a way of life for many projects, and many rely on requests (and other packages) to help abstract away the low-level details of the problem they are working on. In the case of left-pad, there were several options to choose from. The code was small enough that each project could have implemented its own version easily. In the case of requests, this would not be such an easy proposal. Requests is wrapping the functionality of urllib/urllib2/urllib3 in a beautiful way. Recreating that per-project would not be quick or bug free. If groups tried to recreate a common version of it you would most likely witness several splinter versions of the code coming into existence. This is not exactly uncommon in the open source world. The end results… well, for this thought experiment I feel was can assume the greater good would win out, but the important thing to realize is that it would take time to happen.
So what is the solution to this problem?
The world needs things like npm and pypi. They are essential to the growth of their respective communities. At the same time, the authors of the programs there should be free to take their creations out of circulation (if they so choose). I had a co-worker once who was militant about not relying on 3rd party sites like pypi. He took it to the extreme of checking each dependency into our version control. For somethings like (python packaging) this didn’t seem too terrible, it was a good way to track the versions of these libraries we were using. For other things (compiled binaries of major open source projects)… this seemed really extreme. Over the years I’ve seen some approaches that seemed like a nice middle ground. Specifically with
pip --freeze > requirements.txt which will produce a text file that lists out all of your dependencies and their explicit versions. I’ve also seen an approaches that used specific commit hashes to ensure that the code itself was from a known state (e.g. the code could not have been tampered with or the commit blockchain would have been invalidated). But those approaches don’t help with when the source of the library just disappears. In fact, the blockchain approach would prevent someone from substituting in a new version of the library because the commit hashes would not match up. (That is usually considered a good thing, but in this case it would be a pain because everyone in the world would suddenly have to update their dependencies to reflect this new hash.) A possible solution to this is what the Go Language (go-lang) is moving to: a vendors package. While attending a talk by Kelsey Hightower at the Great Wide Open conference, I heard him give a brief overview of how the vendors package was born out of a desire to keep malicious code out of go-lang projects at Google.
Side note: go-lang projects compile into a binary that contains all of the libraries needed for the code to execute. There is no dynamic linking at run time.
As a result, any libraries that the program needs can be checked into the vendors directory and then it becomes a part of the project (under its version control). [caption id=”attachment_611” align=”aligncenter” width=”256”] Maybe this little fella can help out.[/caption] This hearkens back to my coworker’s idea of putting everything into version control. I don’t know the full details of how this works in go-lang, but it still makes me a little uneasy about having a ton of other people’s code living in my repository. But at the same time it will prevent the left-pad problem of disappearing code.
Python packaging is not immune from the problems experienced in npm recently. While code reuse is a great thing, care must be taken to ensure that builds of our software are truly repeatable. Perhaps if Python follows the example of go-lang and creates a similar “vendor” standard then python packaging problems will become a thing of the past.tags: