Spark's Work

Inconsistent Package Names in Python: pip install and import

A few days ago, I was trying to do some heavy-duty data analysis in Python. Naturally, it would be more efficient to structure my scripts to run in parallel with the following dependency.

import multiprocessing

Besides, I also need to run my scripts on a server that requires installing package dependencies. However I tried to run pip install multiprocessing, I would get error messages that simply did not make sense. It turns out that I need to run the following

pip install multiprocess

Now isn't this ridiculous!

Apparently the issue of inconsistent package naming is well known in Python (see here), but before finding this page out, I have already wasted well over half an hour trying to figure out if there was something wrong with the package, or my server, or my Python installation, etc.

A similar issue occurred again later when I tried to install the pickle package: it's actually pip install pickle-mixin, but this time I was quick to figure out the cause and resolved it pretty easily.

This is very annoying because I have never encountered such problem in either Julia or R which have consistent package namings. I suppose I will just have to make this note and always keep it in mind.