Spark's Work

Avoiding Jupyter Notebooks in Data Science

At the time of writing, I am almost half way in my third year working as a Data Scientist, although, in my case, that's just a fancy name for a statistician/actuary-in-name who can code a lot.

As I learn more about coding and work more with data, certain strong opinions naturally form, for example, negative opinions about R. This post is about my avoidance of the usage of Jupyter Notebooks in data science.

I initially started out using Jupyter Notebooks when learning Python (and also very briefly Spyder, which is even worse in my opinion). It was great at the beginning, but became a nightmare when I started versioning code with Git. In short, it is impossible to properly version control a Jupyter Notebook, since code and output (be it text or plots) are mixed together in a messy json file. In hindsight, this is the first warning sign that I should start shifting away from notebooks.

There are many other reasons well elaborated on by others, such as reproducibility, maintainability, and coding habits in general. This page, similar to the R one, will be a collection of arguments against Jupyter Notebooks, although I imagine it will be much shorter.