Spark's Work

Choosing between .ipynb and .py Files

Python is currently the standard tool to use when it comes to data science in the industry. When it comes to data analysis and experimentation of models, perhaps Jupyter notebooks (.ipynb files) are the first choice for many data scientists. A notebook is interactive and easy to modify, which is perfect for running ad-hoc analysis and trying out different things with the data.

However, my issue with .ipynb is the difficulty in version control (although not entirely impossible, see here). In addition, whenever a set of code in notebook is ready to be organized and used, one would need to refactor the notebook into separate Python scripts, which introduces the possibility of errors and inconsistencies.

Recently, I have gradually shifted to using .py scripts as an alternative to Jupyter notebooks, thanks to the interactive mode for Python in VSCode (see here). Just as any scripts, it is now much easier to version control (and lint) my code, while also keeping the interactive component by having a separate tab for ad-hoc Python code. In addition, it is much easier to refactor .py code (used as an alternative to notebooks) into .py scripts (which are actually intended to be run multiple times).

Meanwhile, data science work is never a one-man job, so it is important to be considerate of what others prefer to use or the established ways of work. Fortunately, there are many tools that allow for easy conversion between .ipynb and .py files.