Spark's Work

Book Review: Data Science at the Command Line

A while ago, I came across this free E-book called Data Science at the Command Line written by Jeroen Janssens. Recently, I had quick and light read. My immediate thoughts after reading it are:

Overall, the book is quite well-written: a friendly and easy introduction, abundant code examples with good explanations, and even an accompanying docker image for the readers to try out everything written in it. The book structure and writing style are definitely suitable for someone like me, coming from statistics / data analytics with little to no knowledge of many great (or in this case, basic) tools in programming.

Still, command lines cannot do everything, or I'd venture to say, most things in data analysis. It also comes down to personal preference: For me, I'd still be much more comfortable and efficient with analyzing data interactively, either running script line by line, or more conveniently in a notebook.

At the very least, when my code is run interactively, it would be much easier for me to scroll up and check what code has been run and to make changes here and there: I wouldn't say it'd be fun moving the cursor through multiple lines in the command line interface only to change e.g. one parameter in a 10-line visualization funciton. Not to mention other great utilities coming from well-developed and well-maintained tools (e.g. the Julia extension in VSCode).

That said, certain tasks are best left to be done in command lines, such as running a script to fit multiple machine learning models in the background (see e.g. here for what I have recently learned about parsing arguments in command lines). Obviously, it wouldn't be convenient to have a browser tab of Jupyter notebook on for hours just to wait for a model to be fitted!

In short, I really like this book! I'd also like to use command lines more often in my work when it actually is the most convenient and suitable tool to do certain tasks.

In the near future, I will probably compile a list of useful resources on learning and using command lines. I have already gone through two YouTube videos that are extremely useful (and so is this channel):