Spark's Work

Quick Conversion from Jupyter Notebooks to HTML Reports

In an earlier post, I have made some comparison between using .ipynb notebooks and .py scripts for data analysis in Python. In short, each has their own merits and best use cases.

One of my recent projects requires generating and summarizing a lot of results. For generating results, I have written flexible (e.g. with command line arguments) .py scripts which are clearly more efficient.

After running those scripts, I have hundreds of tables (in csv) and plots (in png) to summarize into reports. On a smaller scale (like just a few plots), it is easy to manually go through these output files and copy-and-paste them into a document, but this becomes infeasible when there are many files.

In this case, Jupyter notebooks come in very handy. While many people run code and interactively examine results in one single notebook, we can also use it to only summarize results without actually running any heavy-duty code (which is done separately and more conveniently with .py scripts).

Assuming we have a bunch of tables and plots ready, they can be easily loaded and displayed nicely with the IPython package (following the examples here and here).

from IPython.display import display, Image, HTML

# Show a table stored as a dataframe
display(HTML(df.to_html()))

# Show a plot
display(Image(filename='some_plot.png'))

Finally, we make use of the nbconvert package to convert the Jupyter notebook into an HTML report (or other output formats). The --execute option asks the package to execute all code before generating the output file, while the --no-input option allows hiding the input code cells.

python3 -m nbconvert input_notebook.ipynb \
    --to html --output output_report.html \
    --execute --no-input

Of course, the same procedure can be done in R with .Rmd and in Julia with .jmd files.