Spark's Work

Parsing Command Line Arguments in Julia

When running data analysis scripts using command lines, I used to hardcode certain arguments, which is quite clunky and error-prone. Recently, after doing more hands-on data analysis, I have become more comfortable with command lines, especially utilizing command line arguments, which turns out to be very powerful and helpful.

To use a very simple example, suppose I want to print a series of numbers, but the series is only specified when running the script. If I want to print 1 to 10, I would write a script print_1.jl like this:

for i in 1:1:10
    println(i)
end

Then, I would run it using the following command line:

$julia print_1.jl

For another task that prints out 5 to 15, I would need to either modify the script above, or duplicate it and change 1:1:10 to 5:1:15, and then save it as another script print_2.jl.

Such modification may work fine for a simple task like printing numbers, but things will get complicated for larger and more complex scripts, e.g. when training a machine learning model which requires multiple inputs such as model files, data files and hyperparameters. In this case, hand-copying and modifying the scripts will quickly become infeasible and will likely lead to manual errors.

Luckily, there is a package ArgParse.jl (see here) which helps with parsing arguments from the command line.

For the task of printing numbers, we can specify arguments such as the starting value (--start), the ending value (--ending) and the step value (--step) as inputs from the command line when running the script.

A sample script print_numbers.jl is given below, which is largely adapted from the examples on the documentation site of ArgParse.jl.

using Pkg
Pkg.activate(".")
using ArgParse

function parse_commandline()
    s = ArgParseSettings()
    @add_arg_table s begin
        "--start"
            help = "starting value of array"
            arg_type = Int
            default = 1
        "--ending"
            help = "ending value of array"
            arg_type = Int
            default = 10
        "--step"
            help = "step value of array"
            arg_type = Int
            default = 1
    end
    return parse_args(s)
end

function main()
    @show parsed_args = parse_commandline()

    start = parsed_args["start"]
    step = parsed_args["step"]
    ending = parsed_args["ending"]
    
    for i in start:step:ending
        println(i)
    end
end

main()

Now things become much more convenient, since I only need this one single script for all my printing tasks.

Finally and obviously, argument parsing can be done in other popular languages for statistical analysis, e.g. Click in Python (see here) and argparse in R (see here).