When running data analysis scripts using command lines, I used to hardcode certain arguments, which is quite clunky and error-prone. Recently, after doing more hands-on data analysis, I have become more comfortable with command lines, especially utilizing command line arguments, which turns out to be very powerful and helpful.
To use a very simple example, suppose I want to print a series of numbers, but the series is only specified when running the script. If I want to print 1 to 10, I would write a script print_1.jl
like this:
for i in 1:1:10
println(i)
end
Then, I would run it using the following command line:
$julia print_1.jl
For another task that prints out 5 to 15, I would need to either modify the script above, or duplicate it and change 1:1:10
to 5:1:15
, and then save it as another script print_2.jl
.
Such modification may work fine for a simple task like printing numbers, but things will get complicated for larger and more complex scripts, e.g. when training a machine learning model which requires multiple inputs such as model files, data files and hyperparameters. In this case, hand-copying and modifying the scripts will quickly become infeasible and will likely lead to manual errors.
Luckily, there is a package ArgParse.jl
(see here) which helps with parsing arguments from the command line.
For the task of printing numbers, we can specify arguments such as the starting value (--start
), the ending value (--ending
) and the step value (--step
) as inputs from the command line when running the script.
A sample script print_numbers.jl
is given below, which is largely adapted from the examples on the documentation site of ArgParse.jl
.
using Pkg
Pkg.activate(".")
using ArgParse
function parse_commandline()
s = ArgParseSettings()
@add_arg_table s begin
"--start"
help = "starting value of array"
arg_type = Int
default = 1
"--ending"
help = "ending value of array"
arg_type = Int
default = 10
"--step"
help = "step value of array"
arg_type = Int
default = 1
end
return parse_args(s)
end
function main()
@show parsed_args = parse_commandline()
start = parsed_args["start"]
step = parsed_args["step"]
ending = parsed_args["ending"]
for i in start:step:ending
println(i)
end
end
main()
Now things become much more convenient, since I only need this one single script for all my printing tasks.
If I want to print 1, 2, ..., 9:
$julia print_numbers.jl --start 1 --ending 9 --step 1
If I want to print 2, 4, 6, 8, 10:
$julia print_numbers.jl --start 2 --ending 10 --step 2
Since I have set some default arguments, the following will print out 1, 2, ..., 10:
$julia print_numbers.jl
Finally and obviously, argument parsing can be done in other popular languages for statistical analysis, e.g. Click
in Python (see here) and argparse
in R (see here).