When running data analysis scripts using command lines, I used to hardcode certain arguments, which is quite clunky and error-prone. Recently, after doing more hands-on data analysis, I have become more comfortable with command lines, especially utilizing command line arguments, which turns out to be very powerful and helpful.
To use a very simple example, suppose I want to print a series of numbers, but the series is only specified when running the script. If I want to print 1 to 10, I would write a script
print_1.jl like this:
for i in 1:1:10 println(i) end
Then, I would run it using the following command line:
For another task that prints out 5 to 15, I would need to either modify the script above, or duplicate it and change
5:1:15, and then save it as another script
Such modification may work fine for a simple task like printing numbers, but things will get complicated for larger and more complex scripts, e.g. when training a machine learning model which requires multiple inputs such as model files, data files and hyperparameters. In this case, hand-copying and modifying the scripts will quickly become infeasible and will likely lead to manual errors.
Luckily, there is a package
ArgParse.jl (see here) which helps with parsing arguments from the command line.
For the task of printing numbers, we can specify arguments such as the starting value (
--start), the ending value (
--ending) and the step value (
--step) as inputs from the command line when running the script.
A sample script
print_numbers.jl is given below, which is largely adapted from the examples on the documentation site of
using Pkg Pkg.activate(".") using ArgParse function parse_commandline() s = ArgParseSettings() s begin "--start" help = "starting value of array" arg_type = Int default = 1 "--ending" help = "ending value of array" arg_type = Int default = 10 "--step" help = "step value of array" arg_type = Int default = 1 end return parse_args(s) end function main() parsed_args = parse_commandline() start = parsed_args["start"] step = parsed_args["step"] ending = parsed_args["ending"] for i in start:step:ending println(i) end end main()
Now things become much more convenient, since I only need this one single script for all my printing tasks.
If I want to print 1, 2, ..., 9:
$julia print_numbers.jl --start 1 --ending 9 --step 1
If I want to print 2, 4, 6, 8, 10:
$julia print_numbers.jl --start 2 --ending 10 --step 2
Since I have set some default arguments, the following will print out 1, 2, ..., 10: