Spark's Work

Julia Performance Tip: Avoid Any Type in a Customized Struct

Recently, I was trying to optimize some Julia code I wrote which consumed a lot of memory and was quite slow. Following the performance tips in the official manual (here), in particular, tips on Type Declarations and Type Stability (here), I was able to pinpoint one of the bottlenecks in terms of memory consumption: the usage of Any type in a customized struct.

My original code can be essentially simplified as follows.

struct MyBadType
    a::Array{Any}
end

When I create a new instance of MyBadType using a decently large vector, the memory consumption looks like this.

my_bad_instance = MyBadType(collect(1:100_000))
# MyBadType(Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  99991, 99992, 99993, 99994, 99995, 99996, 99997, 99998, 99999, 100000])

varinfo(r"my_bad_instance")

#   name                 size summary  
#   ––––––––––––––– ––––––––– –––––––––
#   my_bad_instance 1.526 MiB MyBadType

Now, a way to fix the memory consumption is to specify exactly what type the vector should be, which shrinks the memory to roughly one half of the original in this particular case.

Note there is an implicit type conversion here: the vector collect(1:100_000) is converted from Vector{Int64} to Vector{Float64} when it is passed to the struct.

struct MyGoodType
    a::Array{Float64}
end
my_good_instance = MyGoodType(collect(1:100_000))
# MyGoodType([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 … 99991.0, 99992.0, 99993.0, 99994.0, 99995.0, 99996.0, 99997.0, 99998.0, 99999.0, 100000.0])

varinfo(r"my_good_instance")

#   name                    size summary   
#   –––––––––––––––– ––––––––––– ––––––––––
#   my_good_instance 781.297 KiB MyGoodType

An even better approach is to pass the type as a parameter to the struct, which allows the user to specify the type of the vector when creating a new instance.

struct MyBetterType{T}
    a::Array{T}
end
my_better_instance = MyBetterType(collect(1:100_000))
# MyBetterType{Int64}([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 99991, 99992, 99993, 99994, 99995, 99996, 99997, 99998, 99999, 100000])

varinfo(r"my_better_instance")
#   name                      size summary            
#   –––––––––––––––––– ––––––––––– –––––––––––––––––––
#   my_better_instance 781.297 KiB MyBetterType{Int64}

Notice now I can actually pass different types of numbers (Int64, Float64, etc.) without changing the struct definition.

my_better_instance_float = MyBetterType(rand(100_000))
# MyBetterType{Float64}([0.3799954925351946, 0.09394504229817657, 0.9777517079162116, 0.016242447464765775, 0.990499183701726, 0.4424052738990033, 0.7675470847284869, 0.13617624001850415, 0.2265064097636631, 0.8815719623179058 … 0.19912083332829478, 0.23186628406715581, 0.23339379488864898, 0.3912429397894809, 0.39492205317362883, 0.5167797738208597, 0.389639621406601, 0.8499027356361832, 0.36235758448613853, 0.8784435419352293])

varinfo(r"my_better_instance_float")
#   name                            size summary              
#   –––––––––––––––––––––––– ––––––––––– –––––––––––––––––––––
#   my_better_instance_float 781.297 KiB MyBetterType{Float64}