View on GitHub

LRMoE Demo

A quick demo of the LRMoE Package in R

Real Data: Parallel Fitting

Author: Spark Tseung

Last Modified: Sept 15, 2020

Introduction

In this document, we focus on parallelly fitting LRMoE models.

Running LRMoEFit in Parallel

Fitting LRMoE to a large dataset may be quite computationally intensive. For example, it takes around 5 hours to fit a 4-component LRMoE on the French auto insurance dataset.

In such case, parallel computing with the R package doParallel can help significantly shorten the total runtime. The following shows how to structure a code file for running LRMoE in parallel. Note that X and Y are matrix data files saved in here. The lists model.list and function do_fitting will be discussed in separate sections.

library(doParallel)
library(LRMoE)

# Load data
load("X.Rda")
load("Y.Rda")
load("model_list.Rda")

# Specify how many models to fit in parallel
ncore = 5 # Depends on computation resource
n.run = length(model.list)

# Make computing clusters: standard procedure
cl = makePSOCKcluster(ncore)
registerDoParallel(cl)

# Call fitting functions in parallel
ecm.table = foreach(b=1:n.run) %dopar% {do_fitting(Y, X, b, model.list[[b]])}

# Stop computing clusters: standard procedure
stopCluster(cl)

Generating model.list

The model.list contains a list of models to run, where each model is structured as a list of model specification and parameter initialization. For example, the following shows the LRMoE model with three zero-inflated lognormal component distributions, where the initialization values are returned by the CMMInit function.

model.list = NULL

# Initialize model "lll" and append it to the "model.list" to run

n.comp = 3
dim.m = 1
n.covar = 30

model.name = "lll"

comp.dist = matrix( c("ZI-lnorm", "ZI-lnorm", "ZI-lnorm"),
                    nrow = dim.m, byrow = TRUE)

alpha.init = matrix(0, nrow = n.comp, ncol = n.covar)
alpha.init[,1] = c(log(0.35), log(0.29), log(0.36)) - log(0.36)

zero.init = matrix( c(0.96, 0.96, 0.96),
                    nrow = dim.m, byrow = TRUE)

params.init = list( list( c(6.34, 1.41), c(6.43, 1.43), c(6.53, 1.38) )
                  )

hyper.alpha = 5

hyper.params = list(
                    list( c(5, 5, 5, 5, 5, 200),
                          c(5, 5, 5, 5, 5, 200),
                          c(5, 5, 5, 5, 5, 200) )
                    )

model.list[[length(model.list)+1]] = list(model.name = model.name,
      n.comp = n.comp, comp.dist = comp.dist,
      alpha.init = alpha.init, 
      zero.init = zero.init, params.init = params.init,
      hyper.alpha = hyper.alpha, hyper.params = hyper.params)

Structuring do_fitting

The do_fitting below will call the LRMoEFit function for fitting an LRMoE model. The fitted model will be saved as a .Rda file, and the intermediate update of parameter values are output into a .txt file.

It is optional to use the mailR package to get email updates of the running status. The mail code may be inserted in the do_fitting function.

do_fitting = function(Y, X, b, model)
{
  # Output file names: may be modified
  model.name = toString(paste(model$model.name, sep=""))
  rda.name = toString(paste(model.name, ".Rda", sep=""))
  output.name = toString(paste(model.name, ".txt", sep=""))

  # Open new file to save intermediate update of parameter values
  sink(file = output.name, append = FALSE, type = c("output", "message"), split = FALSE)

  # Call fitting function
  tryCatch({model.fit = LRMoE::LRMoEFit(Y = Y, X = X, n.comp = model$n.comp,
              comp.dist = model$comp.dist,
              alpha.init = model$alpha.init, 
              zero.init = model$zero.init, params.init = model$params.init,
              penalty = TRUE,
              hyper.alpha = model$hyper.alpha, hyper.params = model$hyper.params,
              eps = 0.05, ecm.iter.max = 500,
              print = TRUE)
             save(model.fit, file = rda.name)
            },
           error=function(e){"Error!"; print("Error!")}
           )
  
  # Save intermediate update of parameter values
  sink()
  
  # Optional: use mailR to get running status. Code is omitted.

}