`@formula`

Macro in `StatsModels.jl`

for Model MatricesA noticeable shortcoming of the LRMoE.jl package, as of version 0.3.2, is the lack of support for a `formula`

interface.

For example, when fitting (G)LM in R, one can simply call the following without worrying about the data types of the columns. This can be particularly handy when `x2`

is categorical (e.g. with levels `A`

, `B`

and `C`

).

`lm_r = lm(y ~ x1 + x2, data = df)`

This is the same case with `GLM.jl`

in Julia (see here).

`lm_julia = GLM.lm(@formula(y ~ x1 + x2), df)`

Whenever I wanted to fit an LRMoE on the same `df`

, I used to manually code up a feature engineering function to convert the categorical variable `x2`

into dummy variables (assuming `A`

as the reference level for `x2`

).

```
function feature_engineering(df)
features = fill(1, nrow(df))
features.x1 = df.x1
features.x2_B = Int.(features.x2 .== "B")
features.x2_C = Int.(features.x2 .== "C")
return features
end
```

This can be quite tedious and error-prone. Fortunately, the `StatsModels.jl`

package provides a `@formula`

macro that can be used to specify the model matrix. This can be quite handy when combined with `CategoricalArrays.jl`

and `DataFrames.jl`

packages.

```
# If x2 is already stored as a CategoricalArray, that would be perfect
# df_copy = copy(df)
# Otherwise, a bit of copying is needed since DataFrames are immutable
df_copy = df[!, :x1]
df_copy.x2 = CategoricalArray(df.x2; levels=["A", "B", "C"], ordered=true)
df_copy.y = df.y
```

Next, the `@formula`

macro can be called to generate the model matrix (see here for more details).

```
# set up a formula
fml = @formula(y ~ x1 + x2)
df_fml_schema = StatsModels.apply_schema(
fml,
StatsModels.schema(fml, df_copy)
)
# get y and X
y, X = StatsModels.modelcols(df_fml_schema, df_copy)
# convert y to a matrix, which is needed for LRMoE
y = reshape(y, length(y), 1)
# keep track of the column names
y_col, X_col = StatsModels.coefnames(df_fml_schema)
```

Now, the `y`

and `X`

matrices can be directly used to fit an LRMoE model.

© Spark Tseung 2020-2023. Last modified: March 12, 2023.

Website built with Franklin.jl and the Julia programming language,

plus some help from LeXtudio.

Website built with Franklin.jl and the Julia programming language,

plus some help from LeXtudio.