Macro in `StatsModels.jl`

for Model MatricesA noticeable shortcoming of the LRMoE.jl package, as of version 0.3.2, is the lack of support for a `formula`

interface.

For example, when fitting (G)LM in R, one can simply call the following without worrying about the data types of the columns. This can be particularly handy when `x2`

is categorical (e.g. with levels `A`

, `B`

and `C`

).

`lm_r = lm(y ~ x1 + x2, data = df)`

This is the same case with `GLM.jl`

in Julia (see here).

`lm_julia = GLM.lm(@formula(y ~ x1 + x2), df)`

Whenever I wanted to fit an LRMoE on the same `df`

, I used to manually code up a feature engineering function to convert the categorical variable `x2`

into dummy variables (assuming `A`

as the reference level for `x2`

).

```
function feature_engineering(df)
features = fill(1, nrow(df))
features.x1 = df.x1
features.x2_B = Int.(features.x2 .== "B")
features.x2_C = Int.(features.x2 .== "C")
return features
end
```

This can be quite tedious and error-prone. Fortunately, the `StatsModels.jl`

package provides a `@formula`

macro that can be used to specify the model matrix. This can be quite handy when combined with `CategoricalArrays.jl`

and `DataFrames.jl`

packages.

```
# If x2 is already stored as a CategoricalArray, that would be perfect
# df_copy = copy(df)
# Otherwise, a bit of copying is needed since DataFrames are immutable
df_copy = df[!, :x1]
df_copy.x2 = CategoricalArray(df.x2; levels=["A", "B", "C"], ordered=true)
df_copy.y = df.y
```

Next, the `@formula`

macro can be called to generate the model matrix (see here for more details).

```
# set up a formula
fml = @formula(y ~ x1 + x2)
df_fml_schema = StatsModels.apply_schema(
fml,
StatsModels.schema(fml, df_copy)
)
# get y and X
y, X = StatsModels.modelcols(df_fml_schema, df_copy)
# convert y to a matrix, which is needed for LRMoE
y = reshape(y, length(y), 1)
# keep track of the column names
y_col, X_col = StatsModels.coefnames(df_fml_schema)
```

Now, the `y`

and `X`

matrices can be directly used to fit an LRMoE model.

