Your standard algorithm for tabular data?
September 10, 2024
This lecture is based on the following open access materials:
Source code: https://github.com/anthology-of-data-science/lecture-gam-ebm
Daniel Kapitan, Generalized Additive Models and Explainable Boosting Machines.
This work is licensed under CC BY-SA 4.0
\[ y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} \]
\[\begin{align} p(x)& = \frac{e^{y_{i}}}{1 + e^{y_{i}}}\\ \log \left( {\frac{p(x)}{1 - p(x)}} \right) & = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} \end{align}\]
\[
y_{i} = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i}^{2} + \beta_{3} x_{i}^{3} + \ldots +\epsilon_{i}
\]
\[\begin{align} p(x)& = \frac{e^{y_{i}}}{1 + e^{y_{i}}}\\ \log\left({\frac{p(x)}{1 - p(x)}}\right)& = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i}^{2} + \beta_{3} x_{i}^{3} + \ldots +\epsilon_{i} \end{align}\]
Wage
dataPiecewise cubic: 2 x 4 coefficients
–> 8 degrees of freedom (DoF)
Continuous cubic (no gaps): one extra constraint –> 7 DoF
Cubic spline: require 1st and 2nd derivative to be continuous –> two extra constraints –> 5 DoF
\[ {\color{green}\sum^{n}_{i=1} {\left(y_{i} - g(x_{i}) \right)}^{2}} + {\color{#ff4f5e} \lambda \int g''(t)^{2}dt} \]
Same principle as Lasso and Ridge regression: \({\color{green} loss} + {\color{#ff4f5e} penalty}\)
\[ y_{i} = \beta_{0} + \beta_{1} x_{i1} + \beta_{2} x_{i2} + \ldots + \beta_{p} x_{ip} ++ \epsilon_{i} \]
\[\begin{align} y_{i} &= \beta_{0} + f_{1}(x_{i1}) + f_{2}(x_{i2}) + \ldots + f_{p}(x_{ip}) + \epsilon_{i} \\ y_{i} &= \beta_{0} + \sum^{p}_{j=1} f(x_{ij}) + \epsilon_{i} \end{align}\]
Wage
data\(wage = \beta_0 + f_1(year) + f_2(age) + f_3(education)\)
\(f_1\): four degrees of freedom, \(f_2\): five degrees of freedom
Wage
data\(wage = \beta_0 + f_1(year) + f_2(age) + f_3(education)\)
\(f_1\): four degrees of freedom, \(f_2\): five degrees of freedom
Wage
data\(\log\left({{p(x)}/{1 - p(x)}}\right) = \beta_0 + beta_1 \times year + f_2(age) + f_3(education)\)
\(f_2\): five degrees of freedom
\(\color{green}\bigtriangleup\) You can fit a non-linear \(f_j\) to each \(X_j\), so we can automatically model such relationships (no need for manual transformation)
\(\color{green}\bigtriangleup\) Using non-linear functions potentially results in more accurate predictions
\(\color{green}\bigtriangleup\) Because model is additive, you can examine effect of each feature \(X_j\) on response \(Y\) individually
\(\color{green}\bigtriangleup\) Smoothness of functions can be summarized via degrees of freedom
\(\color{orange}\bigtriangledown\) Additive model may be too restrictive, doesn’t include interactions
\(\color{orange}\bigtriangledown\) Can be computationally expensive for many features
\[ g(E[y]) = \beta_0 + {\color{#00458b} \sum f_i(x_{i})} + {\color{#6e008b} \sum f_{ij}(x_{ij})} \]
\(g(E[y]):\)
link function, identity for regression, logit for logistic regression
\({\color{#00458b} \sum f_i(x_{i})}:\)
GAM, but now using shallow trees as basis function
\({\color{#6e008b} \sum f_{ij}(x_{i})}:\)
pairwise interactions