Your standard algorithm for tabular data?
September 10, 2024
This lecture is based on the following open access materials:
Source code: https://github.com/anthology-of-data-science/lecture-gam-ebm
Daniel Kapitan, Generalized Additive Models and Explainable Boosting Machines.
This work is licensed under CC BY-SA 4.0
\[ y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} \]
\[\begin{align} p(x)& = \frac{e^{y_{i}}}{1 + e^{y_{i}}}\\ \log \left( {\frac{p(x)}{1 - p(x)}} \right) & = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} \end{align}\]
\[
y_{i} = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i}^{2} + \beta_{3} x_{i}^{3} + \ldots +\epsilon_{i}
\]
\[\begin{align} p(x)& = \frac{e^{y_{i}}}{1 + e^{y_{i}}}\\ \log\left({\frac{p(x)}{1 - p(x)}}\right)& = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i}^{2} + \beta_{3} x_{i}^{3} + \ldots +\epsilon_{i} \end{align}\]
Wage
dataPiecewise cubic: 2 x 4 coefficients
–> 8 degrees of freedom (DoF)
Continuous cubic (no gaps): one extra constraint –> 7 DoF
Cubic spline: require 1st and 2nd derivative to be continuous –> two extra constraints –> 5 DoF
\[ {\color{green}\sum^{n}_{i=1} {\left(y_{i} - g(x_{i}) \right)}^{2}} + {\color{#ff4f5e} \lambda \int g''(t)^{2}dt} \]
Same principle as Lasso and Ridge regression: \({\color{green} loss} + {\color{#ff4f5e} penalty}\)
\[ y_{i} = \beta_{0} + \beta_{1} x_{i1} + \beta_{2} x_{i2} + \ldots + \beta_{p} x_{ip} ++ \epsilon_{i} \]
\[\begin{align} y_{i} &= \beta_{0} + f_{1}(x_{i1}) + f_{2}(x_{i2}) + \ldots + f_{p}(x_{ip}) + \epsilon_{i} \\ y_{i} &= \beta_{0} + \sum^{p}_{j=1} f(x_{ij}) + \epsilon_{i} \end{align}\]
Wage
data\(wage = \beta_0 + f_1(year) + f_2(age) + f_3(education)\)
\(f_1\): four degrees of freedom, \(f_2\): five degrees of freedom
Wage
data\(wage = \beta_0 + f_1(year) + f_2(age) + f_3(education)\)
\(f_1\): four degrees of freedom, \(f_2\): five degrees of freedom
Wage
data\(\log\left({{p(x)}/{1 - p(x)}}\right) = \beta_0 + beta_1 \times year + f_2(age) + f_3(education)\)
\(f_2\): five degrees of freedom
\(\color{green}\bigtriangleup\) You can fit a non-linear \(f_j\) to each \(X_j\), so we can automatically model such relationships (no need for manual transformation)
\(\color{green}\bigtriangleup\) Using non-linear functions potentially results in more accurate predictions
\(\color{green}\bigtriangleup\) Because model is additive, you can examine effect of each feature \(X_j\) on response \(Y\) individually
\(\color{green}\bigtriangleup\) Smoothness of functions can be summarized via degrees of freedom
\(\color{orange}\bigtriangledown\) Additive model may be too restrictive, doesn’t include interactions
\(\color{orange}\bigtriangledown\) Can be computationally expensive for many features
Source: Python Geeks
\[ g(E[y]) = \beta_0 + {\color{#00458b} \sum f_i(x_{i})} + {\color{#6e008b} \sum f_{ij}(x_{ij})} \]
\(g(E[y]):\)
link function, identity for regression, logit for logistic regression
\({\color{#00458b} \sum f_i(x_{i})}:\)
GAM, but now using shallow trees as basis function
\({\color{#6e008b} \sum f_{ij}(x_{i})}:\)
pairwise interactions
Searching cuts on input space of \(x_i\) and \(x_j\). On the left we show a heat map on the target for different values of \(x_i\) and \(x_j\). \(c_i\) and \(c_j\) are cuts for \(x_i\) and \(x_j\), respectively. On the right we show an extremely simple predictor of modeling pairwise interaction.
Test set AUCs (%) across ten datasets average over five runs. Best number in each row is in bold.
The dataset contains 14,199 cases of pneumonia collected from 78 hospitals between July 1987 and December 1988.
EBM shape function of “heart rate” for predicting pneumonia mortality risk. Left: missing values result in unrealistic high risk score. Right: corrected risk score.
Left: confounder of retirement at age 67, resulting in sharp increase of risk. Social effect of doctors trying harder to cure centenarians results in lower risk. Right: patients who have a history of asthma have lower pneumonia mortality risk than general population, since they admitted directly into ICU and get more aggressive care, thereby lowering their risk of death.
Left: patients get treated when blood urea nitrogent reaches ~50. When BUN goes over 100, dialysis is given. Right: patients in ICU get treated at systolic blood pressures (SBP) of 175, 200 and 255.
Left: possible improvement by moving dialysis treatment to 80. Rightpatients get treated when blood urea nitrogent reaches ~50. When BUN goes over 100, dialysis is given. Right: adjust “inappropriate” treatment thresholds with flattend red lines.