From foundational models to GenAI systems
June 20, 2025
\[ \begin{align} p(x, y) & = p(x|y)p(y) \newline & = p(y|x)p(x) \end{align} \]
This is in fact Bayes’ rule written differently
\[ p(x) = \sum_y p(x, y)\]
Stochastig process is memoryless, independent of its history:
\[ \begin{align} & P(\frac{\text{coding in Python is fun}}{\text{coding in Python is}}) \\ & \approx P(\frac{\text{Python is fun}}{\text{Python is}}) \end{align} \]
\[ p(\mathbf{x}) = p(x_1) \prod_{d=3}^{D} p(x_d | x_{d-1}, x_{d-2}) \]
\[ p(\mathbf{x}) = p(x_1) \prod_{d=3}^{D} p(x_d | RNN(x_{d-1}, h_{d-1})) \]
Use-case: text classification
Use-case: next token prediction