Overview
This series of posts are study notes documenting my progress through the book "Unerstanding Deep Learning".
This post covers Chapter 20, Why does deep learning work?
1. Likelihood and Invertibility
Chapter 16 focused on Normalizing Flows, emphasizing the concept of transforming probability distributions in an invertible manner.
Unlike the previous chapter on VAE, where distributions were modeled through approximation, Flows allow for the computation of an exact likelihood, which was particularly impressive.
Specifically, complex data distributions such as Pr(\boldsymbol{x} | \boldsymbol{\phi}) can be modeled directly from a simple base distribution p(\boldsymbol{z}) through a transformation f(\boldsymbol{z}, \boldsymbol{\phi}).
However, every layer must be invertible, and the Jacobian determinant must be computed efficiently — a nontrivial constraint.
This structural restriction, however, gives Flow models a sense of mathematical completeness and stability that is often lacking in other generative models.
2. Residual Flows and Contraction Mapping
In the section on Residual Flows, the application of the Banach Fixed Point Theorem to guarantee convergence during training was particularly intriguing.
While most neural networks rely on empirical stability, here the convergence condition is expressed mathematically through the Lipschitz constant (< 1), which ensures stability.
In the equation
y = z + f[z],
if f is a contraction mapping, repeated application will always converge to a single fixed point.
This provides strong intuition for why Residual Flows can maintain high representational power without diverging — a balance between expressivity and stability.
3. GLOW and Image Synthesis
The section on GLOW demonstrated the practical power of Flow-based models.
While GLOW resembles GANs in that it can generate realistic images, its strength lies in being a probabilistically interpretable generative model.
It processes 256×256×3 image tensors using Coupling Layers and 1×1 Convolutions, progressively reducing resolution through a multi-scale architecture.
In the latent space, interpolation between two encoded faces results in smooth, natural transitions — a clear display of the model’s invertibility.
However, GLOW-generated samples are slightly lower in perceptual quality compared to GANs.
This trade-off seems reasonable given the structural constraints imposed by invertibility, which prioritize mathematical precision over visual sharpness.
4. The Fundamental Significance of Flow
Ultimately, Flow represents an attempt to model data in a mathematically exact manner.
It performs both probabilistic computation and sample generation simultaneously, aiming to overcome the instability of GANs and the approximation limits of VAEs.
Whereas other generative models rely on heuristics to produce “good” samples,
Normalizing Flows generate data through explicit probability computation
pursuing a direction closer to mathematically proving what “good” means rather than intuitively approximating it.
Reference
[1] Prince, S. J. D. (2023). Understanding Deep Learning. The MIT Press. Retrieved from http://udlbook.com