Energy-based Conditional Diffusion Models

An in-depth exploration of Energy-based Conditional Diffusion Models (EBCDMs), which leverage conditional energy functions to generate samples conditioned on specific inputs through a diffusion process.
Author: Bahman Moraffah
Estimated Reading Time: 10 min
Published: 2021

Background on Energy-based Models (EBMs)

Energy-based Models (EBMs) are a class of probabilistic models where the probability of a configuration (e.g., a data point) is defined in terms of an energy function. The energy function assigns lower energies to more probable or desirable configurations. Mathematically, the probability distribution over data \( x \) is given by the Boltzmann distribution:

\(p_{\theta}(x) = \frac{\exp(-E_{\theta}(x))}{Z_{\theta}},\)

where \( E_{\theta}(x) \) is the energy function parameterized by \( \theta \), and \( Z_{\theta} = \int \exp(-E_{\theta}(x)) dx \) is the partition function ensuring normalization.

Energy-based Conditional Diffusion Models (EBCDMs)

The key idea of Energy-based Conditional Diffusion Models (EBCDMs) is to define a conditional energy function \( E_{\theta}(x, y) \) that measures the compatibility between a data sample \( x \) and a condition \( y \). The conditional probability distribution is then defined using the Boltzmann distribution:

\(p_{\theta}(x | y) = \frac{\exp(-E_{\theta}(x, y))}{Z_{\theta}(y)},\)

where \( Z_{\theta}(y) = \int \exp(-E_{\theta}(x, y)) dx \) is the partition function, which is generally intractable to compute exactly. In EBCDMs, the energy function is parameterized using a neural network, and the model is trained by optimizing a loss function that encourages the energy to be low for pairs of samples and conditions that are likely to occur together and high otherwise.

Loss Function

The training of EBCDMs involves minimizing a loss function that is typically based on contrastive divergence or noise-contrastive estimation. One common choice is the score matching loss, which for conditional models can be formulated as:

\(\mathcal{L}(\theta) = \mathbb{E}_{x, y \sim p_{\text{data}}(x, y)} \left[ \frac{1}{2} \left\| \nabla_x E_{\theta}(x, y) + \nabla_x \log p_{\text{data}}(x | y) \right\|^2 \right] \)

where \( p_{\text{data}}(x, y) \) is the joint distribution of data samples and conditions, and \( p_{\text{data}}(x | y) \) is the true conditional distribution of data given the condition.

Conditional Generation with EBCDMs

The following algorithm describes the conditional generation process using EBCDMs:


Training:
1. Train the energy-based model by optimizing the loss function \(\mathcal{L}(\theta)\) using gradient descent.
   This step requires samples from the data distribution and their corresponding conditions.

Sampling:
2. To generate samples conditioned on a specific condition \(y\), use a diffusion process that starts from noise and 
   gradually denoises the samples guided by the gradients of the energy function:
   a. Initialize \(x_T\) with random noise.
   b. For each step \(t = T, T-1, \ldots, 1\):
      i.   Update \(x_{t-1}\) using a Langevin dynamics step:
           $$ x_{t-1} = x_t - \frac{\eta_t}{2} \nabla_x E_{\theta}(x_t, y) + \sqrt{\eta_t} \epsilon_t, $$
           where \(\eta_t\) is the step size, and \(\epsilon_t \sim \mathcal{N}(0, I)\) is Gaussian noise.

Output:
3. The final sample \(x_0\) is a generated sample that is conditioned on the specified condition \(y\).

References

[1] LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A Tutorial on Energy-Based Learning.