您的位置:首页 > 汽车 > 新车 > 尚学教育_东莞企业邮箱_营销活动推广策划_网店代运营骗局

尚学教育_东莞企业邮箱_营销活动推广策划_网店代运营骗局

2025/5/1 10:23:02 来源:https://blog.csdn.net/pylittlebrat/article/details/147627907  浏览:    关键词:尚学教育_东莞企业邮箱_营销活动推广策划_网店代运营骗局
尚学教育_东莞企业邮箱_营销活动推广策划_网店代运营骗局

扩散模型(Diffusion Models)学习笔记

一、概率建模与数学推导

1. 前向扩散过程

马尔可夫链假设
扩散模型假设数据 x 0 x_0 x0通过逐步添加高斯噪声的马尔可夫链过程被破坏为纯噪声 x T x_T xT,过程定义:
q ( x 1 : T ∣ x 0 ) = ∏ t = 1 T q ( x t ∣ x t − 1 ) q(x_{1:T}|x_0) = \prod_{t=1}^T q(x_t|x_{t-1}) q(x1:Tx0)=t=1Tq(xtxt1)
其中单步转移概率为:
q ( x t ∣ x t − 1 ) = N ( x t ; 1 − β t x t − 1 , β t I ) q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I) q(xtxt1)=N(xt;1βt xt1,βtI)

累积噪声系数(关键推导)
引入 α t = 1 − β t \alpha_t = 1 - \beta_t αt=1βt α ˉ t = ∏ s = 1 t α s \bar{\alpha}_t = \prod_{s=1}^t \alpha_s αˉt=s=1tαs,任意时刻 x t x_t xt可表示为:
x t = α ˉ t x 0 + 1 − α ˉ t ϵ , ϵ ∼ N ( 0 , I ) x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon,\ \ \epsilon \sim \mathcal{N}(0,I) xt=αˉt x0+1αˉt ϵ,  ϵN(0,I)
推导过程:

  1. 递归展开(定义: α t = 1 − β t \alpha_t = 1 - \beta_t αt=1βt), 从初始数据 x 0 x_0 x0开始,逐步展开前向过程:
    x 1 = α 1 x 0 + β 1 ϵ 0 , x 2 = α 2 x 1 + β 2 ϵ 1 = α 2 α 1 x 0 + α 2 β 1 ϵ 0 + β 2 ϵ 1 , x 3 = α 3 x 2 + β 3 ϵ 2 = α 3 α 2 α 1 x 0 + α 3 α 2 β 1 ϵ 0 + α 3 β 2 ϵ 1 + β 3 ϵ 2 . \begin{aligned} x_1 &= \sqrt{\alpha_1} x_0 + \sqrt{\beta_1} \epsilon_0, \\ x_2 &= \sqrt{\alpha_2} x_1 + \sqrt{\beta_2} \epsilon_1 \\ &= \sqrt{\alpha_2 \alpha_1} x_0 + \sqrt{\alpha_2 \beta_1} \epsilon_0 + \sqrt{\beta_2} \epsilon_1, \\ x_3 &= \sqrt{\alpha_3} x_2 + \sqrt{\beta_3} \epsilon_2 \\ &= \sqrt{\alpha_3 \alpha_2 \alpha_1} x_0 + \sqrt{\alpha_3 \alpha_2 \beta_1} \epsilon_0 + \sqrt{\alpha_3 \beta_2} \epsilon_1 + \sqrt{\beta_3} \epsilon_2. \end{aligned} x1x2x3=α1 x0+β1 ϵ0,=α2 x1+β2 ϵ1=α2α1 x0+α2β1 ϵ0+β2 ϵ1,=α3 x2+β3 ϵ2=α3α2α1 x0+α3α2β1 ϵ0+α3β2 ϵ1+β3 ϵ2.

  2. 归纳可得任意时刻t的表达式:
    x t = ∏ s = 1 t α s x 0 + ∑ k = 0 t − 1 β t − k ∏ m = 1 k α t − m + 1 ϵ k . x_t = \sqrt{\prod_{s=1}^t \alpha_s} x_0 + \sum_{k=0}^{t-1} \sqrt{\beta_{t-k} \prod_{m=1}^k \alpha_{t-m+1}} \epsilon_k. xt=s=1tαs x0+k=0t1βtkm=1kαtm+1 ϵk.

  3. 利用独立高斯变量相加性质,合并方差项

ϵ 1 , . . . , ϵ t ∼ N ( 0 , I ) \epsilon_1,...,\epsilon_t \sim \mathcal{N}(0,I) ϵ1,...,ϵtN(0,I)且相互独立,则:

∑ k = 1 t a k ϵ k ∼ N ( 0 , ( ∑ k = 1 t a k 2 ) I ) \sum_{k=1}^t a_k\epsilon_k \sim \mathcal{N}\left(0, \left(\sum_{k=1}^t a_k^2\right)I \right) k=1takϵkN(0,(k=1tak2)I)

  1. 应用该性质可得:
    ∑ k = 0 t − 1 β t − k ∏ m = 1 k α t − m + 1 ϵ k ∼ N ( 0 , [ 1 − ∏ s = 1 t α s ] I ) \sum_{k=0}^{t-1} \sqrt{\beta_{t-k}\prod_{m=1}^{k}\alpha_{t-m+1}}\epsilon_k \sim \mathcal{N}\left(0, \left[1 - \prod_{s=1}^t \alpha_s \right]I \right) k=0t1βtkm=1kαtm+1 ϵkN(0,[1s=1tαs]I)

  2. 最终闭式解:
    q ( x t ∣ x 0 ) = N ( x t ; α ˉ t x 0 , ( 1 − α ˉ t ) I ) q(x_t|x_0) = \mathcal{N}\left(x_t; \sqrt{\bar{\alpha}_t}x_0, (1-\bar{\alpha}_t)I \right) q(xtx0)=N(xt;αˉt x0,(1αˉt)I)
    其中 α ˉ t = ∏ s = 1 t α s \bar{\alpha}_t = \prod_{s=1}^t \alpha_s αˉt=s=1tαs

2. 反向去噪过程

  1. 变分下界(ELBO)的构造

目标最大化对数似然 log ⁡ p θ ( x 0 ) \log p_\theta(x_0) logpθ(x0),引入变分分布 q ( x 1 : T ∣ x 0 ) q(x_{1:T}|x_0) q(x1:Tx0)
log ⁡ p θ ( x 0 ) ≥ E q ( x 1 : T ∣ x 0 ) [ log ⁡ p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = ELBO \log p_\theta(x_0) \geq \mathbb{E}_{q(x_{1:T}|x_0)}\left[ \log \frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)} \right] = \text{ELBO} logpθ(x0)Eq(x1:Tx0)[logq(x1:Tx0)pθ(x0:T)]=ELBO

联合分布分解:
p θ ( x 0 : T ) = p ( x T ) ∏ t = 1 T p θ ( x t − 1 ∣ x t ) p_\theta(x_{0:T}) = p(x_T)\prod_{t=1}^T p_\theta(x_{t-1}|x_t) pθ(x0:T)=p(xT)t=1Tpθ(xt1xt)

详细展开:
ELBO = E q [ log ⁡ p ( x T ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = E q [ log ⁡ p ( x T ) q ( x T ∣ x 0 ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] \begin{aligned} \text{ELBO} &= \mathbb{E}_q \left[ \log p(x_T) + \sum_{t=1}^T \log \frac{p_\theta(x_{t-1}|x_t)}{q(x_t|x_{t-1})} \right] \\ &= \mathbb{E}_q \left[ \log \frac{p(x_T)}{q(x_T|x_0)} + \sum_{t=1}^T \log \frac{p_\theta(x_{t-1}|x_t)}{q(x_t|x_{t-1})} \right] \end{aligned} ELBO=Eq[logp(xT)+t=1Tlogq(xtxt1)pθ(xt1xt)]=Eq[logq(xTx0)p(xT)+t=1Tlogq(xtxt1)pθ(xt1xt)]

  1. KL散度项的转换
    通过马尔可夫链性质 q ( x t ∣ x t − 1 ) = q ( x t ∣ x t − 1 , x 0 ) q(x_t|x_{t-1}) = q(x_t|x_{t-1},x_0) q(xtxt1)=q(xtxt1,x0),分解各项:
    ELBO = E q [ log ⁡ p ( x T ) − log ⁡ q ( x T ∣ x 0 ) + ∑ t = 2 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) + log ⁡ p θ ( x 0 ∣ x 1 ) ] \text{ELBO} = \mathbb{E}_q \left[ \log p(x_T) - \log q(x_T|x_0) + \sum_{t=2}^T \log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} + \log p_\theta(x_0|x_1) \right] ELBO=Eq[logp(xT)logq(xTx0)+t=2Tlogq(xt1xt,x0)pθ(xt1xt)+logpθ(x0x1)]

关键项分析:
D K L ( q ( x T ∣ x 0 ) ∥ p ( x T ) ) D_{KL}(q(x_T|x_0) \| p(x_T)) DKL(q(xTx0)p(xT)):常数项,不影响优化

∑ t = 2 T E q ( x t ∣ x 0 ) [ D K L ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) ] \sum_{t=2}^T \mathbb{E}_{q(x_t|x_0)}[D_{KL}(q(x_{t-1}|x_t,x_0) \| p_\theta(x_{t-1}|x_t))] t=2TEq(xtx0)[DKL(q(xt1xt,x0)pθ(xt1xt))]:主要优化目标

E q ( x 1 ∣ x 0 ) [ − log ⁡ p θ ( x 0 ∣ x 1 ) ] \mathbb{E}_{q(x_1|x_0)}[-\log p_\theta(x_0|x_1)] Eq(x1x0)[logpθ(x0x1)]:最终重建项

  1. 真实后验分布推导
    利用贝叶斯定理:
    q ( x t − 1 ∣ x t , x 0 ) = q ( x t ∣ x t − 1 , x 0 ) q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) q(x_{t-1}|x_t,x_0) = \frac{q(x_t|x_{t-1},x_0)q(x_{t-1}|x_0)}{q(x_t|x_0)} q(xt1xt,x0)=q(xtx0)q(xtxt1,x0)q(xt1x0)
    代入高斯分布表达式,经推导得:
    q ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ; μ ~ t ( x t , x 0 ) , β ~ t I ) q(x_{t-1}|x_t,x_0) = \mathcal{N}(x_{t-1}; \tilde{\mu}_t(x_t,x_0), \tilde{\beta}_t I) q(xt1xt,x0)=N(xt1;μ~t(xt,x0),β~tI)
    其中:
    μ ~ t = α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t x t + α ˉ t − 1 β t 1 − α ˉ t x 0 β ~ t = 1 − α ˉ t − 1 1 − α ˉ t β t \tilde{\mu}_t = \frac{\sqrt{\alpha_t}(1-\bar{\alpha}_{t-1})}{1-\bar{\alpha}_t}x_t + \frac{\sqrt{\bar{\alpha}_{t-1}}\beta_t}{1-\bar{\alpha}_t}x_0 \\ \tilde{\beta}_t = \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t}\beta_t μ~t=1αˉtαt (1αˉt1)xt+1αˉtαˉt1 βtx0β~t=1αˉt1αˉt1βt

3.参数化与损失函数简化

  1. 均值参数化技巧
    将目标均值 μ θ \mu_\theta μθ参数化为:
    μ θ ( x t , t ) = 1 α t ( x t − β t 1 − α ˉ t ϵ θ ( x t , t ) ) \mu_\theta(x_t,t) = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta(x_t,t) \right) μθ(xt,t)=αt 1(xt1αˉt βtϵθ(xt,t))

推导过程:
从真实后验均值表达式出发:
μ ~ t = 1 α t ( x t − β t 1 − α ˉ t ϵ t ) \tilde{\mu}_t = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_t \right) μ~t=αt 1(xt1αˉt βtϵt)
ϵ t \epsilon_t ϵt替换为神经网络预测 ϵ θ ( x t , t ) \epsilon_\theta(x_t,t) ϵθ(xt,t)

  1. KL散度项计算
    两个高斯分布 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0) p θ ( x t − 1 ∣ x t ) p_\theta(x_{t-1}|x_t) pθ(xt1xt)的KL散度:
    D K L = 1 2 σ t 2 ∥ μ ~ t − μ θ ∥ 2 + 1 2 ( σ t 2 β ~ t − 1 − ln ⁡ σ t 2 β ~ t ) D_{KL} = \frac{1}{2\sigma_t^2} \| \tilde{\mu}_t - \mu_\theta \|^2 + \frac{1}{2}\left( \frac{\sigma_t^2}{\tilde{\beta}_t} - 1 - \ln\frac{\sigma_t^2}{\tilde{\beta}_t} \right) DKL=2σt21μ~tμθ2+21(β~tσt21lnβ~tσt2)

简化假设:
• 固定方差 σ t 2 = β ~ t \sigma_t^2 = \tilde{\beta}_t σt2=β~t,此时KL项简化为:

D K L = 1 2 σ t 2 ∥ μ ~ t − μ θ ∥ 2 D_{KL} = \frac{1}{2\sigma_t^2} \| \tilde{\mu}_t - \mu_\theta \|^2 DKL=2σt21μ~tμθ2

  1. 损失函数最终形式
    将参数化后的均值代入KL项:
    ∥ μ ~ t − μ θ ∥ 2 = ∥ β t α t 1 − α ˉ t ( ϵ t − ϵ θ ) ∥ 2 = β t 2 α t ( 1 − α ˉ t ) ∥ ϵ t − ϵ θ ∥ 2 \begin{aligned} \| \tilde{\mu}_t - \mu_\theta \|^2 &= \left\| \frac{\beta_t}{\sqrt{\alpha_t}\sqrt{1-\bar{\alpha}_t}} (\epsilon_t - \epsilon_\theta) \right\|^2 \\ &= \frac{\beta_t^2}{\alpha_t(1-\bar{\alpha}_t)} \| \epsilon_t - \epsilon_\theta \|^2 \end{aligned} μ~tμθ2= αt 1αˉt βt(ϵtϵθ) 2=αt(1αˉt)βt2ϵtϵθ2

加权处理(简化损失函数):
发现不同时间步的权重系数差异较大,通过实验发现忽略权重可提升性能:
L simple = E t , x 0 , ϵ ∥ ϵ − ϵ θ ( α ˉ t x 0 + 1 − α ˉ t ϵ , t ) ∥ 2 \mathcal{L}_{\text{simple}} = \mathbb{E}_{t,x_0,\epsilon} \| \epsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, t) \|^2 Lsimple=Et,x0,ϵϵϵθ(αˉt x0+1αˉt ϵ,t)2

物理意义解读: 前向过程将数据逐渐腐蚀为噪声,反向过程学习逐步去除噪声。网络实际预测的是每一步添加的噪声分量,通过迭代去噪实现数据生成。

4.采样过程推导(DDPM)

  1. 反向迭代公式推导
    p θ ( x t − 1 ∣ x t ) = N ( x t − 1 ; μ θ ( x t , t ) , σ t 2 I ) p_\theta(x_{t-1}|x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t,t), \sigma_t^2 I) pθ(xt1xt)=N(xt1;μθ(xt,t),σt2I)出发:
    x t − 1 = μ θ ( x t , t ) + σ t z ( z ∼ N ( 0 , I ) ) x_{t-1} = \mu_\theta(x_t,t) + \sigma_t z \quad (z \sim \mathcal{N}(0,I)) xt1=μθ(xt,t)+σtz(zN(0,I))

代入参数化后的均值表达式:
x t − 1 = 1 α t ( x t − β t 1 − α ˉ t ϵ θ ) + σ t z x_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta \right) + \sigma_t z xt1=αt 1(xt1αˉt βtϵθ)+σtz

  1. 方差修正项分析
    当选择 σ t 2 = β t \sigma_t^2 = \beta_t σt2=βt时:
    x t − 1 = 1 α t ( x t − β t 1 − α ˉ t ϵ θ ) + β t z x_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta \right) + \sqrt{\beta_t} z xt1=αt 1(xt1αˉt βtϵθ)+βt z

当选择 σ t 2 = β ~ t \sigma_t^2 = \tilde{\beta}_t σt2=β~t时:
x t − 1 = μ ~ t + β ~ t z x_{t-1} = \tilde{\mu}_t + \sqrt{\tilde{\beta}_t} z xt1=μ~t+β~t z

  1. DDIM采样公式推导
    通过引入非马尔可夫前向过程:
    q σ ( x t − 1 ∣ x t , x 0 ) = N ( α ˉ t − 1 x 0 + 1 − α ˉ t − 1 − σ t 2 ⋅ x t − α ˉ t x 0 1 − α ˉ t , σ t 2 I ) q_\sigma(x_{t-1}|x_t,x_0) = \mathcal{N}\left( \sqrt{\bar{\alpha}_{t-1}}x_0 + \sqrt{1-\bar{\alpha}_{t-1}-\sigma_t^2}\cdot\frac{x_t-\sqrt{\bar{\alpha}_t}x_0}{\sqrt{1-\bar{\alpha}_t}}, \sigma_t^2I \right) qσ(xt1xt,x0)=N(αˉt1 x0+1αˉt1σt2 1αˉt xtαˉt x0,σt2I)

确定性采样( σ t = 0 \sigma_t=0 σt=0时):
x t − 1 = α ˉ t − 1 ( x t − 1 − α ˉ t ϵ θ α ˉ t ) ⏟ 预测的 x 0 + 1 − α ˉ t − 1 − σ t 2 ϵ θ x_{t-1} = \sqrt{\bar{\alpha}_{t-1}} \underbrace{\left( \frac{x_t - \sqrt{1-\bar{\alpha}_t}\epsilon_\theta}{\sqrt{\bar{\alpha}_t}} \right)}_{\text{预测的}x_0} + \sqrt{1-\bar{\alpha}_{t-1}-\sigma_t^2}\epsilon_\theta xt1=αˉt1 预测的x0 (αˉt xt1αˉt ϵθ)+1αˉt1σt2 ϵθ

二、网络结构与实现细节

代码均简化处理

1. U-Net架构

核心组件

class DenoiseNet(nn.Module):def __init__(self, dim=64):super().__init__()# 时间步嵌入层self.t_embed = nn.Sequential(SinusoidalPositionEmbeddings(dim),nn.Linear(dim, dim*4),nn.GELU(),nn.Linear(dim*4, dim))# 下采样路径self.down = nn.ModuleList([ResBlock(3, dim),Downsample(dim),ResBlock(dim, dim*2),Downsample(dim*2),ResBlock(dim*2, dim*4),Downsample(dim*4)])# 瓶颈层self.mid = ResBlock(dim*4, dim*4)# 上采样路径self.up = nn.ModuleList([Upsample(dim*4),ResBlock(dim*8, dim*2),Upsample(dim*2),ResBlock(dim*4, dim),Upsample(dim),ResBlock(dim*2, 3)])def forward(self, x, t):# 时间嵌入t = self.t_embed(t)# 下采样h = []for layer in self.down:x = layer(x, t)h.append(x)# 中间层x = self.mid(x, t)# 上采样for layer in self.up:x = layer(x, t)x = torch.cat([x, h.pop()], dim=1)return x

关键技术点

  1. 时间步嵌入:使用正弦位置编码将离散时间步映射为连续向量

    class SinusoidalPositionEmbeddings(nn.Module):def __init__(self, dim):super().__init__()self.dim = dimdef forward(self, t):half_dim = self.dim // 2emb = math.log(10000) / (half_dim - 1)emb = torch.exp(torch.arange(half_dim) * -emb)emb = t[:, None] * emb[None, :]return torch.cat([emb.sin(), emb.cos()], dim=-1)
    
  2. 残差块设计:每个残差块包含时间嵌入的投影

    class ResBlock(nn.Module):def __init__(self, in_c, out_c):super().__init__()self.mlp = nn.Sequential(nn.Linear(64, out_c),nn.GELU())self.conv = nn.Sequential(nn.Conv2d(in_c, out_c, 3, padding=1),nn.GroupNorm(8, out_c),nn.GELU(),nn.Conv2d(out_c, out_c, 3, padding=1),nn.GroupNorm(8, out_c))def forward(self, x, t):h = self.conv(x)h += self.mlp(t)[:,:,None,None]return h + x  # 残差连接
    

2. 训练与采样算法

训练流程

def train_step(batch):optimizer.zero_grad()# 随机采样时间步t = torch.randint(0, T, (batch.size(0),))# 前向加噪alpha_bar = alphas_cumprod[t][:,None,None,None]noise = torch.randn_like(batch)noisy = torch.sqrt(alpha_bar)*batch + torch.sqrt(1-alpha_bar)*noise# 预测噪声pred = model(noisy, t)loss = F.mse_loss(pred, noise)loss.backward()optimizer.step()return loss.item()

采样流程(DDPM)

@torch.no_grad()
def sample(steps=1000):x = torch.randn(1, 3, 64, 64)  # 初始噪声for t in reversed(range(steps)):ts = torch.full((1,), t, dtype=torch.long)pred_noise = model(x, ts)# 计算系数alpha = 1 - betas[t]alpha_bar_prev = alphas_cumprod_prev[t]# 更新公式x = (1 / torch.sqrt(alpha)) * (x - (betas[t]/torch.sqrt(1-alphas_cumprod[t]))*pred_noise)if t > 0:noise = torch.randn_like(x)x += torch.sqrt((1 - alpha_bar_prev) * betas[t]/(1 - alphas_cumprod[t])) * noisereturn x.clamp(-1,1)

三、损失函数与优化

1. 理论损失函数

完整变分下界:
L VLB = D K L ( q ( x T ∣ x 0 ) ∥ p ( x T ) ) ⏟ 常数项 + ∑ t = 2 T E q ( x t ∣ x 0 ) [ D K L ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) ] + E q ( x 1 ∣ x 0 ) [ − log ⁡ p θ ( x 0 ∣ x 1 ) ] \mathcal{L}_{\text{VLB}} = \underbrace{D_{KL}(q(x_T|x_0) \| p(x_T))}_{\text{常数项}} + \sum_{t=2}^T \mathbb{E}_{q(x_t|x_0)}[D_{KL}(q(x_{t-1}|x_t,x_0) \| p_\theta(x_{t-1}|x_t))] + \mathbb{E}_{q(x_1|x_0)}[-\log p_\theta(x_0|x_1)] LVLB=常数项 DKL(q(xTx0)p(xT))+t=2TEq(xtx0)[DKL(q(xt1xt,x0)pθ(xt1xt))]+Eq(x1x0)[logpθ(x0x1)]

2. 实际简化损失

实践中采用去噪得分匹配目标:
L simple = E t , x 0 , ϵ [ ∥ ϵ − ϵ θ ( α ˉ t x 0 + 1 − α ˉ t ϵ , t ) ∥ 2 ] \mathcal{L}_{\text{simple}} = \mathbb{E}_{t,x_0,\epsilon}\left[ \|\epsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, t)\|^2 \right] Lsimple=Et,x0,ϵ[ϵϵθ(αˉt x0+1αˉt ϵ,t)2]

超参数说明: • 时间步采样策略:均匀采样 vs 重要性采样

• 损失权重:原始论文建议不进行加权,后续改进如Progressive Distillation会调整


四、扩展模型与前沿改进

1. DDIM(Denoising Diffusion Implicit Models)

核心改进
• 引入非马尔可夫的前向过程,允许更快的确定性采样

• 生成过程满足:

x t − 1 = α t − 1 f θ ( x t , t ) + 1 − α t − 1 − σ t 2 ϵ θ ( x t , t ) + σ t ϵ x_{t-1} = \sqrt{\alpha_{t-1}}f_\theta(x_t,t) + \sqrt{1-\alpha_{t-1}-\sigma_t^2}\epsilon_\theta(x_t,t) + \sigma_t\epsilon xt1=αt1 fθ(xt,t)+1αt1σt2 ϵθ(xt,t)+σtϵ

优势
• 采样步骤可减少至50-100步(原需1000步)

• 保持生成质量的同时提升效率

2. Stable Diffusion

关键创新

  1. 潜在空间扩散:在VAE的潜在空间进行扩散,降低计算量
  2. 交叉注意力机制:实现文本-图像的条件生成
    class CrossAttention(nn.Module):def __init__(self, dim):super().__init__()self.q = nn.Linear(dim, dim)self.kv = nn.Linear(768, 2*dim)  # 文本编码维度768self.proj = nn.Linear(dim, dim)def forward(self, x, context):Q = self.q(x)K, V = self.kv(context).chunk(2, dim=-1)attn = (Q @ K.transpose(-2,-1)) * (1.0 / math.sqrt(Q.size(-1)))attn = F.softmax(attn, dim=-1)return self.proj(attn @ V)
    

3. 条件扩散模型

实现方式
• 拼接条件信息: p θ ( x t − 1 ∣ x t , y ) p_\theta(x_{t-1}|x_t, y) pθ(xt1xt,y)

• 分类器指导:采样时修正噪声预测

ϵ ^ θ ( x t , t ) = ϵ θ ( x t , t ) − 1 − α ˉ t γ ∇ x t log ⁡ p ϕ ( y ∣ x t ) \hat{\epsilon}_\theta(x_t,t) = \epsilon_\theta(x_t,t) - \sqrt{1-\bar{\alpha}_t}\gamma\nabla_{x_t}\log p_\phi(y|x_t) ϵ^θ(xt,t)=ϵθ(xt,t)1αˉt γxtlogpϕ(yxt)
其中 γ \gamma γ为指导强度系数


结论

扩散模型通过前向破坏与反向重建的对称过程,建立了强大的生成框架:

  1. 理论优势:严格数学推导保证模式覆盖,避免GAN的模式崩溃问题
  2. 实现特性:U-Net架构天然适合像素级预测任务,时间嵌入实现多步共享参数
  3. 应用扩展:与语言模型结合(DALL·E 2)、跨模态生成(Stable Diffusion)展现强大潜力

对比VAE与扩散模型:

特性VAE扩散模型
生成质量通常模糊高清晰度
训练稳定性易训练需精细调参
理论保证ELBO优化基于得分匹配的严格推导
采样速度一步生成需迭代采样(10-1000步)

核心代码库参考:
• PyTorch实现:https://github.com/lucidrains/denoising-diffusion-pytorch

• Stable Diffusion官方:https://github.com/CompVis/stable-diffusion

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com