A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

Existing preference-alignment methods for text-to-image diffusion treat preference as a sparse, terminal reward. We re-cast alignment as a dense-reward problem along the denoising trajectory, deriving an objective that consistently improves alignment quality and training stability over sparse-reward baselines.

arXiv · Code

Recommended citation: S. Yang*, T. Chen*, and M. Zhou. A Dense Reward View on Aligning Text-to-Image Diffusion with Preference. ICML, 2024. (*equal contribution) · https://arxiv.org/abs/2402.08265