Published in International Conference on Learning Representations (ICLR), 2024

  Code

Additional information

Abstract

Fine-tuning text-to-image models using reward functions trained on human feedback data has emerged as a powerful approach for aligning model behavior with human intent. However, excessive optimization with such reward models, which are only proxy objectives, can degrade the performance of the fine-tuned models, a phenomenon commonly referred to as reward overoptimization. We introduce the Text-Image Alignment Assessment (TIA2) benchmark, a diverse collection of text prompts, images, and human annotations, for studying the issue in depth. We evaluate several state-of-the-art reward models for text-to-image generation on our benchmark and find that they are often not well-aligned with human assessment. We empirically demonstrate that overoptimization can occur when a poorly aligned reward model is used as a fine-tuning objective. To address this, we introduce a simple method, TextNorm, for inducing confidence calibration in reward models by normalizing the scores across prompts that are semantically different from the original prompt. We demonstrate that using the confidence-calibrated scores in fine-tuning effectively reduces the risk of overoptimization.

BibTeX

@inproceedings{
kim2024confidenceaware,
title={Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models},
author={Kyuyoung Kim and Jongheon Jeong and Minyong An and Mohammad Ghavamzadeh and Krishnamurthy Dj Dvijotham and Jinwoo Shin and Kimin Lee},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=Let8OMe20n}
}

Updated: