PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation
Overview
Authors
Affiliations
Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.
Safadi M, Zayegh O, Hawoot Z Cureus. 2025; 16(12):e74978.
PMID: 39744279 PMC: 11692014. DOI: 10.7759/cureus.74978.
KNowNEt:Guided Health Information Seeking from LLMs via Knowledge Graph Integration.
Yan Y, Hou Y, Xiao Y, Zhang R, Wang Q IEEE Trans Vis Comput Graph. 2024; 31(1):547-557.
PMID: 39255106 PMC: 11875928. DOI: 10.1109/TVCG.2024.3456364.
Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning.
Li Y, Wang J, Aboagye P, Yeh C, Zheng Y, Wang L IEEE Trans Vis Comput Graph. 2024; 30(6):2875-2887.
PMID: 38625780 PMC: 11412260. DOI: 10.1109/TVCG.2024.3388514.