id |
caadria2024_293 |
authors |
Xu, Weishun, Li, Mingming and Yang, Xuyou |
year |
2024 |
title |
Can Generative AI Models Count? Finetuning Stable Diffusion for Architecture Image Generation with Designated Floor Numbers Using a Small Dataset |
doi |
https://doi.org/10.52842/conf.caadria.2024.1.089
|
source |
Nicole Gardner, Christiane M. Herr, Likai Wang, Hirano Toshiki, Sumbul Ahmad Khan (eds.), ACCELERATED DESIGN - Proceedings of the 29th CAADRIA Conference, Singapore, 20-26 April 2024, Volume 1, pp. 89–98 |
summary |
Despite the increasing popularity of off-the-shelf text-to-image generative artificial intelligence models in early-stage architectural design practices, general-purpose models are challenged in domain-specific tasks such as generating buildings with the correct number of floors. We hypothesise that this problem is mainly caused by the lack of floor number information in standard training sets. To overcome the often-dodged problem in creating a text-image pair dataset large enough for finetuning the original model in design research, we propose to use BLIP method for both understanding and generation based automated labelling and captioning with online images. A small dataset of 25,172 text-image pairs created with this method is used to finetune an off-the-shelf Stable Diffusion model for 10 epochs with affordable computing power. Compared to the base model with a less than 20% chance to generate the correct number of floors, the finetuned model has an over 50% overall chance for correct floor number and 87.3% change to control the floor count discrepancy within 1 storey. |
keywords |
text-to-image generation, model finetuning, stable diffusion, automated labelling |
series |
CAADRIA |
email |
|
full text |
file.pdf (3,617,387 bytes) |
references |
Content-type: text/plain
|
Chen, J., Wang, D., Shao, Z., Zhang, X., Ruan, M., Li, H., & Li, J. (2023)
Using Artificial Intelligence to Generate Master-Quality Architectural Designs from Text Descriptions
, Buildings, 13(9), Article 9. https://doi.org/10.3390/buildings13092285
|
|
|
|
Deshpande, R. (2023)
Generative Pre-Trained Transformers for 15-Minute City Design
, HUMAN-CENTRIC - Proceedings of the 28th CAADRIA Conference, (pp. 595-604). https://doi.org/10.52842/conf.caadria.2023.1.595
|
|
|
|
Goodfellow, I., Bengio, Y., & Courville, A. (2016)
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
, MIT press
|
|
|
|
Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., & Duerig, T. (2021)
LoRA: Low-Rank Adaptation of Large Language Models
, Proceedings of the 38th International Conference on Machine Learning, 4904-4916. https://proceedings.mlr.press/v139/jia21b.html
|
|
|
|
Kim, F. C. (2023)
Text2Form Diffusion: Framework for learning curated architectural vocabulary
, Digital Design Reconsidered - Proceedings of the 41st Conference on Education and Research in Computer Aided Architectural Design in Europe (eCAADe 2023), (Vol 1, pp. 79-88). https://doi.org/10.52842/conf.ecaade.2023.1.079
|
|
|
|
Kuru, J. (2023)
Training Non-Typical Character Models for Stable Diffusion Utilizing Open Sources AIS
, [Honor Bachelor Thesis, University of Arizona] https://repository.arizona.edu/handle/10150/668639
|
|
|
|
Li, J., Li, D., Xiong, C., & Hoi, S. (2022)
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
, Proceedings of the 39th International Conference on Machine Learning, 12888-12900. https://proceedings.mlr.press/v162/li22n.html
|
|
|
|
Ploennigs, J., & Berger, M. (2023)
AI art in architecture
, AI in Civil Engineering, 2(1), 8. https://doi.org/10.1007/s43503-023-00018-y
|
|
|
|
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022)
Hierarchical Text-Conditional Image Generation with CLIP Latents
, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10674-10685. https://doi.org/10.1109/CVPR52688.2022.01042
|
|
|
|
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. (2023)
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22500-22510. https://doi.org/10.1109/CVPR52729.2023.02155
|
|
|
|
Stigsen, M. B., Moisi, A., Rasoulzadeh, S., Schinegger, K., & Rutzinger, S. (2023)
AI Diffusion as Design Vocabulary-Investigating the use of AI image generation in early architectural design and education
, Digital Design Reconsidered-Proceedings of the 41st Conference on Education and Research in Computer Aided Architectural Design in Europe (Vol. 2, pp. 587-596). https://doi.org/10.52842/conf.ecaade.2023.2.587
|
|
|
|
Turchi, T., Carta, S., Ambrosini, L., & Malizia, A. (2023)
Human-AI Co-creation: Evaluating the Impact of Large-Scale Text-to-Image Generative Models on the Creative Process
, L. D. Spano, A. Schmidt, C. Santoro, & S. Stumpf (Eds.), End-User Development (pp. 35-51). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-34433-6_3
|
|
|
|
last changed |
2024/11/17 22:05 |
|