CumInCAD is a Cumulative Index about publications in Computer Aided Architectural Design
supported by the sibling associations ACADIA, CAADRIA, eCAADe, SIGraDi, ASCAAD and CAAD futures

PDF papers
References
id caadria2024_293
authors Xu, Weishun, Li, Mingming and Yang, Xuyou
year 2024
title Can Generative AI Models Count? Finetuning Stable Diffusion for Architecture Image Generation with Designated Floor Numbers Using a Small Dataset
doi https://doi.org/10.52842/conf.caadria.2024.1.089
source Nicole Gardner, Christiane M. Herr, Likai Wang, Hirano Toshiki, Sumbul Ahmad Khan (eds.), ACCELERATED DESIGN - Proceedings of the 29th CAADRIA Conference, Singapore, 20-26 April 2024, Volume 1, pp. 89–98
summary Despite the increasing popularity of off-the-shelf text-to-image generative artificial intelligence models in early-stage architectural design practices, general-purpose models are challenged in domain-specific tasks such as generating buildings with the correct number of floors. We hypothesise that this problem is mainly caused by the lack of floor number information in standard training sets. To overcome the often-dodged problem in creating a text-image pair dataset large enough for finetuning the original model in design research, we propose to use BLIP method for both understanding and generation based automated labelling and captioning with online images. A small dataset of 25,172 text-image pairs created with this method is used to finetune an off-the-shelf Stable Diffusion model for 10 epochs with affordable computing power. Compared to the base model with a less than 20% chance to generate the correct number of floors, the finetuned model has an over 50% overall chance for correct floor number and 87.3% change to control the floor count discrepancy within 1 storey.
keywords text-to-image generation, model finetuning, stable diffusion, automated labelling
series CAADRIA
email
full text file.pdf (3,617,387 bytes)
references Content-type: text/plain
Details Citation Select
100%; open Chen, J., Wang, D., Shao, Z., Zhang, X., Ruan, M., Li, H., & Li, J. (2023) Find in CUMINCAD Using Artificial Intelligence to Generate Master-Quality Architectural Designs from Text Descriptions , Buildings, 13(9), Article 9. https://doi.org/10.3390/buildings13092285

100%; open Deshpande, R. (2023) Find in CUMINCAD Generative Pre-Trained Transformers for 15-Minute City Design , HUMAN-CENTRIC - Proceedings of the 28th CAADRIA Conference, (pp. 595-604). https://doi.org/10.52842/conf.caadria.2023.1.595

100%; open Goodfellow, I., Bengio, Y., & Courville, A. (2016) Find in CUMINCAD An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion , MIT press

100%; open Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., & Duerig, T. (2021) Find in CUMINCAD LoRA: Low-Rank Adaptation of Large Language Models , Proceedings of the 38th International Conference on Machine Learning, 4904-4916. https://proceedings.mlr.press/v139/jia21b.html

100%; open Kim, F. C. (2023) Find in CUMINCAD Text2Form Diffusion: Framework for learning curated architectural vocabulary , Digital Design Reconsidered - Proceedings of the 41st Conference on Education and Research in Computer Aided Architectural Design in Europe (eCAADe 2023), (Vol 1, pp. 79-88). https://doi.org/10.52842/conf.ecaade.2023.1.079

100%; open Kuru, J. (2023) Find in CUMINCAD Training Non-Typical Character Models for Stable Diffusion Utilizing Open Sources AIS , [Honor Bachelor Thesis, University of Arizona] https://repository.arizona.edu/handle/10150/668639

100%; open Li, J., Li, D., Xiong, C., & Hoi, S. (2022) Find in CUMINCAD BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , Proceedings of the 39th International Conference on Machine Learning, 12888-12900. https://proceedings.mlr.press/v162/li22n.html

100%; open Ploennigs, J., & Berger, M. (2023) Find in CUMINCAD AI art in architecture , AI in Civil Engineering, 2(1), 8. https://doi.org/10.1007/s43503-023-00018-y

100%; open Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022) Find in CUMINCAD Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10674-10685. https://doi.org/10.1109/CVPR52688.2022.01042

100%; open Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. (2023) Find in CUMINCAD DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation , 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22500-22510. https://doi.org/10.1109/CVPR52729.2023.02155

100%; open Stigsen, M. B., Moisi, A., Rasoulzadeh, S., Schinegger, K., & Rutzinger, S. (2023) Find in CUMINCAD AI Diffusion as Design Vocabulary-Investigating the use of AI image generation in early architectural design and education , Digital Design Reconsidered-Proceedings of the 41st Conference on Education and Research in Computer Aided Architectural Design in Europe (Vol. 2, pp. 587-596). https://doi.org/10.52842/conf.ecaade.2023.2.587

100%; open Turchi, T., Carta, S., Ambrosini, L., & Malizia, A. (2023) Find in CUMINCAD Human-AI Co-creation: Evaluating the Impact of Large-Scale Text-to-Image Generative Models on the Creative Process , L. D. Spano, A. Schmidt, C. Santoro, & S. Stumpf (Eds.), End-User Development (pp. 35-51). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-34433-6_3

last changed 2024/11/17 22:05
pick and add to favorite papersHOMELOGIN (you are user _anon_712458 from group guest) CUMINCAD Papers Powered by SciX Open Publishing Services 1.002