id |
ecaade2024_199 |
authors |
Zhong, Ximing; Liang, Jiadong; Li, Yingkai |
year |
2024 |
title |
Building-Agent: A 3D generation agent framework integrating large language models and graph-based 3D generation model |
doi |
https://doi.org/10.52842/conf.ecaade.2024.2.291
|
source |
Kontovourkis, O, Phocas, MC and Wurzer, G (eds.), Data-Driven Intelligence - Proceedings of the 42nd Conference on Education and Research in Computer Aided Architectural Design in Europe (eCAADe 2024), Nicosia, 11-13 September 2024, Volume 2, pp. 291–300 |
summary |
Large language models (LLMs) possess powerful intelligence, demonstrating unprecedented potential in AI-driven architectural design. While LLMs can understand design tasks, they lack the reasoning capability from language to three-dimensional (3D) architectural models. This paper proposes a novel 3D building generative agent framework, Building-Agent, which combines LLMs' decision-making capabilities with Graph Neural Networks (GNNs) generative abilities. Experiments utilize real design briefs and site constraints to test the building agent's task-processing capabilities. The results demonstrate that the Building-Agent can accurately predict different site layout outcomes and achieve high task completion rates. Furthermore, it enables interactive 3D building layout iteration through multi-step natural language instructions. The Building-Agent's ability to comprehend and reason about 3D spatial layouts, based on the graph representations of 3D models in the modeling engine and the requirements of natural language inputs, showcases its potential to accomplish tasks with initial proficiency. Compared to previous 3D generative models that rely on human decision-making for inputting spatial constraints, the Building-Agent paves the way for AI to comprehend and complete 3D design tasks autonomously, promising a transformative impact on AI and architectural design. |
keywords |
Building-Agent, Large Language Model, Graph Generation Model, Language Comprehending, 3D Spatial Reasoning, 3D Cognitive Ability |
series |
eCAADe |
email |
|
full text |
file.pdf (2,380,690 bytes) |
references |
Content-type: text/plain
|
Avetisyan, A. et al. (2024)
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
, arXiv. Available at: http://arxiv.org/abs/2403.13064 (Accessed: 31 March 2024)
|
|
|
|
Chang, K.-H. et al. (2021)
Building-GAN: Graph-Conditioned Building Volumetric Design Generation
, arXiv. Available at: http://arxiv.org/abs/2104.13316 (Accessed: 26 February 2024)
|
|
|
|
Huang, S. et al. (2023)
Diffusion-based Generation, Optimization, and Planning in 3D Scenes
, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada: IEEE, pp. 16750-16761. Available at: https://doi.org/10.1109/CVPR52729.2023.01607
|
|
|
|
Jain, A. et al. (2022)
Zero-Shot Text-Guided Object Generation with Dream Fields
, arXiv. Available at: http://arxiv.org/abs/2112.01455 (Accessed: 31 March 2024)
|
|
|
|
Lin, C., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M., & Lin, T. (2022)
Magic3D: High-Resolution Text-to-3D Content Creation
, ArXiv. /abs/2211.10440
|
|
|
|
Liu, D. et al. (2024)
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models
, arXiv. Available at: http://arxiv.org/abs/2402.03327 (Accessed: 12 March 2024)
|
|
|
|
Poole, B. et al. (2022)
DreamFusion: Text-to-3D using 2D Diffusion
, arXiv. Available at: http://arxiv.org/abs/2209.14988 (Accessed: 21 February 2024)
|
|
|
|
Sun, C. et al. (2023)
3D-GPT: Procedural 3D Modelling with Large Language Models
, arXiv. Available at: http://arxiv.org/abs/2310.12945 (Accessed: 25 February 2024)
|
|
|
|
Tan, W. et al. (2024)
Towards General Computer Control: A Multimodal agent for Red Dead Redemption II as a Case Study
, arXiv. Available at: http://arxiv.org/abs/2403.03186 (Accessed: 16 March 2024)
|
|
|
|
Weinzapfel, Guy; Negroponte, Nicholas (1976)
ACM Press the 3rd annual conference - Philadelphia, Pennsylvania (1976)
, Proceedings of the 3rd annual conference on Computer graphics and interactive techniques - SIGGRAPH '76 - Architecture-by-yourself. , (), 74-78. doi:10.1145/563274.563290
|
|
|
|
Wu, J., Zhang, C., Xue, T., Freeman, W. T., & Tenenbaum, J. B. (2016)
Learning a probabilistic latent space of object shapes via 3D Generative-Adversarial Modelling
, arXiv (Cornell University). https://doi.org/10.48550/arxiv.1610.07584
|
|
|
|
Xu, J. et al. (2023)
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
, arXiv. Available at: http://arxiv.org/abs/2212.14704 (Accessed: 31 March 2024)
|
|
|
|
Yu, Y. et al. (2024)
Affordable Generative Agents
, arXiv. Available at: http://arxiv.org/abs/2402.02053 (Accessed: 25 February 2024)
|
|
|
|
Zeng, X., Vahdat, A., Williams, F., Gojcic, Z., Litany, O., Fidler, S., & Kreis, K. (2022)
LION: Latent Point Diffusion Models for 3D Shape Generation
, ArXiv. /abs/2210.06978
|
|
|
|
Zhao, L. et al. (2021)
3DVG-Transformer: Relation Modelling for Visual Grounding on Point Clouds
, 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada: IEEE, pp. 2908-2917. Available at: https://doi.org/10.1109/ICCV48922.2021.00292
|
|
|
|
Zhong, X., Koh, I. and Fricker, P. (2023)
Building-GNN: Exploring a co-design framework for generating controllable 3D building prototypes by graph and recurrent neural networks
, eCAADe 2023: Digital Design Reconsidered, Graz, Austria, pp. 431-440. Available at: https://doi.org/10.52842/conf.ecaade.2023.2.431
|
|
|
|
last changed |
2024/11/17 22:05 |
|