id |
caadria2024_449 |
authors |
Xie, Yuchen, Li, Yunqin, Zhang, Jiaxin, Zhang, Jiahao and Kuang, Zheyuan |
year |
2024 |
title |
Analysis of Differences in Street Visual Walkability Perception Between DCNN and ViT Model Based on Panoramic Street View Images |
source |
Nicole Gardner, Christiane M. Herr, Likai Wang, Hirano Toshiki, Sumbul Ahmad Khan (eds.), ACCELERATED DESIGN - Proceedings of the 29th CAADRIA Conference, Singapore, 20-26 April 2024, Volume 2, pp. 29–38 |
doi |
https://doi.org/10.52842/conf.caadria.2024.2.029
|
summary |
In measuring Urban Street Visual Walkability Perception (VWP) using Street View Images (SVIs), the VWP classification deep multitask learning (VWPCL) model based on the Deep Convolutional Neural Network (DCNN) shows notable deficiencies in recognizing local features within panoramic images. Addressing this, the study introduces a Vision Transformer (ViT)-based VWPCL model and employs various methods comparing its performance with DCNN. Initially, we assess the basic accuracy and validity performance using traditional metrics such as recall rates, and precision. Furthermore, we use the SHAP model for interpretable machine learning to analyse the significance and contribution of streetscape elements. Finally, the results of panoramic SVIs classification and feature display from different angles at the same location are compared by the Grad-CAM model to further visualise and explain the differences in feature elements that affect the classification of the computer vision model. Findings show the ViT-based VWPCL model, as compared to the traditional DCNN framework, mitigates image distortions in panoramic SVIs while demonstrating higher accuracy that aligns more closely with human visual cognition. The primary contribution of this study lies in qualitatively and quantitatively comparing the performance disparities between ViT and DCNN in the realm of street VWP. |
keywords |
visual walkability perception, panoramic street view images, deep convolutional neural network, vision transformer, grad-cam |
series |
CAADRIA |
email |
|
full text |
file.pdf (3,076,252 bytes) |
references |
Content-type: text/plain
|
ADDIN ZOTERO_BIBL {"uncited":[],"omitted":[],"custom":[]} CSL_BIBLIOGRAPHY Bazi, Y., Bashmal, L., Rahhal, M. M. A., Dayil, R. A., & Ajlan, N. A. (2021)
Vision Transformers for Remote Sensing Image Classification
, Remote Sensing, 13(3), Article 3. https://doi.org/10.3390/rs13030516
|
|
|
|
Fan, Z., Zhang, F., Loo, B. P. Y., & Ratti, C. (2023)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
, Proceedings of the National Academy of Sciences, 120(27), e2220417120. https://doi.org/10.1073/pnas.2220417120
|
|
|
|
Huang, Y., Zhang, F., Gao, Y., Tu, W., Duarte, F., Ratti, C., Guo, D., & Liu, Y. (2023)
Densely Connected Convolutional Networks
, Computers, Environment and Urban Systems, 106, 102043
|
|
|
|
Li, Y., Yabuki, N., & Fukuda, T. (2022)
Measuring visual walkability perception using panoramic street view images, virtual reality, and deep learning
, Sustainable Cities and Society, 86, 104140
|
|
|
|
Li, Y., Yabuki, N., Fukuda, T., & Zhang, J. (2020)
A big data evaluation of urban street walkability using deep learning and environmental sensors-a case study around Osaka University Suita campus
, Proceedings of the 38th eCAADe Conference, TU Berlin, Berlin, Germany, 319-328
|
|
|
|
Lu, Z., Xie, H., Liu, C., & Zhang, Y. (2022)
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
, Advances in Neural Information Processing Systems, 35, 14663-14677
|
|
|
|
Zhao, C., Ogawa, Y., Chen, S., Oki, T., & Sekimoto, Y. (2023)
Quantitative land price analysis via computer vision from street view images
, Engineering Applications of Artificial Intelligence, 123, 106294. https://doi.org/10.1016/j.engappai.2023.106294
|
|
|
|
last changed |
2024/11/17 22:05 |
|