Article Preview
TopPrior to the integration of image-segmentation and retrieval technology into the realms of architectural design and urban development, architects seeking a comprehensive understanding of the overall design schemes and aesthetic styles of integrated architectural environments typically engaged in discussions with peers and consulted relevant literature. However, these conventional methods proved inadequate in meeting the sensory requirements of urban design concerning architectural style and green environments. With the introduction and application of image-segmentation and -retrieval technology, a novel solution has emerged for this challenging issue. For architects, the segmentation and retrieval of architectural images offer a superior means of acquiring relevant knowledge about integrated architectural environment design, thereby propelling the development of green cities. Consequently, the key focus shifts to the construction of an intelligent and efficient architectural image-segmentation and -retrieval model.
In recent years, the application of deep learning–based semantic image segmentation has become widespread across various domains. This approach is employed primarily to address issues such as fuzzy boundaries, low precision, and low resolution in images. When image-segmentation techniques are applied to architectural images, the model is expected not only to accurately delineate specific architectural features and refine architectural categories but also to assist designers in obtaining more-precise design solutions.
Deep learning–based semantic segmentation of images (Ulku and Akagündüz, 2022; Hemamalini et al., 2022) has witnessed widespread adoption across various domains, effectively addressing issues such as fuzziness and low resolution in images. Erdi et al. (1997) introduced an end-to-end neural network for semantic image segmentation. Li et al. (2019) proposed a U-Net network structure based on fully convolutional networks (FCNs), better suited for fine image processing. Unlike the summation mechanism of FCN, U-Net utilizes multiple upsampling and downsampling operations to gradually acquire high-level semantic information. It also incorporates jump connections (stitching dimensions of the same channels together), thereby enhancing feature fusion and significantly improving segmentation performance. While U-Net has demonstrated success in image segmentation, its limitations in extracting detailed contextual information have led to the proposal of new structures with U-Net as a variant. For instance, Duan et al. (2018) designed a lightweight SegNet model, introducing a novel upsampling method for efficient image segmentation.
The UNet++ network, an extension of U-Net, represents a notable breakthrough in image-segmentation technology. This network efficiently addresses the adaptive selection of sampling depth among different samples, accelerating the extraction of feature information at various levels. However, it comes with a drawback of an abrupt increase in the number of model parameters, leading to heightened computational costs and a significant demand for GPU resources (Zhou et al., 2018). As network models deepen, Tan et al. (2021) proposed an AcuNet network, utilizing depth-separable convolution to reduce model parameters. Trebing et al. (2021) introduced an At-UNet segmentation network, incorporating an attention mechanism based on U-Net and employing depth-wise convolution instead of traditional convolution. Cao and Zhang (2020) proposed an updated Res-UNet model for high-resolution image segmentation. He et al. (2020) presented a hybrid attention approach for effective architecture segmentation. Zhao et al. (2022) introduced an Inception v3–based image-segmentation method to enhance the segmentation accuracy of small target images effectively. Zhao et al. (2017) proposed a pyramid-shaped scene-parsing network, integrating contextual data and fully exploiting global features for semantic segmentation of diverse scenes. He et al. (2017) introduced mask R-CNN for image segmentation, achieving high-quality semantic segmentation while performing target detection.