|
DOI: 10.5281/zenodo.19847402.
Vol. 57 (2026), pp. 90–99 •
RESEARCH ARTICLE
Xue Yang, Jie Tang *
Faculty of Humanities and Social Sciences, Macao Polytechnic University, Macao, China
* p2512711@mpu.edu.mo (corresponding author)
Abstract
Song Dynasty grottoes, represented by the Dazu Rock Carvings, exhibit pronounced artistic standardization and stylistic convergence. However, conventional similarity assessments of cultural relics remain heavily reliant on qualitative expert experience, which often results in inconsistent evaluations and lacks the quantitative rigor necessary for refined digital conservation and restoration. To address this, we propose a quantitative evaluation model based on trimodal feature fusion. The framework first employs the Perceptual Hash (pHash) algorithm for global structural screening of orthophotos, followed by the integration of the SIFT operator to extract high-dimensional texture features and perform local keypoint matching. Finally, an improved MSRoVGG deep learning architecture is incorporated to achieve deep feature verification at the semantic level. Using the Seven Buddhas of the Present Kalpa in the Dazu Rock Carvings as a case study, the results demonstrate a 100% recognition rate for identical statue types within the same niche. Furthermore, the model successfully discriminates between different iconographic types, effectively capturing semantic variations in artistic styles and vestimentary features between Buddhist and secular figures. These findings confirm the model's robust capability in typological classification and stylistic differentiation.
Keywords
Dazu Rock Carvings, China, Similarity Matching, pHash, SIFT, MSRoVGG, Trimodal Fusion.
Resumen
Las grutas de la Dinastía Song, representadas por las esculturas rupestres de Dazu, presentan una marcada estandarización artística y convergencia estilística. Sin embargo, las evaluaciones convencionales de similitud de reliquias culturales siguen dependiendo en gran medida de la experiencia cualitativa de expertos, lo cual suele dar lugar a evaluaciones inconsistentes y carece del rigor cuantitativo necesario para una conservación y restauración digital refinada. Para solucionarlo, proponemos un modelo de evaluación cuantitativa basado en la fusión de características trimodales. El marco utiliza primero el algoritmo perceptual Hash (pHash) para el cribado estructural global de ortofotos; a continuación, integra el operador SIFT para extraer características de textura de alta dimensión y realizar el emparejamiento de puntos clave locales. Finalmente, se incorpora una arquitectura de aprendizaje profundo MSRoVGG mejorada para lograr la verificación de características profundas a nivel semántico. Utilizando como caso de estudio los Siete Budas del Kalpa Presente en las esculturas rupestres de Dazu, los resultados muestran una tasa de reconocimiento del 100% para tipos de estatuas idénticos dentro del mismo nicho. Además, el modelo distingue con éxito entre diferentes tipos iconográficos, capturando eficazmente las variaciones semánticas en el estilo artístico y los rasgos indumentarios entre figuras budistas y seculares. Estos resultados confirman la solidez del modelo en la clasificación tipológica y la diferenciación estilística.
Palabras clave
Esculturas rupestres de Dazu, China, emparejamiento de similitud, pHash, SIFT, MSRoVGG, fusión trimodal.
Cite as
Yang, X.; J. Tang. 2026.
Research on Similarity Matching of Dazu Rock Carvings in China Based on a Trimodal Fusion of pHash, SIFT, and MSRoVGG. Arqueología Iberoamericana 57: 90–99. Other Persistent Identifiers
Received: March 16, 2026. Accepted: April 21, 2026. Published: April 30, 2026.
About the authors
Xue Yang (yangxue@cqcet.edu.cn) is a faculty member at Chongqing Polytechnic University of Electronic Technology. She is currently pursuing her doctoral degree in Culture and Heritage and Anthropology at Macao Polytechnic University. Her primary research interests include the restoration and communication of cultural heritage. She is deeply committed to the practical application of digital transformation, preservation, and dissemination of cultural symbols.
Jie Tang (p2512711@mpu.edu.mo) is an Associate Professor and Master's Supervisor at Guangxi University for Nationalities. She is currently pursuing her doctoral degree in Cultural Heritage and Anthropology at Macao Polytechnic University. Her research focuses on digital communication studies, with a specific emphasis on the integration of digital technology with intangible cultural heritage.
Data Availability
The data is sourced from the Academy of Dazu Rock Carvings. The data used and analyzed in the study are available from the author upon reasonable request.
References
Al Ghamdi, M. et alii. 2012. Spatio-temporal SIFT and Its Application to Human Action Classification. In European Conference on Computer Vision 2012: Workshops and Demonstrations (Florence, Italy), eds. A. Fusiello et alii, pp. 301-310. Lecture Notes in Computer Science 7583. Berlin: Springer. Google Scholar. Coelho, D.F.G. et alii. 2018. Computation of 2D 8×8 DCT Based on the Loeffler Factorization Using Algebraic Integer Encoding. IEEE Transactions on Computers 67(12): 1692-1702. Google Scholar. Dai, Y. et alii. 2025. Research on Digital Restoration and Innovative Utilization of Taohuawu Woodblock New Year Prints Based on Edge Detection and Color Clustering. Applied Sciences 15(16): 9081. Google Scholar. Fiorucci, M. et alii. 2020. Machine Learning for Cultural Heritage: A Survey. Pattern Recognition Letters 133: 102-108. Google Scholar. Garozzo, R. et alii. 2024. 3D Segmentation and Analysis for Masonry Bridge Preservation. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 48: 17-23. DOI. Google Scholar. Grilli, E. et alii. 2019. Geometric Features Analysis for the Classification of Cultural Heritage Point Clouds. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 42: 541-548. Google Scholar. Hassan, A.T.; D. Fritsch. 2019. Integration of Laser Scanning and Photogrammetry in 3D/4D Cultural Heritage Preservation: A Review. International Journal of Applied Science and Technology 9(4): 76-91. Google Scholar. Haznedar, B. et alii. 2023. Implementing PointNet for point cloud segmentation in the heritage context. Heritage Science 11: 2. Google Scholar. Hong, W. 2024. Anchoring the Sacred in the Mundane: Construction of the Great Buddha Bend in Baodingshan, Dazu. Archives of Asian Art 74(2): 129-151. Google Scholar. Hou, M. et alii. 2014. 3D Laser Scanning Modeling and Application on Dazu Thousand-hand Bodhisattva in China. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 40: 81-85. Google Scholar. Hua, W. et alii. 2019. Discrimination of Cultural Relics Similarity Based on Phash Algorithm and Sift Operator. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 42: 571-575. Google Scholar. Hua, W. et alii. 2021. Similarity Index Based Approach for Identifying Similar Grotto Statues to Support Virtual Restoration. Remote Sensing 13(6): 1201. Google Scholar. Khotsathian, S. et alii. 2022. Convolution Neural Networks Backbone model for Citrus Leaf Disease Detection. In 19th International Joint Conference on Computer Science and Software Engineering (Bangkok, Thailand), pp. 1-5. Google Scholar. Li, Y. et alii. 2025. A cross-cultural comparative study of Buddhist monumental art: the Borobudur Temple Complex (Indonesia) and the Dazu Rock Carvings (China). Cogent Arts & Humanities 12(1): 2576550. Google Scholar. Liu, X. et alii. 2022. Extracting hierarchical features of cultural variation using network-based clustering. Evolutionary Human Sciences 4: e18. Google Scholar. Llamas, J. et alii. 2017. Classification of Architectural Heritage Images Using Deep Learning Techniques. Applied Sciences 7(10): 992. Google Scholar. Murtiyoso, A.; P. Grussenmeyer. 2019. Automatic Heritage Building Point Cloud Segmentation and Classification Using Geometrical Rules. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 42: 821-827. Google Scholar. Rani, R. et alii. 2025. Attention-enhanced corn disease diagnosis using few-shot learning and VGG16. MethodsX 14: 103172. Google Scholar. Remondino, F. 2011. Heritage recording and 3D modeling with photogrammetry and 3D scanning. Remote Sensing 3(6): 1104-1138. Google Scholar. Shakhovska, N.; P. Pukach. 2022. Comparative Analysis of Backbone Networks for Deep Knee MRI Classification Models. Big Data and Cognitive Computing 6(3): 69. Google Scholar. Sun, Y. et alii. 2025. Manipulating Perceptual Hashing Based Image Retrieval System. IEEE Signal Processing Letters 32: 4134-4138. Google Scholar. Tang, M. et alii. 2020. Clustering Analysis Method of Ethnic Cultural Resources Based on Deep Neural Network Model. In Machine Learning for Cyber Security, pp. 160-170. Springer. Google Scholar. Venkatesh, U. et alii. 2019. An Analog Design of 2D DCT Processor. In TENCON 2019-2019 IEEE Region 10 Conference (Kochi, India), pp. 1606-1610. Google Scholar. Wang, S. et alii. 2025. Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site. Buildings 15(11): 1843. Google Scholar. Wu, W. et alii. 2024. See SIFT in a Rain. IEEE Transactions on Circuits and Systems for Video Technology 34(5): 3700-3713. Google Scholar. Yue, D. et alii. 2025. NeRFOrtho: Orthographic Projection Images Generation based on Neural Radiance Fields. International Journal of Applied Earth Observation and Geoinformation 136: 104378. Google Scholar. Zhou, J. 2025. The Establishment of Religious Landscapes and Local Social Life in Nanshan and Beishan, Dazu District, in the Song Dynasty. Religions 16(3): 355. Google Scholar.
© 2026 ARQUEOLOGIA IBEROAMERICANA. ISSN 1989-4104. CC BY 4.0 License.
Open Access Journal. Edited & Published by Pascual Izquierdo [P. I. Egea]. Graus & Gargallo, Aragon, Spain. W3C HTML 4.01 compatible. |