Integrated physics-machine learning for real-time urban photovoltaic mapping: Coupling local climate zones with 3D building models
Xingkang Chai, Jiayu Chen, Chunying Li, Pengyuan Shen, Yuqin Wang, Yang Wan, Siyuan Chen, Haida Tang
2025
Sustainable Cities and Society

Fig. 1. Workflow of this study.
Summary
This study proposes an integrated physics-machine learning framework for hourly urban photovoltaic (PV) mapping by coupling Local Climate Zones (LCZ) with 3D building models. Using the XGBoost algorithm on data from Shenzhen, the model achieved an $R^2$ of 0.99 for predicting both roof and facade PV generation. SHAP analysis identified key meteorological and morphological drivers, revealing that combined roof and facade PV systems could meet 92.86% of Shenzhen's annual electricity consumption, providing a transferable tool for urban energy planning.
Abstract
Adopting building-integrated photovoltaics in cities can alleviate energy shortages. Real-time analysis of climate and urban morphology impacts improves the accuracy of BIPV energy yield predictions for roofs and facades. This study simulated BIPV power generation in 85 realistic local climate zones in Shenzhen, China, including 10,549 buildings. Using the XGBoost algorithm and the Shapley interpretability method, the importance of meteorological parameters and urban morphology on roof and facade PV power generation was evaluated. In the generalization performance test, XGBoost achieved R² values of 0.99 in predicting roof and facade PV power generation within a 100-meter resolution grid, with RMSE values of 7.5 kW and 13.3 kW, respectively, demonstrating excellent performance. Global horizontal irradiance, diffuse horizontal irradiance, and total building floor area were found to have the most significant impact on roof PV power generation, accounting for 89.8 % of its variation. For the facade PV, global horizontal irradiance, direct radiation, diffuse radiation, average height, total building surface area, total building area, sky view factor, total building floor area of the computation model and height standard deviation of the shading area were the most influential factors, collectively accounting for 87.5 % of its variation. Finally, based on Shenzhen's building data and high temporal and spatial resolution NSRDB data, the hourly urban building PV potential of roofs and facades in Shenzhen was assessed. When PV systems are installed on roofs, facades, and both combined, the annual power generation could reach 45,256.1 GWh, 39,919.2 GWh, and 85,175.3 GWh, respectively, accounting for 49.34 %, 43.52 %, and 92.86 % of annual urban electricity consumption. This study ultimately offers a transferable methodological framework for dynamic, high-precision mapping of urban-scale PV potential. The framework, following localization and validation, can be applied to other cities with available 3D building data, thereby providing quantitative recommendations for policymakers and urban planners in developing BIPV cities with high energy resilience and sustainability.
1. Introduction
Cities, as epicenters of global economic activity, face severe challenges related to energy consumption and carbon emissions. Cities occupy only of the Earth’s land area but consume of global energy and emit carbon dioxide (United Nations Environment Programme, 2011). Adopting renewable energy in cities is the key solution to address the challenge of urban energy shortages (Li, Zhang &
Liu, 2022). Photovoltaics (PV) have unique advantages compared to other renewable energy. PV is an important method of harvesting solar energy. The sun provides a vast amount of energy. Only a small percentage of solar energy is needed to meet human energy demands (Hu et al., 2015). However, solar energy resources are not as energy-dense as traditional fossil fuels. Therefore, large installation areas are needed to collect sufficient energy (Liu, Liu, Jiang & Zhang, 2022). The adoption of building-integrated photovoltaics (BIPV) in cities has attracted global
attention (Pepermans, Driesen & Haeseldonckx, 2005). Studies have shown that of Europe’s electricity demand can be met by installing PV systems on of existing building roofs and of building facades (Ghosh, 2020). Therefore, adopting BIPV in cities can help alleviate urban energy shortages.
Current studies on urban-scale BIPV power generation mainly focus on two aspects. The first is the use of satellite imagery combined with image recognition technology. This approach evaluates the roof area in cities to estimate roof BIPV power generation potential. Zhong et al. and Lee et al. applied deep learning and satellite image recognition to study building roof in specific regions. They assessed the potential for roof BIPV power generation in these areas (Zhong, Zhang & Chen, 2021), (Lee, Iyengar & Feng, 2019). Zhang et al. used a random forest model to extract roof areas from satellite images in 354 cities in China. They evaluated the carbon reduction potential of urban PV roof (Zhang, Chen & Zhong, 2023). However, there are still limitations in using remote sensing technology to assess urban BIPV potential. First, satellite images cannot identify building facades. With the development of flexible PV modules and the gradual improvement of PV efficiency, facade PV systems are also advancing. BIPV on urban facades is becoming increasingly economically viable (Liu, Shen & Wang, 2023). Additionally, urban facades often provide more available surface area due to the limited roof area in cities. Installing PV systems on facades can generate more electricity (Tao, Wang & Xiang, 2024). Second, satellite images cannot effectively represent the shading relationships between buildings. Building shading is a critical factor that affects the power generation potential of facade BIPV systems.
Urban BIPV potential is also assessed through establishing realistic 3D urban models and integrating building morphology indicators. Liu et al. identified nine residential building morphologies and proposed urban energy-saving design strategies using Ladybug and multiobjective optimization plugins (Liu, Xu & Zhang, 2023), (Liu, Xu & Huang, 2023). Xu et al. extracted seven industrial building morphologies and evaluated their radiation, installation, and technical potential using Ladybug (Xu, Jiang & Xiong, 2021). Xie et al. analyzed five types of university dormitory morphologies, simulating BIPV potential and building energy consumption with Ladybug, to explore the influence of building morphology indicators on BIPV potential and energy use (M Xie, Wang & Zhong, 2023). Zhang et al. studied six urban block morphologies and simulated BIPV potential and energy consumption using Ladybug (Zhang, Xu & Shabunko, 2019). However, these studies are limited to simulations at the district scale, making it difficult to apply them to the evaluation of roof and facade PV power generation at the urban scale. For shading models, they often rely on idealized shading scenarios. Another approach uses parametric methods to generate simulated models and shading models for BIPV generation simulations. Tian et al. used Rhino-Grasshopper parametric software to generate numerous building and shading models, assessing BIPV potential at the building scale (Tian & Ooka, 2025), (Tian & Ooka, 2024). Tang et al. constructed representative Local Climate Zone (LCZ) areas based on LCZ indicator ranges and simulated BIPV generation using idealized shading buildings (Tang, Chai & Chen, 2025). Nevertheless, due to their reliance on parametric methods to generate simulated buildings and shading models these studies fail to capture the complexity of real urban environments fully.
Combining Local Climate Zone (LCZ) classification with real buildings for BIPV power generation simulation can resolve the aforementioned issues effectively. LCZ classification describes various urban characteristics, including building height and urban morphology (Stewart & Oke, 2012), (Cao, Liao & Li, 2023), (Yan, Ma & He, 2022). Creating 3D models of buildings within LCZs makes it possible to evaluate BIPV power generation potential through simulation accurately (Li et al., 2025, September), (An, Chen & Shi, 2023), (Machete, Falcao ˜ & Gomes, 2018). However, due to the large scale of urban areas, assessing urban-scale BIPV power generation using physical models faces significant challenges. Machine learning methods can be coupled with 3D
urban building models to estimate urban-scale BIPV power generation (Chen, Tu & Yu, 2024). Existing studies have explored the integration of LCZs with urban BIPV power generation. Kaleshwarwar et al. used LCZ to assess regional BIPV potential. However, the shading effect of surrounding buildings was not considered in evaluation of BIPV power generation at the city scale (Kaleshwarwar & Bahadure, 2023). Chen et al. applied LCZ classification to select representative LCZ areas, created 3D models, and assessed the roof and facade solar radiation potential of different LCZs in Shenzhen (Machete, Falcao ˜ & Gomes, 2018). However, this study did not investigate the relationship between urban morphology and roof, facade solar radiation potential, nor did it account for shading from buildings outside the selected areas. Chen et al. compared roof types in 26 global cities and used LCZ classification to identify LCZ built-up areas with higher solar radiation potential (Chen & Gou, 2024). Nevertheless, this study did not evaluate solar radiation potential at the urban scale and analyze the impact of urban morphology on solar radiation potential. Due to the temporal mismatch between BIPV power generation and building energy consumption, building energy resilience has been proposed (Chen, He & Li, 2024). It highlights the need for more detailed time-scale and urban-scale BIPV power generation predictions (Luo, Peng & Cao, 2022), (Cai & Gou, 2024), (Tang, Wang & Li, 2025). However, existing studies primarily focus on the simulation and predicting annual BIPV power generation. They lack simulations and predictions at an hourly scale, which are essential for capturing the temporal variability of BIPV power generation and its integration into urban energy systems.
Existing studies explore the relationship between urban morphology and roof BIPV power generation. The studies emphasize the relationship between building morphology and roof BIPV power generation. However, most of these studies rely on constructing prototypes and simulating ideal shading scenarios without the complexity of real urban environments. Most studies use typical meteorological year (TMY) data for predicting urban-scale BIPV power generation. Nevertheless, this approach cannot fully reflect the deviations in BIPV power generation caused by varying meteorological conditions in urban areas. In addition, as energy strategies become more refined, predicting hourly BIPV power generation at the urban scale is becoming increasingly important. To address these gaps, this study proposes a novel methodology. First, based on real urban 3D buildings, Shenzhen’s built-up LCZ types were classified, and randomly selected as computation models. GIS software was then used to identify shading areas around these computation models. Both computation models and shading areas were modeled in Rhino-Grasshopper to simulate hourly roof and facade BIPV power generation. Urban morphology in computation models and shading areas were quantified, and meteorological parameters (8760 h) were incorporated into the XGBoost model for training. The SHAP method explained the relationships between variables and the model. Finally, hourly-scale, 2-km resolution data from the National Solar Radiation Database (NSRDB) was used to evaluate Shenzhen’s hourly roof and facade BIPV power generation. The innovations of this study are as follows:
(1) A Novel Framework for Dynamic, High-Resolution Prediction. This study proposes a methodology to shift the paradigm from traditional static, long-term (e.g., annual or monthly) assessments to dynamic, hourly-scale predictions of city-level PV generation. This high temporal resolution is crucial for effective integration with modern energy systems, such as smart grids and V2G networks.
(2) High-Fidelity Urban Environmental Modeling. Our approach introduces a modeling process grounded in real-world urban morphology. Unlike studies using idealized or parametric prototypes, this research systematically incorporates the shading effects from the complex surrounding urban fabric, thereby capturing a more realistic and intricate physical environment for PV simulation.
(3) An Integrated Physics-Informed Machine Learning Methodology. We propose a hybrid methodology that synergistically combines physics-based simulations (derived from high-fidelity 3D models) with a powerful machine learning algorithm (XGBoost). This integrated framework leverages the descriptive accuracy of physical modeling with the predictive efficiency and power of machine learning to estimate urban-scale PV potential.
(4) Interpretable Modeling for Urban Design Guidance. The study moves beyond "black-box" prediction by employing the SHAP (SHapley Additive exPlanations) method to deconstruct the
model’s logic. This allows for a systematic analysis of the complex and non-linear relationships between key urban morphological indicators, meteorological variables, and hourly PV generation, offering nuanced and data-driven insights for urban planning.
While this study uses Shenzhen as a case, its primary contribution is a transferable methodological framework. By leveraging the globally standardized Local Climate Zone (LCZ) classification (Stewart & Oke, 2012), a modular workflow, and increasingly accessible global datasets like the NSRDB (Rodríguez-P´erez & Bajorath, 2020), this framework

Fig. 1. Workflow of this study.
provides a clear pathway for broader regional application. It is important to note, however, that applying this framework to a new city requires localization and further validation. This involves adapting the model with local 3D building data and meteorological conditions, and subsequently retraining the machine learning model to ensure predictive accuracy in the new context.
The structure of this paper is as follows: Section 2 introduces the workflow of the study, the division of LCZ in Shenzhen, the selection of real LCZ areas and shading areas, the machine learning model, and the NSRDB data. Section 3 discusses the simulation of BIPV power generation in computation models and the SHAP analysis. Section 4 summarizes the impact of meteorological parameters and urban morphology on roof and facade BIPV power generation. It also provides limitations and future work. Section 5 concludes the main findings.
2. Methodology and materials
2.1. Workflow
Fig. 1 illustrates the workflow of this study. The assessment of urbanscale BIPV power generation potential using the LCZ classification method involves the following steps:
- Firstly, a 100-meter resolution grid of Shenzhen City was created. Based on Shenzhen’s building data, ArcGIS Pro 3.1.0 was used to calculate the main parameter indicators of the built-up areas. A distribution map of Shenzhen’s built-up LCZ areas was then generated based on the calculated parameters.
- For each LCZ type, 10 real models were randomly selected as calculation models. Additionally, a 100-meter buffer zone around each calculation model was selected as the shadow area. Both the calculation models and the shading areas were imported into Rhino-Grasshopper to build 3D models. The Ladybug tool was used to calculate hourly solar radiation on roofs and facades, which was then converted into hourly roof and facade PV power generation.
- The building morphology indicators of both the computation models and shading areas were calculated. The urban morphology parameters of both the computation models and shading areas, along with the h of meteorological data, were collectively treated as independent variables, while the total BIPV power generation of the computation models were taken as the dependent variable to construct a comprehensive dataset.
- Among the selected computation models, one computation model was randomly chosen for each LCZ type to validate the generalization capability of the model. The remaining computation models were included in the training set for model training. Three models, Multiple Linear Regression (MLR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) were used for training. Based on the evaluation metrics of the training data, testing data, and generalization ability, XGBoost demonstrates the best performance among the three models. Meanwhile, the SHAP method is used for interpretability analysis.
- Based on all 100-meter grids in Shenzhen, the morphological indicators of the calculation model and shading model, as well as the NSRDB 2 km hourly meteorological data, were used. The BIPV power generation for h in Shenzhen was estimated using the trained XGBoost model.
2.2. BIPV power calculation of selected LCZ models
Shenzhen locates in the Pearl River Delta region, and has a subtropical monsoon climate, with an average annual temperature of C and precipitation of 1933 mm (Li, Yu & Jiang, 2019). Since the 1970s, the city has been designated as a special economic zone, and was developed with a rapid urbanization process (Hao, Sliuzas & Geertman, 2011), (Liu, He & Wu, 2010). In 2022, the urbanization rate of Shenzhen
reached . Shenzhen was chosen as the study area because it has abundant solar energy resources and a significant demand for cooling energy.
LCZ is considered for defining urban landscapes, offering a comprehensive classification strategy considering land cover and physical features (Demuzere, Kittner & Bechtel, 2021). Notably, LCZ is universal for any city, allowing for the division of a city into combinations of different LCZs (Hashemi, Mills & Poerschke, 2024). And also, each LCZ type has its specific range of parameters that can be used for physically based modelling (Ching, Aliaga & Mills, 2019), (Demuzere, Hankey & Mills, 2020). As shown in Fig. 2, LCZ have 10 categories for building types. These areas have consistent surface cover, structure, materials, and human activities. Based on Shenzhen’s boundary data, this study used ArcGIS Pro 3.1.0 to create a 100-meter grid dataset for Shenzhen as the basic analysis unit. Using Shenzhen’s building data, the average building height, building surface fraction, and sky view factor within each grid were calculated to classify Shenzhen’s Local Climate Zones. After completing the LCZ classification (Fig. 3), the "Random Selection Within Subset" tool in QGIS was used to randomly select 10 models for each LCZ type. Due to the absence of LCZ7 in the classified areas and LCZ3 only containing 5 models, 85 computation models, comprising 1778 buildings, were selected as computation models. Subsequently, ArcGIS Pro 3.1.0 was used to create a 100-meter radius buffer zone for each computation model. These buffer zones were intersected with Shenzhen’s building vector data to obtain 85 shading areas comprising 8771 buildings.
This study evaluated the BIPV power generation potential of different LCZs while considering the factors that limit BIPV power generation. According to previous research (Li et al., 2025, September), solar energy potential is primarily considered from three aspects: radiation potential, installation potential, and technical potential. Radiation potentialrefers to the distribution of solar radiation on building surfaces. Local solar radiation resources and the morphology of urban blocks influence it. Fig. 4 visualizes roof and facade solar radiation simulations. Installation potential refers to the distribution of solar radiation on PV panels installed on building surfaces. It is affected by the PV installation coefficient. Technical potential refers to the overall power generation efficiency of the PV system, which is primarily influenced by the pho toelectric conversion rate and system efficiency coefficient (Campbell, Aschenbrenner & Blunden, 2008).
This study imported 85 computation models and shading area into Rhino and distinguished between roofs and facades. Then, using the Ladybug plugin on the Grasshopper platform, the hourly solar radiation potential of roofs and facades were calculated separately. This tool can calculate the solar radiation on different building surfaces while taking into account the obstructions caused by shading area. The accuracy of solar radiation simulation using Ladybug has been validated in many studies. The simulation settings are shown in Table 1.
Where is the PV power generation, kW; is the cumulative solar radiation on the building surface, kW; is the area where PV modules can be installed, ; K is the comprehensive efficiency factor, which is set at (Kumar & Ku mar 2017); is the PV module efficiency, which is set at (M Xie, Wang & Zhong, 2023).
2.3. Data process
Table 2 shows the urban morphology indicators calculated in this study. In previous research, these indicators have been identified as being related to roof and facade PV power generation (Machete, Falc˜ao & Gomes, 2018). Meteorological parameters were sourced from Shenzhen’s TMY data (https://climate.onebuilding. org/WMO_Region_2_Asia/CHN_China/index.html#IDGD_Guangdong-). From the TMY data, 8760 h of temperature (T), dew point temperature (Td), relative humidity (RHU), wind speed (V), global horizontal
Fig. 2. Abridged definitions for LCZ built types (Stewart & Oke, 2012).
irradiance (GHI), direct normal irradiance (DNI), diffuse horizontal irradiance (DHI), and sky cover (SC) were extracted. Urban morphology indicators for both computation models and shading areas were calculated, and a PV power generation dataset was constructed using the building morphology parameters and meteorological parameters of the computation models and shading areas. To distinguish between the urban morphology indicators of computation models and shading areas, the suffix "_sm" was added to the indicators of computation models. The suffix "_sd" was added to the indicators of shading areas. For example, "TH_sm" represents the morphology indicator of a computation model, while "TH_sd" represents the morphology indicator of a shading area. Spearman correlation analysis was used for feature selection, and
multicollinearity analysis was conducted to avoid overfitting in the model. Variables with a variance inflation factor greater than 10 were removed. Additionally, all independent variables in the model were required to pass a significance test ) (Wheeler & Tiefelsdorf, 2005). Appendix 1 presents the correlation matrix between the independent variables and BIPV power generation. After preprocessing the data, one computation model was randomly selected for each LCZ type, resulting in 9 computation models and 78,840 rows of data for testing the model’s generalization ability. From the remaining 76 computation models (665,760 rows of data), of the total sample (532,608 rows) was used as the training dataset. The remaining (133,152 rows) was used for model validation and comparison.

Fig. 3. Shenzhen 100-m grid LCZ classification.

a.Roof solar radiation simulation visualization

b.Facade solar radiation simulation visualization
Fig. 4. Visualization of roof and facade solar radiation simulation of computation models.
Table 1 Settings of simulation parameters.
The PV power generation potential was calculated with Eq (1).
2.4. Model
This study used MLR, RF, and XGBoost models to quantify building morphology and meteorological parameters’ impact on roof and facade PV power generation. MLR, a widely used algorithm in BIPV power generation studies (Oukawa, Krecl & Targino, 2022), evaluates the linear relationship between independent and dependent variables. However, due to the nonlinear relationships between building morphology, meteorological parameters, and BIPV power generation, RF and XGBoost were also employed to model these complex interactions. The SHAP (SHapley Additive explanations) method was applied to interpret the models further, providing insights into the contributions of building morphology and meteorological parameters to BIPV power generation.
Assuming there are n variables in the MLR model, denoted as the regression model can be expressed as shown in Eq. (2):
In the Equation, y represents the dependent variable value for individual denotes the total coefficient of the intercept, β , β , …, represent the total coefficients of the slopes, and is the random error term.
Random Forest is a powerful machine learning technique that combines multiple decision trees to predict values, offering robustness against overfitting and improving accuracy by averaging predictions from individual trees (Guo, Wu & Schlink, 2021). The Random Forest model was trained in Python, with the optimal hyperparameters determined using grid search and 10-fold cross-validation. During training, the optimal parameter values for rooftop and facade PV power generation were: max_depth , max_features , min_samples_split , and n_estimators .
Extreme Gradient Boosting (XGBoost) is an advanced boosting algorithm developed by improving Gradient Boosting Decision Trees (GBDT). XGBoost is designed to achieve maximum speed and efficiency (Zamani Joharestani, Cao & Ni, 2019). The XGBoost algorithm, composed of multiple regression trees, integrates homogeneous weak learners to create a more powerful learner (Chen & Guestrin, 2016). This can be represented by Eq. (3):
In the equations, represents this study’s predicted BIPV power
Table 2 Urban Morphology Indicators and Calculation Methods.
Note: is the height of the building i; is the number of buildings within the unit block; is the surface area of building i; is the base area of building i; is the number of building floors;.
is the perimeter of building i; is the area of unit block; RW is the width of road i. CB is the perimeter of an individual building. CN is the total perimeter of a computation model.
generation, denotes the explanatory variables, and M is the number of trees. refers to the CART tree constructed to reduce the residuals of the tree. The XGBoost model was trained with the optimal hyperparameters determined using grid search and 10-fold crossvalidation. The optimal parameter values for rooftop and facade PV power generation are: max_depth , learning_rate , n_estimators , colsample_bytree , and subsample .
To evaluate the model’s performance, the Coefficient of Determination , Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Coefficient of Variation of the Root Mean Square Error (CVRMSE) were used as metrics. Eqs. (4), (5, 6), and (7) represent the calculation formulas for , RMSE, and MAE, respectively.
2.5. SHAP model
To explain the XGBoost model, SHAP values are used to visualize the relative importance of each variable, known as the additive feature attribution method (Han, Zhao & Gao, 2022). The most significant advantage of SHAP values is their ability to represent the influence of features on each sample. The model generates a prediction value for every prediction sample, and SHAP values are assigned to each feature in the sample, indicating both positive and negative influences. The SHAP package provides functionalities such as feature importance ranking for all samples, where the X-axis represents SHAP values (absolute values, regardless of positive or negative), and feature density scatter plots, where each point represents a sample. In these scatter plots, the X-axis shows SHAP values (negative SHAP values indicate negative influence, and positive SHAP values indicate positive influence). At the same time, the Y-axis represents the magnitude of the feature variable (Parsa, Movahedi & Taghipour, 2020), (García & Aznarte, 2020), (Rodríguez-P´erez & Bajorath, 2020). This method effectively addresses the issue of multicollinearity and considers the synergistic effects of different variables. The function of SHAP values for tree-based models is described as follows.
where is the SHAP value of the feature i, N is the set of all features for the training set, the dimension is M; S is a permutation subset of N, the dimension is is the average predicted value of samples using the feature set is the average predicted value of samples with feature i using the feature set S , |S|!(M− |S|− 1)! is the weight of the difference between samples with feature i and without feature i using the feature set s.
2.6. Evaluation of shenzhen’s BIPV potential based on nsrdb data
The NSRDB provides continuous and comprehensive solar and meteorological data, including the three most common solar radiation measurement methods: GHI, DNI, and DHI. These data are collected in the United States and an increasing number of international locations, with high temporal (10-minute) and spatial (2-km) resolution, accurately representing global and regional solar radiation climates (Sengupta, Xie & Lopez, 2018). The current NSRDB uses the National Renewable Energy Laboratory’s Physical Solar Model (PSM). PSM is a two-step physical modeling process: In the first step, cloud and aerosol properties are acquired, collected, and resampled, and in the subsequent step, these properties are input into a radiative transfer model. This model includes the Fast All-Sky Radiation Model for Solar Applications and the FARMS-NIT (Narrowband Irradiance on Tilted Surfaces) for tilted surface irradiance (Y Xie, Sengupta & Dooraghi, 2018), (Xie, Sengupta & Wang, 2019), (Y Xie, Sengupta & Dooraghi, 2018). In this study, meteorological parameters for 2018, with a temporal resolution of 60 min and a spatial resolution of , were downloaded from the NSRDB website. These data were combined with the building morphology indicators of 92,785 simulated shading areas in Shenzhen. This integration enabled the accurate estimation of hourly roof and facade PV power generation at the urban scale.
3. Results
3.1. Roof and facade BIPV power generation of different LCZ types
Fig. 5a shows the cumulative roof PV power generation for different

a.RoofPV

b.Facade PV
Fig. 5. Roof and facade PV power generation for different LCZ types.
LCZ types across months. The roof PV power generation for each LCZ type has been averaged. From the figure, it can be seen that LCZ3 has the highest roof PV power generation. LCZ3 is primarily composed of dense low-rise buildings, which results in a large amount of roof area, thus, higher roof PV power generation. The LCZ type with the lowest roof PV power generation is LCZ9. LCZ9 mainly consists of sparsely arranged small- to medium-sized buildings. Due to the smaller number of buildings and limited roof area, its roof PV power generation is relatively low. Compact building types (LCZ1, LCZ2, LCZ3) have the highest roof PV power generation, while sparse building types (LCZ4, LCZ5, LCZ6) have lower roof PV power generation. The primary reason for this phenomenon is that compact building types have a more significant amount of roof area due to their higher density, contributing to higher BIPV power generation.
Fig. 5b shows the cumulative facade PV power generation for different LCZ types across various months. The facade PV power generation for each LCZ type has been averaged. From the figure, it can be observed that LCZ1 has the highest facade PV power generation. LCZ1 is primarily composed of dense high-rise buildings. Due to the high building density and height, LCZ1 has a large facade area. Although shading is significant in these areas, the large facade area compensates for this, generating higher facade PV power. The LCZ type with the lowest facade PV power generation is LCZ9. LCZ9 mainly consists of sparsely arranged small- to medium-sized buildings. Due to the smaller number of buildings and limited facade area, its facade PV power generation is relatively low. The taller the buildings, the higher the facade PV power generation. The primary reason for this phenomenon is that taller buildings have larger facade areas, which leads to higher facade
PV power generation.
3.2. Model comparisons
Table 3 illustrates the performance of roof PV power generation in the MLR, RF, and XGBoost models across the training data, testing data, and generalization evaluation. From the Table3, it can be observed that XGBoost performs the best in roof PV training. Regardless of the training set, testing data or generalization evaluation, XGBoost achieves an of 0.99. Additionally, its MAE, RMSE and CVRMSE remain consistently low across all evaluations, demonstrating the excellent performance of the XGBoost model. Although the RF model achieves an of 0.98 in the training set, testing data, and generalization evaluation, its MAE, RMSE and CVRMSE are higher than those of the XGBoost model. The MLR model shows the most significant deviation, with an of 0.86 across the training set, testing data, and generalization evaluation. Furthermore, its MAE, RMSE and CVRMSE are higher than those of the RF and XGBoost models. Fig. 6 and appendix 2 compares the predicted results and actual values for different models. The MLR model exhibits greater fitting differences in training, test, and generalization evaluation. In contrast, the linear fit lines of the XGBoost and RF models align more closely with the 1:1 line, particularly for the XGBoost model, where the predicted values for both the training and testing data are distributed on both sides of the 1:1 line.
Table 4 illustrates the performance of facade PV power generation in the MLR, RF, and XGBoost models across the training data, testing data, and generalization evaluation. From Table 4, it can be observed that XGBoost performs the best. Regardless of the training data, testing data, or generalization evaluation, XGBoost achieves an of 0.99. Additionally, its MAE, RMSE and CVRMSE remain consistently low across all evaluations, demonstrating the excellent performance of the XGBoost model. The RF model achieves an of 0.94 in the training and testing data, and an of 0.95 in the generalization evaluation. However, its MAE, RMSE and CVRMSE are higher than those of the XGBoost model across all evaluations. The MLR model shows the poorest fitting performance, with an of 0.71 in the training and testing data, and an of 0.75 in the generalization evaluation. Moreover, its MAE, RMSE and CVRMSE are higher than those of both the RF and XGBoost models across all evaluations. A comparison of the predicted results and actual values for different models is shown in Fig. 7 and appendix 3. The MLR model exhibits greater fitting differences in the training data, testing data, and generalization evaluation. In contrast, the linear fit lines of the XGBoost and RF models align more closely with the 1:1 line, particularly for the XGBoost model, where the predicted values for both the training and test sets are distributed on both sides of the 1:1 line.
Fig. 8 illustrates the hourly differences between MLR, RF, XGBoost, and actual values over a week. It is clear that for both roof and facade PV power generation, XGBoost shows the smallest differences compared to the actual values. XGBoost achieves the best performance in predicting roof and facade PV power generation across all evaluation metrics, with values of 0.995 and 0.998, RMSE values of 4.9 kW and 6.6 kW, and MAE values of 2.4 kW and , respectively. RF follows with values of 0.938 and 0.982, RMSE values of 18.2 kW and 27.1 kW, and MAE values of and . MLR demonstrates the poorest performance.
3.3. Contribution of impact factors to urban PV
The summary plot of meteorological parameters, computation models, and shading area characteristics for roof PV power generation is shown in Fig. 9. The left side of Fig. 9 displays the global importance ranking of each factor on roof PV power generation, ordered from the most significant to the least. The right side of Fig. 9 provides a local explanation of the changes in roof PV power generation influenced by each factor. The visualization illustrates the SHAP values and their directions, where red and blue points represent high and low feature values, respectively. The figure shows that the total contributions of meteorological parameters, computation models and shading areas characteristics are , , and , respectively. This indicates that meteorological parameters significantly impact roof PV power generation. GHI has the most significant influence among the meteorological parameters, positively correlating with roof PV power generation. As GHI increases, roof PV power generation rises significantly. The computation models characteristic of TBFA also shows a positive correlation with roof PV power generation. Similarly, the meteorological parameter DHI positively correlates with roof PV power generation.
The summary plot of meteorological parameters, computation models, and shading area characteristics for facade PV power generation is shown in Fig. 10. The left side of Fig. 10 displays the global importance ranking of each factor on facade PV power generation, ordered from the most significant to the least. The right side of Fig. 10 provides a local explanation of the changes in facade PV power generation influenced by each factor. The visualization illustrates the SHAP values and their directions, where red and blue points represent high and low feature values, respectively. The figure shows that the total contributions of meteorological parameters, computation models, and shading area characteristics are , , and , respectively. Similar to roof PV, meteorological parameters have the most significant impact on facade PV power generation. However, the influence of computation models, and shading area characteristics on facade PV power generation is greater than on roof PV power generation. Among the meteorological parameters, GHI, DNI, and DHI have the most significant impact on facade PV power generation. These three meteorological variables positively correlate with facade PV power generation, indicating that as GHI, DNI, and DHI increase, facade PV power generation also increases. For computation models characteristics, AH, TBSA, and TBA are positively correlated with facade PV power generation. In contrast, SVF is negatively correlated. For shading area characteristics, HSD positively correlates with facade PV power generation.
3.4. Nonlinear correlation analysis of impact factors
To study the nonlinear relationships between roof PV power generation and its influencing factors, scatter plots were created for GHI, TBFA, and DHI, which together account for of the variation in roof PV power generation, indicating their crucial role. As shown in Fig. 11, GHI is positively correlated with roof PV power generation, meaning that roof PV power generation increases significantly as GHI increases. Similarly, TBFA from computation models positively correlates with roof PV power generation, with larger TBFA values leading to higher energy output. DHI also shows a positive correlation overall, but when DHI exceeds approximately , its impact on roof PV
Table 3 shows the parameter metrics of MLR, RF, and XGBoost models in the evaluation of roof PV power generation across the training data, testing data, and generalization ability evaluation.

a.XGBoost training data

b. XGBoost testing data

c. XGBoost generalization ability
evaluation
Fig. 6. The Prediction Results of MLR, RF, and XGBoost Models for Roof PV Power Generation.
Table 4 shows the parameter metrics of MLR, RF, and XGBoost models in the evaluation of facade PV power generation across the training data, testing data, and generalization ability evaluation.

a.XGBoost training data

b.XGBoost testing data

c. XGBoost generalization ability
evaluation
Fig. 7. The Prediction Results of MLR, RF, and XGBoost Models for facade PV Power generation.
power generation gradually diminishes.
To study the nonlinear relationships between facade PV power generation and its influencing factors, this study selected nine key indicators: GHI, DNI, DHI from meteorological parameters, AH, TBSA, TBA, SVF, TBFA from computation models morphology, and HSD from shading area morphology. These nine indicators account for of the variation in facade PV power generation, indicating their critical importance. As shown in Fig. 12, GHI is positively correlated with facade PV power generation when below , but its effect stabilizes when GHI exceeds this value. DNI maintains a positive correlation with facade PV power generation, with higher DNI resulting in greater energy output. Similarly, DHI is positively correlated; however, its influence is
not significant below , becoming more apparent when DHI exceeds this value. Among the representative area building morphology indicators, AH positively correlates with facade PV power generation, but its growth impact diminishes when AH exceeds TBSA exhibits a similar trend, with a positive impact below , after which its influence stabilizes. TBA generally shows a positive correlation, but this relationship becomes steady when TBA is below , showing no significant growth. TBFA demonstrates a stable relationship with facade PV power generation below but turns inversely correlated when exceeding this value. SVF displays an inverse correlation with facade PV power generation, but this relationship stabilizes when SVF ranges between 0.6 and 0.75. For the shading area building morphology indicator,


a.Roof
b.Facade
Fig. 8. Comparison of roof and facade hourly BIPV power generation prediction.
HSD shows a positive correlation with facade PV power generation when greater than However, its impact remains stable when HSD is below
3.5. Calculation of roof and facade PV power generation in shenzhen
Based on NSRDB high temporal and spatial resolution data (1 h and 2 km), Fig. 13 (c - h) visualized roof and facade PV power generation data for 10:00, 13:00, and 16:00 on June 22, 2018. Spatially, BIPV power generation in Shenzhen shows significant regional differences, with the majority concentrated in the northwestern and southern areas of the city. Fig. 13a and 13b show that these regions have a higher density of buildings, providing more roof and facade areas, which leads to higher PV power generation. Temporally, PV power generation in Shenzhen also varies significantly across different times of the day. At 10:00, roof and facade PV power generation peaked at 17.14 GWh and 10.31 GWh, respectively. At 13:00, roof and facade PV power generation decreased to 5.01 GWh and 4.72 GWh, respectively. By 16:00, PV power generation was at its lowest, with roof and facade generation at only 1.46 GWh and 1.53 GWh, respectively.
This study analyzes the relationship between BIPV power generation and electricity consumption data on monthly and annual scales based on 2018 electricity consumption data (Yan, Huang & Ren, 2024). First, a 1 km grid for Shenzhen was created as the basic analysis unit. The electricity consumption and BIPV power generation data for January, July, and the entire year of 2018 were selected to explore the proportion of roof, facade, and combined roof and facade PV power generation in electricity consumption. Specifically, in the visualization, the PV power generation of roofs, facades, and their combination was divided by the
electricity consumption data of the 1 km grid, resulting in the percentage of BIPV power generation relative to electricity consumption, which was used to calculate the PV consumption rate (Fig. 14). The roof and facade PV power generation data were calculated hourly based on NSRDB data, and the spatial join tool in GIS was used to summarize the roof, facade, and combined PV power generation for January, July, and the entire year. Overall, there is significant spatial and temporal inequality in BIPV consumption rates. If only roof PV were considered, the areas where roof PV power generation can meet electricity consumption in January would be mainly concentrated in the northwest, south, and northeast of Shenzhen. In July, while the areas remain limited to these regions, more grids can meet electricity demand. When facade PV is considered, in January, most areas in the northwest, south, and northeast of Shenzhen can meet electricity consumption through facade PV. However, the self-sufficiency rate of facade PV decreased in July due to increased electricity consumption. From an annual perspective, if only roof PV is used, most areas in Shenzhen’s northwest, south, and northeast can meet energy demand. If only facade PV were used, it would mainly meet the demand in the southern part of Shenzhen. However, if roof and facade PV are combined, electricity demand can be well met in January, July, and throughout the year. Overall, if only roof PV is considered, it can meet , , and of electricity consumption in January, July, and the entire year, respectively. If only facade PV is considered, it can meet , , and of electricity consumption in January, July, and the entire year, respectively. If roof and facade PV are combined, they can meet 86.68 , , and of electricity consumption in January, July, and the entire year.

Fig. 9. Importance ranking diagram of feature variables and density scatter diagram of feature variables for hourly roof BIPV power generation.
4. Discussion
This study introduced an integrated physics-machine learning framework to map urban BIPV generation at an hourly scale. The exceptionally high accuracy achieved for both roof and facade predictions) is not merely a statistical achievement but a direct outcome of our methodological innovations. Unlike previous models that reported lower accuracies—such as Chen et al. of 0.85 and 0.70), Tang et al. of 0.93), and Tao et al. of 0.696)—our approach succeeded through a synergistic combination of critical elements. First, we conducted high-fidelity physical simulations based on real 3D building models instead of idealized prototypes. Second, we systematically included the surrounding shading area’s morphology, a factor often overlooked. And third, we used the XGBoost algorithm, which is adept at capturing complex, non-linear interactions that simpler models like linear regression or even standard Random Forest might miss. This section critically discusses the novel insights derived from this robust model and their implications for urban energy planning.
4.1. Novel insights into the drivers of BIPV generation
A key contribution of this work is the deconstruction of how meteorological and morphological factors influence BIPV generation, revealing several novel, non-linear relationships that challenge or refine previous understanding.
For roofs, our findings confirm that Global Horizontal Irradiance (GHI) is the dominant driver, and that Total Building Floor Area (TBFA_sm) serves as a strong proxy for generation capacity, which aligns with previous research (Song, Cao & Yang, 2023 ; Lee, Lee & Lee, 2016). However, our model provides a more nuanced insight into the role of Diffuse Horizontal Irradiance (DHI). While DHI shows a positive correlation, its impact exhibits a clear non-linear saturation effect,
diminishing significantly beyond approximately (Fig. 11b). This quantifiable threshold suggests that at high levels of diffuse radiation, unfavorable incident angles limit further gains in power generation, a phenomenon that simpler linear models often fail to capture (Pan, Bai & Chang, 2022). Furthermore, our analysis quantitatively confirms that the impact of surrounding building shading on roofs is minimal (contributing only of the variation, Fig. 9), providing a strong evidence base for prioritizing building-intrinsic factors in rooftop PV planning.
For facades, our study offers the most significant new insights by moving beyond generalized correlations to identify specific, previously unquantified thresholds and non-linearities. For instance, while prior studies correctly identified building height as positively correlated with facade potential (Brito, Redweik & Catita, 2019 ; Chatzipoulka, Compagnon & Nikolopoulou, 2016), our model reveals a critical performance plateau for Average Height (AH_sm) around (Fig. 12d). Beyond this height, the gains from increased facade area are progressively negated by inter-building shading, yielding diminishing returns and providing a novel, quantifiable guideline for urban planners. Our analysis also refines the conventional understanding of the Sky View Factor (SVF_sm), which is typically held to be inversely correlated with facade generation (Mirkovic & Alawadi, 2017 ; Heng, Malone-Lee & Zhang, 2017 ; Arboit, Diblasi & Llano, 2008). We identified a range of relative insensitivity between 0.6 and 0.75 (Fig. 12g), where changes in SVF have a minimal impact, suggesting a "sweet spot" in urban form where planners can achieve balance. Critically, our study is one of the first to systematically quantify the impact of the surrounding urban fabric’s morphology. The Height Standard Deviation of the shading area (HSD_sd) was identified as a key factor whose negative impact becomes pronounced only when it exceeds (Fig. 12i). This reveals that height uniformity in the surrounding area is highly beneficial for a target building’s facade PV generation. This insight transcends single-building

Fig. 10. Importance ranking diagram of feature variables and density scatter diagram of feature variables for hourly facade BIPV power generation.

a. GHI

b.DHI

c.TBFA_sm
Fig. 11. Roof BIPV power generation SHAP dependence plots for variables in the XGBoost model. (Note: Red indicates high SHAP values, while blue indicates low SHAP values.).
analysis and underscores the necessity of considering the broader neighborhood context in 3D, a key advantage of our methodology.
4.2. Implications to urban energy planning
The granular and non-linear insights from our model translate into more sophisticated and data-driven urban planning strategies than previously possible, moving beyond generic recommendations.
For roof PV, planning remains relatively straightforward: strategies should prioritize maximizing installable area (TBFA) in zones with high solar radiation (GHI), as our model confirms these linear drivers are dominant. For facade PV, however, our findings provide a basis for more nuanced zoning and design regulations. The discovery of a 40-meter threshold for Average Height (AH_sm) suggests that policies
promoting endlessly tall, slender buildings for facade PV may be inefficient. Instead, urban design codes could encourage mid-rise typologies (up to where the balance between facade area gain and interbuilding shading is optimized. Similarly, the identification of an SVF_sm insensitivity range (0.6–0.75) provides planners with greater design flexibility in moderately dense urban blocks. Most critically, the influence of the Height Standard Deviation of the shading area (HSD_sd) provides a quantitative rationale for context-sensitive zoning, shifting planning from a building-by-building approach to a more effective neighborhood-scale energy-morphology optimization.
An hourly-scale model with this level of detail is a critical enabler for advanced urban energy systems. By accurately capturing spatiotemporal variations, planners can address the "temporal mismatch" between PV generation and real-time urban electricity demand, thereby achieving

a.GHI

b.DNI

c.DHI

d.AH_sm

e.TBSA_sm

f.TBA_sm

g.SVF_sm

h.TBFA_sm

i.HSD_sd
Fig. 12. Facade BIPV power generation SHAP dependence plots for variables in the XGBoost model. (Note: Red indicates high SHAP values, while blue indicates low SHAP values.)
supply-demand alignment. This high-resolution mapping is not only crucial for strategically positioning energy storage stations but also provides the foundation for creating effective Vehicle-to-Grid (V2G) charging strategies, allowing electric vehicle charging needs to be synchronized with solar energy availability. Moreover, the model allows for the precise integration of BIPV systems with more refined urban energy management strategies, such as optimizing energy scheduling and balancing grid loads. Furthermore, coupling this model with large-scale climate models can enhance its predictive capabilities, enabling cities to forecast hourly BIPV generation under future climate conditions. This is vital for developing long-term energy strategies and optimizing the deployment of renewable resources. Ultimately, by providing a robust predictive foundation for the seamless integration of renewable energy, energy storage systems, and V2G technologies, our framework supports the transition toward smart, resilient, and self-sufficient urban energy networks that can minimize transmission losses, enhance local energy utilization, and achieve carbon neutrality.
4.3. Scalability and application challenges
The methodology proposed in this study possesses high scalability and can be applied to cities other than Shenzhen. Its core advantage is that the entire framework is based on the Local Climate Zone (LCZ), a globally universal standard for urban morphology classification. As stated previously, LCZ offers a comprehensive classification strategy applicable to any city, allowing for the division of a city into various LCZ combinations. This means that researchers can leverage this standardized framework to conduct comparable PV potential assessments across different cities.
However, several challenges may arise when extendin g this method to other cities. The first is data availability. The model’s accuracy is highly dependent on high-quality, city-wide 3D building vector data and high-spatiotemporal resolution meteorological data. Yet, not all cities possess such detailed and easily accessible public datasets. The second challenge is the requirement for significant computational resources. This study involved modeling over 10,000 buildings and training a

a. The roof area

b.The facade area

c.Roof PV power generation at 10:00 on June 22,2018

d.Facade PV power generation at 10:00 on June 22,2018

e.RoofPV power generation at 13:00 on June 22,2018


f.Facade PV power generation at13:00 on June 22,2018
g.RoofPV power generationat16:00 on June 22,2018

h.Facade PV power generation at 16:00 on June 22,2018
Fig. 13. Hourly scale urban roof and facade PV power distribution predicted based on NSRDB data.
machine learning model on a massive dataset, which demands powerful computational support. The final challenge is local calibration. Although the LCZ framework is universal, the specific relationship between urban morphology and PV potential may exhibit regional variations due to unique architectural styles, materials, and local climatic features in different cities. Therefore, when applying the pre-trained model directly
to a new city, a degree of calibration or retraining with local data may be necessary to ensure optimal prediction accuracy.
Despite these challenges, with the advancement of global urban digitalization and open data initiatives, we believe this method provides a robust and feasible technical pathway for conducting standardized, high-precision PV potential assessments in diverse urban contexts

a.The percentage of roof PV power

generation relative to electricity
consumption in January 2018.
d.The percentage of facade PV power

generation relative to electricity
consumption in January 2018.
g.The percentage of roof and facade PV
power generation relative to electricity
consumption in January 2018.

b.The percentage of roof PV power

generation relative to electricity
consumption in July 2018.
e.The percentage of facade PV power

generation relative to electricity
consumption in July 2018.
h.The percentage of roof and facade PV
power generation relative to electricity
consumption in July 2018.

c.The percentage of roof PV power

generation relative to electricity
consumption in whole year of 2018.
f.The percentage of facade PV power
generation relative to electricity

consumption in whole year of 2018.
i.The percentage of roof and facade PV
power generation relative to electricity
consumption in whole year of 2018.
Fig. 14. The proportion of roof, facade, and combined PV power generation relative to electricity consumption at different times.
worldwide.
4.4. Limitations and future works
There are some limitations in this study that need to be addressed in future studies.
-
Although this study constructed the XGBoost model and performed SHAP interpretability analysis using a real-world model, the SHAP analysis was limited to the training set within the XGBoost model. Therefore, the results of the SHAP analysis have certain limitations.
-
This study did not consider the building envelope when assessing the BIPV power generation potential at the urban scale. Different BIPV products (such as PV glass and PV walls) have varying PV efficiencies, leading to inaccuracies when evaluating BIPV power generation at the urban scale (Xu, Chen & Ren, 2025).The study also did not account for the efficiency differences of PV modules on facade surfaces with different orientations, which may significantly affect the annual distribution of power generation. Future studies should refine the components of building envelopes to provide more accurate evaluations of urban BIPV power generation.
-
This study did not fully account for the impact of different building types and structures on the installation and performance of BIPV systems. Complex structures such as sloped roofs and historical buildings impose specific constraints on BIPV deployment, which were not reflected in the current model. Future research should integrate building categorization databases and incorporate structural attributes into the modeling process to enable multidimensional predictions of BIPV suitability.
-
This model cannot predict the roof and facade photovoltaic PV power generation for individual buildings. However, in practical applications, stakeholders such as property developers and building managers are more concerned with the power generation performance of individual buildings, for which the current model does not provide granular predictions. Therefore, future research should focus on achieving cross-scale predictions to estimate PV power generation for both individual buildings and building groups.
-
This study assumes that all eligible surface areas are fully equipped with BIPV systems, without considering real-world deployment constraints such as investment capacity, policy limitations, and the cost of equipment access. Therefore, future research should develop an integrated decision-making model that incorporates economic feasibility, policy incentives, and user acceptance, in order to
improve the practicality and applicability of urban BIPV power generation potential assessments.
- This study is limited to the assessment of photovoltaic power generation potential in Shenzhen and has not yet been extended to other global regions. Although the LCZ classification framework is theoretically universal and applicable to cities worldwide, and the integration of LCZ with 3D building models and machine learning shows good potential for transferability, the methodology still needs to be validated in different global contexts. In future work, we will carry out cross-regional and multi-climate zone extensions in several representative cities to test, improve, and ultimately verify the adaptability and effectiveness of the proposed method on a global scale.
- Although this study integrated multiple data sources from 2018 to ensure temporal consistency and improve the accuracy of model training, the data used are relatively outdated. With ongoing changes in urban morphology and energy systems, data from 2018 may not accurately reflect current conditions, which could affect the timeliness and generalizability of the model. Therefore, future studies should incorporate more recent datasets to enhance the model’s practical relevance and predictive capability.
5. Conclusions
This study, based on LCZ classification, utilized XGBoost to construct prediction models for hourly roof and facade PV power generation, demonstrating excellent performance. Additionally, the study employed the XGBoost model and SHAP interpretability to investigate the impact of hourly meteorological parameters, computational models, and shading area parameters on roof and facade PV power generation. High temporal and spatial resolution NSRDB data were also used to evaluate Shenzhen’s hourly roof and facade PV power generation. The main conclusions are as follows:
- For roof PV power generation, LCZ3 has the highest roof PV power generation, while LCZ9 has the lowest. For facade PV power generation, LCZ1 has the highest facade PV power generation, while LCZ9 has the lowest.
- Compared to MLR and RF, XGBoost demonstrated the best performance in predicting roof and facade PV power generation. For roof PV power generation, XGBoost achieved an of 0.99 across the testing data, and generalization performance evaluation, with MAE values of , and , RMSE values of , and 7.5 kW,
and CVRMSE values of , and , respectively. For facade PV power generation, XGBoost also achieved an of 0.99 across all evaluations, with MAE values of , and 6.1 kW, RMSE values of , and , and CVRMSE values of , and , respectively.
- The main factors influencing roof PV power generation are GHI, DHI, and TBFA, which account for of the variation in roof PV power generation. For facade PV power generation, in addition to GHI, DNI, and DHI, the computation models morphology indicators AH, TBSA, TBA, SVF, and TBFA, as well as the shading area morphology indicator HSD, have the most significant influence. These nine indicators account for of the variation in facade PV power generation.
- Based on NSRDB data, this study evaluated the potential for hourly roof and facade PV power generation in Shenzhen. Additionally, the study assessed BIPV power generation in January, July, and 2018. Roof PV systems could generate 2442.9 GWh, 4886.9 GWh, and 45,256.1 GWh in January, July, and the entire year, meeting 38.29 , , and of electricity consumption, respectively. Facade PV systems could generate 3087.7 GWh, 3553.4 GWh, and 39,919.2 GWh in January, July, and the entire year, meeting 48.39 , , and of electricity consumption, respectively.
CRediT authorship contribution statement
Xingkang Chai: Writing – review & editing, Writing – original draft, Methodology, Conceptualization. Jiayu Chen: Writing – original draft, Visualization. Chunying Li: Writing – review & editing, Validation. Pengyuan Shen: Writing – review & editing, Validation. Yuqin Wang: Visualization. Yang Wan: Writing – review & editing, Visualization. Siyuan Chen: Visualization. Haida Tang: Methodology, Formal analysis, Conceptualization.
Declaration of competing interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Integrated Physics-Machine Learning for Real-Time Urban Photovoltaic Mapping: Coupling Local Climate Zones with 3D Building Models”.
Appendix
Appendix 1, Appendix 2, Appendix 3


a.The correlation matrix between roof PV power generation and meteorological parameters

b.The correlation matrix between roof PV power generation and the urban morphology of computation models
c.The correlation matrix between roof PV power generation and the urban morphology of shading areas.


d.Thecorrelationmatrix between facade PV power generation and meteorological parameters

e.The correlation matrix between facadePV power generation and the urban morphology of computation models
f.The correlation matrix between facade PV power generation and the urban morphology of shading areas.
Appendix 1. Correlation matrix of meteorological parameters, computation models morphology, and shading area morphology with roof and façade PV power generation (Note: .

a.MLR training data

b.MLR testing data

c.MLR generalization ability evaluation

d.RF training data

e.RF testing data

f.RF generalization ability evaluation
Appendix 2. The Prediction Results of MLR, RF Models for Roof PV Power Generation.

a.MLR trainingdata

b.MLR testing data

c.MLR generalization ability evaluation

d.RF training data

e.RF testing data

f.RF generalization ability evaluation
Appendix 3. The Prediction Results of MLR, RF Models for facade PV Power generation.
Data availability
Data will be made available on request.
References
An, Y., Chen, T., Shi, L., et al. (2023). Solar energy potential using GIS-based urban residential environmental data: A case study of Shenzhen, China. Sustainable Cities and Society, 93, Article 104547.
Arboit, M., Diblasi, A., Llano, J. C. F., et al. (2008). Assessing the solar potential of lowdensity urban environments in Andean cities with desert climates: The case of the city of Mendoza, in Argentina. Renewable Energy, 33(8), 1733–1748.
Bohner, ¨ J., & Antoni´c, O. (2009). Land-surface parameters specific to topo-climatology. Developments in soil science, 33, 195–226.
Brito, M. C., Redweik, P., Catita, C., et al. (2019). 3D solar potential in the urban environment: A case study in lisbon. Energies, 12(18), 3457.
Cai, S., & Gou, Z. (2024). Defining the energy role of buildings as flexumers: A review of definitions, technologies, and applications. Energy and Buildings, 303, Article 113821.
Campbell, M., Aschenbrenner, P., Blunden, J., et al. (2008). The drivers of the levelized cost of electricity for utility-scale photovoltaics. White paper: SunPower corporation.
Cao, R., Liao, C., Li, Q., et al. (2023). Integrating satellite and street-level images for local climate zone mapping. International Journal of Applied Earth Observation and Geoinformation, 119, Article 103323.
Chatzipoulka, C., Compagnon, R., & Nikolopoulou, M. (2016). Urban geometry and solar availability on façades and ground of real urban forms: Using London as a case study. Solar Energy, 138, 53–66.
Chen, S., & Gou, Z. (2024). City-roof coupling: Unveiling the spatial configuration and correlations of green roofs and solar roofs in 26 global cities. Cities (London, England), 147, Article 104780.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system [C]. //. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).
Chen, W., He, Y., Li, N., et al. (2024). A smart platform (BEVPro) for modeling, evaluating, and optimizing community microgrid integrated with buildings,
distributed renewable energy, electricity storage, and electric vehicles. Journal of Building Engineering, 87, Article 109077.
Chen, X., Tu, W., Yu, J., et al. (2024). LCZ-based city-wide solar radiation potential analysis by coupling physical modeling, machine learning, and 3D buildings. Computers, Environment and Urban Systems, 113, Article 102176.
Ching, J., Aliaga, D., Mills, G., et al. (2019). Pathway using WUDAPT’s digital synthetic city tool towards generating urban canopy parameters for multi-scale urban atmospheric modeling [J], 28. Urban Climate, Article 100459.
Demuzere, M., Hankey, S., Mills, G., et al. (2020). Combining expert and crowd-sourced training data to map urban form and functions for the continental US. Scientific data, 7(1), 264.
Demuzere, M., Kittner, J., & Bechtel, B. (2021). LCZ Generator: A web application to create local Climate Zone maps. Frontiers in Environmental Science, 9, Article 637455.
García, M. V., & Aznarte, J. L. (2020). Shapley additive explanations for NO2 forecasting. Ecological Informatics, 56, Article 101039.
Ghosh, A. (2020). Potential of building integrated and attached/applied PV (BIPV/ BAPV) for adaptive less energy-hungry building’s skin: A comprehensive review. Journal of Cleaner Production, 276, Article 123343.
Guo, F., Wu, Q., & Schlink, U. (2021). 3D building configuration as the driver of diurnal and nocturnal land surface temperatures: Application in Beijing’s old city. Building and Environment, 206, Article 108354.
H¨antzschel, J., Goldberg, V., & Bernhofer, C. (2005). GIS-based regionalisation of radiation, temperature and coupling measures in complex terrain for low mountain ranges. Meteorological Applications, 12(1), 33–42.
Han, L., Zhao, J., Gao, Y., et al. (2022). Prediction and evaluation of spatial distributions of ozone and urban heat island using a machine learning modified land use regression method [J], 78. Sustainable Cities and Society, Article 103643.
urban villages in Shenzhen. Habitat International, 35(2), 214–224.
Hashemi, F., Mills, G., Poerschke, U., et al. (2024). A novel parametric workflow for simulating urban heat island effects on residential building energy use: Coupling local climate zones with the urban weather generator a case study of seven us cities [J], 110. Sustainable Cities and Society, Article 105568.
Heng, C. K., Malone-Lee, L. C., & Zhang, J. (2017). Relationship between density, urban form and environmental performance [M]. // Growing Compact. Routledge, 297–312.
Hu, A., Levis, S., Meehl Gerald, A., Han, W., Washington Warren, M., Oleson Keith, W., et al. (2015). Impact of solar panels on global climate. Nat Clim Change, 6, 290–294.
Kaleshwarwar, A., & Bahadure, S. (2023). Assessment of the solar energy potential of diverse urban built forms in Nagpur, India. Sustainable Cities and Society, 96, Article 104681.
Kumar, M., & Kumar, A. (2017). Performance assessment and degradation analysis of solar photovoltaic technologies: A review. Renewable and Sustainable Energy Reviews, 78, 554–587.
Lee, S., Iyengar, S., Feng, M., et al. (2019). Deeproof: A data-driven approach for solar potential estimation using rooftop imagery [C]. //. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2105–2113).
Lee, K. S., Lee, J. W., & Lee, J. S (2016). Feasibility study on the relation between housing density and solar accessibility and potential uses. Renewable energy, 85, 749–758.
Li, C., Wan, W., Huang, G., Chai, X., Li, C., & Tang, H. (2025, September). Thermal and electrical performance assessment of a bifacial photovoltaic green facade based on CFD simulation. In Building simulation, 18 pp. 2227–2249). Tsinghua University Press.
Li, Q., Yu, Y., Jiang, X., et al. (2019). Multifactor-based environmental risk assessment for sustainable land-use planning in Shenzhen, China. Science of the Total Environment, 657, 1051–1063.
Li, H., Zhang, J., Liu, X., et al. (2022). Comparative investigation of energy-saving potential and technical economy of rooftop RC and PV systems. Applied Energy, 328, Article 120181.
Liu, Y., He, S., Wu, F., et al. (2010). Urban villages under China’s rapid urbanization: Unregulated assets and transitional neighbourhoods. Habitat international, 34(2), 135–144.
Liu, X., Liu, X., Jiang, Y., & Zhang, T. (2022). PEDF (photovoltaics, energy storage, direct current, flexibility), a power distribution system of buildings for grid decarbonization: Definition, technology review, and application. CSEE J Power Energy Syst. Submitted to journal.
Liu, X., Shen, C., Wang, J., et al. (2023). Static and dynamic regulations of photovoltaic double skin facades towards building sustainability: A review. Renewable and Sustainable Energy Reviews, 183, Article 113458.
Liu, K., Xu, X., Huang, W., et al. (2023). A multi-objective optimization framework for designing urban block forms considering daylight, energy consumption, and photovoltaic energy potential. Building and Environment, 242, Article 110585.
Liu, K., Xu, X., Zhang, R., et al. (2023). Impact of urban form on building energy consumption and solar energy potential: A case study of residential blocks in Jianhu. China [J]. Energy and Buildings, 280, Article 112727.
Luo, Z., Peng, J., Cao, J., et al. (2022). Demand flexibility of residential buildings: Definitions, flexible loads, and quantification methods. Engineering, 16, 123–140.
Machete, R., Falc˜ao, A. P., Gomes, M. G., et al. (2018). The use of 3D GIS to analyse the influence of urban context on buildings’ solar energy potential. Energy and Buildings, 177, 290–302.
Mirkovic, M., & Alawadi, K. (2017). The effect of urban density on energy consumption and solar gains: The study of Abu Dhabi’s neighborhood. Energy Procedia, 143, 277–282.
Oukawa, G. Y., Krecl, P., & Targino, A. C (2022). Fine-scale modeling of the urban heat island: A comparison of multiple linear regression and random forest approaches. Science of the total environment, 815, Article 152836.
Pan, D., Bai, Y., Chang, M., et al. (2022). The technical and economic potential of urban rooftop photovoltaic systems for power generation in Guangzhou, China. Energy and Buildings, 277, Article 112591.
Parsa, A. B., Movahedi, A., Taghipour, H., et al. (2020). Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accident Analysis & Prevention, 136, Article 105405.
Pepermans, G., Driesen, J., Haeseldonckx, D., et al. (2005). Distributed generation: Definition, benefits and issues. Energy policy, 33(6), 787–798.
Rodríguez-P´erez, R., & Bajorath, J. (2020). Interpretation of machine learning models using shapley values: Application to compound potency and multi-target activity predictions. Journal of computer-aided molecular design, 34(10), 1013–1026.
Sengupta, M., Xie, Y., Lopez, A., et al. (2018). The national solar radiation data base (NSRDB). Renewable and sustainable energy reviews, 89, 51–60.
Song, Z., Cao, S., & Yang, H. (2023). Assessment of solar radiation resource and photovoltaic power potential across China based on optimized interpretable machine learning model and GIS-based approaches. Applied energy, 339, Article 121005.
Stewart, I. D., & Oke, T. R (2012). Local climate zones for urban temperature studies. Bulletin of the American Meteorological Society, 93(12), 1879–1900.
Tang, H., Chai, X., Chen, J., et al. (2025). Assessment of BIPV power generation potential at the city scale based on local climate zones: Combining physical simulation, machine learning and 3D building models. Renewable Energy, 244, Article 122688.
Tang, H., Wang, Y., & Li, C. (2025). Energy-flexibility strategy for residential blocks with multiple morphologies based on energy, economy, and carbon reduction performance. Building and Environment, 268, Article 112333.
Tao, L., Wang, M., & Xiang, C. (2024). Assessing urban morphology’s impact on solar potential of high-rise facades in hong kong using machine learning: An application for fipv optimization, 117. Sustainable Cities and Society, Article 105978.
Tian, J., & Ooka, R. (2024). Evaluation of solar energy potential for residential buildings in urban environments based on a parametric approach. Sustainable Cities and Society, 106, Article 105350.
Tian, J., & Ooka, R. (2025). Prediction of building-scale solar energy potential in urban environment based on parametric modelling and machine learning algorithms. Sustainable Cities and Society, 119, Article 106057.
United Nations Environment Programme. (2011). Green economy: Cities investing in energy and resource efficiency. https://wedocs.unep.org/20.500.11822/7979.
Wheeler, D., & Tiefelsdorf, M. (2005). Multicollinearity and correlation among local regression coefficients in geographically weighted regression. Journal of Geographical Systems, 7(2), 161–187.
Xie, Y., Sengupta, M., & Dooraghi, M. (2018a). Assessment of uncertainty in the numerical simulation of solar irradiance over inclined PV panels: New algorithms using measurements and modeling tools. Solar Energy, 165, 55–64.
Xie, Y., Sengupta, M., & Dooraghi, M. (2018b). Assessment of uncertainty in the numerical simulation of solar irradiance over inclined PV panels: New algorithms using measurements and modeling tools. Solar Energy, 165, 55–64.
Xie, Y., Sengupta, M., & Wang, C. (2019). A fast all-sky radiation model for solar applications with narrowband irradiances on tilted surfaces (FARMS-NIT): Part II. The cloudy-sky model. Solar Energy, 188, 799–812.
Xie, M., Wang, M., Zhong, H., et al. (2023a). The impact of urban morphology on the building energy consumption and solar energy generation potential of university dormitory blocks., 96. Sustainable Cities and Society, Article 104644.
Xie, M., Wang, M., Zhong, H., et al. (2023b). The impact of urban morphology on the building energy consumption and solar energy generation potential of university dormitory blocks [J], 96. Sustainable Cities and Society, Article 104644.
Xu, C., Chen, S., Ren, H., et al. (2025). A novel deep learning and GIS integrated method for accurate city-scale assessment of building facade solar energy potential. Applied Energy, 387, Article 125600.
Xu, S., Jiang, H., Xiong, F., et al. (2021). Evaluation for block-scale solar energy potential of industrial block and optimization of application strategies: A case study of Wuhan, China. Sustainable Cities and Society, 72, Article 103000.
Yan, X., Huang, Z., Ren, S., et al. (2024). Monthly electricity consumption data at 1 km× 1 km grid for 280 cities in China from 2012 to 2019. Scientific Data, 11(1), 877.
Yan, Z., Ma, L., He, W., et al. (2022). Comparing object-based and pixel-based methods for local climate zones mapping with multi-source data. Remote Sensing, 14(15), 3744.
Zamani Joharestani, M., Cao, C., Ni, X., et al. (2019). PM2. 5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere, 10(7), 373.
Zhang, Z., Chen, M., Zhong, T., et al. (2023). Carbon mitigation potential afforded by rooftop photovoltaic in China. Nature Communications, 14(1), 2347.
Zhang, J., Xu, L., Shabunko, V., et al. (2019). Impact of urban block typology on building solar potential and energy use efficiency in tropical high-density city. Applied Energy, 240, 513–533.
Zhong, T., Zhang, Z., Chen, M., et al. (2021). A city-scale estimation of rooftop solar photovoltaic potential based on deep learning. Applied Energy, 298, Article 117132.

Fig. 1. Workflow of this study.
Publication Details
Journal
Sustainable Cities and Society
Publication Year
2025
Authors
Xingkang Chai, Jiayu Chen, Chunying Li, Pengyuan Shen, Yuqin Wang, Yang Wan, Siyuan Chen, Haida Tang
Categories
Urban climate and building adaptation strategies