Machine Learning–Based Fault Diagnosis in Solar Photovoltaic Systems Using Data Balancing Techniques
Main Article Content
Abstract
As solar energy adoption continues to rise, the demand for reliable photovoltaic (PV) systems has increased significantly. Ensuring the efficient and secure operation of PV systems requires accurate fault detection, making fault diagnosis a critical research area. This study investigates the diagnosis of short-circuit faults in
PV systems by integrating machine learning algorithms with data balancing techniques. Four classifiers (Random Forest, CatBoost, Extreme Gradient Boosting,
and Light Gradient Boosting Machine (LGBM)) were employed for fault classification, while Synthetic Minority Oversampling Technique (SMOTE), Random
Oversampling, and Adaptive Synthetic Sampling were used to address class imbalance. Two datasets were analyzed: Dataset-1 with 11 features and Dataset-2
with 13 features. For Dataset-1, LGBM achieved the highest accuracy (79.28%) on the imbalanced data, which improved to 86.59% after applying SMOTE. By
incorporating two additional features in Dataset-2, fault diagnosis accuracy increased to 98.57% on the imbalanced data and reached 100% when balanced with
SMOTE. These findings demonstrate that combining LGBM with SMOTE significantly enhances short-circuit fault detection performance in PV systems.
Cite this article as: D. Machine learning–based fault diagnosis in solar photovoltaic systems using data balancing techniques. Turk J Electr Power Energy
Syst. Published online October 20, 2025. doi: 10.5152/tepes.2025.25028.
