Sci Rep. 2026 Feb 18;16(1):9724. doi: 10.1038/s41598-026-40117-1. ABSTRACT Accurate prediction of athlete performance is a challenges issue of significance in sports science and analytics and has application in training design, injury prevention, and talent management. Conventio…
Sci Rep. 2026 Feb 18;16(1):9724. doi: 10.1038/s41598-026-40117-1.
ABSTRACT
Accurate prediction of athlete performance is a challenges issue of significance in sports science and analytics and has application in training design, injury prevention, and talent management. Conventional statistical models usually cannot represent nonlinearities that are highly intricate in terms of physiological, lifestyle, and contextual characteristics. The main aim of the study is the prediction of performance (predictive validity/generalization) of the performance scores of the athletes based on the tabular data. An interpretability is a secondary objective that is tackled through SHAP-based explanations, and computational efficiency is considered as the practical aspect of the selected model class; deployment feasibility is only presented as a possible application, not as a proven real-world application. The main objective is to offer a dependable platform of determining the main determinants of athlete performance and data-driven decision making in sports coaching. Kaggle Athlete Performance Prediction Dataset was used, which included demographic, training, physiological, and/or lifestyle features. Imputation, normalization, encoding, and feature engineering then followed, and finally, partitioning into training, validation, and test sets. Gradient Regression Model was trained with tenfold cross-validation and compared against various baseline which included linear regression, Ridge Regression, Support Vector Regression (SVR), Random Forest, and Neural Networks. Measures of evaluation wereR2, RMSE, and MAE. The proposed model had an R² of 0.923, which was higher than baselines, which included Neural Networks (R² = 0.901) and Random Forest (R² = 0.887). The presence of residual and error analysis ensured that bias and variance were reduced to a minimum, whereas learning dynamics showed that convergence and stability were achieved efficiently. The Gradient Regression Model is better at predicting and is also more interpretable, with applications that are applicable to individualized training and performance monitoring. This study is presented as an application and evaluation work with a publicly available dataset, with its contribution being a transparent end-to-end pipeline, strict validation, and analysis that relates predictive factors to the performance of athletes, as opposed to a new learning algorithm. The future research will cover larger and longitudinal studies along with the hybrid frameworks that incorporate both biomechanical and psychological aspects.
PMID:41708773 | PMC:PMC13013846 | DOI:10.1038/s41598-026-40117-1