An Efficient Hybrid LASSO–WOA (hLWOA) Feature Selection Model for High-Accuracy Thyroid Disease Classification
Department of Computing & Informatics, Sir Padampat Singhania University, Udaipur, Rajasthan, 313601, India
Abstract
Thyroid disease is a common endocrine disorder which, if not diagnosed and treated properly and in time, can result in major health issues. Our Study introduces a new Hybrid LASSO followed by Whale Optimization Algorithm (hLWOA) method to achieve efficient feature selection and thyroid disease prediction. The raw thyroid dataset was preprocessed with Z-score normalization to boost the accuracy of the classification from 99.20% to 99.47%, demonstrating the relevance of data preprocessing. The dataset was very imbalanced so the SMOTE was utilized to ensure that each class has 3,480 samples in total. Next, LASSO reduced the feature space from 25 to 9 features, followed by WOA optimizing the subset to only 4 characteristics that are most instructive, achieving a total dimensionality reduction of around 80.95%. This optimized feature subset was tested using several classifiers including Random Forest (RF), XGBoost (XGB), CatBoost (CB), SVM-Linear, and SVM-Polynomial in a train-test split and cross-validation (70%-30% and 5-fold respectively) setup. The suggested model hLWOA was demonstrated by the experimental findings in maximum classification accuracy of 99.29%, while the pre, re, f1_sc and spec were 99.61%, 98.95%, 99.28% and 99.62%, respectively. That was also demonstrated by comparative analysis hLWOA showed competitive predictive performance with the smallest number of features (4) in contrast to the greater quantity of features, demonstrating its efficiency and interpretability. To confirm the critical role of the selected feature in thyroid disease prediction, an ablation study was carried out where removal of the feature TSH caused the maximum degradation in performance. In addition, the classifier validation was achieved using a Friedman statistical test p value of 0.0173, which proved the classifiers to be statistically different in the evaluated classifiers. CB (2.1) was the best performing classifier, followed by Random Forest (2.3) and XGB (2.6). The results of the acquired findings confirm that the proposed hLWOA framework is able to properly address the feature selection and prediction accuracy trade-off, yielding a powerful and reliable approach in the context of intelligent diagnosis of thyroid diseases.
Keywords
Graphical Abstract

Novelty Statement
This study introduces a new Hybrid LASSO followed by Whale Optimization Algorithm (hLWOA) method to achieve efficient feature selection and thyroid disease prediction.

