Research papers of the week – April 29, 2024

An Explainable Deep Learning Classifier of Bovine Mastitis Based on Whole-Genome Sequence Data—Circumventing the p >> n Problem

Krzysztof Kotlarz; Magda Mielczarek; Przemyslaw Biecek; Katarzyna Wojdak-Maksymiec; Tomasz Suchocki; Piotr Topolski; Wojciech Jagusiak; Joanna Szyda
International Journal of Molecular Sciences

Ministerial score = 140.0
Journal Impact Factor (2023) = 5.6 (Q1)

international_journal_od_molecular_sciences.jpgThe serious drawback underlying the biological annotation of whole-genome sequence data is the p >> n problem, which means that the number of polymorphic variants (p) is much larger than the number of available phenotypic records (n). We propose a way to circumvent the problem by combining a LASSO logistic regression with deep learning to classify cows as susceptible or resistant to mastitis, based on single nucleotide polymorphism (SNP) genotypes. Among several architectures, the one with 204,642 SNPs was selected as the best. This architecture was composed of two layers with, respectively, 7 and 46 units per layer implementing respective drop-out rates of 0.210 and 0.358. The classification of the test data resulted in AUC = 0.750, accuracy = 0.650, sensitivity = 0.600, and specificity = 0.700. Significant SNPs were selected based on the SHapley Additive exPlanation (SHAP). As a final result, one GO term related to the biological process and thirteen GO terms related to molecular function were significantly enriched in the gene set that corresponded to the significant SNPs. Our findings revealed that the optimal approach can correctly predict susceptibility or resistance status for approximately 65% of cows. Genes marked by the most significant SNPs are related to the immune response and protein synthesis.

DOI:10.3390/ijms25094715

 

READ THE PAPER UPWr Base

magnacarta-logo.jpg eua-logo.png hr_logo.png logo.png eugreen_logo_simple.jpg iroica-logo.png bic_logo.png