Newswise — A research team created seven learning models using Support Vector Machine (SVM) algorithms to discern flowering-time-associated genes (FTAGs) from non-FTAGs, with the SVM-Kmer-PC-PseAAC model performing the best (F1 score = 0.934, accuracy = 0.939, and receiver operating characterstic = 0.943). They created 'FTAGs_Find', a plant FTAGs prediction tool, identifying 318,521 FTAGs from 81 species protein datasets. Notably, Ostreococcus lucimarinus, a non-flowering plant, only 208 FTAGs were predicted, indicating extensive FTAG loss. They constructed a FTAG database (FTAGdb), facilitating user access to the FTAG prediction tool and the FTAG datasets. Plans involve expanding FTGD (Flowering-time Gene Database) with more datasets and exploring other machine learning (ML) methods, enhancing resources for breeders and researchers in the flowering-time community.

Flowering marks a pivotal shift from vegetative to reproductive phases in higher plants, impacting crop yield and overall plant fitness. While substantial progress has been made in understanding flowering mechanisms, identifying FTAGs remains challenging. Current methods rely on costly, time-consuming and labor-intensive wet-lab experiments or resource-intensive omics technologies. Existing bioinformatics tools like BLAST+ lack comprehensive information for accurate gene recognition. In response, ML emerges as a promising solution, yet no ML model exists for FTAGs' protein sequences.

study (DOI: 10.48130/tp-0024-0007) published in Tropical Plants on 03 April 2024, develops an ML model for precisely identifying proteins encoded by FTAGs, enhancing research efficiency in flowering-time studies.

To construct the SVM classification model for predicting FTAGs, 628 positive and 8,163 negative protein sequences underwent data preprocessing. The dataset was divided into training and test sets, 80% dataset was used to construct the SVM prediction model, while 20% formed the test set for evaluating the prediction model.. Seven types of features were employed to train the SVM prediction model, including ACC, Kmer, PC-PseAAC, Kmer-ACC, ACC-PC-PseAAC, Kmer-PC-PseAAC, and ACC-Kmer-PC-PseAAC, and optimized using a grid search on kernel, gamma, and cost parameters. Among the models, SVM-Kmer-PC-PseAAC demonstrated superior performance. Subsequently, a local Python tool, 'FTAGs_Find', was developed based on this model, enabling proteome-wide identification of FTAGs. The tool identified 318,521 FTAGs from 2,873,697 protein sequences across 81 species. Notably, species like Sphagnum fallax exhibited significant FTAG expansion, while non-flowering plants like Ostreococcus lucimarinus showed minimal FTAG presence. Further, GO enrichment analysis in Brassica rapa revealed FTAG involvement in various flower development processes. Additionally, the constructed prediction model demonstrated an 88% recognition rate for flowering-time-related genes in B. rapa, enhancing confidence in its accuracy and reliability. Finally, the FTGD (www.sagsanno.top:8080/FTGD) was established, offering user-friendly tools for FTAG prediction, dataset browsing, and submission, aiming to facilitate comprehensive research in the field.

According to the study's lead researcher, Zhidong Li, “We are confident that the FTGD will prove to be a valuable and user-friendly resource for all researchers.”.

In summary, this study used SVM algorithms to distinguish FTAGs with high accuracy, leading to the development of 'FTAGs_Find' for proteome-wide FTAGs identification. Large-scale analysis across 83 species revealed FTAGs' evolutionary patterns. The FTGD was established for easy access. Looking ahead, the goal is to expand FTGD with additional datasets and explore advanced machine learning techniques to further refine the prediction model. This refinement will enhance its utility for the scientific community and contribute to broader insights into plant flowering mechanisms.

###

References

DOI

10.48130/TP-2023-0023

Original Source URL

https://doi.org/10.48130/TP-2023-0023

Funding information

This work was supported by the National Natural Science Foundation of China (32172614), Hainan Province Science and Technology Special Fund (ZDYF2023XDNY050). Authors thank the anonymous editor and reviewers for their valuable comments and suggestions.

About Tropical Plants

Tropical Plants (e-ISSN 2833-9851) is the official journal of Hainan University and published by Maximum Academic Press. Tropical Plants undergoes rigorous peer review and is published in open-access format to enable swift dissemination of research findings, facilitate exchange of academic knowledge and encourage academic discourse on innovative technologies and issues emerging in tropical plant research.

Journal Link: Tropical Plants,November 2023