13 January 2025

Machine learning has made it easier to diagnose cancer by subtype

Researchers from five countries have used machine learning algorithms to develop easy-to-use models to classify malignancy samples into known subtypes. These models, designed to facilitate molecular typing of patients' tumors in clinical settings, have been made publicly available. A report of the work is published in the journal Cancer Cell.

Molecular subtypes of cancer, defined, for example, in the Cancer Genome Atlas (TCGA), provide information about the biological processes in the tumor, which helps to assess the patient's prognosis and choose rational approaches to treatment. However, available approaches to classify tumors into subtypes are based on models and clustering methods that are difficult to apply beyond the dataset where they were identified, making their clinical application nearly impossible.

Peter Laird of the Van Andel Institute and colleagues from Brazil, Canada, Georgia, Greece, Georgia and the United States conducted a multivariate analysis of 8791 tumors from TCGA, belonging to 106 known subtypes of 26 anatomohistopathological cancer varieties, across five types of data: mutations, copy number, matrix RNA, DNA methylation and microRNA. The researchers developed subtype-balanced repeated cross-validation runs that were used in training and test datasets for five machine learning algorithms: AKLIMATE, CloudForest, SK Grid, JADBio (they were trained for each cancer variety separately) and subSCOPE (for all at once).

Using these algorithms and cross-validation, model classifiers optimized to identify molecular subtypes of cancer from minimal feature sets were created to avoid overtraining. In total, nearly 412,600 such models were created. Of these, the best ones for each cancer type, learning algorithm, and data type were selected (a total of 737 model classifiers for tumor subtype detection not from the TCGA cohort), containerized, and made publicly available.

The resulting models can be used to create compact systems and kits for cancer genetic diagnosis in clinical trials and medical practice. The authors hope that their findings will be the first step in bridging the gap between the extensive TCGA data library and its clinical application.

Found a typo? Select it and press ctrl + enter Print version