Evaluation of Deep Learning in non-coding RNA Classification

N Amin, A McGrath, YPP Chen (2019),  Evaluation of Deep Learning in non-coding RNA ClassificationNature Machine Intelligence, VOL 1 | MAY 2019 | 246–256, DOI: 10.1038/s42256-019-0051-2.

Description

In the recent past, deep learning had been employed for ncRNA identification and classification and had shown promising results. we evaluated deep learning based approaches for lncRNA, small ncRNA, and circular RNA types classification.

Note: In our experiments, we have used the full-length version of lncADeep.

Prerequisites

Using pip

pip install biopython
pip install numpy
pip install scipy 
pip install scikit-learn 
	OR 
pip install -U scikit-learn (if numpy scipy is already installed) 
pip install matplotlib
pip install seaborn
Using conda

conda install -c anaconda biopython
conda install -c anaconda numpy
conda install -c anaconda scipy
conda install -c anaconda scikit-learn  
conda install -c anaconda matplotlib 
conda install -c anaconda seaborn

Usage

Density Plot of the mouse and human datasets from GENCODE

python .\density_plot.py -i < inputFile >  -o < outputFile >
Accuracy, Precision, Recall, F1 score, and Confusion Matrix of CircDeep, lncADeep, lncRNAnet, lincFinder, and nRC of mouse and human datasets from GENCODE.

python .\circDeep_Metrics.py -i < inputFile >
python .\lncADeep_Metrics.py -i < inputFile >
python .\lncRNAnet_Metrics.py -i  < inputFile >
python .\lncFinder_Metrics.py -i  < inputFile >
python .\nRC_Metrics.py -i  < inputFile >
ROC and PR Curve of lncADeep, lncRNAnet, and lincFinder of human and mouse from GENCODE.

 python .\lnc_ROC_PRC.py
Note: All the datasets are in data/< algorithm name >/< species name or file name >

Download

Source code and data can be downloaded here

lncADeep Partial-lenth Model Experiments

Performance of the partial-length model of LncADeep is described in Table 1, Fig 1, and Fig 2.


Table 1 | Performance of lncRNAnet, partial-length model of lncADeep, and lncFinder on lncGH and lncGM datasets

lncGH

lncGM

lncRNAnet

lncADeep

lncFinder

lncRNAnet

lncADeep

lncFinder

Accuracy

0.923

0.946

0.842

0.773

0.950

0.895

Precision

0.758

0.821

0.593

0.481

0.832

0.686

Recall

0.972

0.975

0.952

0.625

0.964

0.952

F1 score

0.852

0.891

0.731

0.544

0.893

0.798

Specificity

0.909

0.821

0.809

0.814

0.832

0.879

Error rate

0.0761

0.054

0.225

0.226

0.05

0.104

Trulli
Fig. 1 | ROC curves for lncRNAnet, partial-length model of LncADeep and lncFinder on lncGH and lncGM datasets.
Trulli
Fig. 2 | PR curves for lncRNAnet, partial-length model of LncADeep and lncFinder on lncGH and lncGM datasets.

References

Han, S. et al. LncFinder: An integrated platform for long non-coding rna identification utilizing sequence intrinsic composition, structural information and physicochemical property. Brief. Bioinform. 2018, bby065 (2018).  

Baek, J., et al. LncRNAnet: long non-coding RNA Identification using deep learning. Bioinformatics 31, 3889–3897 (2018).

Fiannaca, A., et al. nRC: non-coding RNA classifier based on structural features. BioData Mining 10, 27 (2017).

Yang, C. et al. LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning. Bioinformatics 34, 3825–3834 (2018). 


Copyright © 2019 Noorul Amin, Annette McGrath and Yi-Ping Phoebe Chen