To ensure the classifier is as accurate as possible, it is best to quantify the gene expression values in the same way as was done for the training samples, as follows:
1) We used GENCODE v34 as reference transcriptome, with some filtering as described in the Methods section of the paper. The resulting reference transcriptome can be downloaded here:
gencode.v34.transcripts.selected.fa.gz
2) Quantify transcript abundance with kallisto.
The index can be constructed like this:
kallisto index -i kallisto_index gencode.v34.transcripts.selected.fa
and the transcripts in a sample can be quantified for instance like this:
kallisto quant -i /path/to/kallisto_index --single -l 200 -s 20 --rf-stranded sample.fastq
or with paired-end or unstranded settings as needed. Please refer to the kallisto manual for details.
3) Sum the estimated transcript abundance to the gene level; a script to summarize to the gene level and construct a table from multiple samples is provided here. The resulting table can be uploaded as input to the classifier.