Run the TE pipeline

For this tutorial, we use the hiPSC-CM-chr1-example with the hiPSC-CM-test.yaml configuration file.

How to run the pipeline

To run the pipeline, change the paths in the configuration file to point to the location where you extracted the data, e.g. /home/user/data/hiPSC-CM-chr1-example. You can do this using a text editor, or simply by modifying the file in place

sed -i 's|/path/to/your/hiPSC-CM-example|'`pwd`'|g' hiPSC-CM-test.yaml

Important

Default parameters were modified for the example and included in the configuration file. If you use this configuration file as a general template for your data, do not forget to remove everything below the line “REMOVE BELOW THIS LINE IF YOU USE THIS CONFIGURATION FILE AS TEMPLATE FOR YOUR DATA”.

For the example, you don’t have to create the indices and annotation, they are included with the input data. Finally, run the abundance estimation pipeline from the hiPSC-CM-chr1-example directory (or adjust the paths accordingly)

run-htseq-workflow \
 ribo \
 hiPSC-CM-test.yaml \
 --run-all \
 --trim-rna-to-max-fragment-size \
 --ribo-config hiPSC-CM-test.yaml \
 --star-options "--quantMode GeneCounts" \
 --htseq-options "--idattr orf_id" "--additional-attr orf_type" "--additional-attr gene_name" "--stranded yes" \
 --rna-stranded reverse \
 --gtf input/ribo-ORFs.chr1.gtf \
 --rna-config hiPSC-CM-test.yaml \
 --num-cpus 4 \
 --keep-intermediate-files \
 --logging-level INFO \
 --log-file ribotools-pipeline.log

If you have access to a cluster with a cluster management and job scheduling system such as Slurm, you can --use-slurm and pass additional options.

The file ribotools-pipeline.log contains logging output for the different processing steps. You now have two new directories riboseq-results and rnaseq-results, each with five sub-directories: a directory called without-adapters (Flexbar output), directories called with-rrna and without-rrna (Bowtie2 output), a directory called without-rrna-mapping (STAR output), and a directory called count-tables (final output from HTSeq). Under riboseq-results, there is one additional directory called metagene-profiles containing model outputs for periodicity estimation, periodic lengths and offsets.

How to estimate TE

Prepare the sample table

get-sample-table hiPSC-CM-test.yaml

You should have a new directory, called tea-results, containing the file sample-table.csv. Finally, fit the data with

run-tea --orfCol 2 --symbolCol 3 --lfcThreshold 0 --alpha .99 hiPSC-CM-test.yaml

Under tea-results, there is now a new sub-directory containing the output of the LRT model for the contrast d5_vs_d1. The different classes of regulated ORFs are show below

_images/scatter.png

Note

The test dataset contains only a subset of the original data. Values for --lfcThreshold and --alpha are not meaningful and only meant to produce a sufficient number of features in each class. Despite this, we are able to see features associated with relevant cardiac maturation regulatory processes.