run-htseq-workflow

Submit samples for abundance estimation. The pipeline is called for every sample in the configuration file.

usage: run-htseq-workflow [-h] [--skip-periodicity-estimation]
                          [--create-orf-profiles] [--run-all]
                          [--rna-stranded {yes,reverse,no}]
                          [--trim-rna-to-max-fragment-size]
                          [--ribo-config RIBO_CONFIG]
                          [--rna-config RNA_CONFIG] [--gtf HTSEQ_GTF] [-t TMP]
                          [--overwrite] [-k] [--num-cpus NUM_CPUS] [--mem MEM]
                          [--time TIME] [--partitions PARTITIONS]
                          [--no-output] [--no-error]
                          [--stdout-file STDOUT_FILE]
                          [--stderr-file STDERR_FILE] [--do-not-call]
                          [--use-slurm]
                          [--mail-type [{NONE,BEGIN,END,FAIL,REQUEUE,ALL,STAGE_OUT,TIME_LIMIT,TIME_LIMIT_90,TIME_LIMIT_80,TIME_LIMIT_50,ARRAY_TASKS} ...]]
                          [--mail-user MAIL_USER] [--log-file LOG_FILE]
                          [--enable-ext-logging] [--log-stdout]
                          [--no-log-stderr]
                          [--logging-level {NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                          [--file-logging-level {NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                          [--stdout-logging-level {NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                          [--stderr-logging-level {NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                          [--star-executable STAR_EXECUTABLE]
                          [--star-options [STAR_OPTIONS ...]]
                          [--flexbar-options [FLEXBAR_OPTIONS ...]]
                          [--htseq-options [HTSEQ_OPTIONS ...]]
                          {rna,ribo} config

Positional Arguments

seq

Possible choices: rna, ribo

config

The yaml configuration file.

Named Arguments

--skip-periodicity-estimation

Skip periodicity estimation and do not filter out non-periodic read lengths from the final alignment files. For Ribo-seq only (Default is to keep periodic reads only).

Default: False

--create-orf-profiles

Create ORF profiles for extended QC. Silently ignored if used with [–skip-periodicity-estimation]. Requires Rp-Bp index files.

Default: False

--run-all

Run Ribo-seq and RNA-seq, one after the other. RNA-seq is run only if ALL Ribo-seq jobs complete successfully. For Ribo-seq only.

Default: False

--rna-stranded

Possible choices: yes, reverse, no

Library strandedness for RNA-seq, when [–run-all]. This option is passed to ‘htseq-count’ and overrides the same option passed via [–htseq-options] or the default value.

Default: 'no'

--trim-rna-to-max-fragment-size

Trim RNA post adapter removal using max fragment size from matching Ribo-seq samples. Required: the “periodic-offsets” files, the “matching_samples” key in the config, and the option [–ribo-config]. If the [–post-trim-length] option is passed via [–flexbar-options], it will override this option. For RNA-seq only.

Default: False

--ribo-config

The Ribo-seq config file if using [–trim-rna-to-max-fragment-size].

--rna-config

The RNA-seq config file if using [–run-all].

--gtf

A different GTF file for abundance estimation, e.g. the output of ‘get-gtf-from-predictions’ (Ribo-seq ORFs). This is passed to ‘htseq-count’ and overrides the GTF file from the config.

file options

-t, --tmp
Where to write temporary files.

If not specified, programs-specific tmp will be used.

--overwrite

Overwrite existing files.

Default: False

-k, --keep-intermediate-files
Unless this flag is given, all intermediate files

(such as discarded reads) will be deleted, unless the [–do-not-call] option is also given.

Default: False

slurm options

--num-cpus

The number of CPUs to use (not only for SLURM). For STAR, --num-cpus are threads, but in general, this is number of processes to spawn. This value should not be greater than the number of cores available. When used with SLURM, this is equivalent to: --ntasks 1 --cpus-per-task <num-cpus>.

Default: 1

--mem

Real memory required (per node), mostly for STAR genome indexing (not only for SLURM). When used with SLURM, this is equivalent to: --mem=<mem>.

Default: '2G'

--time

Set a limit on the total run time of the job allocation. This is equivalent to: --time <time>.

--partitions

Request a partition for the resource allocation. This is equivalent to: -p <partitions>.

--no-output

Redirect stdout to /dev/null. This is equivalent to: --output=/dev/null. By default, stdout is redirected to --output=slurm-*.out.

Default: False

--no-error

Redirect stderr to /dev/null. This is equivalent to: --output=/dev/null. By default, stderr is redirected to --output=slurm-*.err.

Default: False

--stdout-file

Log file (stdout) if not --no-output. This is equivalent to: --output=stdout-file.

--stderr-file

Log file (stderr) if not --no-error. This is equivalent to: --output=stderr-file.

--do-not-call

Do not execute the program (dry run).

Default: False

--use-slurm

Submitted calls to SLURM.

Default: False

--mail-type

Possible choices: NONE, BEGIN, END, FAIL, REQUEUE, ALL, STAGE_OUT, TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80, TIME_LIMIT_50, ARRAY_TASKS

Notify user by email when certain event types occur, if --mail-user is specified.

Default: ['FAIL', 'TIME_LIMIT']

--mail-user

User to receive email notification of state changes as defined by --mail-type.

logging options

--log-file

Log file (logging is redirected to this file, in addition to stdout and stderr, if specified).

Default: ''

--enable-ext-logging

Enable logging for external programs that may be disabled by default, e.g. CmdStanPy.

Default: False

--log-stdout

Log to stdout (in addition to a file and stderr, if specified).

Default: False

--no-log-stderr

Do not send logging to stderr.

Default: False

--logging-level

Possible choices: NOTSET, DEBUG, INFO, WARNING, ERROR, CRITICAL

Logging level for all logs.

Default: 'WARNING'

--file-logging-level

Possible choices: NOTSET, DEBUG, INFO, WARNING, ERROR, CRITICAL

Logging level for the log file. This option overrides --logging-level.

Default: 'NOTSET'

--stdout-logging-level

Possible choices: NOTSET, DEBUG, INFO, WARNING, ERROR, CRITICAL

Logging level for stdout. This option overrides --logging-level.

Default: 'NOTSET'

--stderr-logging-level

Possible choices: NOTSET, DEBUG, INFO, WARNING, ERROR, CRITICAL

Logging level for stderr. This option overrides --logging-level.

Default: 'NOTSET'

STAR options

--star-executable

The name of the STAR executable

Default: 'STAR'

--star-options

A space-delimited list of options to pass to STAR. Each option is quoted separately as in "--starOption value", using soft quotes, where starOption is the long parameter name from STAR, and value is the value given to this parameter. If specified, STAR options will override default settings.

Flexbar options

--flexbar-options

A space-delimited list of options to pass to Flexbar. Each option is quoted separately as in "--flexbarOption value", using soft quotes, where flexbarOption is the long parameter name from Flexbar, and value is the value given to this parameter. If specified, Flexbar options will override default settings.

HTSeq options

--htseq-options
A space-delimited list of options to pass to htseq-count.

Each option must be quoted separately as in “–htseqOption value”, using soft quotes, where ‘–htseqOption’ is the long parameter name from htseq-count and ‘value’ is the value given to this parameter. If specified, htseq-count options will override default settings.