Quick Start
After installation, PySNAIL can be executed directly as a Python module using the following command:
$ pysnail sample_data/qsmooth.tsv --groups sample_data/groups.tsv --outdir output
The complete arguments are listed as follows (one can get this information by executing pysnail --help)
pysnail -h
usage: pysnail [-h] [-g [path]] [-m {'mean', 'median', 'auto'}]
[-t [threshold]] [-o [path]] xprs
Python implementation of Smooth-quantile Normalization Adaptation for
Inference of co-expression Links (PySNAIL)
positional arguments:
xprs Path to the expression data. The file should be
formatted as follows: the rows should represent genes
(the first row must be the sample names), and the
columns should represent samples (the first column
must be the gene names). The columns must be separated
with <tab>.
optional arguments:
-h, --help show this help message and exit
-g [path], --groups [path]
Path to the group information for each sample. The
file should have two columns without any header. The
first column should be the sample names, corresponds
to the columns in xprs. The second column should be
the group information for each sample. The two columns
must be separated by <tab>. If this argument is not
provided, the algorithm will treat all samples as one
group.
-m {'mean', 'median', 'auto'}, --method {'mean', 'median', 'auto'}
Method used compute the aggregate statistics for
quantile with same value in each group, should be
either 'mean', 'median' or 'auto'. If set to 'auto',
the algorithm is going to use median aggregation if
the proportion of the affected samples is larger or
equal to [--threshold] (default: 0.25). Default:
'median'.
-t [threshold], --threshold [threshold]
Threshold of the proportion of samples being affected
if mean aggregation is being used. The algorithm is
going to use median aggregation if the proportion of
the affected samples is larger or equal to this
threshold when [--method] is set to 'auto'. This
argument is ignored if method is specified with 'mean'
or 'median'. Default: 0.25
-c [cutoff], --cutoff [cutoff]
Cutoff used for trimmed mean when inferring quantile
distribution. (range from 0.00 to 0.25) Default: 0.15.
-o [path], --outdir [path]
Output directory for the corrected qsmooth expression
and some informative statistics. The directory
consists of a data table 'xprs_norm.tsv' with the
corrected expression levels. Default: './output'.
Reproduce Analysis in the Manuscript
The bioconductor-encodexplorer package used in the original analysis is deprecated. To reproduce the analysis, please download the ENCODE dataset from here before executing the following commands. To reproduce analysis in the manuscript:To reproduce analysis in the manuscript:
$ cd PySNAIL
$ # download the ZIP file and put it here.
$ mkdir -p manuscript_analysis/datasets/
$ unzip PySNAIL-ENCODE.zip
$ mv ENCODE manuscript_analysis/datasets/
$ snakemake --cores [n]
The result can be found in the directory manuscript_analysis. Note that it will likely take a while to download and preprocess the datasets.