Quick Start¶
The quick start is meant to test that everything is working with the installation of the probabilistic2020 package. This provides running probabilistic2020 with the minimum number of steps to execute the statistical test. For more expansive user instructions see Tutorial.
Installation¶
Please see the Installation.
Downloading Example¶
Download the quick start example data, and extract the resulting tarball.
$ wget http://karchinlab.org/data/2020+/pancreatic_example.tar.gz
$ tar xvzf pancreatic_example.tar.gz
$ cd pancreatic_example
Input files¶
Gene BED annotation¶
BED gene annotation files should contain a single reference transcript per gene. The name field in the BED file should contain the gene name (not the transcript). An example BED file containg the annotations for the largest transcripts in SNVBox is named snvboxGenes.bed.
Gene FASTA¶
Gene sequences are extracted from a genome FASTA file, and is a step that only needs to be done once. This has already been done for the example BED file provided, but if you were to use a different transcript annotation then you would need to follow the Gene FASTA.
Mutation Annotation Format (MAF) file¶
Mutations are saved in a MAF-like format. Not All fields in MAF spec are required, and columns may be in any order. Mutations for pancreatic adenocarcinoma are in the file pancreatic_adenocarcinoma.txt.
Running the Example¶
To execute the statistical test for TSG-like genes by examining elevated proportion of inactivating mutations, the tsg sub-command for probabilistic2020 is used. To limit the run time for this example, you can limit the number of iterations to 10,000 with the -n parameter. You can further speed up the example by using multiple computer cores with the -p parameter.
$ probabilistic2020 tsg \
-n 10000 \
-i snvboxGenes.fa \
-b snvboxGenes.bed \
-m pancreatic_adenocarcinoma.txt \
-o pancreatic_output_comparison.txt
Your results should match those found in the file pancreatic_output.txt. Particularly, TP53, SMAD4, ARID1A, and SMARCA4 should have a significant inactivating Benjamini-Hochberg (BH) q-value of less than .1.