Installation¶
Requirements¶
The most basic requirements are: - Conda/Mamba (or git/curl/wget, which will download mamba) - Pixi (for modular/dev installations)
These need to already be on the machine you want to install rolypoly on, as Conda/Mamba are used for managing the Python interpreter and external tools. Specific Python libraries are managed via pip. For modular installations, pixi is used to manage both PyPI and conda dependencies.
Quick and Easy - One Conda/Mamba Environment¶
Recommended for most users who want a "just works" solution and primarily intend to use rolypoly as a CLI tool in an independent environment.
We hope to have rolypoly available from bioconda in the near future. In the meantime, it can be installed with the quick_setup.sh script which will also fetch the pre-generated data rolypoly will require.
curl -O https://code.jgi.doe.gov/rolypoly/rolypoly/-/raw/main/src/setup/quick_setup.sh && \
bash quick_setup.sh
You can specify custom paths for the code, databases, and conda environment location:
bash quick_setup.sh /path/to/conda/env /path/to/install/rolypoly_code /path/to/store/databases /path/to/logfile
quick_setup.sh is called from):- database in
./rolypoly/data/- code in
./rolypoly/code/- conda environment in
./rolypoly/env/- log file in
./RolyPoly_quick_setup.log
Modular / Dev - Command-Specific Pixi Environments¶
For software developers looking to try or make use of specific rolypoly features with minimal risk of dependency conflicts.
# Install pixi first (if not already installed)
curl -fsSL https://pixi.sh/install.sh | bash
# Clone the repository
git clone https://code.jgi.doe.gov/rolypoly/rolypoly.git
cd rolypoly
# Install for specific functionality:
pixi install -e reads-only # Just read processing tools
pixi install -e assembly-only # Just assembly tools
pixi install -e complete # All tools (full functionality)
# Run commands in the appropriate environment
pixi run -e reads-only rolypoly filter-reads --help
Modular Installation Options¶
1. Command-Specific Workflows¶
Reads Processing Only - For filtering, masking, and processing reads:
Assembly Only - For genome/metagenome assembly:
RNA Annotation Only - For RNA structure prediction and ribozyme detection:
Protein Annotation Only - For protein domain annotation:
Virus Identification Only - For viral marker search and virus mapping:
2. Combined Workflow Environments¶
Basic Analysis (reads + assembly + identification):
Full Annotation Workflow:
Complete Installation (all features):
Environment Details¶
| Environment | Features | Use Case | Disk Usage |
|---|---|---|---|
default |
Core Python only | Basic testing/minimal install | ~600M |
reads-only |
Read processing tools | QC, filtering, masking | ~676M |
assembly-only |
Assembly tools | Genome/metagenome assembly | ~532M |
annotation-rna-only |
RNA annotation tools | RNA structure prediction | ~700M |
annotation-prot-only |
Protein annotation tools | Domain annotation | ~650M |
identify-only |
Virus identification tools | Viral marker search | ~650M |
basic-analysis |
Reads + Assembly + Identify | Standard virus discovery | ~1.2G |
full-annotation |
All except misc tools | Complete annotation pipeline | ~1.5G |
complete |
All production tools | Full functionality | ~1.7G |
dev |
Everything + dev tools | Development and testing | ~1.9G |
Commands by Environment¶
- reads-only supports:
rolypoly filter-reads,rolypoly shrink-reads,rolypoly mask-dna - assembly-only supports:
rolypoly assemble,rolypoly filter-contigs - annotation-rna-only supports:
rolypoly annotate-rna - annotation-prot-only supports:
rolypoly annotate-prot - identify-only supports:
rolypoly marker-search,rolypoly virus-mapping - misc-only supports:
rolypoly fetch-sra,rolypoly quick-taxonomy,rolypoly get-data
Feature-to-Tool Mapping¶
The modular installation system groups dependencies by functionality:
reads¶
- seqkit (sequence manipulation)
- bowtie (optional alignment)
- falco (quality control reports)
- pigz (parallel compression)
assembly¶
- spades (genome assembler)
- megahit (metagenome assembler)
- plass (protein-level assembler)
- mmseqs2 (sequence search)
annotation-rna¶
- infernal (RNA structure search)
- linearfold (RNA secondary structure)
- aragorn (tRNA/tmRNA detection)
- trnascan-se (tRNA detection)
annotation-prot¶
- hmmer (profile HMM search)
- mmseqs2 (sequence search)
- diamond (fast protein alignment)
identify¶
- mmseqs2 (virus database search)
- hmmer (viral marker search)
- diamond (protein similarity search)
misc¶
- ncbi-datasets-cli (NCBI data download)
- aria2 (fast downloads)
- taxonkit (taxonomy tools)
- freebayes (variant calling)
Useful Tips¶
-
Check available environments:
-
List installed packages in environment:
-
Update specific environment:
-
Remove unused environments:
Development with Pixi¶
For development work, use the pixi development environment:
git clone https://code.jgi.doe.gov/rolypoly/rolypoly.git
cd rolypoly
pixi install -e dev
pixi run -e dev rolypoly --help
Dependencies¶
Not all 3rd party software is used by all the different commands. RolyPoly includes a "citation reminder" that will try to list all the external software used by a command. The "reminded citations" are pretty printed to console (stdout) but are also written to a logfile. The bibtex file rolypoly uses for this is included in the codebase.
Click to show dependencies
Non-Python - [SPAdes](https://github.com/ablab/spades). - [seqkit](https://github.com/shenwei356/seqkit) - [datasets](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/) - [bbmap](https://sourceforge.net/projects/bbmap/) - via [bbmapy](https://github.com/urineri/bbmapy) - [megahit](https://github.com/voutcn/megahit) - [mmseqs2](https://github.com/soedinglab/MMseqs2) - [plass and penguin](https://github.com/soedinglab/plass) - [diamond](https://github.com/bbuchfink/diamond) - [pigz](https://github.com/madler/pigz) - [prodigal](https://github.com/hyattpd/Prodigal) - via pyrodigal-gv (add link) - [linearfold](https://github.com/LinearFold/LinearFold) - [HMMER](https://github.com/EddyRivasLab/hmmer) - via pyhmmer - [needletail](https://github.com/onecodex/needletail) - [infernal](https://github.com/EddyRivasLab/infernal) - [aragorn](http://130.235.244.92/ARAGORN/) - [tRNAscan-SE](http://lowelab.ucsc.edu/tRNAscan-SE/) - [bowtie1](https://github.com/BenLangmead/bowtie) - [falco](https://github.com/smithlabcode/falco/) ### Python Libraries * [polars](https://pola.rs/) * [numpy](https://numpy.org/) * [rich_click](https://pypi.org/project/rich-click/) * [rich](https://github.com/Textualize/rich) * [pyhmmer](https://github.com/althonos/pyhmmer) * [pyrodigal-gv](https://github.com/althonos/pyrodigal-gv) * [multiprocess](https://github.com/uqfoundation/multiprocess) * [requests](https://requests.readthedocs.io) * [pgzip](https://github.com/pgzip/pgzip) * [pyfastx](https://github.com/lmdu/pyfastx) * [psutil](https://pypi.org/project/psutil/) * [bbmapy](https://github.com/urineri/bbmapy) * [pymsaviz](https://github.com/aziele/pymsaviz) * [viennarna](https://github.com/ViennaRNA/ViennaRNA) * [pyranges](https://github.com/biocore-ntnu/pyranges) * [intervaltree](https://github.com/chaimleib/intervaltree) * [genomicranges](https://github.com/CoreyMSchafer/genomicranges) * [lightmotif](https://github.com/dincarnato/LightMotif) * [mappy](https://github.com/lh3/minimap2/tree/master/python)Databases used by rolypoly¶
RolyPoly will try to remind you to cite these too based on the commands you run. For more details, see the citation_reminder.py script and all_used_tools_dbs_citations