Skip to content

Virus Search

virus-mapping maps reads or contigs against virus databases using MMseqs2.

flowchart TD
  subgraph IN["<b>Input</b>"]
  IN1["Nucleotide FASTA/FASTQ or MMseqs DB<br> (.fa/.fasta/.fq/.fastq/.mmdb)"]
  end

  subgraph P["<b>MMseqs2 Mapping</b>"]
  CDB["Create DB (if fasta input)<br> (mmseqs createdb)"]
  SEARCH["MMseqs2 Search<br> (mmseqs search)"]
  CONVERT["Convert results<br> (tab/sam/html)"]
  CLEAN["Cleanup tmp files<br> (optional)"]
  end

  subgraph OUT["<b>Outputs</b>"]
  OTAB[".tab (tsv-like)<br> (qheader,theader,alnlen,pident,evalue)"]
  OSAM[".sam (alignment)<br> (optional)"]
  OHTML[".html (report)"]
  end

  IN1 --> CDB --> SEARCH --> CONVERT --> CLEAN
  CONVERT --> OTAB
  CONVERT --> OSAM
  CONVERT --> OHTML

  classDef inputStyle fill:#f0f9ff,stroke:#0366d6,color:#03396c;
  classDef pipelineStyle fill:#fffaf0,stroke:#b85c00,color:#7a3b00;
  classDef outputStyle fill:#f0fff4,stroke:#0b8a3e,color:#0b6624;

  class IN1 inputStyle
  class CDB,SEARCH,CONVERT,CLEAN pipelineStyle
  class OTAB,OSAM,OHTML outputStyle

Options

Common

  • -i, --input: Input nucleotide fasta/fastq or MMseqs2 database (required)
  • -o, --output: Output location (default: current_directory_RP_mapping)
  • -t, --threads: Number of threads (default: 1)
  • -M, --memory: Memory allocation (default: "6g")
  • -g, --log-file: Path to log file (default: current_directory/search_viruses_logfile.txt)
  • --keep-tmp: Keep temporary files (flag)

Database

  • --db: Database to search (default: "all")
  • Options: RVMT, NCBI_Ribovirus, all, other
  • --db-path: Path to custom database (required if db is 'other')
  • Can be FASTA or formatted MMseqs2 database

Usage

# Basic search against all databases
rolypoly virus-mapping -i contigs.fa -o virus_hits.tab

# Search against custom database
rolypoly virus-mapping -i reads.fq --db other --db-path custom_viruses.fa

Citations

This command uses the following tools and databases:

Tools

  • MMseqs2: https://doi.org/10.1038/nbt.3988

Databases

  • RVMT: https://doi.org/10.1016/j.cell.2022.08.023
  • GitHub: https://github.com/UriNeri/RVMT
  • Zenodo: https://zenodo.org/record/7368133
  • RefSeq: https://doi.org/10.1093%2Fnar%2Fgkv1189