===== Usage ===== Given a sequence database (in FASTA format), ``vpsearch build`` constructs an optimized vantage point search tree. Building the tree is a one-time operation and doesn't have to be done again unless the database changes. As an illustration, we build a vantage point tree for a database of sequences obtained by trimming the GTDB 16S database to the v3-v4 hypervariable region. This database contains 10875 unique sequences, and can be found (in compressed form) in the ``data/`` directory inside this repository:: $ vpsearch build bac120_ssu_reps_r207-sliced-dedup.fa.gz Building for 10875 sequences...done. Linearizing...done. Database created in bac120_ssu_reps_r207-sliced-dedup.fa.db As this is a relatively small database, the process finishes quickly, in about 10 seconds. For larger databases, such as the RDP database of full length sequences, this may take longer. For example, building an index for the RDP database takes about 20 minutes on a standard machine. Once a tree has been built, unknown sequences can be looked up using the ``vpsearch query`` command. Here we supply a query file with a single sequence. The ``query.fa`` file can also be found in the ``data/`` directory and represents a *Lactobacillus helsingborgensis* sample whose sequence was downloaded from RefSeq. We see that we have a perfect match with ``RS_GCF_000970855.1``, which happens to be the same sequence. Other matches are highly similar but not identical, and represent different species of *Lactobacillus* (*kimbladii*, *melliventris*, and *panisapium*, respectively):: $ vpsearch query bac120_ssu_reps_r207-sliced-dedup.fa.db query.fa NR_126253.1 RS_GCF_000970855.1 100.00 253 0 0 1 253 1 253 0 1265 NR_126253.1 RS_GCF_014323605.1 98.81 253 0 0 1 253 1 253 0 1238 NR_126253.1 RS_GCF_013346935.1 98.02 253 0 0 1 253 1 253 0 1220 NR_126253.1 RS_GCF_002916935.1 97.63 253 0 0 1 253 1 253 0 1211 By default, the ``vpsearch query`` command outputs the best four matches in the database per query sequence (the number of matches can be changed with the ``-k`` parameter). Lookup is done one query sequence at a time, but multiple queries can be considered in parallel by enabling multiple threads; use the ``-j`` option to specify the number of threads. The ``vpsearch query`` command attempts to output its results in the standard BLAST tabular format. The interpretation of the columns is as follows: +------------------+--------------------+------------------------------------+ | Column name | Example | Notes | +==================+====================+====================================+ | query ID | NR_126253.1 | | +------------------+--------------------+------------------------------------+ | subject ID | RS_GCF_014323605.1 | | +------------------+--------------------+------------------------------------+ | % identity | 98.81 | | +------------------+--------------------+------------------------------------+ | alignment length | 253 | | +------------------+--------------------+------------------------------------+ | mismatches | 0 | currently not implemented | +------------------+--------------------+------------------------------------+ | gap openings | 0 | currently not implemented | +------------------+--------------------+------------------------------------+ | query start | 1 | | +------------------+--------------------+------------------------------------+ | query end | 253 | | +------------------+--------------------+------------------------------------+ | subject start | 1 | | +------------------+--------------------+------------------------------------+ | subject end | 253 | | +------------------+--------------------+------------------------------------+ | E-value | 0 | N/A (always 0) | +------------------+--------------------+------------------------------------+ | bit score | 1238 | interpreted as the alignment score | +------------------+--------------------+------------------------------------+ Note that the number of mismatches and gap openings are currently not displayed in the result output. This will be addressed in a future version of the package.