API Reference

Functions

pyswrd.search(queries, targets, *, gap_open=10, gap_extend=1, scorer_name='BLOSUM62', kmer_length=3, max_candidates=30000, score_threshold=13, max_alignments=10, max_evalue=10.0, algorithm='sw', threads=0)

Run a many-to-many search of query sequences to target sequences.

This function is a high-level wrapper around the different classes of the pyswrd library to support fast searches when all sequences are in memory.

Parameters:
  • queries (Sequences, or iterable of str) – The sequences to query the target sequences with.

  • targets (Sequences or iterable of str) – The sequences to be queries with the query sequences.

  • gap_open (int) – The penalty for opening a gap in each alignment.

  • gap_extend (int) – The penalty for extending a gap in each alignment.

  • scorer_name (str) – The name of the scoring matrix to use for scoring each alignment. See Scorer for the list of supported names.

  • kmer_length (int) – The length of the k-mers to use in the SWORD heuristic filter.

  • max_candidates (int) – The maximum number of candidates to retain in the heuristic filter.

  • max_evalue (float) – The E-value threshold above which to discard sequences before alignment.

  • algorithm (str) – The algorithm to use to perform pairwise alignment. See pyopal.Aligner.align for more information.

  • threads (int) – The number of threads to use to run the pre-filter and alignments. If zero is given, uses the number of CPUs reported by os.cpu_count.

  • pool (ThreadPool) – A running pool instance to use for parallelization. Useful for reusing the same pool across several calls of search. If None given, spawns a new pool based on the threads argument.

Yields:

Hit – Hit objects for each hit passing the threshold parameters. Hits are grouped by query index, and sorted by E-value.

Example

>>> queries = ["MAGFLKVVQLLAKYGSKAVQWAWANKGKILDWLNAGQAIDWVVSKIKQILGIK"]
>>> targets = ([
...     "MESILDLQELETSEEESALMAASTVSNNC",
...     "MKKAVIVENKGCATCSIGAACLVDGPIPDFEIAGATGLFGLWG",
...     "MAGFLKVVQILAKYGSKAVQWAWANKGKILDWINAGQAIDWVVEKIKQILGIK",
...     "MTQIKVPTALIASVHGEGQHLFEPMAARCTCTTIISSSSTF",
... ])
>>> for hit in pyswrd.search(queries, targets):
...     cigar = hit.result.cigar()
...     print(f"target={hit.target_index} score={hit.score} evalue={hit.evalue:.1g} cigar={cigar}")
target=2 score=268 evalue=1e-33 cigar=53M

Classes

pyswrd.KmerGenerator

A generator of k-mers with optional substitutions.

pyswrd.Scorer

A class storing the scoring matrix and gap parameters for alignments.

pyswrd.EValue

A class for calculating E-values from alignment scores.

pyswrd.Sequences

A list of sequences.

pyswrd.FilterScore

The score of the heuristic filter for a single target.

pyswrd.FilterResult

The result of the heuristic filter.

pyswrd.HeuristicFilter

The SWORD heuristic filter for selecting alignment candidates.

pyswrd.Hit

A single hit of a database search.