Citing COVID-Align

If you use this web service, please cite:

    Frederic Lemoine, Luc Blassel, Jakub Voznica, Olivier Gascuel,
    COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM,
    Bioinformatics, Volume 37, Issue 12, 15 June 2021, Pages 1761–1762,
    10.1093/bioinformatics/btaa871

Running a COVALIGN analysis

COVID-ALIGN interface is straightforward to use. You just have to choose a fasta file to upload, and submit the analysis.

It is possible to specify a run name and indicate an email address to be alerted as soon as the analysis is finished.

Interpreting output files

Four output files are given by COVALIGN (see paper for more information about these statistics):

  1. Alignment in FASTA format. This file contains all input sequences after alignment. Insertions compared to the reference profile are absent from this alignment (but counted in the statistics below);
  2. Raw aligned sequences in FASTA Format: Concatenated aligned sequences per batch (10 sequences per batch). Insertions compared to the reference profile are still present;
  3. Dataset statistics (csv):
    • Genomes: Number of high quality and low quality submited genomes.
    • Mutations: Average number of mutations, unique mutations and new mutations.
    • Gaps: Average number of gaps, unique gaps and new gap.
    • Gap Open.: Average number of gap opennings, unique gap opennings and new gap opennings.
    • Ins Open.: Average number of insertion opennings, unique insertion opennings and new insertion opennings.
  4. Sequence statistics (csv):
    • Mutations
      • Mut_Unique: # Unique DNA mutations (see above definition for Unique/New).
      • Mut_New: # New DNA mutations (does not apply to GISAID sequences).
      • Mut_Ref: # DNA mutations compared to the reference genome (EPI_ISL_402124).
      • Mut_ORF: # of mutations occurring in ORFs.
      • Mut_Density: Highest number of DNA mutations in a window of size 20 (to be used to detect poor quality genomes).
      • Mut_Unique_List: List of unique mutations, as pairs of (position, nucleotide).
      • Mut_New_List: List of new mutations (does not apply to GISAID sequences).
      • Mut_ORF_List: List of mutations compared to the reference sequence, occurring in ORFs. Each mutation is represented as a triple of (position, mutated Nucleotide, name of ORF).
    • GAPS
      • Gap_Start: # Gaps (i.e. deletions) at the beginning of the sequence (not counting those in the 12 first positions of the reference sequence).
      • Gap_End: # Gaps at the end of the sequence (not counting those in the 22 last positions of the reference sequence).
      • Gap: # Gaps in the core sequence (i.e. not counting start/end gaps).
      • Gap_Unique: # Unique core gaps (see above definition for Unique/New).
      • Gap_New: # New core gaps (does not apply to GISAID sequences).
      • Gap_Opening: # Number of gap openings.
      • Gap_Opening_Unique: # Number of unique gap openings.
      • Gap_Opening_New: # Number of unique gap openings.
      • Gap_ORF: # of gaps occurring in ORFs.
      • Gap_Segment_Unique: # Unique gap segments in the core sequence, having a unique set of starting position and length (see above definition for Unique/New).
      • Gap_Segment_New: # New gap segments in the core sequence (does not apply to GISAID sequences).
      • Gap_Opening_Unique_List: List of unique opening gap positions
      • Gap_Opening_New_List: List of new opening gap positions
      • Gap_Segment_List: List of gap segments as pairs of (starting position, length), including gap segments at the start and end of the sequence.
      • Gap_ORF_List: List of gaps occurring in ORFs, as pairs of (position, ORF name).
    • INSERTIONS
      • Insertion: # of non N insertions in the core sequence (i.e. not counting start/end insertions).
      • Insertion_Opening: # Core insertion openings.
      • Insertion_Opening_Unique: # Unique core insertion opening positions (see above definition for Unique/New).
      • Insertion_Opening_New: # New core insertion opening positions (does not apply to GISAID sequences).
      • Insertion_ORF: # Insertion opening in ORFs.
      • Insertion_Segment_Unique: # Unique insertions segments in the core sequence, having a unique set of opening position and length (see above definition for Unique/New).
      • Insertion_Segment_New: # New insertions segments in the core sequence (does not apply to GISAID sequences).
      • Insertion_Opening_Unique_List: List of unique opening insertion positions.
      • Insertion_Opening_New_List: List of new opening insertion positions.
      • Insertion_Segment_List: List of insertion segments as pairs of (opening position, length).
      • Insertion_ORF_List: List of insertions occurring in ORFs, as pairs of (opening position, ORF name).
    • NUCLEOTIDE CONTENTS
      • A: #A in the whole sequence.
      • C: #C
      • G: #G
      • T: #T
      • N: #N
      • W: #W
      • S: #S
      • M: #M
      • K: #K
      • R: #R
      • Y: #Y
      • Ambiguous_Bases: # of ambiguous bases.
twitter facebook linkedin youtube3