IzziD code logo

Topics

Home

Bioinfo

Web

Misc

About

Other IzziDs

IzziDassorted

IzziDtravel

IzziDwetlab

IzziD

How to reverse complement a DNA sequence in Perl

Article created: Aug 25, 2011
Article by: Jeremiah Faith

The basic function, shown below, for reverse complementing a DNA sequence uses the built-in reverse() function in Perl to reverse the string. It then uses the tr operator, which allows you to supply a list of characters to match in a string and a list of the replacement characters for each of the matched characters. For the example below (i.e. tr/ACGTacgt/TGCAtgca/), we are replacing A with T, C with G, etc… to complement the DNA.

sub reverse_complement {
        my $dna = shift;

	# reverse the DNA sequence
        my $revcomp = reverse($dna);

	# complement the reversed DNA sequence
        $revcomp =~ tr/ACGTacgt/TGCAtgca/;
        return $revcomp;
}

To include support for all of the IUPAC nucleotide codes (e.g. S = strong = G OR C) and uracil, we need to add in a few more replacements.

sub reverse_complement_IUPAC {
        my $dna = shift;

	# reverse the DNA sequence
        my $revcomp = reverse($dna);

	# complement the reversed DNA sequence
        $revcomp =~ tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/;
        return $revcomp;
}

Nucleotide Ambiguity Codes

SymbolNucleotidesComplement
AAT
CCG
GGC
TTA
UUA
RA or GY
YC or TR
SC or GS
WA or TW
KG or TM
MA or CK
BC or G or TV
DA or G or TH
HA or C or TD
VA or C or GB
NA or C or G or TN
XA or C or G or TX
-gap-
.gap.