The basic function, shown below, for reverse complementing a DNA sequence uses the built-in reverse() function in Perl to reverse the string. It then uses the tr operator, which allows you to supply a list of characters to match in a string and a list of the replacement characters for each of the matched characters. For the example below (i.e. tr/ACGTacgt/TGCAtgca/), we are replacing A with T, C with G, etc… to complement the DNA.
sub reverse_complement {
my $dna = shift;
# reverse the DNA sequence
my $revcomp = reverse($dna);
# complement the reversed DNA sequence
$revcomp =~ tr/ACGTacgt/TGCAtgca/;
return $revcomp;
}
To include support for all of the IUPAC nucleotide codes (e.g. S = strong = G OR C) and uracil, we need to add in a few more replacements.
sub reverse_complement_IUPAC {
my $dna = shift;
# reverse the DNA sequence
my $revcomp = reverse($dna);
# complement the reversed DNA sequence
$revcomp =~ tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/;
return $revcomp;
}
| Symbol | Nucleotides | Complement |
| A | A | T |
| C | C | G |
| G | G | C |
| T | T | A |
| U | U | A |
| R | A or G | Y |
| Y | C or T | R |
| S | C or G | S |
| W | A or T | W |
| K | G or T | M |
| M | A or C | K |
| B | C or G or T | V |
| D | A or G or T | H |
| H | A or C or T | D |
| V | A or C or G | B |
| N | A or C or G or T | N |
| X | A or C or G or T | X |
| - | gap | - |
| . | gap | . |