News: 2011 bioblitz!
Farrell Lab undergraduate Adam Clark is leading the second annual Harvard bioblitz. Join us on May 1! More
News
Prof. Farrell co-authors new paper which answers longstanding scientific question about cause of tropics' stunning biodiversity. MoreCurrent research
Current research extends the evolution of insect-plant interactions to other trophic levels through a broad collaboration in the beetle Tree of Life project.
A new research dimension in the lab concerns the acoustic signals produced for mating and territory defense. More
Sequence editing & alignment
We use Sequencher 4.1 for sequence editing and ClustalX, MAFFT, or one of several other programs for alignment of difficult sequences (such as ribosomal DNA)
Trimming raw sequence
- In Sequencher, open one of the sequence files.
- Select "View chromatogram".
- Highlight the first 25 bases or so of good sequence (top row), corresponding to the primer. Delete these bases (the sequence corresponds to the primer, not necessarily sequence from your organism).
- Start highlighting bases when the signal becomes too messy to reliably determine the sequence. Delete these bases. This makes alignment far easier and reduces errors.
Labeling features (like introns)
- In Sequencher, open a contig.
- Highlight the consensus sequence corresponding to the feature.
- Go to Sequence -> Mark Selection As Feature and format the feature.
Exporting sequences
- In Sequencher, select a contig for a taxon, choose "Create New Seq From Consensus" under the Contig menu.
- Assemble the new sequence with the contig.
- Delete those parts of the new sequence which are not strongly supported by the contig sequence data (use N's for internal regions of uncertain sequence)
- Do this for all the taxa.
- Remove the consensus sequences from the contigs.
- Assemble the consensus sequences into one large contig. Edit it for gaps, etc.
- Under the File menu, go to "Import & Export..." and choose "Export Sequence(s)...", NEXUS format.
Alignment parameters for ClustalX
We generally use a variety of alignment parameters and then choose the alignment which seems best (see below). We use Clustal's default gap opening cost:gap extension cost value of 15:6.66, as well as the following ratios (from Maddison et al. 1999, "Phylogeny of carabid beetles as inferred from 18S ribosomal DNA" Systematic Entomology 24:103-108). They are provided in a check box format for more convenient use. The boxes will be reset if you close this window. Make sure you change the name of the output files of Clustal when you change the alignment parameters.
| 50:5 | 20:5 | 15:6.66 | 15:3 | 12:7 | 10:5 | 10:2 | 8:3 | 7:2 | 5:1 | 3:2 | 3:0.5 |
Profile alignment
In some cases Clustal's alignments can be pretty poor — we have noticed cases where identical sequence sections are aligned differently from each other. The alignments can be drastically improved by using a profile aligment and secondary structure. With this sort of alignment, the cost of inserting gaps in some regions, like stems of ribosomal RNA, can be set higher than gaps in regions which are more free to vary, such as RNA loops. Known secondary structures can be downloaded from several databases and the sequences edited to conform to Clustal's format for profile sequences.