Sockeye Help index


Tutorial: Aligning C. elegans and C. briggsae myo-2 orthologues

This tutorial describes the steps required to align orthologous genes between C. elegans and C. briggsae, view the alignment in detail, and use sequence conservation profiles to identify conserved regions.


Setup

  1. Open a data import dialog by selecting from the main menu Data > Query Database/Import Data.
  2. Connect to an available EnsEMBL database server by selecting EnsEMBL Server in the left panel, clicking the Connect button, and then clicking Next.
  3. In the next tab panel labelled Data and Species, select Get annotations by EnsEMBL ID from the first panel.
  4. In the second panel, type 'T18D3.4', the Wormbase ID for the C. elegans myo-2 gene. For more information on C. elegans genes see www.wormbase.org.
  5. In this example we want to align the 5kb upstream region of the gene so we change the value of 1000 basepairs to 5000 in the Plus 5' edit box.


  6. Click the Finish button. After several seconds a new track will appear in 3D with this gene and its upstream region displayed.
  7. Now we want to display repeats and gene-associated features.  In the feature tree, click the mini-magnifying glass icon next to the 'Genes and gene predictions' label to expand the list of features below it.


  8. Check the All box. Exons and UTR's should appear on the T18D3.4 gene after a few seconds. If not, make sure that 'Genes and gene predictions' are turned on by clicking the button to the left of the label until it is blue.
  9. Repeat steps 7 and 8 for the 'Repeats' label in the feature tree.  Repeats will not appear for the T18D3.4 gene but will be seen when we load the C. briggsae orthologue.
  10. Right-click on the T18D3.4 gene in the centre of the track.  In the context menu click EnsEMBL orthologues > Get all orthologues to load the C. briggsae gene (there is only one orthologue for T18D3.4). After a few seconds, the C. briggsae gene will appear parallel to T18D3.4 on a new track. Exons, UTR's, and repeats should be visible for this gene.



    Note: To line up the upstream regions of the genes, the new C. briggsae track is reversed since the orthologous gene is on the negative strand while T18D3.4 is not.



  11. Lets mark the upstream regions of these genes.  Select both genes by holding down the CTRL key and clicking on each of them. They should both be highlighted in pink. Be careful to click only on an orange gene/intron feature and not a green exon (or red/blue UTR) or else you will mark the wrong regions.
  12. Right-click on one gene and select Mark regions in the context menu.


  13. In this dialog box, specify -5000 for the Region limit in 5' direction because we want to align the 5kb upstream regions.  It's a good idea to also include some exons of the orthologues since the 5’ and 3’ UTRs are often well defined in C. elegans but not in C. briggsae. This can be done by setting the Region limit in 3' direction value to 500.  Click OK to finish highlighting the regions. The highlighted upstream regions of the genes should now appear in a lighter blue.


  14. At this point it is a good idea to save the current work session so that you don't have to repeat this setup process at a later date. Go to Session > Save from the main menu and specify a title for the session.

Running the alignment

  1. With two highlighted regions to work with you can now start an alignment. Hold down the CTRL key and click on each of the highlighted blue regions. They should both become light purple.
  2. Right-click on one of them and choose Align 2 selected regions from the context menu.


  3. The Initial Alignment Relationship Dialog asks you to specify a relationship to be used for the alignment process. This will affect how the aligned tracks are ordered and may affect how sequence conservation profiles are calculated. Specify an Orthologous relationship and click OK.


  4. The Reference Track Selection Dialog will ask you to select a primary reference track to be used in conjunction with the selected relationship for ordering aligned tracks.  Keep the default C. elegans track and click the OK button.


  5. The next dialog will ask you for an alignment application. Choose 'Lagan' and click OK.


  6. The next dialog will confirm the parameters of your input sequence regions. Note that the C. briggsae region is set on the negative strand. This is correct and was assumed by the dialog because the region is on a reversed track. Click OK.


  7. The next dialog offers parameters that are specific to the alignment method you selected (Lagan in this case). Accept the defaults and click OK.


  8. The alignment should take under 15 seconds to complete. When it does, two new tracks appear with the same features as the original source regions but they are expanded with gaps to represent the alignment.  A sequence conservation profile is displayed on the reference track of the alignment with its threshold line set at the 75th percentile (meaning 75% of the profile scores are below this line).


  9. Because the original tracks that we imported are larger than the alignment tracks, we see less detail of the alignment.  Hide the first two tracks by clicking on the blue buttons beside their names in the track list panel. Now only the alignment tracks should be visible and they will have expanded to fill the full length of the 3D platform.


Viewing the alignment in more detail

  1. To get a detailed 2D view of the alignment right-click on one of the alignment tracks and click on Alignment > Show alignment in the context menu.
  2. The dialog that pops up displays the nucleotide view of the alignment columns as well as the sequence conservation profiles in 2D. In 3D, a purple band highlights the region of the alignment currently visible in the 2D dialog. This band will change dynamically as you scroll or resize the 2D alignment view.



Using the sequence conservation profile to show highly conserved regions

A 'conserved region' in a sequence corresponds to a section of the conservation profile that is above a threshold value. You can define conserved regions dynamically by changing the threshold and profile calculation parameters.

  1. Right-click on the 3D sequence conservation profile graph at the back of the first alignment track. Select Sequence Conservation Profiles to open a dialog box that let's you configure the profiles.
  2. Click on the Settings tab and check the box in the GCR column. 'GCR' stands for Gapped Conserved Regions.


  3. Click the Apply button. The areas of the graph that are above the threshold line are projected onto the track as green conserved regions.


  4. Make the original C. elegans track visible again by clicking the grey buton to the left of its name in the track list. You may have to use the navigational controls to setup the best view of the original upstream region and its aligned counterpart.
  5. In the sequence conservation profile dialog, check the box in the OCR column ('OCR' stands for Original Conserved Regions) and click the Apply button. The conserved regions are now visible on the non-aligned source regions too.


  6. If there are too many conserved regions to work with, increase the profile's threshold. In the sequence conservation profile dialog, slide the percentile slider in the Conserved Region Threshold column until a value close to 90% is displayed. Note that the value to the right of the percentile value is the actual profile score level at which the threshold will be drawn. This value ranges from 0 (no conservation) to 1 (complete conservation).
  7. Click the Apply button. In 3D, the threshold line intersecting the graph is higher and so the conserved regions are smaller.

Related items


Last modified: 2004-02-05