Sockeye Help index
Tutorial: Aligning C. elegans and C. briggsae myo-2
orthologues
This tutorial describes the steps required to align orthologous
genes between C. elegans and C. briggsae, view the
alignment in detail, and use sequence conservation profiles to identify
conserved regions.
Setup
- Open a data import dialog by selecting from the main menu Data
> Query Database/Import Data.
- Connect to an available EnsEMBL database server by selecting EnsEMBL
Server in the left panel, clicking the Connect button, and
then clicking Next.
- In the next tab panel labelled Data and Species, select Get
annotations by EnsEMBL ID from the first panel.
- In the second panel, type 'T18D3.4', the Wormbase ID for the C.
elegans myo-2 gene. For more information on C. elegans
genes see www.wormbase.org.
- In this example we want to align the 5kb upstream region of the
gene so we change the value of 1000 basepairs to 5000 in the Plus 5'
edit box.

- Click the Finish button. After several seconds a new
track will appear in 3D
with this gene and its upstream region displayed.
- Now we want to display repeats and gene-associated
features. In the feature tree, click the mini-magnifying glass
icon next to the 'Genes and gene predictions' label to expand the list
of features below it.

- Check the All box. Exons and UTR's should appear on the
T18D3.4 gene after a few seconds. If not, make sure that 'Genes and
gene predictions' are
turned on by clicking the button to the left of the label until it is
blue.
- Repeat steps 7 and 8 for the 'Repeats' label in the feature
tree. Repeats will not appear for the T18D3.4 gene but will be
seen
when we load the C. briggsae orthologue.
- Right-click on the T18D3.4 gene in the centre of the
track. In the context menu click EnsEMBL
orthologues > Get all orthologues to load the C. briggsae
gene (there is only one orthologue for T18D3.4). After a few seconds,
the C. briggsae
gene will appear parallel to T18D3.4 on a new track. Exons, UTR's, and
repeats should be visible for this gene.

Note: To line up the upstream regions of the genes, the new C.
briggsae track is reversed since the orthologous gene is on the
negative strand while T18D3.4 is not.

- Lets mark the upstream regions of these genes.
Select both genes by holding down the CTRL key and clicking on each of
them. They should both be highlighted in pink. Be careful to
click only on an orange gene/intron feature and not a green exon (or
red/blue UTR) or else you will mark the wrong regions.
- Right-click on one gene and select Mark
regions in the context menu.

- In this dialog box, specify -5000 for the Region limit in 5'
direction because we want to align the 5kb upstream regions.
It's a good idea to also include some exons of the orthologues since
the 5’ and 3’ UTRs are often well defined in C.
elegans but not in C. briggsae. This can be done by setting
the Region limit in 3' direction value to 500. Click OK
to finish highlighting the regions. The highlighted upstream regions of
the genes should now appear in a lighter blue.

- At this point it is a good idea to save the current work session
so that you don't have to repeat this setup process at a later date. Go
to Session > Save from the main menu and specify a
title for the session.
Running the alignment
- With two highlighted regions to work with you can now start an
alignment. Hold down the CTRL key and click on each of the
highlighted blue regions. They should both become light purple.
- Right-click on one of them and choose Align 2 selected regions
from the context menu.

- The Initial Alignment Relationship Dialog asks you to specify
a relationship to be used for the alignment process. This will affect
how the aligned tracks are ordered and may affect how sequence
conservation profiles are calculated. Specify an Orthologous
relationship and click OK.

- The Reference Track Selection Dialog will ask you to select a
primary reference track to be used in conjunction with the selected
relationship for ordering aligned tracks. Keep the default C.
elegans track and click the OK button.

- The next dialog will ask you for an alignment application. Choose
'Lagan' and click OK.

- The next dialog will confirm the parameters of your input
sequence regions. Note that the C. briggsae region is set on
the
negative strand. This is correct and was assumed by the dialog because
the region is on a reversed track. Click OK.

- The next dialog offers parameters that are specific to the
alignment method you selected (Lagan in this case). Accept the defaults
and click OK.

- The alignment should take under 15 seconds to complete. When it
does, two
new tracks appear with the same features as the original source regions
but they are expanded with gaps to represent the alignment. A
sequence conservation profile is displayed on the reference track of
the
alignment with its threshold line set at the 75th percentile (meaning
75% of the profile scores are below this line).

- Because the original tracks that we imported are larger than the
alignment tracks, we see less detail of the alignment. Hide the
first two tracks by clicking on the blue buttons beside their names in
the track list panel. Now only the alignment tracks should be visible
and they will have expanded to fill the full length of the 3D platform.

Viewing the
alignment in more detail
- To get a detailed 2D view of the alignment right-click on one of
the alignment tracks and click on Alignment > Show alignment
in the context menu.
- The dialog that pops up displays the nucleotide view of the
alignment columns as well as the sequence conservation profiles in 2D.
In 3D, a purple band highlights the region of the alignment currently
visible in the 2D dialog. This band will change dynamically as you
scroll or resize the 2D alignment view.

Using the
sequence conservation profile to show highly conserved regions
A 'conserved region' in a sequence
corresponds to a section of the conservation profile that is above a
threshold value. You can define conserved regions dynamically by
changing the threshold and profile calculation parameters.
- Right-click on the 3D sequence conservation profile graph at the
back of the first alignment track. Select Sequence Conservation
Profiles to open a dialog box that let's you configure the profiles.
- Click on the Settings tab and check the box in the GCR
column. 'GCR' stands for Gapped Conserved Regions.

- Click the Apply button. The areas of the graph that are
above the threshold line are projected onto the track as green
conserved regions.

- Make the original C. elegans track visible again by
clicking the grey buton to the left of its name in the track list. You
may
have to use the navigational controls to setup the best view of the
original upstream region and its aligned counterpart.
- In the sequence conservation profile dialog, check the box in the
OCR column ('OCR' stands for Original Conserved Regions) and
click the Apply button. The conserved regions are now visible
on
the non-aligned source regions too.

- If there are too many conserved regions to work with, increase
the profile's threshold. In the sequence conservation profile dialog,
slide the percentile slider in the Conserved Region Threshold column
until a value close to 90% is displayed. Note that the value to the
right of the percentile value is the actual profile score level at
which
the threshold will be drawn. This value ranges from 0 (no conservation)
to 1 (complete conservation).
- Click the Apply button. In 3D, the threshold line intersecting
the graph is higher and so the conserved regions are smaller.
Related items
Last modified: 2004-02-05