DNA Sequence Alignment Analysis

Size: px
Start display at page:

Download "DNA Sequence Alignment Analysis"

Transcription

1 Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes will be a set of exercises that you can work through at your own speed, getting as far as seems reasonable. All the software is freely available (we sites are given below) and you will get copies of the sequence alignment so that you can use it as an example file and work on it later, if needed. The questions are intended to show you how to use the software for analysis of diversity within species and divergence between them, and to focus on some of the concepts covered in the lectures. We shall analyse some data on sequences of two genes from some closely related plants, the white campion, Silene latifolia, and two related species, the closely related S. dioica and the more distant relatives S. vulgaris and S. conica. S. latifolia and S. dioica have separate males and females, Y with an X/Y male sex determining system, while S. vulgaris and S. conica are X hermaphroditic species, and this is presumably the ancestral state. Several genes have been found on the Y chromosomes of S. latifolia. In such cases, it is interesting to test whether the Y-linked gene is evolving as expected for a functional gene, or is showing signs of losing function. One kind of test that can be done is based on analyses of the sequences. If the Y-linked gene has remained functional, its sequence should diverge from other sequences of homologous genes more slowly for non-synonymous (amino acid replacement) sites than for synonymous sites. In other words, the KA value should be lower than KS (the KA/KS ratio <1). If the Y-linked copy is losing function (or is nonfunctional), its sequence should accumulate non-synonymous changes more often than the X-linked copy. Detailed instructions and description of the data files You will be given a file with 60 sequences of the gene (the sequences are in FASTA format, which can be read by many software packages, including both MEGA and DNAsp). These two programs are very useful for preliminary analysis of sequence data and both are free (see web sites below). The FASTA files were made by aligning the sequences using SeAl2 (a good program for adjusting alignments by hand, available from this web site: and then exporting in FASTA format, opening them in BioEdit ( and saving

2 Analysis of DNA sequence data p. 2 again as a new FASTA file that other programs are happy with. Because MEGA gives an annoying error when one tries to save the file opened from the FASTA file from either of these programs, I have made a separate file in MEGA format. The file names are: SileneXYgene4BioEditdata.fas SileneXYgene4data.meg You can get the file from this web site (or from a CD which will be provided): and the files are SileneXYgene4BioEditdata.fas SileneXYgene4data.meg Copy the files into the Workspace folder (and give back the CD, if you used that). NOTE that the file will not remain after you log off. The files contains sets of genomic DNA sequences from the three species, two sets (X and Y) from S. latifolia and S. dioica, and one sequence from each of the two hermaphrodite species, which are used purely as outgroups. Analyses using MEGA 3.1 (Molecular Evolutionary Genetics Analysis) You can get the software for yourself (free) from: (1) Find the MEGA software (under the Programs menus) and start it up. In the Start menu, select the following sequence of folders All programs School Applications Science & Engineering Biological Science Summer School MEGA3.1 (2) From MEGA, open the file for gene 4: SileneXYgene4data.meg ( activate a data file ). It will ask if it is nucleotide sequence, so click ok. Also click OK to the question about whether it is coding sequence. The text file editor opens again, showing your sequence data. (3) Look at the sequences, using the Display option. You will see the sequence names listed in the left-hand column. The file contains the following sequences (a total of 144): First are sequences from two dioecious species (X and Y sequences are included from both) Silene latifolia X (40 sequences, from various populations, labeled X4) and Y (45 sequences, from the same populations, labeled Y4)

3 Analysis of DNA sequence data p. 3 Silene dioica X and Y (30 and 27 sequences, respectively, labeled X4D and Y4D) Then sequences from the two hermaphrodite species (only one sequence, non-sex-linked, per species) Silene conica Silene vulgaris The sequences are partial. They do not include the complete coding sequence, so the sequence does not start with the start codon, but the coding sequence does end with the stop codon (the last part is 3' non-coding sequence). The total length is 1543 nucleotides. (4) At this point, the program does not know where the coding sequence regions are, since the reading frame has not been specified. Gene 4 has few introns, so it is simple to analyse. In this case, the coding sequence starts with the first nucleotide of the first codon, but the first and last parts of the sequence are non-coding. To give the program this information, you need to use the first and last nucleotide positions of each non-coding region from the following tables, which also give, for each exon, the position in its codon of the first nucleotide. MEGA has a menu to assign domains as coding or non-coding, and also to select the correct number for the position in its codon of the first nucleotide in the sequence (from the table below; note that the stop codon starts after position 1353, and will therefore be treated by the program as non-coding if you include it, some analysis will give an error message saying that a stop codon is found in the coding sequence). This is "select & edit genes/domains" (3rd tab from the left). Continue to enter the relevant data for each coding and non-coding sequence. Then close the window. Gene 4 information Exon Exon positions Intron or noncoding positions First nucleotide position in codon It is often a good idea to draw a rough picture of the gene, showing the region sequenced, the length of the region, and the introns and exon positions. Now notice the labelling along the top of the window. The codons are shown. Select the menu to translate the sequences, and check that the amino acids look OK and that there are no stop codons (indicated by *; the actual stop codon is at positions in the alignment. If you made a mistake, it is easy to correct just open the window again and change the positions, but NOTE that a position cannot be named as part of 2 different domains, so you have to work around this limitation. Now look at the nucleotide sequence again. Note the numerous indels (gaps)? What regions of the sequence are they in? (5) To identify sets of sequences, you can make groups using the Edit/Select taxa and

4 Analysis of DNA sequence data p. 4 groups menu (the 2 nd icon from the left in the row of icons at the top of the screen). You will see the S. latifolia X-linked sequences first, then the Y-linked set, then the X- and Y- linked sets for S. dioica. You can make named sets, which appear in the left-hand window, and transfer sequences into these from the right-hand window, using the small arrow. Make the 4 sets for these sequences. There is no need to make sets for the single S. vulgaris and S. conica sequences. Again, if you made a mistake, it is easy to correct it. When you return to the sequence viewer, you will see that the names of the groups of sequences are displayed in the left-hand column. You can save the file by selecting Write data to file and giving it a new name (e.g. adding your initials or something to indicate that the new version contains the intron-exon and species data). (6) You can select options to have the program mark with a colour all sites of some particular type, e.g. variable sites, and to output a table showing them, which can be imported into Excel to make a figure. You must of course select the sequences to include, otherwise it will include all of them, and then the variable sites will include both (i) polymorphic sites within either of the species, including X-Y fixed differences, as well as (ii) differences between the species. You can select the sequences to include with the Edit/Select taxa and groups menu. Click on the box by any set you want to omit, and the tick mark should go away. This sequence (or set of sequences) will no longer be included in analyses. To reverse this decision, click to get the tick back again. To make a list of all polymorphic sites within S. latifolia (including X-Y differences), remove all other sequences from the analysis, using the function just described. Then select the function to mark variable sites. NOTE the count of the number of such sites in the bar at the bottom of the data viewer screen. If it looks ok, you can ask it to Write data to file, choosing the option marked sites only. (7) Use the Construct Phylogeny function in the main MEGA window to make a tree using the sequences. You are given various options for the type of tree (we can use Neighbour- Joining, or NJ), the site type you want to analyse, the statistics you are interested in, and the region of the sequence you want to consider. It can use all sites, just synonymous sites, etc. and there are several options for evolutionary models, depending on whether the sites are coding or not, and also whether to use Jukes-Cantor correction, a correction for saturation of the sites that occurs when the sequence are highly divergent. There is an option for whether to display the bootstrap values on the tree figure (choose 1000 bootstraps). If you see only S. latifolia in your tree, you probably failed to reverse the decision to restrict analyses to just this species (see 6 above); you can do this and re-do the tree. If you did it correctly, it labels the sequences with their names and also shows the group names. If you want to save the tree, select Copy to clipboard under the Image menu. You can then paste it into PowerPoint, and then can add text to record the details of what analysis you actually did. You might try a different analysis to see what difference it makes.

5 Analysis of DNA sequence data p. 5 What do the results tell us? Here are some things to look for. (i) Is either gene a pseudogene? (ii) Are Y sequences less variable than X sequences? What do the trees suggest? To find numbers of polymorphisms in the two data sets, go back to (6) above and compare the numbers of X and Y polymorphisms in S. latifolia, and the number of fixed X-Y differences. Table 1. Numbers of differences between X and Y sequences of S. latifolia, and umbers of variants in each of them. Chromosome X polymorphisms Y polymorphisms Number of sequences X-Y fixed Number of variants These results and conclusions are helpful, but are only a preliminary analysis, and ideally we want to quantify variability and test the significance of any possible difference. We will see how to do this with DNAsp. (iii) Do the gene trees of these X and Y-linked genes agree with the species tree of the species that have sex chromosomes? How do you interpret what you see?

6 Analysis of DNA sequence data p. 6 Analyses using DnaSP You can get the software for yourself (free) from: (1) Find the DnaSP 4 software and start it up. (2) From DNAsp, open the file for gene 4: SileneXYgene4BioEditdata.fas (it allows only one file to be open at a time). (3) Look at the sequences, using the Display option. You will see the same sequences as with MEGA (if you did not do the MEGA exercise, look at part (3) above for an explanation). (4) To tell the program where the coding sequence regions are, choose Assign coding regions from the Data menu and assign the regions (see table above) folloing the instructions in the dialog box. Now notice the labelling along the top of the window: the codons and their amino acids should be shown (when introns are present, these sites are labelled N for non-coding). Also look at the alignment To save the file after adding this information, use the File menu to save the file under a new name in Nexus format; it will appear as FileName.nex. Choosing this format ensures that these details remain available next time DNAsp opens the file, so you don t have to do all this work over again. (5) We will first estimate diversity, to test whether the impression of different values for the X and Y genes is correct. As with MEGA, you must first name the sequence sets, to identify them for analyses. In the Data menu, select the Define sequence sets option. You will see the 144 sequences listed in the left-hand window. First select the S. latifolia Y-linked sequences and define a set for your analyses, and name it. Then define a set of the X-linked sequences from the same species. Then make a third set with just the S. vulgaris sequence (which will be needed later, and also two sets, one with a single S. latifolia Y sequence, and one X). Use the Polymorphism and divergence function to analyse the diversity. You are given various options for the site type to analyse, the statistics you are interested in, and the region of the sequence you want to consider. You must of course select the sequences to include. Click on the Data set option. You will see the sets of sequences with the names you gave them. First choose to estimate diversity for S. latifolia X-linked alleles, then Y-linked. If you use the default option to include divergence, the analysis will use only the regions of sequence that are present in the sequence chosen for divergence; thus, if you include divergence between the S. latifolia Y- and X- linked pair, you will have a fair comparison of diversity of each of the two sets. Now click OK. The program will calculate divergence values for different types of sites,

7 Analysis of DNA sequence data p. 7 such as synonymous and non-synonymous sites, or intron sites. As you get the results, enter them in the table below. You will find it helpful to use the Pi(a)/Pi(s) ratio option. This gives you all the different types of sites in one output screen. Enter the results in Table 2 (if time is running short, there is no need to complete everything the point is to understand what the items are, and you can return to the exercise later if you want). The numbers of polymorphisms at synonymous and non-synonymous sites are also given. Non-coding sites include intron and other non-coding sites. Table 2. Variability within S. latifolia Type of site X-linked Number of sites Numbers of variable sites Within-species diversity (π values) π = All site types Synonymous π S = Non-synonymous π A = Non-coding π Non-coding = Silent π Silent = Y-linked All site types Synonymous Non-synonymous Non-coding Silent (i) Does diversity appear to differ between X- and Y-linked samples? (6) Next, analyse divergence between the sets of sequences: S. latifolia X-and Y-linked versus the hermaphrodite species S. vulgaris, using the groups you defined earlier. Use the analysis you already used, plus DNA divergence between populations. Enter the results in Table 3. Be sure that you understand how the numbers of synonymous and non-synonymous sites are calculated, and why these are not whole numbers (a brief outline of one method is in the Selection Basics lecture, and Graur and Li's book has an outline on page 81 onwards). Briefly, the reason is that only fourfold degenerate sites have only synonymous changes and only non-degenerate sites have only non-synonymous changes some changes at twofold degenerate sites are synonymous and some are non-synonymous, and this has to be taken into account. Thus it is not a simple matter of counting sites, but the numbers are estimated and a model is involved, e.g. some methods assume that any change from one nucleotide to another is equally likely (which is untrue transition rates in sequences are generally > transversion rates).

8 Analysis of DNA sequence data p. 8

9 Analysis of DNA sequence data p. 9 Table 3. Divergence between S. latifolia and S. vulgaris, and between S. latifolia Y and autosomal sequences. Type of site and comparison Divergence (K A K S, etc.) Numbers of sites analysed Ka/Ks S. latifolia X-linked versus Y- linked All site types Synonymous. K S Non-synonymous, K A Non-coding, K Non-coding Silent, K Silent X-linked versus S. vulgaris All site types Synonymous, K S Non-synonymous. K A Non-coding, K Non-coding Silent, K Silent Y-linked versus S. vulgaris All site types Synonymous, K S Non-synonymous. K A Non-coding, K Non-coding Silent, K Silent Some further questions. (ii) Is the Y-linked gene degenerated? (iii) Has the Y-linked gene undergone more non-synonymous substitutions than the X? To test this, use estimated numbers of differences at synonymous and nonsynonymous sites, you can use the 'Preferred and Unpreferred Synonymous Substitutions' function (also in the Analysis menu) to determine the numbers of substitutions in the S. latifolia X and Y sequences, since they started to diverge from the outgroup (S. vulgaris). The analysis window has a diagram that will help you to understand the idea (use the analysis with a near and far outgroup). Enter the results in Table 4. Table 4. Non-synonymous and synonymous substitutions in the S. latifolia X and Y sequences, using an outgroup (S. vulgaris). Y X Non-synonymous Synonymous

10 Analysis of DNA sequence data p. 10 McDonald-Kreitman tests This is a very simple, but important, test that has a good chance of correctly detecting selection even when the population is subdivided (see Graur and Li pages 63-64). There is an item under the Analysis windows. This is not the most appropriate data set for applying this test, but I added it because of its importance. There are 2 interesting questions we could use it for. (a) We might wonder whether the higher diversity of the X-linked locus (see above) could be due to balancing selection at this gene, and this test is one way to examine this. If so, we expect an excess of non-synonymous polymorphisms within S. latifolia against the number expected based on divergence from an outgroup (we can use S. vulgaris). The test uses synonymous or silent sites to take account of possible mutation rate differences (see Answers section). (b) We could also use this test to see whether it is likely that the Y has undergone an excess of non-synonymous substitutions since the split S. vulgaris, which might suggest that slightly deleterious mutations are accumulating, due to the low effective size of the Y-linked gene (see Brian Charlesworth s lecture). HKA tests (iv) Is the X-Y diversity difference statistically significant? We can do an HKA test in DNAsp, using the results in Tables 2 and 3. The analysis is in the Tools menu (don t use the one in the Analysis menu it is for comparing 2 parts of a single gene). How about silent sites? Here is a table to enter the results needed to do the HKA test: Intra-specific variability Number of segregating sites Total number of sites Sample size Inter-species divergence from S. vulgaris Average number of differences (D XY ) Total number of sites All sites Silent sites X Y X Y (v) What might explain the low diversity of the Y-linked gene?

11 Analysis of DNA sequence data p. 11 (7) Raw versus net divergence. Because there is variation among the sequences in a group, the mean divergence includes two components: I: the differences between the sequences of the two species, and also II: differences between the S. latifolia sequences. If these latter differences were very high, we would not want to include them in estimates of divergence (since we are interested in the extent of substitutions between the species, or fixed differences). It is therefore reasonable to subtract the mean within-species diversity from the raw divergence, D XY, to get the net divergence D a = K - (k species1 + k species2 )/2 To illustrate the difference between the two measures, do a divergence between S. latifolia and S. dioica X- and Y-linked sequences, using the analysis DNA divergence between populations. Enter the results in Table 5. Table 5. Divergence and net divergence from S. latifolia for all site types. X-linked versus S. dioica versus S. vulgaris D XY (JC= ) (JC= ) D a (JC= ) (JC= ) Y-linked versus S. dioica versus S. vulgaris D XY (JC= ) (JC= ) D a (JC= ) (JC= ) The output shows the divergence values and also values with the Jukes-Cantor correction (labelled JC ). You might want to write down both versions and consider whether this correction is required for these species. NOTE that this analysis does not calculate separate divergence values for synonymous and non-synonymous sites, but just deals with all sites. To calculate synonymous and non-synonymous divergence, you have to use the Polymorphism and divergence analysis, which does give you that option (but that analysis doesn t give net divergence how could you estimate those values?). A further analysis would be to compare Fst values between the two closely related dioecious species with values between populations within either species. To do this, you would need to define more sequence sets, using the same menu as before (the sequence names indicate the populations). The analysis is Gene flow and genetic differentiation. To get statistical tests of subdivision, choose the option in the dialog box Perform the "Permutation test.

12 Analysis of DNA sequence data p. 12 (vi) Test for recombination in the X-linked genes is there evidence for recombination? (vii) Another X-Y gene pair was studied, and the X-Y divergence value was 2%. What could account for the different values for the two genes? If you have time, you can try other analyses. For instance, you can compare F st values between the two closely related dioecious species with values between populations within either species.

13 APPENDIX: Some tips for using MEGA and DNAsp Analysis of DNA sequence data p. 13 MEGA should open FASTA files. This is how to do it. Before this, check that each sequence has a different name, and that names are not too long. Another tip is that DNAsp doesn't accept alignments where the first sequence starts with gaps. (1) Open MEGA (2) From the file menu, choose open the file you need. It generally says error # 5520 line too long on line 1. Ignore this (click OK). The select the funny little icon (the one to the right of the print icon) with the arrow pointing downwards; this converts to MEGA format (to see text indicating the icons meanings, move the cursor over the icon and at the right position, text will appear). A dialog box appears. Select ok (don t try to select data format). The text file then appears in the window in MEGA format. Save it as a new name, so you can tell which version is in MEGA format (ignore the save as type selecting window). Close all windows in the text editor. It often crashed MEGA. Press Control/Alt/Delete keys simulataneously to get the Task Manager window, which allows you to quit from MEGA. It will have saved the file ok and you can then re-start the program and open the file to carry on with analyses as below. (3) Click the text to activate a data file, and select the file you just made. It will ask if it is nucleotide sequence, so click ok, as it normally will be that kind of sequence. Also click OK to the question about whether it is coding sequence. The text file editor opens again, showing your sequence data. Again, it often says error # line XX and the cursor moves to the offending line. a. Sometimes this means that it thinks 2 different sequences have the same name (because the software doesn t seem to take in the complete names you gave your sequences). You can edit the name where it stopped, or the one with the duplicate name (often the sequence before the one where it stopped) to change it slightly, or whatever is needed to make it acceptable. Then save and close the file. Repeat the process of activating the data file. b. Generally, there is nothing wrong with the sequence names, but if you change them slightly (e.g. add x at some position) the software is happy with this sequence next time you activate the file (but it may stop at a different one, and so one sometimes has to change lots of names, each time saving and closing the file, then re-activating it). When it is all satisfactory to MEGA, it will ask if it is coding or not. Click Yes, unless the entire sequence is non-coding. (4) From the Data menu, choose Data explorer. This displays the entire set of sequences, and (if cdna sequences are included) it indicates which parts are coding and which non-coding; by showing codons at the top of the window (indicated by a faint box around each 3 nucleotides), so you can check that all is as you expected, and you can note the position in the codon of the first base of each exon (needed in the next step). (5) Enter the exon and intron positions a. This property is very helpful, so it is often best, after aligning and importing sequences, to work first in MEGA, to determine exon and intron positions, and note them down before opening data in DNAsp. Clicking on

14 Analysis of DNA sequence data p. 14 the base in a column (position) gives its number in small characters at the bottom left of the data explorer window. b. You can check the translation with the right hand icon. If you need to change the reading frame, use the Data menu: Select genes and domains menus to tell it the first site s codon position. (6) Set up the groups of sequences, using the Data menu. NOTE: take care that a group name is selected before trying to add sequences to a group, otherwise it crashes in a bad way. (7) Save data to file. Use MEGA format, and give it a different name. Then, after closing the data file, the information will still be there next time you use MEGA to open it. Now you are ready to do analyses. Menus on the small MEGA window are simple to use, and pretty obvious what they do. You can select which sequences to include, and you can un-select some sequences in the Data explorer window, and use the remaining ones to see variable sites, and some other things that can be useful.

15 Analysis of DNA sequence data p. 15 ANSWERS MEGA Table 1. Numbers of differences between X and Y sequences of S. latifolia, and umbers of variants in each of them. Chromosome Number of sequences Number of variants X polymorphisms Y polymorphisms 45 5 X-Y fixed 101 The number 101 is calculated assuming that there are no shared variants, so that the total number of variants when both X and Y sequences are included (216) is the sum of the numbers in the first 2 rows + the number of X-Y fixed differences. Thus I subtracted from this, to get the 101. It is quite simple to check that the X and Y share no polymorphic sites (if they did, this calculation would give the wrong numbers of fixed differences); I just listed the 5 Y polymorphic sites and had MEGA display the X polymorphisms and looked to see if any of those sites is among the 5. What do the results tell us? Here are some things to look for. (i) Is either gene a pseudogene? Probably not no stop codons in coding sequence, and no frame-shifts (ii) Are Y sequences less variable than X sequences? Yes, but these results are only a preliminary analysis, and ideally we want to quantify variability and test the significance of any possible difference. We will see how to do this with DNAsp. (iii)do the gene trees of these X and Y-linked genes agree with the species tree of the species that have sex chromosomes? No, the Y-linked sequences do, but not the X. The higher X variability suggests that diversity was high in the ancestor, and thus lineage-sorting occurred, so that different sequences are present in the X sets of both species. An alternative is that introgression has occurred, but that, for some reason, the Y does not introgress.

16 Analysis of DNA sequence data p. 16 DNAsp Table 2. Variability within S. latifolia NOTE 1: I used the analysis including divergence between S. latifolia X and Y, and entered those results in Table 3; other choices will give different numbers in these tables). NOTE 2: It is not simple to get numbers of variable sites. The numbers given in the output of this diversity analysis are numbers of polymorphisms, so a site that has 3 different nucleotides may be counted as 2 different polymorphisms (as at least 2 mutations must have occurred). This explains why numbers of variable sites differ when you extract them from other analyses. For example, the number of "substitutions" is 111 in the diversity analysis of X alleles, using all sites, but the analysis says that there are 103 polymorphic sites (so that is the number I put in this table); presumably 8 sites have > 2 different nucleotides. For the coding region sites, the two counts are the same. For non-coding regions, we should thus have =53 polymorphic sites, again suggesting 61-53=8 sites with > 2 different nucleotides (and the McDonald-Kreitman table using all silent sites, not just synonymous ones, says 55). Then for silent sites we should have 35+53=89 (against 96"substitutions" as counted by the program). Table 2. Variability within S. latifolia Type of site Number of sites Number of variable sites Within-species diversity (π values) X-linked All site types π = Synonymous π A = Non-synonymous π S = Non-coding π Non-coding = Silent π Silent = Y-linked All site types As above Synonymous (see note 1) 0 0 Non-synonymous 0 0 Non-coding Silent NOTE 3: π A /π S = for X, but cannot be estimated for Y (no variants).

17 Analysis of DNA sequence data p. 17 Table 3. Divergence between S. latifolia and S. vulgaris, and between S. latifolia Y and autosomal sequences. See NOTE 1 above. Type of site and comparison Divergence (K A or K S with JC correction) Number of sites analysed Ka/Ks S. latifolia X-linked versus Y- linked All site types Synonymous Non-synonymous Non-coding Silent X-linked versus S. vulgaris All site types Synonymous Non-synonymous Non-coding Silent Y-linked versus S. vulgaris All site types Synonymous Non-synonymous Non-coding Silent Questions. (i) Does diversity appear to differ between X- and Y-linked samples? Yes X >> Y. (ii) Is the Y-linked gene degenerated? Not evidently Ka << Ks in all comparisons in Table 3 above. (iii)has the Y-linked gene undergone more non-synonymous substitutions than the X? No. According to the analysis using the outgroup S. vulgaris, the numbers of changes are as follows, and the difference is not significant by a 2 x 2 contingency test (see DNAsp Tools menu).

18 Analysis of DNA sequence data p. 18 Table 4. Non-synonymous and synonymous substitutions in the S. latifolia X and Y sequences, using an outgroup (S. vulgaris). Non-synonymous Synonymous Y 6 20 X 0 5 McDonald-Kreitman tests (a) Is the higher diversity of the X-linked locus (see above) could be due to balancing selection at this gene? If so, we expect an excess of non-synonymous polymorphisms within S. latifolia against the number expected based on divergence from an outgroup (we can use S. vulgaris). The test uses synonymous or silent sites to take account of possible mutation rate differences. The results are as follows and the test statistic is non-significant. X vs. S. vulgaris Fixed Polymorphic Synonymous Non-synonymous 2 15 (b) Is it likely that the Y has undergone an excess of non-synonymous substitutions since the split with S. vulgaris, which might suggest that slightly deleterious mutations are accumulating, due to the low effective size of the Y-linked. The results are as follows: Y vs. S. vulgaris Fixed Polymorphic Synonymous Non-synonymous 11 0 The test cannot be done, because of the absence of polymorphisms. To try and see if there is any likelihood that the above process might be happening, we could set that value to 1 and use DNAsp s Tools menu to do a 2x2 contingency table test. It is non-significant, i.e. there is no evidence for an undue amount of non-synonymous substitution. NOTE that this test is a less good one than the previous one, because all substitutions are included and we cannot separate them into those that occurred specifically in the Y-linked lineage. (iv) Is the difference between X- and Y-linked samples statistically significant? Yes, by HKA test using divergence from S. vulgaris. According to my results, the numbers are as in the table below, and both sets give significant results after selecting the appropriate type of gene (X and Y linked think about why this is needed).

19 Analysis of DNA sequence data p. 19 All sites Silent sites X Y X Y Intra-specific variability Number of segregating sites (NOTE 4) Total number of sites Sample size Inter-species divergence from S. vulgaris Average number of differences (D XY ) (NOTE 5) (NOTE 5) Total number of sites *328 =51.51 (NOTE 6) *678 = (NOTE 6) NOTE 4: From the DNA diversity and divergence analysis, using S. vulgaris for divergence. NOTE 5: I got these from the DNA Divergence between populations analysis NOTE 6: I multiplied divergence per silent site by the number of silent sites (data from Table 3). (v) What might explain the low diversity of the Y-linked gene? One possibility is that degeneration is occurring, and that hitch-hiking events are reducing its diversity. Another possibility is a much lower effective size for the Y than the X, e.g. due to strong sexual selection such that there is a high variance of male reproductive success. This predicts that autosomal genes diversity should be reduced, relative to that of X-linked genes (because X-linked genes are carried in males 1/3 of the time, versus ½ for autosomal genes). (7) Raw versus net divergence. Table 5. Divergence and net divergence from S. latifolia for all site types. X-linked versus S. dioica versus S. vulgaris D XY (JC= ) (JC= ) D a (JC= ) (JC= ) Y-linked versus S. dioica versus S. vulgaris D XY (JC= ) (JC= ) D a (JC= ) (JC= )

20 Analysis of DNA sequence data p. 20 Jukes-Cantor correction is required for the more distant species for the X data (where diversity is high within S. latifolia), but not for Y, where there are few variants within the species. Compare F st values between the two closely related dioecious species with values between populations within either species. To do this, you would need to define more sequence sets, using the same menu as before (the sequence names indicate the populations). The analysis is Gene Flow & Genetic Differentiation? To get statistical tests of subdivision, choose the option in the dialog box Perform the Permutation Test. (vi) Test for recombination in the X-linked genes is there evidence for recombination? The X sequences yield a minimum number of recombination events, Rm = 6 (and Y gives zero). The X don't fit zero recombination. I did an analysis of diversity within the X sequence set, and then a coalescent simulation for haplotype diversity, given theta. The program takes the results from the diversity analysis and enters them in the relevant boxes, and you can run the simulation and see the 95% confidence intervals, which show that the observed value has a low probability if the simulation is run with zero recombination, but it is not quite significant (other variants of this analysis, e.g. using the number of segregating sites, are highly significant). The Y data set fits better (but also fits recombination > 0, so it's not really conclusive). (vii) Another X-Y gene pair was studied, and the X-Y divergence value was 2%. What could account for the different values for the two genes? One possibility is that degeneration is occurring, and that hitch-hiking events are reducing its diversity. Another possibility is a much lower effective size for the Y than the X, e.g. due to strong sexual selection such that there is a high variance of male reproductive success. This predicts that autosomal genes diversity should be reduced, relative to that of X-linked genes (because X-linked genes are carried in males 1/3 of the time, versus ½ for autosomal genes).

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

DnaSP, DNA polymorphism analyses by the coalescent and other methods. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. Sánchez-DelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

SPSS Workbook 1 Data Entry : Questionnaire Data

SPSS Workbook 1 Data Entry : Questionnaire Data TEESSIDE UNIVERSITY SCHOOL OF HEALTH & SOCIAL CARE SPSS Workbook 1 Data Entry : Questionnaire Data Prepared by: Sylvia Storey s.storey@tees.ac.uk SPSS data entry 1 This workbook is designed to introduce

More information

Biological Sciences Initiative. Human Genome

Biological Sciences Initiative. Human Genome Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

More information

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

More information

Phylogenetic Trees Made Easy

Phylogenetic Trees Made Easy Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

PRINCIPLES OF POPULATION GENETICS

PRINCIPLES OF POPULATION GENETICS PRINCIPLES OF POPULATION GENETICS FOURTH EDITION Daniel L. Hartl Harvard University Andrew G. Clark Cornell University UniversitSts- und Landesbibliothek Darmstadt Bibliothek Biologie Sinauer Associates,

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Searching Nucleotide Databases

Searching Nucleotide Databases Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames

More information

Instructions for applying data validation(s) to data fields in Microsoft Excel

Instructions for applying data validation(s) to data fields in Microsoft Excel 1 of 10 Instructions for applying data validation(s) to data fields in Microsoft Excel According to Microsoft Excel, a data validation is used to control the type of data or the values that users enter

More information

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO): Replaces 260806 Page 1 of 50 ATF Software for DNA Sequencing Operators Manual Replaces 260806 Page 2 of 50 1 About ATF...5 1.1 Compatibility...5 1.1.1 Computer Operator Systems...5 1.1.2 DNA Sequencing

More information

USC Marshall School of Business Academic Information Services. Excel 2007 Qualtrics Survey Analysis

USC Marshall School of Business Academic Information Services. Excel 2007 Qualtrics Survey Analysis USC Marshall School of Business Academic Information Services Excel 2007 Qualtrics Survey Analysis DESCRIPTION OF EXCEL ANALYSIS TOOLS AVAILABLE... 3 Summary of Tools Available and their Properties...

More information

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA Page 1 of 5 Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA Genetics Exercise: Understanding how meiosis affects genetic inheritance and DNA patterns

More information

9 Calculated Members and Embedded Summaries

9 Calculated Members and Embedded Summaries 9 Calculated Members and Embedded Summaries 9.1 Chapter Outline The crosstab seemed like a pretty useful report object prior to Crystal Reports 2008. Then with the release of Crystal Reports 2008 we saw

More information

Basic Analysis of Microarray Data

Basic Analysis of Microarray Data Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.

More information

REDUCING YOUR MICROSOFT OUTLOOK MAILBOX SIZE

REDUCING YOUR MICROSOFT OUTLOOK MAILBOX SIZE There are several ways to eliminate having too much email on the Exchange mail server. To reduce your mailbox size it is recommended that you practice the following tasks: Delete items from your Mailbox:

More information

So you want to create an Email a Friend action

So you want to create an Email a Friend action So you want to create an Email a Friend action This help file will take you through all the steps on how to create a simple and effective email a friend action. It doesn t cover the advanced features;

More information

MEDIAplus administration interface

MEDIAplus administration interface MEDIAplus administration interface 1. MEDIAplus administration interface... 5 2. Basics of MEDIAplus administration... 8 2.1. Domains and administrators... 8 2.2. Programmes, modules and topics... 10 2.3.

More information

Paternity Testing. Chapter 23

Paternity Testing. Chapter 23 Paternity Testing Chapter 23 Kinship and Paternity DNA analysis can also be used for: Kinship testing determining whether individuals are related Paternity testing determining the father of a child Missing

More information

Document Management Quick Start and Shortcut Guide

Document Management Quick Start and Shortcut Guide Document Management Quick Start and Shortcut Guide For the attention of SystmOne users: This document explains the basic Document Management functionality. It is highly advisable that you read the in-detail

More information

GeoGebra Statistics and Probability

GeoGebra Statistics and Probability GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,

More information

Database Studio is the new tool to administrate SAP MaxDB database instances as of version 7.5.

Database Studio is the new tool to administrate SAP MaxDB database instances as of version 7.5. 1 2 3 4 Database Studio is the new tool to administrate SAP MaxDB database instances as of version 7.5. It replaces the previous tools Database Manager GUI and SQL Studio from SAP MaxDB version 7.7 onwards

More information

DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

Becker Muscular Dystrophy

Becker Muscular Dystrophy Muscular Dystrophy A Case Study of Positional Cloning Described by Benjamin Duchenne (1868) X-linked recessive disease causing severe muscular degeneration. 100 % penetrance X d Y affected male Frequency

More information

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

More information

Genomes and SNPs in Malaria and Sickle Cell Anemia

Genomes and SNPs in Malaria and Sickle Cell Anemia Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing

More information

Software Application Tutorial

Software Application Tutorial Software Application Tutorial Copyright 2005, Software Application Training Unit, West Chester University. No Portion of this document may be reproduced without the written permission of the authors. For

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

MCB41: Second Midterm Spring 2009

MCB41: Second Midterm Spring 2009 MCB41: Second Midterm Spring 2009 Before you start, print your name and student identification number (S.I.D) at the top of each page. There are 7 pages including this page. You will have 50 minutes for

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman An Introduction to SPSS Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman Topics to be Covered Starting and Entering SPSS Main Features of SPSS Entering and Saving Data in SPSS Importing

More information

Working with SPSS. A Step-by-Step Guide For Prof PJ s ComS 171 students

Working with SPSS. A Step-by-Step Guide For Prof PJ s ComS 171 students Working with SPSS A Step-by-Step Guide For Prof PJ s ComS 171 students Contents Prep the Excel file for SPSS... 2 Prep the Excel file for the online survey:... 2 Make a master file... 2 Clean the data

More information

Y Chromosome Markers

Y Chromosome Markers Y Chromosome Markers Lineage Markers Autosomal chromosomes recombine with each meiosis Y and Mitochondrial DNA does not This means that the Y and mtdna remains constant from generation to generation Except

More information

Chapter 2 Introduction to SPSS

Chapter 2 Introduction to SPSS Chapter 2 Introduction to SPSS Abstract This chapter introduces several basic SPSS procedures that are used in the analysis of a data set. The chapter explains the structure of SPSS data files, how to

More information

Intro to Excel spreadsheets

Intro to Excel spreadsheets Intro to Excel spreadsheets What are the objectives of this document? The objectives of document are: 1. Familiarize you with what a spreadsheet is, how it works, and what its capabilities are; 2. Using

More information

Input Data Files (FASTA format; MEGA format; NBRF/PIR format; NEXUS format; PHYLIP format; HapMap3

Input Data Files (FASTA format; MEGA format; NBRF/PIR format; NEXUS format; PHYLIP format; HapMap3 DnaSP Version 5 Help Contents Running DnaSP, press F1 to view the context-sensitive help. What DnaSP can do Introduction System requirements Input and Output Input Data Files (FASTA format; MEGA format;

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Dreamweaver and Fireworks MX Integration Brian Hogan

Dreamweaver and Fireworks MX Integration Brian Hogan Dreamweaver and Fireworks MX Integration Brian Hogan This tutorial will take you through the necessary steps to create a template-based web site using Macromedia Dreamweaver and Macromedia Fireworks. The

More information

Databases in Microsoft Access David M. Marcovitz, Ph.D.

Databases in Microsoft Access David M. Marcovitz, Ph.D. Databases in Microsoft Access David M. Marcovitz, Ph.D. Introduction Schools have been using integrated programs, such as Microsoft Works and Claris/AppleWorks, for many years to fulfill word processing,

More information

Tutorial 2: Using Excel in Data Analysis

Tutorial 2: Using Excel in Data Analysis Tutorial 2: Using Excel in Data Analysis This tutorial guide addresses several issues particularly relevant in the context of the level 1 Physics lab sessions at Durham: organising your work sheet neatly,

More information

Exercises for the UCSC Genome Browser Introduction

Exercises for the UCSC Genome Browser Introduction Exercises for the UCSC Genome Browser Introduction 1) Find out if the mouse Brca1 gene has non-synonymous SNPs, color them blue, and get external data about a codon-changing SNP. Skills: basic text search;

More information

Real Estate Reports Overview Quick Reference Guide

Real Estate Reports Overview Quick Reference Guide Real Estate Reports Overview Quick Reference Guide Overview This guide shows you the options available for customising the standard RE reports available in SAP. It covers the following: Using individual

More information

Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics

Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics Basic Principles of Forensic Molecular Biology and Genetics Population Genetics Significance of a Match What is the significance of: a fiber match? a hair match? a glass match? a DNA match? Meaning of

More information

Microsoft Access Basics

Microsoft Access Basics Microsoft Access Basics 2006 ipic Development Group, LLC Authored by James D Ballotti Microsoft, Access, Excel, Word, and Office are registered trademarks of the Microsoft Corporation Version 1 - Revision

More information

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.

More information

MICROSOFT ACCESS 2007 BOOK 2

MICROSOFT ACCESS 2007 BOOK 2 MICROSOFT ACCESS 2007 BOOK 2 4.1 INTRODUCTION TO ACCESS FIRST ENCOUNTER WITH ACCESS 2007 P 205 Access is activated by means of Start, Programs, Microsoft Access or clicking on the icon. The window opened

More information

Clone Manager. Getting Started

Clone Manager. Getting Started Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software

More information

Biological Sequence Data Formats

Biological Sequence Data Formats Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

More information

ProSightPC 3.0 Quick Start Guide

ProSightPC 3.0 Quick Start Guide ProSightPC 3.0 Quick Start Guide The Thermo ProSightPC 3.0 application is the only proteomics software suite that effectively supports high-mass-accuracy MS/MS experiments performed on LTQ FT and LTQ Orbitrap

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Acrobat 9: Forms. 56 Pages. Acrobat 9: Forms v2.0.0. Windows

Acrobat 9: Forms. 56 Pages. Acrobat 9: Forms v2.0.0. Windows Acrobat 9: Forms Windows Acrobat 9: Forms v2.0.0 2009 56 Pages About IT Training & Education The University Information Technology Services (UITS) IT Training & Education program at Indiana University

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Lecture 3: Mutations

Lecture 3: Mutations Lecture 3: Mutations Recall that the flow of information within a cell involves the transcription of DNA to mrna and the translation of mrna to protein. Recall also, that the flow of information between

More information

Principles of Evolution - Origin of Species

Principles of Evolution - Origin of Species Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X

More information

Vector NTI Advance 11 Quick Start Guide

Vector NTI Advance 11 Quick Start Guide Vector NTI Advance 11 Quick Start Guide Catalog no. 12605050, 12605099, 12605103 Version 11.0 December 15, 2008 12605022 Published by: Invitrogen Corporation 5791 Van Allen Way Carlsbad, CA 92008 U.S.A.

More information

Umm AL Qura University MUTATIONS. Dr Neda M Bogari

Umm AL Qura University MUTATIONS. Dr Neda M Bogari Umm AL Qura University MUTATIONS Dr Neda M Bogari CONTACTS www.bogari.net http://web.me.com/bogari/bogari.net/ From DNA to Mutations MUTATION Definition: Permanent change in nucleotide sequence. It can

More information

IBM SPSS Statistics for Beginners for Windows

IBM SPSS Statistics for Beginners for Windows ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1 MAS 500 Intelligence Tips and Tricks Booklet Vol. 1 1 Contents Accessing the Sage MAS Intelligence Reports... 3 Copying, Pasting and Renaming Reports... 4 To create a new report from an existing report...

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

How To Write Tvalue Amortization Software

How To Write Tvalue Amortization Software TimeValue Software Amortization Software Version 5 User s Guide s o f t w a r e User's Guide TimeValue Software Amortization Software Version 5 ii s o f t w a r e ii TValue Amortization Software, Version

More information

MultiExperiment Viewer Quickstart Guide

MultiExperiment Viewer Quickstart Guide MultiExperiment Viewer Quickstart Guide Table of Contents: I. Preface - 2 II. Installing MeV - 2 III. Opening a Data Set - 2 IV. Filtering - 6 V. Clustering a. HCL - 8 b. K-means - 11 VI. Modules a. T-test

More information

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium

More information

Microsoft Access 2010: Basics & Database Fundamentals

Microsoft Access 2010: Basics & Database Fundamentals Microsoft Access 2010: Basics & Database Fundamentals This workshop assumes you are comfortable with a computer and have some knowledge of other Microsoft Office programs. Topics include database concepts,

More information

MSP How to guide session 2 (Resources & Cost)

MSP How to guide session 2 (Resources & Cost) MSP How to guide session 2 (Resources & Cost) 1. Introduction Before considering resourcing the schedule it is important to ask yourself one key question as it will require effort from the scheduler or

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

Blackbaud FundWare Accounts Receivable Guide VOLUME 1 SETTING UP ACCOUNTS RECEIVABLE

Blackbaud FundWare Accounts Receivable Guide VOLUME 1 SETTING UP ACCOUNTS RECEIVABLE Blackbaud FundWare Accounts Receivable Guide VOLUME 1 SETTING UP ACCOUNTS RECEIVABLE VERSION 7.50, JULY 2008 Blackbaud FundWare Accounts Receivable Guide Volume 1 USER GUIDE HISTORY Date Changes June 2000

More information

Using Microsoft Excel to Manage and Analyze Data: Some Tips

Using Microsoft Excel to Manage and Analyze Data: Some Tips Using Microsoft Excel to Manage and Analyze Data: Some Tips Larger, complex data management may require specialized and/or customized database software, and larger or more complex analyses may require

More information

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur

More information

Sales Person Commission

Sales Person Commission Sales Person Commission Table of Contents INTRODUCTION...1 Technical Support...1 Overview...2 GETTING STARTED...3 Adding New Salespersons...3 Commission Rates...7 Viewing a Salesperson's Invoices or Proposals...11

More information

Web Intelligence User Guide

Web Intelligence User Guide Web Intelligence User Guide Office of Financial Management - Enterprise Reporting Services 4/11/2011 Table of Contents Chapter 1 - Overview... 1 Purpose... 1 Chapter 2 Logon Procedure... 3 Web Intelligence

More information

Molecular Clocks and Tree Dating with r8s and BEAST

Molecular Clocks and Tree Dating with r8s and BEAST Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today

More information

Figure 1. Example of an Excellent File Directory Structure for Storing SAS Code Which is Easy to Backup.

Figure 1. Example of an Excellent File Directory Structure for Storing SAS Code Which is Easy to Backup. Paper RF-05-2014 File Management and Backup Considerations When Using SAS Enterprise Guide (EG) Software Roger Muller, Data To Events, Inc., Carmel, IN ABSTRACT SAS Enterprise Guide provides a state-of-the-art

More information

Working together with Word, Excel and PowerPoint

Working together with Word, Excel and PowerPoint Working together with Word, Excel and PowerPoint Have you ever wanted your Word document to include data from an Excel spreadsheet, or diagrams you ve created in PowerPoint? This note shows you how to

More information

Appendix A How to create a data-sharing lab

Appendix A How to create a data-sharing lab Appendix A How to create a data-sharing lab Creating a lab involves completing five major steps: creating lists, then graphs, then the page for lab instructions, then adding forms to the lab instructions,

More information

How To Create A Powerpoint Intelligence Report In A Pivot Table In A Powerpoints.Com

How To Create A Powerpoint Intelligence Report In A Pivot Table In A Powerpoints.Com Sage 500 ERP Intelligence Reporting Getting Started Guide 27.11.2012 Table of Contents 1.0 Getting started 3 2.0 Managing your reports 10 3.0 Defining report properties 18 4.0 Creating a simple PivotTable

More information

REUTERS/TIM WIMBORNE SCHOLARONE MANUSCRIPTS COGNOS REPORTS

REUTERS/TIM WIMBORNE SCHOLARONE MANUSCRIPTS COGNOS REPORTS REUTERS/TIM WIMBORNE SCHOLARONE MANUSCRIPTS COGNOS REPORTS 28-APRIL-2015 TABLE OF CONTENTS Select an item in the table of contents to go to that topic in the document. USE GET HELP NOW & FAQS... 1 SYSTEM

More information

Mitochondrial DNA Analysis

Mitochondrial DNA Analysis Mitochondrial DNA Analysis Lineage Markers Lineage markers are passed down from generation to generation without changing Except for rare mutation events They can help determine the lineage (family tree)

More information

Chironomid DNA Barcode Database Search System. User Manual

Chironomid DNA Barcode Database Search System. User Manual Chironomid DNA Barcode Database Search System User Manual National Institute for Environmental Studies Center for Environmental Biology and Ecosystem Studies December 2015 Contents 1. Overview 1 2. Search

More information

X-Trade Brokers Dom Maklerski S.A. XTB Expert Builder. Tutorial. Michał Zabielski 2010-08-05

X-Trade Brokers Dom Maklerski S.A. XTB Expert Builder. Tutorial. Michał Zabielski 2010-08-05 X-Trade Brokers Dom Maklerski S.A. XTB Expert Builder Tutorial Michał Zabielski 2010-08-05 Table of Contents Installation...3 Legal notification...7 Initial adjustments / Preferences...8 Language...8 Platform

More information

Introduction to Data Tables. Data Table Exercises

Introduction to Data Tables. Data Table Exercises Tools for Excel Modeling Introduction to Data Tables and Data Table Exercises EXCEL REVIEW 2000-2001 Data Tables are among the most useful of Excel s tools for analyzing data in spreadsheet models. Some

More information

Lesson 9: Introduction to the Landscape Management System (LMS)

Lesson 9: Introduction to the Landscape Management System (LMS) Lesson 9: Introduction to the Landscape Management System (LMS) Review and Introduction In earlier lessons, you learned how to establish and take measurements in sample inventory plots. In Lesson 8, you

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015 Reference Genome Tracks November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com Reference

More information

2 The first program: Little Crab

2 The first program: Little Crab 2 The first program: Little Crab topics: concepts: writing code: movement, turning, reacting to the screen edges source code, method call, parameter, sequence, if statement In the previous chapter, we

More information

Results CRM 2012 User Manual

Results CRM 2012 User Manual Results CRM 2012 User Manual A Guide to Using Results CRM Standard, Results CRM Plus, & Results CRM Business Suite Table of Contents Installation Instructions... 1 Single User & Evaluation Installation

More information

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins

More information

Gene mutation and molecular medicine Chapter 15

Gene mutation and molecular medicine Chapter 15 Gene mutation and molecular medicine Chapter 15 Lecture Objectives What Are Mutations? How Are DNA Molecules and Mutations Analyzed? How Do Defective Proteins Lead to Diseases? What DNA Changes Lead to

More information

Hierarchical Bayesian Modeling of the HIV Response to Therapy

Hierarchical Bayesian Modeling of the HIV Response to Therapy Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and

More information

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

More information

GUIDE TO THE TRADING PLATFORM CONTENTS. Page OVERVIEW 2. ACCOUNT SUMMARY Transfer funds Account details

GUIDE TO THE TRADING PLATFORM CONTENTS. Page OVERVIEW 2. ACCOUNT SUMMARY Transfer funds Account details GUIDE TO THE TRADING PLATFORM CONTENTS OVERVIEW 2 Page ACCOUNT SUMMARY Transfer funds Account details 3 SPREAD & BINARY MARKETS Finding your market Opening and closing trades Opening Orders Closing Orders

More information

Integrated Accounting System for Mac OS X and Windows

Integrated Accounting System for Mac OS X and Windows Integrated Accounting System for Mac OS X and Windows Program version: 6.2 110111 2011 HansaWorld Ireland Limited, Dublin, Ireland Preface Books by HansaWorld is a powerful accounting system for the Mac

More information

USING WORDPERFECT'S MERGE TO CREATE MAILING LABELS FROM A QUATTRO PRO SPREADSHEET FILE Click on a Step to move to the next Step

USING WORDPERFECT'S MERGE TO CREATE MAILING LABELS FROM A QUATTRO PRO SPREADSHEET FILE Click on a Step to move to the next Step USING WORDPERFECT'S MERGE TO CREATE MAILING LABELS FROM A QUATTRO PRO SPREADSHEET FILE Click on a Step to move to the next Step STEP 1: Create or use a Quattro Pro or Excel File. The first row must be

More information

Tutorial #7A: LC Segmentation with Ratings-based Conjoint Data

Tutorial #7A: LC Segmentation with Ratings-based Conjoint Data Tutorial #7A: LC Segmentation with Ratings-based Conjoint Data This tutorial shows how to use the Latent GOLD Choice program when the scale type of the dependent variable corresponds to a Rating as opposed

More information

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML 9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

More information