**Worked and annotated sample of question for take-home exam **

**(to be handed out 21-Apr-06, due Fr. 28-Apr-06)**

Output from Mathematica program FstCalc.nb by David B. McDonald, Dept. Zool., U. of Wyoming, Laramie, WY 82071 dbmcd@uwyo.edu.

My "comments" are in red.

**The input matrix of allele-by-allele genotypes is:**

A1 A2 A3

`55 30 100`
` 0 45 90`

` 0 0 48`

`35
85 160`

` 0 40 50`

` 0 0 80`

120 45 25

0 80 35

0 0 75

40 130 70

0 60 120

0 0 75

First, calculate the
population sizes by summing the elements/cells in each subpopulation
(remember,
the **number of individuals is half the number of alleles**) :

e.g., NRepeat for the three remaining subpopulations to get the values below:_{1}= 55 + 30 + 100 + 45 + 90 + 48 = 368

Population sizes are:

NNext, allele frequencies (_{1}N_{2}N_{3}N_{4}368 450 380 495

E.g., *p*_{1}
= freq(A1) = (2*55 + 30 + 100) / 2*368 = 0.326

Twice the number of
A1
homozygotes + one times the number of each of the types of hets (which
gives us the total number of A1 alleles) divided by twice the
population
size (2**N*_{1} is the total number of alleles in subpop.
1).

*p*_{2}
= freq(A2) = (30 + 2*45 + 90) / 2**N*_{1} = 0.285

Repeat for the remaining alleles and subpopulations to get the values below:

Allele
frequencies
(*p*_{i}) are:

*p*_{1} *p*_{2}
*p*_{3}

Subpop. 1 0.326 0.285 0.389

Subpop. 2 0.350 0.239 0.411

Subpop. 3 0.408 0.316 0.276

Subpop. 4 0.283 0.374 0.343

Next, the genotypic counts. For homozygotes this will be the gene frequency squared times the population size, for heterozygotes that will be twice the gene frequencies squared times the population size.

**ROUNDING: I don’t mind
rounding
to integers here, because we aren’t going to use these numbers in
subsequent
calculations. Note, though, that I am keeping my allele frequencies to
three decimal places for the calculations where I use them. If you
round
re-used numbers too much, too early, you will get off course.**

Subscript note: I am **not**
subscripting for subpopulation, just assuming we can keep track.

*s*=1, A1A1 = *p*_{1}^{2}**N*_{1}
= 0.326^{2}*368 = 39.11 (rounds to 39).

A1A2 = 2**p*_{1}**p*_{2}**N*_{1}
= 2*0.326*0.285*368 = 68.38 (rounds to 68).

One more example, picked haphazardly: (third subpopulation, A2A3 heterozygote expected count)

*s*=3, A2A3 = 2**p*_{2}**p*_{3}**N*_{3}
= 2*0.316*0.276*380 = 66.28 rounds to 66

Repeat for all the 24 genotypes (six per pop.)

Expected genotypic counts for the populations are:

A1 A2 A3

A1 39
68
93

A2 0
30 82

A3
0
0 56

A1 55
75
130

A2 0
26 88

A3
0
0 76

A1 63
98
86

A2 0
38 66

A3
0
0 29

A1 40
105
96

A2 0
69 127

A3
0
0 58

Subtract the expected from the observed to get excess or deficiency in the Observed:

*s* = 1, A1A1 =
55 - 39 = 16

A1A2 = 30 - 68 = -38

Excesses (+ numbers) or deficiencies (- numbers) of observed relative to Hardy-Weinberg genotypic counts:

A1 A2 A3

16 -38 7

0 15 8

0 0 -8

-20 10 30

0 14 -38

0 0 4

57 -53 -61

0 42 -31

0 0 46

0 25 -26

0 -9 -7

0 0 17

For observed -
expected
homozygotes in subpopulation 1 we have + 16, +15, and -8. Their sum is
+23, so we **observe** an excess of homozygotes (possible evidence
for
inbreeding). Note that the homozygote excess of 23 is balanced by a
heterozygote
deficiency of 23 (for the first matrix above, total of -38, 7, and 8
above
the diagonal is -23).

Repeat this logic to get the following set of values:

Observed homozygote counts relative to HWE expectation (- means homozygote deficiency/heterozygote excess, outbreeding or Wahlund effect):

Pop. 1 Pop. 2 Pop. 3 Pop. 4

23 -2 145 8

Observed heterozygosities. What proportion of each subpopulation are heterozygotes? The sum of the three kinds of heterozygotes from our original input matrix, all divided by the subpopulation size.

Subpop. 1 = (30 A1A2 + 90 A1A3 + 100 A2A3) / 368 = 220/368 = 0.598

Repeat for remaining subpops.

Observed heterozygosities are:

Pop. 1 Pop. 2 Pop. 3 Pop. 4

0.598 0.656 0.276 0.646

Expected heterozygosities. We use the 1 minus sum of the allele frequencies squared approach to calculate this quantity for each subpopulation.

E.g., subpop. 1 is 1
- SUM[0.326^{2} + 0.285^{2} + 0.389^{2}] =
1-[0.1063
+ 0.0812 + 0.1513] = 1-0.3388 = 0.6612

Repeat for remaining subpops.

Expected heterozygosities (Gene diversities) are:

Pop. 1 Pop. 2 Pop. 3 Pop. 4

0.661 0.651 0.658 0.662

Things are starting
to
get easier (or at least they will involve fewer terms). *F* = (*H*_{exp}-*H*_{obs})
/ *H*_{exp}

Subpop. 1 *F* =
(0.661 - 0.598) / 0.661 = 0.096

Local inbreeding
coefficients
(*F*) are:

Pop. 1 Pop. 2 Pop. 3 Pop. 4

0.096 -0.006 0.580 0.024

Next, we calculate the *p*_{i}-bar
(across all the subpopulations). **We need to weight by population size**.

*p*_{1}-bar = (0.326*368 + 0.350*450 + 0.408*380 + 0.283*495) / (368 + 450 + 380 +
495) = 0.338

Repeat for the other two alleles to get:

Global allele frequencies (*p*_{i}-bar) are:

*p*_{1}-bar *p*_{2}-bar
*p*_{3}-bar

0.338 0.306 0.356

*H*_{I}
is the weighted average of the observed heterozygosities.

(0.598*368 + 0.656*450 + 0.276*380 + 0.646*495) / (368 + 450 + 380 + 495) = 0.555

*H*_{I} is: 0.555

*H*_{S} is the weighted average of the expected heterozygosities across
subpopulations.

(0.661*368 + 0.651*450 + 0.658*380 + 0.662*495) / (368 + 450 + 380 + 495) = 0.658

*H*_{S} is: 0.658

H_{T} is the
expected heterozygosity based on the global gene frequencies.

1 - SUM[p_{i}-bar^{2}]
= 1 - [0.338^{2} + 0.306^{2} + 0.356^{2}] = 1 - 0.3346 = 0.665

*H*_{T} is: 0.665

*F*_{IS} = (*H*_{S} - *H*_{I}) / *H*_{S}
= (0.658 - 0.555) / 0.658 = 0.103/0.658 = 0.156

*F*_{IS} is: 0.156

*F*_{ST} = (*H*_{T} - *H*_{S}) / *H*_{T}
= (0.665 - 0.658) / 0.658 = 0.007/0.658 = 0.011

*F*_{ST} is: 0.011

*F*_{IT} is: 0.166

Conclusions:

The allele frequencies differ among the four subpopulations. Subpops. 1 and 2 have the A3 allele as the most common, subpop. 3 has A1 as the most common allele, and subpop. 4 has the A2 allele most frequent. Evidence

Only subpop. 2 has higher observed heterozygosity than expected. Evidence

Subpops. 2 and 4 are
neither inbred nor outbred (*F* near zero), while Pop. 1 is mildly
inbred and Pop. 3 is highly inbred. Evidence

Overall the population
shows moderate inbreeding (moderately positive *F*_{IS}).
This is largely driven by the high local *F* of Pop. 3.

The populations are
only
slightly differentiated genetically (fairly low value of *F*_{ST}).
*Generally, when F _{ST} exceeds
0.15 or so we conclude that differentiation is substantial. For a
more rigorous answer we can compute confidence intervals on F_{ST}
and see whether they overlap zero -- if they do we conclude that we
have
no real evidence for any structure). *