Worked and annotated sample of question for take-home exam

(to be handed out 21-Apr-06, due Fr. 28-Apr-06)

Output from Mathematica program FstCalc.nb by David B. McDonald, Dept. Zool., U. of Wyoming, Laramie, WY 82071 dbmcd@uwyo.edu.

The input matrix of allele-by-allele genotypes is:

A1 A2  A3

55 30 100
0  45  90
0   0  48

35 85 160
0 40  50
0  0  80

120 45 25
0 80 35
0  0 75

40 130 70
0 60 120
0  0  75

First, calculate the population sizes by summing the elements/cells in each subpopulation (remember, the number of individuals is half the number of alleles) :

e.g., N1 = 55 + 30 + 100 + 45 + 90 + 48 = 368
Repeat for the three remaining subpopulations to get the values below:

Population sizes are:

N1         N2          N3         N4

368        450        380         495

Next, allele frequencies (pi, for i = 1 to k, where k is the total number of possible alleles; here k is three)

E.g., p1 = freq(A1) = (2*55 + 30 + 100) / 2*368 = 0.326

Twice the number of A1 homozygotes + one times the number of each of the types of hets (which gives us the total number of A1 alleles) divided by twice the population size (2*N1 is the total number of alleles in subpop. 1).

p2 = freq(A2) = (30 + 2*45 + 90) / 2*N1 = 0.285

Repeat for the remaining alleles and subpopulations to get the values below:

Allele frequencies (pi) are:

p1      p2   p3

Subpop. 1       0.326   0.285   0.389

Subpop. 2       0.350   0.239   0.411

Subpop. 3       0.408    0.316  0.276

Subpop. 4       0.283    0.374  0.343

Next, the genotypic counts. For homozygotes this will be the gene frequency squared times the population size, for heterozygotes that will be twice the gene frequencies squared times the population size.

ROUNDING: I don’t mind rounding to integers here, because we aren’t going to use these numbers in subsequent calculations. Note, though, that I am keeping my allele frequencies to three decimal places for the calculations where I use them. If you round re-used numbers too much, too early, you will get off course.

Subscript note: I am not subscripting for subpopulation, just assuming we can keep track.

s=1, A1A1 = p12*N1 = 0.3262*368 = 39.11 (rounds to 39).

A1A2 = 2*p1*p2*N1 = 2*0.326*0.285*368 = 68.38 (rounds to 68).

One more example, picked haphazardly: (third subpopulation, A2A3 heterozygote expected count)

s=3, A2A3 = 2*p2*p3*N3 = 2*0.316*0.276*380 = 66.28 rounds to 66

Repeat for all the 24 genotypes (six per pop.)

Expected genotypic counts for the populations are:

A1  A2  A3

A1      39  68  93
A2       0  30  82
A3       0   0  56

A1      55  75 130
A2       0  26  88
A3       0   0  76

A1      63  98  86
A2       0  38  66
A3       0   0  29

A1      40 105  96
A2       0  69 127
A3       0   0  58

Subtract the expected from the observed to get excess or deficiency in the Observed:

s = 1, A1A1 = 55 - 39 = 16

A1A2 = 30 - 68 = -38

Excesses (+ numbers) or deficiencies (- numbers) of observed relative to Hardy-Weinberg genotypic counts:

A1  A2  A3
16 -38   7
0  15   8
0   0  -8

-20  10  30
0  14 -38
0   0   4

57 -53 -61
0  42 -31
0   0  46

0  25 -26
0  -9  -7
0   0  17

For observed - expected homozygotes in subpopulation 1 we have + 16, +15, and -8. Their sum is +23, so we observe an excess of homozygotes (possible evidence for inbreeding). Note that the homozygote excess of 23 is balanced by a heterozygote deficiency of 23 (for the first matrix above, total of -38, 7, and 8 above the diagonal  is -23).

Repeat this logic to get the following set of values:

Observed homozygote counts relative to HWE expectation (- means homozygote deficiency/heterozygote excess, outbreeding or Wahlund effect):

Pop. 1    Pop. 2      Pop. 3     Pop. 4

23        -2         145         8

Observed heterozygosities. What proportion of each subpopulation are heterozygotes? The sum of the three kinds of heterozygotes from our original input matrix, all divided by the subpopulation size.

Subpop. 1 = (30 A1A2 + 90 A1A3 + 100 A2A3) / 368 = 220/368 = 0.598

Repeat for remaining subpops.

Observed heterozygosities are:

Pop. 1      Pop. 2      Pop. 3      Pop. 4

0.598        0.656       0.276       0.646

Expected heterozygosities. We use the 1 minus sum of the allele frequencies squared approach to calculate this quantity for each subpopulation.

E.g., subpop. 1 is 1 - SUM[0.3262 + 0.2852 + 0.3892] = 1-[0.1063 + 0.0812 + 0.1513] = 1-0.3388 = 0.6612

Repeat for remaining subpops.

Expected heterozygosities (Gene diversities) are:

Pop. 1      Pop. 2      Pop. 3       Pop. 4

0.661        0.651       0.658        0.662

Things are starting to get easier (or at least they will involve fewer terms). F = (Hexp-Hobs) / Hexp

Subpop. 1 F = (0.661 - 0.598) / 0.661 = 0.096

Local inbreeding coefficients (F) are:

Pop. 1      Pop. 2      Pop. 3        Pop. 4

0.096      -0.006        0.580         0.024

Next, we calculate the pi-bar (across all the subpopulations).
We need to weight by population size.

p1-bar = (0.326*368 + 0.350*450 + 0.408*380 + 0.283*495) / (368 + 450 + 380 + 495) = 0.338

Repeat for the other two alleles to get:

Global allele frequencies (pi-bar) are:

p1-bar   p2-bar      p3-bar

0.338      0.306       0.356

HI is the weighted average of the observed heterozygosities.

(0.598*368 + 0.656*450 + 0.276*380 + 0.646*495) / (368 + 450 + 380 + 495) = 0.555

HI is: 0.555

HS is the weighted average of the expected heterozygosities across subpopulations.

(0.661*368 + 0.651*450 + 0.658*380 + 0.662*495) / (368 + 450 + 380 + 495) = 0.658

HS is: 0.658

HT is the expected heterozygosity based on the global gene frequencies.

1 - SUM[pi-bar2] = 1 - [0.3382 + 0.3062 + 0.3562] = 1 - 0.3346 = 0.665

HT is: 0.665

FIS = (HS - HI) / HS = (0.658 - 0.555) / 0.658 = 0.103/0.658 = 0.156

FIS is: 0.156

FST = (HT - HS) / HT = (0.665 - 0.658) / 0.658 = 0.007/0.658 = 0.011

FST is: 0.011

FIT is: 0.166

Conclusions:

The allele frequencies differ among the four subpopulations. Subpops. 1 and 2 have the A3 allele as the most common, subpop. 3 has A1 as the most common allele, and subpop. 4 has the A2 allele most frequent. Evidence

Only subpop. 2 has higher observed heterozygosity than expected. Evidence

Subpops. 2 and 4 are neither inbred nor outbred (F near zero), while Pop. 1 is mildly inbred and Pop. 3 is highly inbred. Evidence

Overall the population shows moderate inbreeding (moderately positive FIS).  This is largely driven by the high local F of Pop. 3.

The populations are only slightly differentiated genetically (fairly low value of FST).  Generally, when FST exceeds 0.15 or so we conclude that differentiation is substantial.  For a more rigorous answer we can compute confidence intervals on FST and see whether they overlap zero -- if they do we conclude that we have no real evidence for any structure).