Heterozygosity, *H _{Exp}*
(or gene diversity,

Go
to web page describing how to calculate F_{ST} from heterozygosities.

Heterozygosity is of major interest to students of genetic variation in natural populations. It is often one of the first "parameters" that one presents in a data set. It can tell us a great deal about the structure and even history of a population. Just for example, very low heterozygosities for allozyme loci in cheetahs and black-footed ferrets indicate severe effects of small population sizes (population bottlenecks or metapopulation dynamics that severely reduced the level of genetic variation relative to that expected or found in comparable mammals). High heterozygosity means lots of genetic variability. Low heterozygosity means little genetic variability. Often, we will compare the observed level of heterozygosity to what we expect under Hardy-Weinberg equilibrium (HWE). If the observed heterozygosity is lower than expected, we seek to attribute the discrepancy to forces such as inbreeding. If heterozygosity is higher than expected, we might suspect an isolate-breaking effect (the mixing of two previously isolated populations).

Several measures of heterozygosity exist. The value
of these measures will range from zero (no heterozygosity) to nearly 1.0
(for a system with a large number of equally frequent alleles). We
will focus primarily on expected heterozygosity (*H*_{E},
or gene diversity, *D*, as Bruce Weir prefers to call it). The simplest
way to calculate it for a single locus is as:

Eqn 4.1where

Eqn 4.2where the first summation is for the

Why does it work to take the sum of the squared gene frequencies and subtract that from one? Let’s think back to basic Hardy-Weinberg:

where the heterozygosity is given by 2p^{2}+ 2pq+q^{2}= 1 Eqn 4.3

What does heterozygosity tell us, and what patterns
emerge as we go to multi-allelic systems? Let’s take an example. Say *p*
= *q* = 0.5. The heterozygosity for a two-allele system is described
by a concave down parabola that starts at zero (when *p* = 0) goes
to a maximum at *p* = 0.5 and goes back to zero when p = 1. In fact,
for any multi-allelic system, heterozygosity is greatest when

that is, when the allele frequencies are equal. The maximum heterozygosity for a 10-allele system comes when each allele has a frequency of 0.1 --pEqn 4.4_{1}= p_{2}= p_{3}= ….p_{k }

**Individual’s-eye view of heterozygosity**

Here is a way that I like to think of heterozygosity
(*H*_{E} or *D*). It is the (expected) probability that
an individual will be heterozygous at a given locus (or over the assayed
loci for a multi-locus system). For many human microsatellite loci, for
example, *H*_{E} is often > 0.85, meaning that you have a
> 85% chance of being a heterozygote.

Now that you have a way to calculate gene diversity/expected
heterozygosity, you are ready to calculate *F*-statistics by the method
of:

As shown in the workedF_{IS}= (H_{S}-H_{I}) /H_{S }Eqns 4.5

F_{ST}= (H_{T}-H_{S}) /H_{T}

F_{IT}= (H_{T}-H_{I}) /H_{T}

If you run some data through Eqns 4.5 and an analysis program you may ask:

"Why is the *F*_{ST} I
calculate with
*FSTAT* (or some other software)

different from the
one I calculate using Eqns 4.5?"

Answer: because the analysis programs use more complex algorithms
that take into account such factors as how individuals disperse (island
model vs. stepping-stone model vs. lattice model), the mutation process
(infinite alleles model vs. stepwise mutation model) and various bias adjusters (e.g., taking into account the sample size of the subpopulations sampled).