The third moments of the site frequency spectrum

The analysis of patterns of segregating (i.e. polymorphic) sites in aligned sequences is routine in population genetics. Quantities of interest include the total number of segregating sites and the number of sites with mutations of different frequencies, the so-called site frequency spectrum. For neutrally evolving sequences, some classical results are available, including the expected value and variance of the spectrum in the Kingman coalescent model without recombination as calculated by Fu (1995). In this work, we use similar techniques to compute the third moments of the frequencies of three linked sites. Based on these results, we derive analytical results for the bias of Tajima’s D and other neutrality tests. As a corollary, we obtain the second moments of the frequencies of two linked mutations conditional on the presence of a third mutation with a certain frequency. These moments can be used for the normalisation of new neutrality tests relying on these spectra.

