(2015-Sep-10, 04:36:06)Krom Wrote: Are you aware Rosenberg et al (2002, 2005) clusters are said to only capture 1.53% inter-population genetic variation. >98% is captured by IBD.

Now, Rosenberg (2005) said:

"When an additional binary variable B is added—equaling one if an ocean, the Himalayas, or the Sahara must be crossed to travel between two populations, and zero otherwise—R2 increases to 0.729. The regression equation is Fst = 0.0032 + 0.0049D + 0.0153B, where D is distance in thousands of kilometers....The effect of a barrier is to add 0.0153 to Fst beyond the value predicted by geographic distance alone."

The 0.015 is just the effect of their crude dichotomously coded barriers; this is not the percentage of the total variability between continental races (K=5). For that you have to turn back to Rosenberg (2002). Rosenberg (2002) gives an Amova of 4.3 between regions (not 3.6 as you said earlier!) for K=5 (Blumenbach races! Remember, Linnaeus had inconstant varieties!). But what does this 4.3% actually mean? For context, try:

.....................

Meirmans, P. G., & Hedrick, P. W. (2011). Assessing population structure: FST and related measures. Molecular Ecology Resources, 11(1), 5-18.

"For biallelic markers, this makes sure that FST is bounded between zero and one, with zero representing no differentiation and one representing fixation of different alleles within populations. For multiallelic markers, however, the maximum possible value is not necessarily equal to one, but is instead determined by the amount of within-population diversity (Charlesworth 1998; Hedrick 1999). The reason for this can be best understood by looking at GST, which is defined as (Nei 1987)....For highly variable loci, this can lead to a very small possible range of GST values. To illustrate this relationship, Fig. 1 gives the joint values of FST and HS found in the past 4 years in Molecular Ecology (expanded from Heller & Siegismund 2009; see also Table S1, Supporting information). Notice that the observed range of FST is always less than HS and that the range of FST becomes very small when HS is large. For example when HS = 0.9, a value that is commonly encountered for microsatellite markers, the maximum possible value of FST is 0.1. Such a value of FST is generally interpreted as representing a rather weak population structure. However, here it represents the case with maximum differentiation among the populations, meaning that the populations do not share any alleles at all."

But see also here: Verity, R., & Nichols, R. A. (2014). What is genetic differentiation, and how should we measure it—GST, D, neither or both?. Molecular ecology, 23(17), 4216-4225.

"By tracing GST to its origin – the parameter FST, and further to the inbreeding coefficient F upon which FST was built – we can identify the root cause of some of the disagreement in the literature around the measurement of population differentiation. We have found that there are two overlapping views regarding the definition of the probability of identity by descent, which in turn have rubbed off on our definitions of FST, leading to mutation dependent and mutation-independent versions. The criticism that GST is constrained is misplaced – at least if the task at hand is to estimate the mutation-dependent version of FST. If we wish to capture other aspects of the population history, such as the mutation independent version of FST, when the mutation rate is relatively high, then we will need to supplement GST with other measures that capture a different aspect of evolution. No single statistic can be informative about both parameters in this situation, as it is mathematically impossible for a single dimension to fully represent two.... Thus no statistic is differentiation, but some statistics can be used to infer differentiation. We find that mutation-independent FST is a sensible quantity to use as our definition of differentiation, although the general arguments made above are equally valid when applied to alternative definitions... Under the same model with a high mutation rate we found that GST is insufficient on its own to jointly estimate the true level of differentiation and mutation (Figure 5). Supplementing GST with either G’ST or D solved this problem, providing a distinct source of information that can be used to pull apart the confounded signals."

And here: Should I use FST, G’ST or D?

"Over the past few decades researchers have increasingly used microsatellites, due to their high level of variability and the relative ease of development and scoring in non-model systems. However, now that next-generation sequencing is getting more affordable, sequence-based markers can be assessed throughout the genome (e.g. using RAD sequencing). As we move back towards such low-mutation-rate markers as SNPs, FST becomes easier to assess reliably. On the other hand, FST and other current methods are all designed to assess one or a few markers at a time, and genomic approaches just apply these methods thousands or tens of thousands of times for markers throughout the genome. One can look for outliers, calculate means, etc., without really taking full advantage of the data. For instance, I have seen bi-modal or skewed distributions of FST and other summary statistics; clearly means and standard deviations can be misleading in these cases. My hope is that new methods for assessing divergence will focus not on individual loci but on many markers throughout the genome....

The theoretical maximum of FST = 1 can only be reached if each subpopulation is fixed for a single unique allele. If there is variability within any subpopulation, the maximum FST is (1 – HeS). Unfortunately, this limit to the maximum FST is often overlooked. The maximum value for FST is the smaller the more variable a marker is, and the effect can be especially dramatic for microsatellites, which often exhibit high HeS (over 0.9, in which case the maximum FST is only 0.1). In the extreme (yet possible) scenario of two subpopulations completely divergent (i.e., not sharing a single allele), but both with HeS approaching 1 (i.e., all individuals are expected to be heterozygous because of high allelic diversity), FST becomes meaningless, as its theoretical maximum is then 0 (see Fig. 1 in Jost 2008 for a graphical representation)"

...................

I'm not 100% sure how Amova -- which is a F/Gst Analogue -- works, so I will just use F/Gst values, which were about 0.05 (rounding down). The clearest interpretation would be: (a) the between regional diversity accounts for 5% of the total diversity, (b) given an upper limit of 28% for these loci. © This is equivalent to a 20% mutation independent differentiation (measured in allele sharing) (e.g., Jost's D). Now claim (a) is fine so long as you understand what F/Gst is measuring and recognize its dependency on mutation rates. But (a) is not fine if you wish to make a global claim about between population differentiation and base it on high mutation rate loci. On this point, It is notable the Wright (1969) developed Fst for biallelic markers, which as Meirmans and Hedrick note actually have a range of 0 to 100. So it's not clear how well Wright's scale transfers to multiallelic markers, which are constrained by Hs. On this point, Nolan Kane notes (correctly in my opinion):

"For many situations, certainly, this can be quite problematic – for microsatellites with high heterozygosity, maximum GST is often 0.1-0.2! Clearly, in these cases Wright’s (1978) guidelines are entirely misleading, when he states that values ranging from 0-0.05 indicate “little” genetic differentiation; 0.05-0.15 is “moderate”, 0.15-0.25 is “great”, etc. This is only plausible for biallelic cases, and in other situations we cannot rely on such simple rules of thumb. (Should I use FST, G’ST or D?)"

So, to summarize, yes, Human CT microsatellite F/Gst is low to moderate. But this is expected given the high heterozygosity of the markers. (Indeed, my regression line for subspecies Hs by G/Fst, showed that the human micro G/Fst was what one would expect for a subspeciated species with the same level of Hs!) More generally, since mutation independent measures of differentiation give estimates which substantially diverge from those given by mutation dependent ones, it's difficult to interpret the situation in terms of "general" differentiation. Also, since Wright's rule of thumb was based on biallelic Fst (which has a practical range of 0 to 100), it's not clear to what extent it is applicable to multiallelic Gst, specifically when using high mutation rate microsat.

Luckily, the above is somewhat tangential to the immediate discussion, which concerns expected quantitative differences. I say luckily, since the issue above is too complex for me to easily convince you of the point, given your stubbornness on the topic.. I say that it's tangential, because unless I am mistaken, you are arguing that as (micro) Fst is low, typical quantitative differences must also be so. Conveniently for me, rules of thumb for making these inferences have been outlined. (references cited in my paper.) And they note that when doing so, when assessing the expected trait differences owing to neutral divergence (the default magnitude of differences), one should use low mutation rate markers of the type which likely underlie the genetic structure of the trait, such as SNPs, for which, amongst humans, there happen to be moderate to large differences, depending on the groups discussed. I elaborated on this in my paper -- section 4 -- and so I won't rehash all of the points made. What I said, though, definitively refutes these types of silly arguments which you are reiterating.

Your move.