(2015-May-07, 03:35:54)Chuck Wrote:(2015-May-06, 10:15:17)Duxide Wrote: First of all, the set of 315 Fst values that I calculated using VCFtools (which employs Weir and Cockeram Fst formula) on 1000 Genomes phase 3 data for 26 populations can be seen here (https://docs.google.com/spreadsheets/d/1...sp=sharing ). I report Fst for 1st and 21st chromosomes (columns C and D). They are practically identical (r=0.995) so either can be used to represent the whole genome. Note that these include SNPs and indels. If you use these Fst values in your paper, please cite my last article (http://dx.doi.org/10.6084/m9.figshare.1393160 ) because they are in the supplementary material there.

THERE IS INDEED MUCH CONFUSION ON INTERPRETING FST AS RELATIVE BETWEEN POPULATION VARIANCE.

It appears that the expected BETWEEN population variance should be 2*Fst, after correcting for the inbreeding coefficient.

Davide,

Would it be possible for you to partition global variance into between continental race, between individual within race, and within individual variance?

See table 4 here for an example.

"To measure the differentiation between populations, the widely used statistic FST [17] and its unbiased estimator [18] were used. FST estimates were averaged over all loci, and 95% confidence intervals (CIs) of the average FST were calculated by bootstrap resampling with 10000 replications...Along with FST, variance components were estimated to reflect intra-individual, inter-individual and inter-population differences in genetic variation."

There appear to be programs which allow for this -- but no one does it. If you need, I will write Nishiyama et al. regarding method/statistical program.

Also, link rot: http://dx.doi.org/10.6084/m9.figshare.1393160

The links work, just there were issues with parentheses. My paper is here: http://dx.doi.org/10.6084/m9.figshare.1393160

and Fst values are here: https://docs.google.com/spreadsheets/d/1...sp=sharing

Vcftools uses Weir and Cockerham's 1984 formula, which actually includes within-individual variance. So it appears that Sarich and Miele's contention that Fst values have to be multiplied by 2 is wrong, as Fst calculations (at least in the formula provided by Weir and Cockerham) already account for diploidy. So just use the Fst values reported in my excel table to indicate relative between-population differentiation. As you can see from Weir and Cockerham's paper attached, the formula (1) uses three components of variance: Ea=between populations variance; Eb=between individuals within populations; Ec=between gametes within individuals.

Vcftools directly outputs the Fst values and I cannot see the variance components in the output files. However, it should be possible to get them.

I will have to figure out if it's possible to retrieve the single variance components from the VCFtools output and I'll get back to you. However, since Vcftools already calculates Fst automatically using these 3 variance components, is there a specific reason why you need to know their values?

**Attached Files**

Weir1984.pdf (Size: 1.35 MB / Downloads: 5,245)