(2014-Jul-29, 01:24:01)Emil Wrote:Quote: 1) I'd like to see more details of the factor/PC analyses. Were the sub-component intercorrelations explained by a single factor by the usual standards (e.g., only one factor with eigenvalue>1)? If not, how many other factors were there, can you give a substantive interpretation to them, and are they correlated with national IQ? KMO and Bartlett's test are pretty superfluous and aren't usually reported; I would mention them in a footnote only.

You can view all the details in the R code file.

Number of factors to extract was in all cases set to 1 ("nfactors=1" in code), so no criteria for determining the number of factors to keep was used.

So the criterion used was to extract just one factor. You should state that explicitly in the paper and justify the decision. At the limit, given that your variance explained is <50%, it is possible (though extremely unlikely) that there's a second factor that explains almost as much. In that case, denoting one of them as a general factor would be arbitrary. At the very least, you should tell how many factors with eigenvalues>1 there are.

Quote:I report KMO and Bartlett's because a reviewer (Piffer) had previously requested that I include them. Clearly I cannot satisfy both of you. It seems best just to include them. They don't take up much space.

OK, but information on the decisions made on the number of factors to be extracted is much more important.

Quote:Quote: 2) "individuals have a general socioeconomic factor"

This language is confusing. 'Factor' refers to a source of variance among individuals, so individuals cannot have factors. Individuals are located at different points on a factor, or have factor scores.

What I meant is that if one analyses the data at the individual level, one will also find a general socioeconomic factor (i.e. s factor).

Changed text to:

Gregory Clark argued that there is a general socioeconomic factor which underlies their socioeconomic performance at the individual-level.\cite{clark2014}

Still badly worded. Who are the 'they' referenced? Clark does not write about a "general socioeconomic factor." He writes about "social competence." Perhaps the Social Competence Factor would be a better label for your factor, given that some of its indicators are not usually thought of as indicating socioeconomic status.

Quote:Quote: 3) "national measures of country well-doing or well-being"

excise "well-doing"

I prefer to keep both because these variables measure all kinds of things, some related to well-being (e.g. health, longevity) others to well-doing (income, number of universities).

'Well-doing' is archaic-sounding and does not mean what you think it does:

well-doing (uncountable)

1.The practice of doing good; virtuousness, good conduct.

'Well-being' implies material prosperity, too, and is quite sufficient for your purposes:

well-being (uncountable)

1.a state of health, happiness and/or prosperity

Quote:Quote: 5) Figure 1 describes the structure of the SPI, why is there no corresponding figure describing the DP?

It did not seem necessary. It is less complicated and the user can find it in the manual referenced. Do you want me to include an overview of it?

http://democracyranking.org/?page_id=590

As far as I can tell, they have 6 'dimensions', which have the weights 50, 10, 10, 10, 10, 10. I'm not sure what they do within each 'dimension', probably they average the indexes. As with the DR, SPI, the HDI also has some idiosyncratic way of combining their variables to a higher construct. For reference, the HDI is based on a geometric mean (a what?) of three indicators.

If you are going to include an itemized description of SPI, you should tell more about DP, too.

Quote:Quote: 7) "In some cases PCA can show a general factor where none exists (Jensen and Weng, 1994 [13]). For this reason, I compared the first factor extracted via PCA to the first factors using minimum residuals, weighted least squares, generalized least squares, principle axis factoring and maximum likelihood estimation"

I don't see how the similarity of factor loadings based on different extraction methods can tell us anything about the existence of a general factor. You don't say what a 'general factor' is, but I assume it means a factor with all-positive indicator loadings regardless of extraction method. There is no such factor in your data, as indicated by the many negative loadings listed in the Appendix. Even if you reverse coded the variables so that higher values on all variables would have positive valence (e.g, "Adequate nourishment" instead of "Undernourishment"), which I think would be a good thing to do, there'd still be negative loadings on the first factor/PC (e.g., suicide rate).

A general factor need not be a perfectly general factor, just a very large one. As you mention, there are a few variables that load in the 'wrong' direction which is not found in the analysis of cognitive data.

A general factor would be disproved if there was a lot of variables that didn't load on the first factor (i.e. with very low loadings, say <.10). This isn't the case with these national data, mean abs. loading was high (.6-.65).

Most factor analyses will produce first factors that are substantially larger than the subsequent ones. What is the standard for "a very large" factor? The g factor is a general factor because all cognitive abilities do load positively on it.

More than a third of the loadings on the SPI factor are negative, so it's not a general factor. Similarly, if you had a cognitive test battery where some subtests loaded positively on the first unrotated factor and other subtests loaded negatively on it, there'd be no general factor, no matter how much variance the first factor explained.

Of course, in the case of the SPI factor the negative loadings are mostly an artefact of your failing to reverse code the negatively valenced variables. It's possible that this decision has some effect on all loadings.

Quote:Quote:8) How did you compute the correlations between factor loadings? The congruence coefficient rather than Pearson's r should be used: http://en.wikipedia.org/wiki/Congruence_coefficient (Or did you use factor scores?)

Congruence coefficient has some bias (e.g.). I used Pearson r.

I cannot access that paper, but whatever bias the CC has is miniscule compared to the bias that Pearson's r can produce when it is used to compare factor loadings. Look at the example on p. 100 in The g Factor by Jensen. The CC is the standard method for comparing factor loadings in EFA, and you should use it.

Quote:One cannot compare loadings (easily) when using subset x whole/subset analysis, for those I used scores.

State explicitly in the paper what you are doing. Computing correlations between factors can be done in many different ways (e.g., factor scores, congruence coefficient, CFA latent factor correlations).

Quote:Quote:9) Sections 4-6 have nice graphs, but I don't see their purpose. The fact that the correlation between two linear combinations of correlated elements gets higher the more there are shared elements is self-evident. The graphs might be of use if there was a practical need to estimate the S factor using only a limited number of components, but I don't see why anyone would want to do that.

I assume that the results in sections 4-6 are based on correlations of factor/component scores, but it's nowhere specified. Are the factors extracted with replacement?

Are the results from all the 54 components based on PCA? If so, the higher correlations with PCA components could be artefactual, due to common method variance.

As you can see, there has been some methodological studies concerning the interpretation of loadings from small subsets of variables, as well as correlations between factors from different methods. These results are clearly relevant to these matters.

If you interpret a factor in a realist manner, as g is usually interpreted, then such studies are relevant, but your S factor seems completely artefactual.

Quote:I don't know what you mean "by replacement". I used the regression method to get scores. I used the fa() function from the psych package. http://www.inside-r.org/packages/cran/psych/docs/fa[/

Ignore the replacement issue, with 1000 iterations you of course have to reuse the variables.

Quote:The result from all the 54 variables is based on the same method used to extract from the subset. So, when using maximum likelihood (ML) to extract S from the subset of vars, the comparison S is also extracted via ML.

The subset x subset analyses are only based on PCA, however. I can replicate these with the other 5 methods if necessary. I think the differences will be slight.

I think that's OK as it is, but, again, you should make it explicit in the paper.

Quote:Quote:10) MCV correlations of 0.99 are unusual in my experience. I'd like to see the scatter plots to ascertain that there's a linear association across the range. It's possible that the high correlations are due to outliers. What happens to the MCV correlations if you reverse score the variables with negative valence?

The first thing I did when working this out was to plot them. You can do this yourself with the plot() command (the code is around lines 293-327). They are indeed very linear. I have attached one plot for each with national IQs. They are almost the same with Altinok's.

I don't know what you mean with the other question.

OK, it's very linear. I'd include at least one of the scatter plots in the paper.

The other question refers to the fact that the variables with negative loadings may have an outsized influence on the MCV correlations because they somewhat artificially increase the range of values analyzed.

Quote:Quote:11) "The analyses carried out in this paper suggest that the S factor is not quite like g. Correlations between the first factor from different subsets did not reach unity, even when extracted from 10 non-overlapping randomly picked tests (mean r’s = .874 and .902)."

Your analysis is based on different sets of observed variables, while the g studies that found perfect or nearly perfect correlations were based on analyses of latent factors which contain no error or specific variance, or on analyses of the same data set with different methods. So the results aren't comparable.

I am referring to the two Johnson studies which found that g factors extracted from different IQ batteries without overlapping tests did reach near-unity (1 or .99). This wasn't the case for these data. One problem in interpretation is that IQ batteries are deliberately put together so as to sampled a broad spectrum of ability variance. Randomly chosen subsets of such tests are not.

The Johnson studies used CFA and latent factors to test if g factors from different test batteries were equivalent. You did not use CFA and latent factors, so you're comparing apples and oranges.

Quote:The best way to proceed is to obtain the MISTRA data and repeat my analyses on them. So it comes down to whether they want to share the data or not.

The MISTRA IQ correlation matrices have been published: http://www.newtreedesign2.com/isironline...RAData.pdf

Quote:Quote:12) The biggest problem in the paper is that it seems to be pretty pointless. Yes, most indicators of national well-being are highly correlated with each other and with national IQ, which means that the first PC from national well-being data must be highly correlated with national IQ, but so what? That was obvious from the outset.

What is the S factor? Do you believe it is a unitary causal factor influencing socioeconomic variables? That interpretation would give some meaning to the paper, but I think it's not a very promising idea. Socioeconomic status is normally thought of as a non-causal index reflecting the influence of various factors. Clark speaks of a "social competence" that is inherited across generations, but I don't think he views it as a unitary causal force but rather as a composite of different influences (such as IQ and personality).

I think national IQs and national socioeconomic indices are so thoroughly causally intermingled that attempting to say anything about causes and effects would require longitudinal data.

Strangely, this was the criticism that Gould also offered of the g factor (cf. Davis review). I disagree the results are NOT obvious. Neither g, G, s or S are obvious. Especially not the MCR results.

I agree that the MCV results are perhaps of some interest.

It is well known that nations that are richer also have better health care, less malnutrition, higher life expectancy, better educational systems, better infrastructure, etc., so the S factor is unsurprising. The g factor is surprising because many assume that various domains of intelligence are strongly differentiated. Moreover, the realist interpretation of g does not rely just on the positive manifold, but on many completely independent lines of evidence (e.g., from multivariate behavioral genetics). So my criticisms are quite unlike Gould's.

What is this general socioeconomic factor? Why is it worth studying? Is it a formative or reflective factor?

Quote:Quote:14) I couldn't find a correlation matrix of all the variables in the supplementary material. It would be useful (as an Excel file).

They would be very large (54x54 and 42x42). Anyone curious can easily obtain it in R by typing:

Code:`write.csv(cor(y),file="y_matrix.csv")`

> write.csv(cor(z),file="z_matrix.csv")

For your ease, I have attached both files.

There are all kinds of stuff in your supplemental materials, but if someone wants to replicate or extend your analysis, correlation matrices are what they need. Although I think it's unfortunate that you have not reversed the scoring of the negative variables.

Quote:Quote:15) PCA and factor analysis are really different methods, and PCA shouldn't be called factor analysis.

Semantics. Some authors use "factor analysis" as a general term as I do in this paper. Others prefer "dimensional reduction method" or "latent trait analysis" or some other term and then limit "factor analysis" to non-PCA methods.

Methodologists are adamant about the fact that PCA and FA are quite different animals. But I'm not going to insist on this.