Hello There, Guest!  
 Previous 1 2 3 4 5 8 Next   

[ODP] The international general socioeconomic factor: Factor analyzing international

#21
I agree with you concerning the countries, sometimes naming them is awful. I always hated to work with national data. I once (2 years ago) gathered large amount of data notably GDP, economic freedom indices, and savings for all years and diverse sources, but screwed up at the end with the rows that don't match. I gave up completing the data set since then. It's lot of stress and I don't need that. However, can you send me (by mails, preferably) the data sets before they are matched using the procedure you describe ? I just want to see if the matching/merging are correct.

Quote:Moving one step up, one of us showed that among 71 Danish immigrant groups ranked on 4 different measures of socioeconomic variables (crime, use of social benefits, income, education attainment) there was a large (40% of variance explained) general socioeconomic factor[2].

I have problem with that. When you say "one of us" you refer to [2] a study by Kirkegaard and Fuerst. That seems confusing to me.

Quote:The general mental ability factor at the individual level has been termed "g" (often italicized "g"[3]), while the national-level group equivalent has been termed "G" ("big g factor").[4] Keeping in line with this terminology, one might refer to the general socioeconomic factor at the individual level as "s factor" and the group level version "S factor" (or "big s").

In that case shouldn't it be "national-level" instead of "group" level ?

Quote:He mentions that in 26% of a sample of studies using principal components in PsychINFO, the case-to-var ratio was between 2 and 5 as it is with our two datasets.

Do you know what is the recommended ratio ? I think you should say it explicitly, as it would help some people lost here (like me).

Since you use KMO, why not add a little sentence about what it is ? See Andy Field book "Discovering statistics using spss: Introducing statistical method" (p647).

Another alternative is to use the Kaiser–Meyer–Olkin measure of sampling adequacy (KMO) (Kaiser, 1970). The KMO can be calculated for individual and multiple variables and represents the ratio of the squared correlation between variables to the squared partial correlation between variables. The KMO statistic varies between 0 and 1. A value of 0 indicates that the sum of partial correlations is large relative to the sum of correlations, indicating diffusion in the pattern of correlations (hence, factor analysis is likely to be inappropriate). A value close to 1 indicates that patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors. Kaiser (1974) recommends accepting values greater than 0.5 as barely acceptable (values below this should lead you to either collect more data or rethink which variables to include). Furthermore, values between 0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and 0.9 are great and values above 0.9 are superb (Hutcheson & Sofroniou, 1999).

Quote:Since I found that regardless of method and dataset used, the first factor was a general factor accounting for about 40-47% of the variance, it was interesting to know how many components one needed to measure it well. To find out, I sampled subsets of components at random from the datasets, extracted the first factor, and then correlated this with the first factor using all the components. I repeated the sampling 1000 times to reduce sampling error to almost zero.

I do not understand what's in bold.

Quote:Often authors will also argue for a causal connection from national IQ/G to country-level variables. The typical example of this is wealth (e.g. [17, 18, 19, 20, 21]). Since I know that g causes greater wealth at the individual level, and that nations can generally be considered a large group of individuals, it would be very surprising, though not impossible, if there was no causation at the group level as well.

The word "causation" is too strong. Given the annoted references, only 20 and 21 talk a little bit about causation, but that's not even clear there is strong evidence for this pattern of causation. Instead, there is evidence that the causation wealth->IQ is not well established. Perhaps my best conclusion is that it would be best to argue, at least for now, that we don't have strong evidence for either of these two pattern of causation (i.e., wealth cause IQ or the reverse). You can argue there is probably some very indirect suggestion that IQ causes wealth more than the reverse, as your articles on immigrations, notably with John, seems to show this, but as i said, it's very indirect proof.

Quote:If population differences in G is a main cause of national differences in many socioeconomic areas, then aggregating measures should increase the correlation with G, since measurement specificity averages out.

Even if G is not causal here, aggregation would also improve the correlation, no ?

I think table 2 should better be made as the table 1.

Not related, but can you tell me how you manage to generate the nice graph at figures 2-5 ? Concerning the figures 2-5, again, have you contacted Major and his co-authors to ask for opinion ? I bet he (they) will be very interested.
 Reply
#22
Hi Meng Hu,

Quote:I agree with you concerning the countries, sometimes naming them is awful. I always hated to work with national data. I once (2 years ago) gathered large amount of data notably GDP, economic freedom indices, and savings for all years and diverse sources, but screwed up at the end with the rows that don't match. I gave up completing the data set since then. It's lot of stress and I don't need that. However, can you send me (by mails, preferably) the data sets before they are matched using the procedure you describe ? I just want to see if the matching/merging are correct.

It is easy to verify that it works. You can just create some datasets and merge them yourself. The code is open source, so it is available to anyone who can figure out how to install Python (use the Anaconda package).

You can also merge some actual datasets, e.g., different measures of inequality. Wikipedia has lots of these national rankings for all kinds of stuff.

https://en.wikipedia.org/wiki/List_of_in...l_rankings

Quote: I have problem with that. When you say "one of us" you refer to [2] a study by Kirkegaard and Fuerst. That seems confusing to me.

It was because when I wrote the paper, I had imagined that I would have a co-author. I ended up doing all the analyses myself, so there is no co-author. For that reason, some of the language was off, e.g. using "we", "our". You found another remnant of that. I have fixed it.

Quote:In that case shouldn't it be "national-level" instead of "group" level ?

The group level versions are capitalized. The national-level versions are a subset of group-level versions. For instance, in my previous paper on immigrant groups in Denmark, it was group-level but not national-level.

Quote: Do you know what is the recommended ratio ? I think you should say it explicitly, as it would help some people lost here (like me).

Since you use KMO, why not add a little sentence about what it is ? See Andy Field book "Discovering statistics using spss: Introducing statistical method" (p647).

Another alternative is to use the Kaiser–Meyer–Olkin measure of sampling adequacy (KMO) (Kaiser, 1970). The KMO can be calculated for individual and multiple variables and represents the ratio of the squared correlation between variables to the squared partial correlation between variables. The KMO statistic varies between 0 and 1. A value of 0 indicates that the sum of partial correlations is large relative to the sum of correlations, indicating diffusion in the pattern of correlations (hence, factor analysis is likely to be inappropriate). A value close to 1 indicates that patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors. Kaiser (1974) recommends accepting values greater than 0.5 as barely acceptable (values below this should lead you to either collect more data or rethink which variables to include). Furthermore, values between 0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and 0.9 are great and values above 0.9 are superb (Hutcheson & Sofroniou, 1999).

The answer is: There are lots of different recommendations. Read the source I refer to, you will like it.

[11] Nathan Zhao. The minimum sample size in factor analysis, 2009. URL https://www.encorewiki.org/display/~nzha...r+Analysis

I have added a short explanation of KMO and Bartlett's test.

My rewording:

I performed principle components analyses (PCA) on both the reduced and the means-imputed datasets to examine the effect of the procedure. The correlation of factor loadings was 0.996 indicating that the procedure did not alter the structure of the data much. I performed KMO tests (a measure of sampling adequacy) on both samples which showed that reducing the sample reduced the KMO (0.899 to 0.809). In comparison, KMO in the DR dataset was 0.884. All values are considered ‘meritorious’.\cite[p. 225]{hutcheson1999multivariate}

Bartlett's test (tests whether the data is suitable for factor analysis) were extremely significant in all three datasets (p$<$0.00001).


Quote:I do not understand what's in bold.

It means that I did the random sampling 1000 times in my R code. Or in other words. 1000 times I picked N number of random variables, performed a factor analysis on the data with those variables, and then correlated the scores from the first factor from this analysis with the those found from the analysis of all variables.

Quote: The word "causation" is too strong. Given the annoted references, only 20 and 21 talk a little bit about causation, but that's not even clear there is strong evidence for this pattern of causation. Instead, there is evidence that the causation wealth->IQ is not well established. Perhaps my best conclusion is that it would be best to argue, at least for now, that we don't have strong evidence for either of these two pattern of causation (i.e., wealth cause IQ or the reverse). You can argue there is probably some very indirect suggestion that IQ causes wealth more than the reverse, as your articles on immigrations, notably with John, seems to show this, but as i said, it's very indirect proof.

It is well-established at the personal level that the causation is g→wealth (income), not much the other way. At the national level, it is more contentious. I don't have to argue for that here, since I'm not making such a claim in the paper.

Quote:Even if G is not causal here, aggregation would also improve the correlation, no ?

Yes. It is a statistical phenomenon (cf. Spearman Brown formula).

Quote:I think table 2 should better be made as the table 1.

What do you mean? There is a table called "Table 1". It is in section 5.

Quote:Not related, but can you tell me how you manage to generate the nice graph at figures 2-5 ? Concerning the figures 2-5, again, have you contacted Major and his co-authors to ask for opinion ? I bet he (they) will be very interested.

Yes. Look in the file Loadings analysis.ods. It is just made with LibreOffice.

I have not contacted Major et al. His email is J.T.Major@ed.ac.uk. Perhaps we should contact him. He could be an external reviewer (journal policy is that authors are allowed to recruit one external reviewer).

---

Here's a new version. I have made the small change noted above and fixed some other small issues. Due to a wrong setting in LibreOffice, the previous results figures were a little bit off. There were some slight errors with the names of some references (involving use of italics in their titles).


Attached Files
.pdf   international_socioeconomic_general.pdf (Size: 358.13 KB / Downloads: 487)
.zip   The International general socioeconomic factor.zip (Size: 1.41 MB / Downloads: 402)
 Reply
#23
I contacted Major. No response so far.
 Reply
#24
1) I'd like to see more details of the factor/PC analyses. Were the sub-component intercorrelations explained by a single factor by the usual standards (e.g., only one factor with eigenvalue>1)? If not, how many other factors were there, can you give a substantive interpretation to them, and are they correlated with national IQ? KMO and Bartlett's test are pretty superfluous and aren't usually reported; I would mention them in a footnote only.

2) "individuals have a general socioeconomic factor"

This language is confusing. 'Factor' refers to a source of variance among individuals, so individuals cannot have factors. Individuals are located at different points on a factor, or have factor scores.

3) "national measures of country well-doing or well-being"

excise "well-doing"

4) "(see review national in [5]."

reword that

5) Figure 1 describes the structure of the SPI, why is there no corresponding figure describing the DP?

6) "principle components analyses", "principle axis factoring"

principal, not principle

7) "In some cases PCA can show a general factor where none exists (Jensen and Weng, 1994 [13]). For this reason, I compared the first factor extracted via PCA to the first factors using minimum residuals, weighted least squares, generalized least squares, principle axis factoring and maximum likelihood estimation"

I don't see how the similarity of factor loadings based on different extraction methods can tell us anything about the existence of a general factor. You don't say what a 'general factor' is, but I assume it means a factor with all-positive indicator loadings regardless of extraction method. There is no such factor in your data, as indicated by the many negative loadings listed in the Appendix. Even if you reverse coded the variables so that higher values on all variables would have positive valence (e.g, "Adequate nourishment" instead of "Undernourishment"), which I think would be a good thing to do, there'd still be negative loadings on the first factor/PC (e.g., suicide rate).

8) How did you compute the correlations between factor loadings? The congruence coefficient rather than Pearson's r should be used: http://en.wikipedia.org/wiki/Congruence_coefficient (Or did you use factor scores?)

9) Sections 4-6 have nice graphs, but I don't see their purpose. The fact that the correlation between two linear combinations of correlated elements gets higher the more there are shared elements is self-evident. The graphs might be of use if there was a practical need to estimate the S factor using only a limited number of components, but I don't see why anyone would want to do that.

I assume that the results in sections 4-6 are based on correlations of factor/component scores, but it's nowhere specified. Are the factors extracted with replacement?

Are the results from all the 54 components based on PCA? If so, the higher correlations with PCA components could be artefactual, due to common method variance.

10) MCV correlations of 0.99 are unusual in my experience. I'd like to see the scatter plots to ascertain that there's a linear association across the range. It's possible that the high correlations are due to outliers. What happens to the MCV correlations if you reverse score the variables with negative valence?

11) "The analyses carried out in this paper suggest that the S factor is not quite like g. Correlations between the first factor from different subsets did not reach unity, even when extracted from 10 non-overlapping randomly picked tests (mean r’s = .874 and .902)."

Your analysis is based on different sets of observed variables, while the g studies that found perfect or nearly perfect correlations were based on analyses of latent factors which contain no error or specific variance, or on analyses of the same data set with different methods. So the results aren't comparable.

12) The biggest problem in the paper is that it seems to be pretty pointless. Yes, most indicators of national well-being are highly correlated with each other and with national IQ, which means that the first PC from national well-being data must be highly correlated with national IQ, but so what? That was obvious from the outset.

What is the S factor? Do you believe it is a unitary causal factor influencing socioeconomic variables? That interpretation would give some meaning to the paper, but I think it's not a very promising idea. Socioeconomic status is normally thought of as a non-causal index reflecting the influence of various factors. Clark speaks of a "social competence" that is inherited across generations, but I don't think he views it as a unitary causal force but rather as a composite of different influences (such as IQ and personality).

I think national IQs and national socioeconomic indices are so thoroughly causally intermingled that attempting to say anything about causes and effects would require longitudinal data.

13) "It is worth noting that group-level correlations need not be the same or even in the same direction as individual-level correlations. In the case of suicide, there does appear to be a negative correlation at the individual level as well."

This means that the s and S factors aren't the same. Why is suicide an indicator of socioeconomic status anyway?

14) I couldn't find a correlation matrix of all the variables in the supplementary material. It would be useful (as an Excel file).

15) PCA and factor analysis are really different methods, and PCA shouldn't be called factor analysis.
 Reply
#25
Hi Dalliard. Thank you for a thorough review.

Quote: 1) I'd like to see more details of the factor/PC analyses. Were the sub-component intercorrelations explained by a single factor by the usual standards (e.g., only one factor with eigenvalue>1)? If not, how many other factors were there, can you give a substantive interpretation to them, and are they correlated with national IQ? KMO and Bartlett's test are pretty superfluous and aren't usually reported; I would mention them in a footnote only.

You can view all the details in the R code file.

Number of factors to extract was in all cases set to 1 ("nfactors=1" in code), so no criteria for determining the number of factors to keep was used.

I report KMO and Bartlett's because a reviewer (Piffer) had previously requested that I include them. Clearly I cannot satisfy both of you. It seems best just to include them. They don't take up much space.

Quote: 2) "individuals have a general socioeconomic factor"

This language is confusing. 'Factor' refers to a source of variance among individuals, so individuals cannot have factors. Individuals are located at different points on a factor, or have factor scores.

What I meant is that if one analyses the data at the individual level, one will also find a general socioeconomic factor (i.e. s factor).

Changed text to:
Gregory Clark argued that there is a general socioeconomic factor which underlies their socioeconomic performance at the individual-level.\cite{clark2014}

Quote: 3) "national measures of country well-doing or well-being"

excise "well-doing"

I prefer to keep both because these variables measure all kinds of things, some related to well-being (e.g. health, longevity) others to well-doing (income, number of universities).

Quote: 4) "(see review national in [5]."

reword that

Changed text to:
Previous studies have correlated some of these with national IQs but not in a systematic manner (see review in \cite{lynn2012intelligence}.

Quote: 5) Figure 1 describes the structure of the SPI, why is there no corresponding figure describing the DP?

It did not seem necessary. It is less complicated and the user can find it in the manual referenced. Do you want me to include an overview of it?

http://democracyranking.org/?page_id=590

As far as I can tell, they have 6 'dimensions', which have the weights 50, 10, 10, 10, 10, 10. I'm not sure what they do within each 'dimension', probably they average the indexes. As with the DR, SPI, the HDI also has some idiosyncratic way of combining their variables to a higher construct. For reference, the HDI is based on a geometric mean (a what?) of three indicators.

Quote:6) "principle components analyses", "principle axis factoring"

principal, not principle

Fixed.

Quote: 7) "In some cases PCA can show a general factor where none exists (Jensen and Weng, 1994 [13]). For this reason, I compared the first factor extracted via PCA to the first factors using minimum residuals, weighted least squares, generalized least squares, principle axis factoring and maximum likelihood estimation"

I don't see how the similarity of factor loadings based on different extraction methods can tell us anything about the existence of a general factor. You don't say what a 'general factor' is, but I assume it means a factor with all-positive indicator loadings regardless of extraction method. There is no such factor in your data, as indicated by the many negative loadings listed in the Appendix. Even if you reverse coded the variables so that higher values on all variables would have positive valence (e.g, "Adequate nourishment" instead of "Undernourishment"), which I think would be a good thing to do, there'd still be negative loadings on the first factor/PC (e.g., suicide rate).

A general factor need not be a perfectly general factor, just a very large one. As you mention, there are a few variables that load in the 'wrong' direction which is not found in the analysis of cognitive data.

A general factor would be disproved if there was a lot of variables that didn't load on the first factor (i.e. with very low loadings, say <.10). This isn't the case with these national data, mean abs. loading was high (.6-.65).

Quote:8) How did you compute the correlations between factor loadings? The congruence coefficient rather than Pearson's r should be used: http://en.wikipedia.org/wiki/Congruence_coefficient (Or did you use factor scores?)

Congruence coefficient has some bias (e.g.). I used Pearson r.

I found an error. I had forgotten to add PCA to the comparison with the 5 other methods. It made little difference.

One cannot compare loadings (easily) when using subset x whole/subset analysis, for those I used scores.

I compared scores in the full datasets before. They have a mean around .99 for both datasets using Pearson. I have now written more code to compare the loadings too, also with Pearson's. They are also .99 in both datasets.

Clearly, if the score correlations are very high, the loading correlations will also be, and vice versa. So in that sense doing both analyses is unnecessary.

I doubt using Spearman's or CC instead will change these results much. They are almost identical for every method in the full datasets.

However, I used the CC as requested. Results are rounded to three digits. I used this function.

Results:
Code:
> #for SPI
> factor.congruence(list(y_all.loadings),digits=3)
       PC1   MR1  WLS1  GLS1   PA1   ML1
PC1  1.000 0.997 1.000 1.000 1.000 0.997
MR1  0.997 1.000 0.997 0.997 0.997 1.000
WLS1 1.000 0.997 1.000 1.000 1.000 0.997
GLS1 1.000 0.997 1.000 1.000 1.000 0.997
PA1  1.000 0.997 1.000 1.000 1.000 0.997
ML1  0.997 1.000 0.997 0.997 0.997 1.000
> #for DR
> factor.congruence(list(z_all.loadings),digits=3)
       PC1   MR1  WLS1  GLS1   PA1   ML1
PC1  1.000 0.997 1.000 1.000 1.000 1.000
MR1  0.997 1.000 0.997 0.997 0.997 0.998
WLS1 1.000 0.997 1.000 1.000 1.000 1.000
GLS1 1.000 0.997 1.000 1.000 1.000 1.000
PA1  1.000 0.997 1.000 1.000 1.000 1.000
ML1  1.000 0.998 1.000 1.000 1.000 1.000


Quote:9) Sections 4-6 have nice graphs, but I don't see their purpose. The fact that the correlation between two linear combinations of correlated elements gets higher the more there are shared elements is self-evident. The graphs might be of use if there was a practical need to estimate the S factor using only a limited number of components, but I don't see why anyone would want to do that.

I assume that the results in sections 4-6 are based on correlations of factor/component scores, but it's nowhere specified. Are the factors extracted with replacement?

Are the results from all the 54 components based on PCA? If so, the higher correlations with PCA components could be artefactual, due to common method variance.

As you can see, there has been some methodological studies concerning the interpretation of loadings from small subsets of variables, as well as correlations between factors from different methods. These results are clearly relevant to these matters.

I changed the text in section 4 to:
Since I found that regardless of method and dataset used, the first factor was a general factor accounting for about 40-47\% of the variance, it was interesting to know how many components one needed to measure it well. To find out, I sampled subsets of components at random from the datasets, extracted the first factor, and then correlated the scores from it with the scores of the first factor using all the components. I repeated the sampling 1000 times to reduce sampling error to almost zero. Since recently there was interest in comparing g factors from different factor extraction methods, I used the 6 different methods mentioned before.

I don't know what you mean "by replacement". I used the regression method to get scores. I used the fa() function from the psych package. http://www.inside-r.org/packages/cran/psych/docs/fa

The result from all the 54 variables is based on the same method used to extract from the subset. So, when using maximum likelihood (ML) to extract S from the subset of vars, the comparison S is also extracted via ML.

The subset x subset analyses are only based on PCA, however. I can replicate these with the other 5 methods if necessary. I think the differences will be slight.

Quote:10) MCV correlations of 0.99 are unusual in my experience. I'd like to see the scatter plots to ascertain that there's a linear association across the range. It's possible that the high correlations are due to outliers. What happens to the MCV correlations if you reverse score the variables with negative valence?

The first thing I did when working this out was to plot them. You can do this yourself with the plot() command (the code is around lines 293-327). They are indeed very linear. I have attached one plot for each with national IQs. They are almost the same with Altinok's.

I don't know what you mean with the other question.

Quote:11) "The analyses carried out in this paper suggest that the S factor is not quite like g. Correlations between the first factor from different subsets did not reach unity, even when extracted from 10 non-overlapping randomly picked tests (mean r’s = .874 and .902)."

Your analysis is based on different sets of observed variables, while the g studies that found perfect or nearly perfect correlations were based on analyses of latent factors which contain no error or specific variance, or on analyses of the same data set with different methods. So the results aren't comparable.

I am referring to the two Johnson studies which found that g factors extracted from different IQ batteries without overlapping tests did reach near-unity (1 or .99). This wasn't the case for these data. One problem in interpretation is that IQ batteries are deliberately put together so as to sampled a broad spectrum of ability variance. Randomly chosen subsets of such tests are not. The best way to proceed is to obtain the MISTRA data and repeat my analyses on them. So it comes down to whether they want to share the data or not.

Quote:12) The biggest problem in the paper is that it seems to be pretty pointless. Yes, most indicators of national well-being are highly correlated with each other and with national IQ, which means that the first PC from national well-being data must be highly correlated with national IQ, but so what? That was obvious from the outset.

What is the S factor? Do you believe it is a unitary causal factor influencing socioeconomic variables? That interpretation would give some meaning to the paper, but I think it's not a very promising idea. Socioeconomic status is normally thought of as a non-causal index reflecting the influence of various factors. Clark speaks of a "social competence" that is inherited across generations, but I don't think he views it as a unitary causal force but rather as a composite of different influences (such as IQ and personality).

I think national IQs and national socioeconomic indices are so thoroughly causally intermingled that attempting to say anything about causes and effects would require longitudinal data.

Strangely, this was the criticism that Gould also offered of the g factor (cf. Davis review). I disagree the results are NOT obvious. Neither g, G, s or S are obvious. Especially not the MCR results.

I have no particular opinions to offer on the causal interpretations. I don't want my paper to get stuck in review due to speculative discussion of causation. I think the descriptive data are very interesting by themselves.

Quote:13) "It is worth noting that group-level correlations need not be the same or even in the same direction as individual-level correlations. In the case of suicide, there does appear to be a negative correlation at the individual level as well."

This means that the s and S factors aren't the same. Why is suicide an indicator of socioeconomic status anyway?

Not socioeconomic status. General socioeconomic factor. I'm not sure what the specific complaint is.

Quote:14) I couldn't find a correlation matrix of all the variables in the supplementary material. It would be useful (as an Excel file).

They would be very large (54x54 and 42x42). Anyone curious can easily obtain it in R by typing:

Code:
write.csv(cor(y),file="y_matrix.csv")
> write.csv(cor(z),file="z_matrix.csv")


For your ease, I have attached both files.

Quote:15) PCA and factor analysis are really different methods, and PCA shouldn't be called factor analysis.

Semantics. Some authors use "factor analysis" as a general term as I do in this paper. Others prefer "dimensional reduction method" or "latent trait analysis" or some other term and then limit "factor analysis" to non-PCA methods.

--

Attached is also a new PDF with the above fixes and a new code file with the changes I made.


Attached Files Thumbnail(s)
       

.csv   y_matrix.csv (Size: 55.86 KB / Downloads: 441)
.csv   z_matrix.csv (Size: 33.8 KB / Downloads: 400)
.pdf   international_socioeconomic_general.pdf (Size: 358.13 KB / Downloads: 512)
 Reply
#26
It wasn't possible to attach more than 5 files to a post.


Attached Files
.r   R_analysis.R (Size: 16.58 KB / Downloads: 383)
 Reply
#27
(2014-Jul-29, 01:24:01)Emil Wrote:
Quote: 1) I'd like to see more details of the factor/PC analyses. Were the sub-component intercorrelations explained by a single factor by the usual standards (e.g., only one factor with eigenvalue>1)? If not, how many other factors were there, can you give a substantive interpretation to them, and are they correlated with national IQ? KMO and Bartlett's test are pretty superfluous and aren't usually reported; I would mention them in a footnote only.

You can view all the details in the R code file.

Number of factors to extract was in all cases set to 1 ("nfactors=1" in code), so no criteria for determining the number of factors to keep was used.


So the criterion used was to extract just one factor. You should state that explicitly in the paper and justify the decision. At the limit, given that your variance explained is <50%, it is possible (though extremely unlikely) that there's a second factor that explains almost as much. In that case, denoting one of them as a general factor would be arbitrary. At the very least, you should tell how many factors with eigenvalues>1 there are.

Quote:I report KMO and Bartlett's because a reviewer (Piffer) had previously requested that I include them. Clearly I cannot satisfy both of you. It seems best just to include them. They don't take up much space.

OK, but information on the decisions made on the number of factors to be extracted is much more important.

Quote:
Quote: 2) "individuals have a general socioeconomic factor"

This language is confusing. 'Factor' refers to a source of variance among individuals, so individuals cannot have factors. Individuals are located at different points on a factor, or have factor scores.

What I meant is that if one analyses the data at the individual level, one will also find a general socioeconomic factor (i.e. s factor).

Changed text to:
Gregory Clark argued that there is a general socioeconomic factor which underlies their socioeconomic performance at the individual-level.\cite{clark2014}

Still badly worded. Who are the 'they' referenced? Clark does not write about a "general socioeconomic factor." He writes about "social competence." Perhaps the Social Competence Factor would be a better label for your factor, given that some of its indicators are not usually thought of as indicating socioeconomic status.

Quote:
Quote: 3) "national measures of country well-doing or well-being"

excise "well-doing"

I prefer to keep both because these variables measure all kinds of things, some related to well-being (e.g. health, longevity) others to well-doing (income, number of universities).

'Well-doing' is archaic-sounding and does not mean what you think it does:

well-doing (uncountable)
1.The practice of doing good; virtuousness, good conduct.


'Well-being' implies material prosperity, too, and is quite sufficient for your purposes:

well-being (uncountable)
1.a state of health, happiness and/or prosperity


Quote:
Quote: 5) Figure 1 describes the structure of the SPI, why is there no corresponding figure describing the DP?

It did not seem necessary. It is less complicated and the user can find it in the manual referenced. Do you want me to include an overview of it?

http://democracyranking.org/?page_id=590

As far as I can tell, they have 6 'dimensions', which have the weights 50, 10, 10, 10, 10, 10. I'm not sure what they do within each 'dimension', probably they average the indexes. As with the DR, SPI, the HDI also has some idiosyncratic way of combining their variables to a higher construct. For reference, the HDI is based on a geometric mean (a what?) of three indicators.

If you are going to include an itemized description of SPI, you should tell more about DP, too.

Quote:
Quote: 7) "In some cases PCA can show a general factor where none exists (Jensen and Weng, 1994 [13]). For this reason, I compared the first factor extracted via PCA to the first factors using minimum residuals, weighted least squares, generalized least squares, principle axis factoring and maximum likelihood estimation"

I don't see how the similarity of factor loadings based on different extraction methods can tell us anything about the existence of a general factor. You don't say what a 'general factor' is, but I assume it means a factor with all-positive indicator loadings regardless of extraction method. There is no such factor in your data, as indicated by the many negative loadings listed in the Appendix. Even if you reverse coded the variables so that higher values on all variables would have positive valence (e.g, "Adequate nourishment" instead of "Undernourishment"), which I think would be a good thing to do, there'd still be negative loadings on the first factor/PC (e.g., suicide rate).

A general factor need not be a perfectly general factor, just a very large one. As you mention, there are a few variables that load in the 'wrong' direction which is not found in the analysis of cognitive data.

A general factor would be disproved if there was a lot of variables that didn't load on the first factor (i.e. with very low loadings, say <.10). This isn't the case with these national data, mean abs. loading was high (.6-.65).

Most factor analyses will produce first factors that are substantially larger than the subsequent ones. What is the standard for "a very large" factor? The g factor is a general factor because all cognitive abilities do load positively on it.

More than a third of the loadings on the SPI factor are negative, so it's not a general factor. Similarly, if you had a cognitive test battery where some subtests loaded positively on the first unrotated factor and other subtests loaded negatively on it, there'd be no general factor, no matter how much variance the first factor explained.

Of course, in the case of the SPI factor the negative loadings are mostly an artefact of your failing to reverse code the negatively valenced variables. It's possible that this decision has some effect on all loadings.

Quote:
Quote:8) How did you compute the correlations between factor loadings? The congruence coefficient rather than Pearson's r should be used: http://en.wikipedia.org/wiki/Congruence_coefficient (Or did you use factor scores?)

Congruence coefficient has some bias (e.g.). I used Pearson r.

I cannot access that paper, but whatever bias the CC has is miniscule compared to the bias that Pearson's r can produce when it is used to compare factor loadings. Look at the example on p. 100 in The g Factor by Jensen. The CC is the standard method for comparing factor loadings in EFA, and you should use it.

Quote:One cannot compare loadings (easily) when using subset x whole/subset analysis, for those I used scores.

State explicitly in the paper what you are doing. Computing correlations between factors can be done in many different ways (e.g., factor scores, congruence coefficient, CFA latent factor correlations).

Quote:
Quote:9) Sections 4-6 have nice graphs, but I don't see their purpose. The fact that the correlation between two linear combinations of correlated elements gets higher the more there are shared elements is self-evident. The graphs might be of use if there was a practical need to estimate the S factor using only a limited number of components, but I don't see why anyone would want to do that.

I assume that the results in sections 4-6 are based on correlations of factor/component scores, but it's nowhere specified. Are the factors extracted with replacement?

Are the results from all the 54 components based on PCA? If so, the higher correlations with PCA components could be artefactual, due to common method variance.

As you can see, there has been some methodological studies concerning the interpretation of loadings from small subsets of variables, as well as correlations between factors from different methods. These results are clearly relevant to these matters.

If you interpret a factor in a realist manner, as g is usually interpreted, then such studies are relevant, but your S factor seems completely artefactual.

Quote:I don't know what you mean "by replacement". I used the regression method to get scores. I used the fa() function from the psych package. http://www.inside-r.org/packages/cran/psych/docs/fa[/

Ignore the replacement issue, with 1000 iterations you of course have to reuse the variables.

Quote:The result from all the 54 variables is based on the same method used to extract from the subset. So, when using maximum likelihood (ML) to extract S from the subset of vars, the comparison S is also extracted via ML.

The subset x subset analyses are only based on PCA, however. I can replicate these with the other 5 methods if necessary. I think the differences will be slight.

I think that's OK as it is, but, again, you should make it explicit in the paper.

Quote:
Quote:10) MCV correlations of 0.99 are unusual in my experience. I'd like to see the scatter plots to ascertain that there's a linear association across the range. It's possible that the high correlations are due to outliers. What happens to the MCV correlations if you reverse score the variables with negative valence?

The first thing I did when working this out was to plot them. You can do this yourself with the plot() command (the code is around lines 293-327). They are indeed very linear. I have attached one plot for each with national IQs. They are almost the same with Altinok's.

I don't know what you mean with the other question.

OK, it's very linear. I'd include at least one of the scatter plots in the paper.

The other question refers to the fact that the variables with negative loadings may have an outsized influence on the MCV correlations because they somewhat artificially increase the range of values analyzed.

Quote:
Quote:11) "The analyses carried out in this paper suggest that the S factor is not quite like g. Correlations between the first factor from different subsets did not reach unity, even when extracted from 10 non-overlapping randomly picked tests (mean r’s = .874 and .902)."

Your analysis is based on different sets of observed variables, while the g studies that found perfect or nearly perfect correlations were based on analyses of latent factors which contain no error or specific variance, or on analyses of the same data set with different methods. So the results aren't comparable.

I am referring to the two Johnson studies which found that g factors extracted from different IQ batteries without overlapping tests did reach near-unity (1 or .99). This wasn't the case for these data. One problem in interpretation is that IQ batteries are deliberately put together so as to sampled a broad spectrum of ability variance. Randomly chosen subsets of such tests are not.

The Johnson studies used CFA and latent factors to test if g factors from different test batteries were equivalent. You did not use CFA and latent factors, so you're comparing apples and oranges.

Quote:The best way to proceed is to obtain the MISTRA data and repeat my analyses on them. So it comes down to whether they want to share the data or not.

The MISTRA IQ correlation matrices have been published: http://www.newtreedesign2.com/isironline...RAData.pdf

Quote:
Quote:12) The biggest problem in the paper is that it seems to be pretty pointless. Yes, most indicators of national well-being are highly correlated with each other and with national IQ, which means that the first PC from national well-being data must be highly correlated with national IQ, but so what? That was obvious from the outset.

What is the S factor? Do you believe it is a unitary causal factor influencing socioeconomic variables? That interpretation would give some meaning to the paper, but I think it's not a very promising idea. Socioeconomic status is normally thought of as a non-causal index reflecting the influence of various factors. Clark speaks of a "social competence" that is inherited across generations, but I don't think he views it as a unitary causal force but rather as a composite of different influences (such as IQ and personality).

I think national IQs and national socioeconomic indices are so thoroughly causally intermingled that attempting to say anything about causes and effects would require longitudinal data.

Strangely, this was the criticism that Gould also offered of the g factor (cf. Davis review). I disagree the results are NOT obvious. Neither g, G, s or S are obvious. Especially not the MCR results.

I agree that the MCV results are perhaps of some interest.

It is well known that nations that are richer also have better health care, less malnutrition, higher life expectancy, better educational systems, better infrastructure, etc., so the S factor is unsurprising. The g factor is surprising because many assume that various domains of intelligence are strongly differentiated. Moreover, the realist interpretation of g does not rely just on the positive manifold, but on many completely independent lines of evidence (e.g., from multivariate behavioral genetics). So my criticisms are quite unlike Gould's.

What is this general socioeconomic factor? Why is it worth studying? Is it a formative or reflective factor?

Quote:
Quote:14) I couldn't find a correlation matrix of all the variables in the supplementary material. It would be useful (as an Excel file).

They would be very large (54x54 and 42x42). Anyone curious can easily obtain it in R by typing:

Code:
write.csv(cor(y),file="y_matrix.csv")
> write.csv(cor(z),file="z_matrix.csv")


For your ease, I have attached both files.

There are all kinds of stuff in your supplemental materials, but if someone wants to replicate or extend your analysis, correlation matrices are what they need. Although I think it's unfortunate that you have not reversed the scoring of the negative variables.

Quote:
Quote:15) PCA and factor analysis are really different methods, and PCA shouldn't be called factor analysis.

Semantics. Some authors use "factor analysis" as a general term as I do in this paper. Others prefer "dimensional reduction method" or "latent trait analysis" or some other term and then limit "factor analysis" to non-PCA methods.

Methodologists are adamant about the fact that PCA and FA are quite different animals. But I'm not going to insist on this.
 Reply
#28
(2014-Jul-26, 17:14:16)Emil Wrote: Yes. It is a statistical phenomenon (cf. Spearman Brown formula).


I understand, but my problem is with your wording, because it sounds like if G is not causal, aggregation will not increase correlation with G. That's the entire sentence that should be remade.

(2014-Jul-26, 17:14:16)Emil Wrote: It is well-established at the personal level that the causation is g→wealth (income), not much the other way. At the national level, it is more contentious. I don't have to argue for that here, since I'm not making such a claim in the paper.


Ok with the last sentence, but i wanted to say it because even though at individual level, wealth variable may have moderate causal effect, things can be different at national level where you have more environmental variations, and so you can (should) expect higher effect of environmental factors.

Concerning congruence coefficient, after reading this article...

Davenport, E. C. (1990). Significance testing of congruence coefficients: A good idea?. Educational and psychological measurement, 50(2), 289-296.

... I am left with the impression it's very bad method. You should be careful with that. (The version of the paper I have can't allow copy paste, but check the pages 293-295.) The congruence coeff seems to constantly give you very high value even in situations where they should be (theoretically) small, or not high at all.

Regarding the question of negative loadings, i don't understand your discussion here, both of you. My opinion is that when you have small loadings (such as 0.20 or less) in the 1rst unrotated factor, regardless of the direction, you should remove it because it's a poor measure of this factor.

The same can be said about rotated factor analysis. For instance, you have 3 interpretable factors, and you have a 4th one, not interpretable at all. And one of your variable has meaningful loading only on the 4th but not the others. In that case, remove it and re-do the factor analysis, etc. etc... until you get something neat.

Concerning if having only one variable with no loading on the 1st non-rotated factor is a problem and should be removed (i.e., keeping only variable having large or (at least) modest loadings), I would appreciate if someone here can find me some articles that talk about that subject, because I don't remember if I have.

---
---

EMIL OW K :

Each time I post here, I get the following:

Quote:Please correct the following errors before continuing:
The subject is too long. Please enter a subject shorter than 85 characters (currently 88).

Can you try to fix that ?

edit2: I forget to thank you for the files.
 Reply
#29
Meng Hu. It means the title of your post is too long. I think the limit is hardcoded. It is easily solved by shortening the title of the post. It's useful to give them meaningful titles. My post above (#25) is called "Reply to Dalliard" since it's a reply to his criticism.
 Reply
#30
Dalliard,

I will reply to most of your criticism later. I am currently visiting my girlfriend in Leipzig and I'm working from my laptop which isn't well-suited for statistical analyses.

However, one point. Yes, it is possible that the 2nd factor is almost the same size as the first. I had actually checked this because I initially did some analyses in SPSS before moving to R (it's my first time using R for a project). Here's what one can do in R:

Code:
y_ml.2 = fa(y,nfactors=2,rotate="none",scores="regression",fm="ml") #FA with 2 factors
y_ml.2 #display results
plot(y_ml.2$loadings[1:54],y_ml$loadings) #plots first factors
cor(y_ml.2$loadings[1:54],y_ml$loadings) #correlation ^

y_ml.3 = fa(y,nfactors=3,rotate="none",scores="regression",fm="ml") #same as above just for 3 factors
y_ml.3
plot(y_ml.3$loadings[1:54],y_ml$loadings)
cor(y_ml.3$loadings[1:54],y_ml$loadings)


One will get the 2 factor and 3 factor solutions using max. likelihood. Apparently the first factor is not completely identical across the nfactors to extract, but almost so. ML1 (with nfactors=1) with ML1 from nfactors=2 and 3 was .999.

With nfactors=2, the 2nd factor was much smaller. Var% for ML1 is about 41%, for ML2 it is about 11%,

With 3, ML1=41%, ML2=10%, ML3=5%.
 Reply
 Previous 1 2 3 4 5 8 Next   
 
 
Forum Jump:

Users browsing this thread: 2 Guest(s)