Hello There, Guest!  

[ODP] Educational attainment, income, use of social benefits, crime rate and the gene

#21
I used only SSA countries but Islam is still a good predictor. R = .914. Even removing Somalia does not get rid of the correlation. It is .686 without Somalia.

I also tried dividing countries into MENAP and not. The correlation is positive for both groups. R = .593 in non-MENAP and R = .314 in MENAP. This does not seem to fit well with Peter's hypothesis.

The predictive ability of Islam seems quite robust.


Attached Files Thumbnail(s)
           
 Reply
#22
I entered countries with dichotomous variables for whether they were European, SSAfrican or MENAP. Then I used multiple regression along with the Islam variable to see whether these variables has any effect on predicting performance in Denmark. SSA and MENAP had no effects in the model, but European had a high effect still lower than Islam and in the opposite direction.

It also indicates that being MENAP in itself is not associated with explanatory power.


Attached Files Thumbnail(s)
   
 Reply
#23
We seem to be talking past each other. I'm not challenging your correlation. I'm saying it's an artefact of factors that are tangentially related to Islam. To be brief, Islamic polities have been less effective at monopolizing the use of violence than polities in Europe and East Asia.

When you limit your correlation to European societies, you should keep in mind that much of southeastern Europe was still part of the Ottoman Empire until just before World War I. This is why Albanians are still oriented toward amoral familialism, clannism, and increased readiness to use personal violence for defence of "honour" and "face." A major criticism of the Ottoman Empire was that it could not maintain security of life and property. Banditry was very common and the State was unable or unwilling to put an end to it. The preferred strategy was to co-opt bandits and warlords.

In the case of sub-Saharan Africa, you should keep in mind that the non-Muslim immigrants come mainly from formerly British East Africa (Uganda, Kenya, Tanzania). Many if not most of these "Africans" are of South Asian origin.
 Reply
#24
Emil O.K., I think I will give you my vote, because I don't have see any particular flaws on the articles, just some points I don't recommend and i explained that in my earlier comment. I replied to your comment below. Let's see if you disagree or not but whatever the case, I don't see any reason to disapprove the publication.

Quote:You were looking at the Spearman rho only. The Pearson r is .064. The Spearman rho has p=.072, so perhaps a fluke.

I will never trust p-value if i were you. It's not significant because the sample is not large enough. But the size is not small. I will not recommend you to be mistaken on what a significance test is. It seems to me a lot of people don't know what it is. I have seen a lot of people having correlation of, say, between 0.1 and 0.3 but with p larger than 0.05 and conclude in the end "no correlation". That's wrong. Large p means that whatever your correlation is, your N is not large enough to have lot of confidence in your result. Why I dislike p value is because the p value cannot add much new information. p value (or X²) is based on two things : sample size and effect size. You already have these two information, so p value does not add anything worthy of consideration.

And yes, you note there's a difference between r and rho. That means, maybe, that an outlier is killing your correlation.

Quote:I have written some more in the text about PC2, but kept it in the matrix to show that it is a nonsense factor.

I know. But it's not what I said. No, I said that it's not necessary to correlate the PC2 with the other variables.

Quote:It is because some variables measure good things and others negative things (in the context of well-doing on the group in Denmark). We can reverse variables so that positive values are always better and negative always worse, but it makes no difference for the math.

It will make a big difference for the interpretation, i'm sure that I'm not the only one who find your table 11 difficult to read. Generally, a "g" factor has all positive loadings on it. Sometimes, even practioners remove the variables which have zero or negative loadings on PC1. They want it to be all positive.

Quote:The reason to use R² in this case is that SPSS calculates the adjusted R² but not the adjusted R. In multiple regression, just adding a variable generally increases the R value, even when it is a nonsense, randomly distributed variable. This is because the regression abuses random fluctuation in the data.

Well. A lot of researchers interpret it like that. They add a new variable in the regression, and this variable has good correlation with dependent var. And yet R² is low. They conclude the new variable is not important. Given that, I don't see why R² should be trusted. To get the best picture of an effect of any given variable, the best way is to examine the regression coefficient, either standardized or not. It's better than R² or R.

P.S. regarding the small size of the numbers in your tables, I was reading the 2nd version, not the 1rst. Even on the 3rd draft, the numbers are all smaller than the letters in your text.
 Reply
#25
(2014-Apr-29, 20:18:51)menghu1001 Wrote: Emil O.K., I think I will give you my vote, because I don't have see any particular flaws on the articles, just some points I don't recommend and i explained that in my earlier comment. I replied to your comment below. Let's see if you disagree or not but whatever the case, I don't see any reason to disapprove the publication.

Quote:You were looking at the Spearman rho only. The Pearson r is .064. The Spearman rho has p=.072, so perhaps a fluke.

I will never trust p-value if i were you. It's not significant because the sample is not large enough. But the size is not small. I will not recommend you to be mistaken on what a significance test is. It seems to me a lot of people don't know what it is. I have seen a lot of people having correlation of, say, between 0.1 and 0.3 but with p larger than 0.05 and conclude in the end "no correlation". That's wrong. Large p means that whatever your correlation is, your N is not large enough to have lot of confidence in your result. Why I dislike p value is because the p value cannot add much new information. p value (or X²) is based on two things : sample size and effect size. You already have these two information, so p value does not add anything worthy of consideration.


I am aware of this fact about statistics. :)

I saw a horrible example of it recently in this paper: http://www.jneurosci.org/content/34/13/4567.long

Quote:OSU did not differ from CS in ethnicity (χ2(4) = 8.2, p = 0.08), age, education, or verbal IQ, but had more males (61%) than CS (43%, χ2 (1) = 4.2, p = 0.04).

p=0.08... ergo did not differ
p=0.04.. ergo did differ
but the probabilities are 92% and 96%! According to the p-value analysis, neither of these are likely to be flukes. But the authors seem to believe in some magic division between 0.08 and 0.04.

Quote:
Quote:It is because some variables measure good things and others negative things (in the context of well-doing on the group in Denmark). We can reverse variables so that positive values are always better and negative always worse, but it makes no difference for the math.

It will make a big difference for the interpretation, i'm sure that I'm not the only one who find your table 11 difficult to read. Generally, a "g" factor has all positive loadings on it. Sometimes, even practioners remove the variables which have zero or negative loadings on PC1. They want it to be all positive.

Often, ECT/RT are coded so that they give rise to negative correlations in the correlational matrix/PCA.

Quote:
Quote:The reason to use R² in this case is that SPSS calculates the adjusted R² but not the adjusted R. In multiple regression, just adding a variable generally increases the R value, even when it is a nonsense, randomly distributed variable. This is because the regression abuses random fluctuation in the data.

Well. A lot of researchers interpret it like that. They add a new variable in the regression, and this variable has good correlation with dependent var. And yet R² is low. They conclude the new variable is not important. Given that, I don't see why R² should be trusted. To get the best picture of an effect of any given variable, the best way is to examine the regression coefficient, either standardized or not. It's better than R² or R.

P.S. regarding the small size of the numbers in your tables, I was reading the 2nd version, not the 1rst. Even on the 3rd draft, the numbers are all smaller than the letters in your text.

As I wrote, this is because we are embedding them as pictures not using LATEX tables. This saves a lot of time converting. I generally don't think it is a problem that readers must zoom in a bit for small numbers. The other option is to make the tables even larger which prevents the reading flow for those who do not closely inspect the tables (perhaps most readers).

I have attached a new version, adding the analyses about Islam as well as some general fixes.


Attached Files
.pdf   educationalattainmentetcDenmark4.pdf (Size: 752.44 KB / Downloads: 490)
 Reply
#26
I runned some regressions using your attached data. Concerning table 13, you should precise for each model the number of countries, because they obviously differ (Islam only, N is 61, but for other models, N is usually 48). Also, optionally, you can add some precisions about the normality of your residuals. Try this.

REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT g_se_pc1
/METHOD=ENTER Islam
/SCATTERPLOT=(*ZRESID ,*ZPRED)
/RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).

In the first, you see that your P-P plot is more or less normal.

However, if you use this :

REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT g_se_pc1
/METHOD=ENTER GDP Height
/SCATTERPLOT=(*ZRESID ,*ZPRED)
/RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).

You'll see that the p-p plot is not normal, but the deviation from normality seems not too alarming. The p-p plot has the same shape if you use islam+gdp+iq+height. But the p-p plot for IQ+height model seems to have a big problem.
 Reply
#27
I have never used the SPSS for inputting the commands manually, just using menus. In hindsight, it is best if papers include this kind of information in the supplementary information, so that others can reproduce results exactly.

I don't understand the meaning of p-plots statistically. I need to study more statistics.
 Reply
#28
It's an assumption of regressions, although it is rarely pointed out in academic papers, sadly. I see that they usually don't care about the normality of residuals, even if they should. Here's what Andy Field said in his book (discovering statistics using spss, 2009, p. 251) :

Quote:You need to check some of the assumptions of regression to make sure your model generalizes beyond your sample:

- Look at the graph of* ZRESID plotted against* ZPRED. If it looks like a random array of dots then this is good. If the dots seem to get more or less spread out over the graph (look like a funnel) then this is probably a violation of the assumption of homogeneity of variance. If the dots have a pattern to them (i.e. a curved shape) then this is probably a violation of the assumption of linearity. If the dots seem to have a pattern and are more spread out at some points on the plot than others then this probably reflects violations of both homogeneity of variance and linearity. Any of these scenarios puts the validity of your model into question. Repeat the above for all partial plots too.

- Look at histograms and P–P plots. If the histograms look like normal distributions (and the P–P plot looks like a diagonal line), then all is well. If the histogram looks non-normal and the P–P plot looks like a wiggly snake curving around a diagonal line then things are less good! Be warned, though: distributions can look very non-normal in small samples even when they are!

Some researchers sometimes (but not always) look at the normality of the data sets. It's possible to evaluate univariate normality, using syntax like that in SPSS :

FREQUENCIES VARIABLES=x1 x2 x3
/FORMAT=NOTABLE
/HISTOGRAM NORMAL
/ORDER=ANALYSIS.

EXAMINE VARIABLES=x1 x2 x3
/PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES EXTREME
/CINTERVAL 95
/MISSING PAIRWISE
/NOTOTAL.

But it's more definitive to look at how the p-p plot looks in your regressions. I have written the following in an earlier blog post of my mine :

Quote:Also, I will not put much faith on the Kolmogorov-Smirnov and the Shapiro-Wilk tests. They are very sensitive to sample size (Field, 2009, pp. 148, 788). This increases the probability of validating the null hypothesis (eg, less than 0.05) that the distribution is not normally distributed, while for testing the normality of the residuals, we want a p-value at least higher than 0.05, not less.

Quote:In large samples these tests can be significant even when the scores are only slightly different from a normal distribution. Therefore, they should always be interpreted in conjunction with histograms, P–P or Q–Q plots, and the values of skew and kurtosis.
(edit: after reading the underlined passage, I think it should be re-written "reject the null hypothesis that the distribution is normal")

Finally, you can read this here :
http://www.ats.ucla.edu/stat/spss/webboo...ssreg1.htm

Quote:Some researchers believe that linear regression requires that the outcome (dependent) and predictor variables be normally distributed. We need to clarify this issue. In actuality, it is the residuals that need to be normally distributed. In fact, the residuals need to be normal only for the t-tests to be valid. The estimation of the regression coefficients do not require normally distributed residuals. As we are interested in having valid t-tests, we will investigate issues concerning normality. … A common cause of non-normally distributed residuals is non-normally distributed outcome and/or predictor variables.
 Reply
#29
We were not using T-tests though.
 Reply
#30
I know. But when it says the "t-tests to be valid" it probably means the part of it which measures the "effect size". And effectively, it lowers the effect size(s) in your regressions.

http://pareonline.net/getvn.asp?n=2&v=8

(I have one excellent reference with details on this matter, but I couldn't remember the webpage. So I show you only the Osborne 2004 above, that's the only one that I can remember so far)
 Reply
 
Forum Jump:

Users browsing this thread: 1 Guest(s)