Hello There, Guest!  

[ODP] Are stereotypes about immigrants accurate in Denmark?: a large, preregistered

#11
Heiner,

Thank you for the review. We will get back to you with a reply.
 Reply
#12
Heiner,

Quote:First, there is nearly no introduction and no theory. Its a very Ockham-British-Lynn like data driven paper. More verbalization and generally a better formatting of ODP papers would be helpful.

Personally, I find it annoying to read papers with long introductions. I tend to just skim or skip them entirely. I write my own papers in the style that I would like to read: straightforward and to the point. Other reviewers like John Fuerst has applauded this presentation style previously and it is thus impossible to satisfy both kinds of reviewers.

Quote:Next there is a lot of statistical analysis and it seems the authors have analyzed all what is possible to do with the given data. However, data analysis should be driven by research questions and theory.

I prefer to put more analyses in one paper than publishing multiple papers. This saves work because one does not have to write two similar introductions, repeat descriptive analyses and go thru peer review twice, which can take months. The result is that papers tend to be longer than normal and be target-paper-type such as the very long paper recently published in Mankind Quarterly (>100 pages with a ~50 page reply; http://mankindquarterly.org/archive/volume.php?v=107 ). For the benefit of the reader, it keeps related analyses together so one has doesn't have to download two papers to find all the analyses done on the same dataset (until/unless someone reuses the data for some other study, that is).

Quote:Introduction: You wrote “stereotypes were moderately accurate (median correlational accuracy score = .51)”. Usually, taken Cohen’s levels for interpretation r=.10 is a small correlation, .30 a medium and .50 a large correlation. For the validity of tests r=.50 is taken as a large correlation (Fisseni).
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Fisseni, H.-J. (2004/1997/1990). Lehrbuch der psychologischen Diagnostik. Göttingen: Verlag für Psychologie.
However, for reliability measures higher correlations are expected (r=.80 medium, r=.90 high; Fisseni). And as I remember averaging Jussim’s (2012, his tables 17-1 to 17-3) results the average correlation between stereotypes and criteria is about r=.81.
So you should discuss possible benchmarks/norms and justify your taken benchmark for interpretation.

We have changed the wording to fairly accurate. The correlations in the area of .80 are consensual/aggregate stereotypes, not individual/personal stereotypes. E.g. Jussim's Table 17-1 has .42, .36 and .69 as individual-level accuracy correlations (he does not say which kind of average was used, so I presume it's the arithmetic mean; we used medians due to the skewed distribution). We found mean/median individual accuracy of .43/.48, so the values are in the same ballpark. His aggregate accuracy correlations in the same table are .60, .93, .88, .93, .53, .77, .77, .68, .72. This is close to our aggregate accuracy finding of .70. The previous studies used small, unrepresentative samples (N's 60-90ish), so not too much weight should be put into them. We have expanded on our discussion to include a discussion of previous numeric results.

Quote:1. Disclosure. You start your questionnaire (translated to English) with:
“When people have expectations about how one or more groups are or behaves, it’s often called a “prejudice” or “stereotype”. Often it’s said that they are inaccurate, even without examining this. The purpose of this study is to examine how precise the Danes’ stereotypes of immigrants are. Which is why we ask of you to evaluate how well a plethora of immigrant groups in Denmark perform, without looking up the numbers beforehand.”
In this way the surveyed persons know what is going on and that may bias their answers and your results. You should better frame it more neutrally and bend thoughts [distract] from the stereotype subject.

Explaining the purpose of the study was done by the suggestion of the pollster who thought it might help get higher quality data. Note that we had a fair amount of problems with getting people to understand the assignment and/or fill it out honestly. It's hard to say whether this introduction helped or not. We have more stereotype accuracy studies planned and so we can test whether explaining the purpose makes a difference or not.
We have added a paragraph in the discussion about this fact and how it may have biased findings.

Quote:2. Ask more questions. Not only ask about migrants and public assistance (welfare), but also about crime and about divorce and family stability or production of jobs (anything positive on immigrants) or student performance. And a totally different subject as differences between men and women.

Stereotype accuracy studies of gender and political labels are planned to be done 'in the next year or so', depending on time and monetary limitations. We have data for the immigrant groups for other sociological outcomes: income, criminality, educational attainment, so it is possible to carry out a follow-up immigrant stereotype accuracy study using another outcome.
There are two dimensions to stereotype accuracy studies: the number of groups and the number of attributes. The present study is extreme in that it has a very large number of groups, but only one attribute (70x1 design). However, for gender, there are only two main groups (the proportion who do not consider themselves male/female is tiny, .3% in the OKCupid dataset), but we have many attributes. We plan on using about 50, making it a 2x50 design. Our attribute data are based on a collection published by a Danish newspaper who bought gendered statistics for some 250 categories or so from a number of pollsters. These are all recent, nationally representative samples for the Danish population. The outcomes are very varied.
For political labels, we have not decided on how many attributes to use yet or how many labels, but perhaps the numbers will be about 20x4, so the design will be intermediate between the immigrant and gender studies. The attribute data will come from another planned study where we ask people questions about their political preferences and ask them which labels they self-identify with.
After one has done the above, one can look for general stereotype accuracy across participants. I'm not sure anyone has investigated that yet. In general, previous studies of stereotype accuracy have not been very systematic large-scale, so there is plenty of room for improvement.

Quote:In the paper itself the question has to be mentioned, this is central! Currently the paper cannot be understood. (“People who live in Denmark originate from many countries. We would like you to evaluate how many people among the 30-39 year olds that you think are on public assistance, from each country of origin.”)

We have included the central question in the text (Section 2.2).

Quote:Always add page numbers in manuscripts. Always add page numbers in by Open Psych published papers. Papers should be finally formatted as usual by professional publishers (Cambridge, Elsevier, Sage, Springer). The more professional the better.

PDF readers supply the page numbers, so if one reads it electronically, they are redundant. However, some people print papers making them not redundant. We have added page numbers.
With regards to a more professional style, have a look at this recently published paper: http://openpsych.net/ODP/2016/07/putting...mposition/

Julius is offering to style up papers like this for a small fee (depending on the length of the paper). Since he is a co-author, he will style up this paper as well. I think that this markedly improves the respectability of the finalized papers and may help convince more conservative colleagues. Personally, I focus on the content, not the presentation.

Quote:I do not understand the utility of the analyses in chapter 6 (“Inter-rater agreement”). Start your analyses and paper with theory and research questions and then add only the necessary analyses.

In the review of the pilot study one or more reviewers asked for this information and hence we provide it here as well. One cannot please everybody! :)

Quote:Very good and important: What predicts stereotype accuracy. Here you found that conservatism leads to higher accuracy of judgments. However, in areas of lefty questions (e.g. gender differences) maybe progressivity would lead to better stereotype accuracy.

In general, we didn't find many useful predictors of accuracy. The only moderately good ones were cognitive ability (r=.22) and education (r=.20). But in some contexts (e.g. with Muslims), some political preferences predicted higher accuracy (e.g. nationalism, r=-.12). You are right that in the main analyses, conservatism was slightly correlated with correlational accuracy (r=.13) but not with absolute accuracy (r=.02) and not in multivariate LASSO regression either, so the predictive validity is questionable, not general and very small to begin with.

The previous studies that Jussim discusses that looked into correlates of accuracy were seriously underpowered and probably useless to draw conclusions from. E.g. he summarizes the findings of Ashton and Esses (1999) which had measly sample of N=94 university students. He seems unaware that these kind of interaction effects cannot be reasonably established by studies of that size.

Quote:Table 5 in chapter 7.1: Never bring abbreviations – I do not understand what means “DF” etc.

It is necessary to use abbreviations because the party names are too long and also in Danish. The appendix (after references) has information about the parties. The abbreviations are mentioned in Section 2.2, but for ease of finding the information, we have written a brief explanation in the caption of each table that has the parties. DF, by the way, is Dansk Folkeparti (Dänische Volkspartei), the main nationalist, conservative, immigration skeptical party. However, three smaller parties, more extreme parties are now gathering signatures to run for the next election, so things may change in the next few years.

Quote:Below Figure 6 (why there are no page numbers?): d=.19 has to be written as d=0.19. Values (p, r, usually beta) which can be only between 1, 0 and –1 write as “.19”, values that can be larger (d) write as “0.19”.

It is common practice in many areas to omit redundant digits, particularly in programming. The general reason to omit leading zeroes is the same as that to avoid padding zeroes. One could write .1 as 0.1 or 00.1 or 0.100 or 000.1000 etc. The shortest version that conveys the information is .1.
The rule with having 0 when the number is not bounded between 1 and -1 is just something APA made up. Just as Wikipedia made up some other rules, like requiring leading zeroes for numbers bounded between 1 and -1... except for baseball batting averages where they prefer omitting them and for 12-hour system hours (2:30, not 02:30).

Quote:Finally, choose different outlets. Also include standard APS, APA and Elsevier journals.

When these publishers stop leeching money from the public (journal subscriptions) and stop demanding ridiculous charges for publishing (article processing charges), we will consider it. Since they are quite content to abuse the scientific reputational system for their own monetary benefit, I refuse to work for them for free. “the mountain must come to Muhammad”. https://en.wikipedia.org/wiki/Wikipedia:...nd_numbers

We are aware that this study would probably get quite a bit of attention if it was published in a mainstream journal. However, since we consider this to be unethical, we will have to pursue other means of getting attention. For instance, by sending the paper to Jussim/other colleagues, posting it on Twitter, Researchgate, Facebook, etc.

--

The project files have been updated with the changes mentioned above (version 15).
 Reply
#13
We are still waiting for a reply from Sean who said he would have time to review this paper. Last reminder sent to him Aug. 18.
 Reply
#14
I have sent a reminder email to Rindermann to let him know we are waiting for his second comment.

I think we will give up getting a review from Sean Stevens as he did not reply to my email. However, with Rindermann, there will be 3 approvals (Bob Williams, Gerhard Meisenberg, Heiner Rindermann).
 Reply
#15
Rindermann sent his review to me and I post it here:

Heiner Rindermann, via email Wrote:Thanks for the revision.

You distinguish sometimes (abstract, text, answer to the reviewers) for Jussim and your data between ”mean/median individual accuracy” and ”aggregate accuracy”. This is important and explain in detail and bring an example what does it mean.
”So far we have examined individual-level (in)accuracy and its correlates (also called personal stereotypes). However, one can also aggregate the estimates and then examine (in)accuracy and its correlates (consensual stereotypes) (Jussim, 2012).”
Yes, this is only word sound without explaining the meaning and give examples!

Please add in your abstract:
old:
A nationally representative sample was asked to estimate the percentage of persons aged 30-39 receiving social benefits for 70 countries of origin
new:
A nationally representative Danish sample was asked to estimate the percentage of persons aged 30-39 living in Denmark receiving social benefits for 70 countries of origin

Table 5: Still acronyms – write all out!

 Reply
#16
The paper has been revised.

1.
Quote: You distinguish sometimes (abstract, text, answer to the reviewers) for Jussim and your data between ”mean/median individual accuracy” and ”aggregate accuracy”. This is important and explain in detail and bring an example what does it mean.
”So far we have examined individual-level (in)accuracy and its correlates (also called personal stereotypes). However, one can also aggregate the estimates and then examine (in)accuracy and its correlates (consensual stereotypes) (Jussim, 2012).”
Yes, this is only word sound without explaining the meaning and give examples!

I looked over the paper. This distinction is explained in briefly in Section 8. However, I added a worked example as well.

2.
We have updated the abstract to use your wording.

3.
It is necessary to use acronyms because the real party names (translate or not) would take up too much space. We give the full list of parties in the appendix, so readers can consult that table.
 Reply
#17
Typo, p. 15: Tho individuals
 
The distinction between individual and aggregate accuracy is now clear. (It is similar to inter-rater reliability: While individual agreement between two randomly chosen raters is low, about r=.20 to .30 for students’ evaluations of instruction, the ”reliability” or ”objectivity” of the mean of two raters is about r=.25 to .35. It can be simply calculated using Spearman Brown formula – the more raters, the higher the reliability-objectivity of the mean of raters.)
 
However, it is still not clear what they mean.
 
I suggest to add sentences similar to those (write in your own words!):
 
Individual stereotypes represent the thinking of a single person. Individual accuracy stands for the accuracy of the thinking of an average single person, here about immigrants. However, this is not what is really interesting. Much more important is the accuracy of collective thinking, the accuracy of generally shared stereotypes. If the term ”stereotypes” is applied, usually not stereotypes of single persons are meant but in a society widely spread patterns of thinking influencing individuals in their thinking and behavior, e.g. about sex differences in mathematics vs. language or about differences in ability and crime between races. These are the relevant stereotypes, the collective ones. They impact society and culture and the people. We use for them the term aggregate stereotypes and the aggregate accuracy stands for the accuracy of the typical thinking in society.
 Reply
#18
Heiner,

Thanks for reviewing.


Quote: Typo, p. 15: Tho individuals

Fixed.

Quote: The distinction between individual and aggregate accuracy is now clear. (It is similar to inter-rater reliability: While individual agreement between two randomly chosen raters is low, about r=.20 to .30 for students’ evaluations of instruction, the ”reliability” or ”objectivity” of the mean of two raters is about r=.25 to .35. It can be simply calculated using Spearman Brown formula – the more raters, the higher the reliability-objectivity of the mean of raters.)
 
However, it is still not clear what they mean.
 
I suggest to add sentences similar to those (write in your own words!):
 
Individual stereotypes represent the thinking of a single person. Individual accuracy stands for the accuracy of the thinking of an average single person, here about immigrants. However, this is not what is really interesting. Much more important is the accuracy of collective thinking, the accuracy of generally shared stereotypes. If the term ”stereotypes” is applied, usually not stereotypes of single persons are meant but in a society widely spread patterns of thinking influencing individuals in their thinking and behavior, e.g. about sex differences in mathematics vs. language or about differences in ability and crime between races. These are the relevant stereotypes, the collective ones. They impact society and culture and the people. We use for them the term aggregate stereotypes and the aggregate accuracy stands for the accuracy of the typical thinking in society.

I rewrote part of the Discussion to be:

We observed relatively high levels of accuracy. The accuracy for aggregate stereotypes was much higher (r = .70) than the median individual accuracy (r = .48) as expected based on the Spearman-Brown formula. In thinking about stereotypes, the aggregate stereotypes are usually the important ones to focus on. This is because these represent the typical or average expectations of the population. The beliefs and any resultant actions of single persons average out with each other.


In general, the present results are similar to those found in the pilot study. The only findings that did not replicate were the strong predictive validities of age and gender observed in the pilot study.

The findings fit well with the general literature on stereotype accuracy (Jussim, 2012; Jussim et al., 2015). The average correlation in social psychology has been estimated to be around .20 (Richard, Bond, & Stokes-Zoota, 2003),1 while we found that 78% of participants had accuracy correlations above .30 and 45% had scores above .50. Previous studies of racial/ethnic stereotypes reported average accuracies between .36 to .69 and .53 to .93 for individual and aggregate-level stereotypes, respectively (Jussim, 2012, p. 327).

[foot note]
1This value is very likely to be too large. The estimate is based on a large number of meta-analyses which mostly did not correct for the endemic publication bias in this field (Open Science Collaboration, 2015).

Let me know whether this is satisfactory.

---

Files updated.
 Reply
#19
Dear Emil,
Good,
fine with me.
Heiner
 Reply
#20
I read briefly through the paper and found that it is OK. I think it is ready to be published in its present form.
 Reply
 
Forum Jump:

Users browsing this thread: 1 Guest(s)