Hello There, Guest!  
 Previous 1 2 3 4 5 11 Next   

[ODP] Increasing inequality in general intelligence and socioeconomic status as a res

#21
2.3. “This has repeatedly been found for so many years that it hardly bears repeating”. Too informal?
8. It should be made more clear in the text that the data reported in fig.7 refer to simulation and not actual data. Also explain that you could not find this kind of data in the military Danish draft data, because it would have been useful to test your model against the actual data.
Add these explanations before or after this paragraph “In our modeled scenario, We decided to examine IQs 130 and 70 which are usually used as the thresholds for intellectually gifted and disabled, respectively. Figure 7 shows just this, along with the ratio of disabled per gifted using the no-gains model.”
 Reply
#22
Dear Piffer,

Quote: 2.3. “This has repeatedly been found for so many years that it hardly bears repeating”. Too informal?

No rules against that.

Quote: 8. It should be made more clear in the text that the data reported in fig.7 refer to simulation and not actual data. Also explain that you could not find this kind of data in the military Danish draft data, because it would have been useful to test your model against the actual data.

It is already clear. Both the text referring to the figure and the figure caption mention it.

draft Wrote:Figure 7 shows just this, along with the ratio of disabled per gifted using the no-gains model.

...

Figure 7: Percents of disabled and gifted individuals and their ratio, Denmark 1980-2014. Results based on the no-gains model.


How would you like us to make it more clear? In the figure title? I have added "Simulation results based on census data" to the figure title.

I don't see how anyone could be confused, since the paper is about modeling and there is no mention of any actual data aside from the one army study.

Draft updated to version 9.
 Reply
#23
(2014-Dec-06, 23:50:14)Emil Wrote: Dear Piffer,

Quote: 2.3. “This has repeatedly been found for so many years that it hardly bears repeating”. Too informal?

No rules against that.


Just a suggestion, some readers (especially hard nosed academics) may find it too informal. Personally I don't care.

Quote:How would you like us to make it more clear? In the figure title? I have added "Simulation results based on census data" to the figure title.

I don't see how anyone could be confused, since the paper is about modeling and there is no mention of any actual data aside from the one army study.

Draft updated to version 9.

Modeling has two phases, developing (theory) and testing (empirical), so it can also entail testing the model against real data (empirical part), unless you're writing a purely theoretical paper, which does not seem to be the case. Be that as it may, I think it's fine now since you added ""Simulation results based on census data" to the figure title." and this makes it clearer to the reader.
I approve the paper as it is, my review is done.
 Reply
#24
I wanted to make sense of the R syntax. But it has exhausted my patience. Honestly, if R programmers expect people to even care about reviewers examining the syntax given by the researchers, they should make R simple to understand (e.g., the use of multiple [] and {} and () is exceedingly exhausting for my little brain). Now, unless a reviewer is a statistician and a programmer, and is willing to spare some time, he will look at it. But that's beyond my expertise. I understand R, but only the simple codes.

However, I understand the description of the models you use. Still, I don't get several things. You say :

Quote:Then, for each year, we calculated the composite population using the population data and their IQs (using the same data as in Section 3). The plot of the results is shown in Figure 6.

But in Figure 6, the legend reads :

Quote:Figure 6: Change in mean IQ and SD over time in Denmark modeled from population data by country of origin and national IQs.

Why am I thinking it's odd here ? Given the above paragraph, you use the actual data, and calculate the means/SD per year, so it's the observed data. In the legend of Figure 6, it is said "modeled" as if you have used some statistical modeling while you have only made a descriptive analysis. Correlation and Cohen's d for example are this kind of descriptive analysis. At the beginning of section 4, you say that the model presented in figure 6 did not assume g gains for immigrants. As I said, it's odd because in the description, you merely said you have calculated the means and SD over time. Do you mean "no g gain model" because your descriptive analysis does not incorporate IQ gain for immigrants ? I ask this question based on what I understand (not much) of your syntax below :

Code:
# IQ vector - no gains
IQ.vector = unlist((DF["IQ"]-100)/15) #standardized IQs
names(IQ.vector) = rownames(DF) #set names again
#IQ.vector = c(-2,2) #for testing purposes

#IQ vectors for gains
for (case in 1:length(IQ.vector)){ #loop over each IQ
  if (IQ.vector[case] < -0.18666667){ #is it lower than DK?
    diff.to.DK = (-0.18666667-IQ.vector[case])
    IQ.vector[case] = IQ.vector[case]+diff.to.DK*.75 #change this value for the other scenarios
  }
}


And, more important. When one wants to examine what model has the better "fit" to the data (I use fit in parentheses because there is no fit indices) you need to compare the differing models (presented in table 2) with what is observed in the actual data. Why I'm perplexed is that in section 5, you have plotted the scenario of no g gain. Section 6, you have listed in the table 2, the expected outcomes for the different scenarios. Section 7, you talk about the military danish data, but I don't see the link between section 7 and sections 5-6. In section 7, you only mentioned that the SD of the immigrants is higher than the natives. I don't see how it helps to evaluate whether the model of "no g gain" is better than the others.

One other question here :

Quote:The cause of the larger than expected SD is perplexing. The fact that some non-Danes are classified as 'Danish origin' means that the SD should be smaller than modeled, not larger.

Why smaller ? I'm not sure I get the idea. Also when you say "larger than expected SD" you also said that your model predicts 11.3% SD higher, but that the actual data shows 14.2% higher. But these values are still close, no ? Also, what do you mean by "Using the model to predict this value using 2003 data, gives 11.3% which is not too far off (estimated SD's 15.01 for 'western' and 16.70 for 'non-western')." ? Which model exactly ?
 Reply
#25
MH,

Quote: I wanted to make sense of the R syntax. But it has exhausted my patience. Honestly, if R programmers expect people to even care about reviewers examining the syntax given by the researchers, they should make R simple to understand (e.g., the use of multiple [] and {} and () is exceedingly exhausting for my little brain). Now, unless a reviewer is a statistician and a programmer, and is willing to spare some time, he will look at it. But that's beyond my expertise. I understand R, but only the simple codes.

The code is extensively commented, so if one knows R and statistics, one can follow it.

[] {} and () are not the same. [] chooses subsets/values by index. E.g. if values = c(5,2,7), then values[1] is 5, values[2] is 2 and values[3] is 7. {} is used in control flow. () are used for calculations when order of operations are important. (To make it worse, [[]] is how one selects an item from a list in R.)

Quote: Why am I thinking it's odd here ? Given the above paragraph, you use the actual data, and calculate the means/SD per year, so it's the observed data. In the legend of Figure 6, it is said "modeled" as if you have used some statistical modeling while you have only made a descriptive analysis. Correlation and Cohen's d for example are this kind of descriptive analysis. At the beginning of section 4, you say that the model presented in figure 6 did not assume g gains for immigrants. As I said, it's odd because in the description, you merely said you have calculated the means and SD over time. Do you mean "no g gain model" because your descriptive analysis does not incorporate IQ gain for immigrants ? I ask this question based on what I understand (not much) of your syntax below :

I don't know what you don't understand. These are the same. Modeling is a broad term and does not imply any fancy stuff like fit indexes (as in latent trait modeling, confirmatory factor analysis etc.). It just means one is calculating based on a model of how reality works.

Wikipedia has a nice description:

Quote:Modeling and simulation (M&S) is getting information about how something will behave without actually testing it in real life. For instance, if we wanted to design a race car, but weren't sure what type of spoiler would improve traction the most, we would be able to use a computer simulation of the car to estimate the effect of different spoiler shapes on the coefficient of friction in a turn. We're getting useful insights about different decisions we could make for the car without actually building the car.

Section 4 has no reference to Figure 6. You must mean Section 5. (It is not "did not assume g gains for immigrants", it is "assumed no g gains for immigrants". These are different.)

No gains model is the one where there no immigrant gains, yes.

What do you understand about the syntax? The loop goes over each IQ in the IQ vector (the list of IQs for each country of origin). Then it checks if it is lower than Danish. If it is, it calculates the difference to Danish IQ, and adds a fraction of that to the IQ in the vector. In the code you quote, the fraction is .75, so it is the 75% gains model. One merely changes that value to calculate a different model (well, technically, it is one overall model with 4 parameter, but it is not important for present purposes).

Quote: And, more important. When one wants to examine what model has the better "fit" to the data (I use fit in parentheses because there is no fit indices) you need to compare the differing models (presented in table 2) with what is observed in the actual data. Why I'm perplexed is that in section 5, you have plotted the scenario of no g gain. Section 6, you have listed in the table 2, the expected outcomes for the different scenarios. Section 7, you talk about the military danish data, but I don't see the link between section 7 and sections 5-6. In section 7, you only mentioned that the SD of the immigrants is higher than the natives. I don't see how it helps to evaluate whether the model of "no g gain" is better than the others.

There is no data to compare against except for the army study. As you can see, the army study found that immigrants that a higher SD than predicted by the no gains model. Since the other models give a smaller SD for the immigrants, the best fitting model is the no gains one.

Quote: Why smaller ? I'm not sure I get the idea. Also when you say "larger than expected SD" you also said that your model predicts 11.3% SD higher, but that the actual data shows 14.2% higher. But these values are still close, no ? Also, what do you mean by "Using the model to predict this value using 2003 data, gives 11.3% which is not too far off (estimated SD's 15.01 for 'western' and 16.70 for 'non-western')." ? Which model exactly ?

No gains model is the default/primary. It is the one used unless otherwise stated.

When non-Danes are classify as 'Danish', this increases the raw score SD for the 'Danish' group. Since value is in the denominator, the ratio decreases i.e. smaller %.
 Reply
#26
I understand R much better without curly brackets because I don't know these things (I read your link, but I understand nothing at all). And my main problem with your model with gain is this :

Code:
for (case in 1:length


I notice you use it quite often, but I don't see what is "for", what is "in 1" and why you have ":" just after. And why you would need the "length" too.

When I said "At the beginning of section 4, you say that the model presented in figure 6 did not assume g gains for immigrants" there was indeed a mistake. It was section 6, not 4 : "The critical reader will have noticed that the model assumes that there are no immigrant changes in g.".

Quote:Modeling and simulation (M&S) is getting information about how something will behave without actually testing it in real life.

I agree with that statement. But when you were merely calculating means and say it's modeling, I'm confused. Or perhaps that's my definition that is narrow.

When you say "Since the other models give a smaller SD for the immigrants, the best fitting model is the no gains one" I wonder what is you are referring to. In table 2, you mentioned these other models, but this was about the magnitude of the increase in SD over time, not whether the immigrant SD is higher than the SD of the natives. I suppose you didn't refer to this, but the statement that "the other models give a smaller SD for the immigrants" compared to the no g gain model is not explicited in your text.

Emil Wrote:When non-Danes are classify as 'Danish', this increases the raw score SD for the 'Danish' group. Since value is in the denominator, the ratio decreases i.e. smaller %.


Tell me if I'm wrong. Your model did not take into account that (some of the) non-Danes were misclassified as Danish, and so, the no g gain model gives an over-estimated SD.
 Reply
#27
Quote: I understand R much better without curly brackets because I don't know these things (I read your link, but I understand nothing at all).

What you need is to read an introduction to programming in R. Read this: http://health.adelaide.edu.au/psychology...ching/lsr/

You cannot really utilize programming without understanding simple control flow.


Quote: I notice you use it quite often, but I don't see what is "for", what is "in 1" and why you have ":" just after. And why you would need the "length" too.

You cut off the code in the example. Don't do that. Here it is:

Code:
#IQ vectors for gains
for (case in 1:length(IQ.vector)){ #loop over each IQ
  if (IQ.vector[case] < -0.18666667){ #is it lower than DK?
    diff.to.DK = (-0.18666667-IQ.vector[case])
    IQ.vector[case] = IQ.vector[case]+diff.to.DK*.75 #change this value for the other scenarios
  }
}


In R, typing any integer, a colon, and any new integer creates a vector of numbers. Very useful. Just do e.g. 1:10 in R and see.

You appear not to understand loops. This is not the place for me lecturing you on basics of programming. It is more effective if you use a textbook.

If there is a command you don't understand, just look it up. Type ?command (e.g. ?length) in R and it will open help.

Quote: I agree with that statement. But when you were merely calculating means and say it's modeling, I'm confused. Or perhaps that's my definition that is narrow.

When you say "Since the other models give a smaller SD for the immigrants, the best fitting model is the no gains one" I wonder what is you are referring to. In table 2, you mentioned these other models, but this was about the magnitude of the increase in SD over time, not whether the immigrant SD is higher than the SD of the natives. I suppose you didn't refer to this, but the statement that "the other models give a smaller SD for the immigrants" compared to the no g gain model is not explicited in your text.

Your definition is too narrow, yes.

I don't understand why you don't understand it. Read the paper again perhaps? Immigrant SD is of course higher since they are composed of many groups. Any composite population of standard normal distributions with different means has a larger SD than 1 (also stated in paper).

Immigrant 'non-western' SD was ~14% larger than 'Western' SD, but the no-gains model only predicts it to be ~11% larger. The other models fare even worse. There is something more going on, perhaps differential selection for g between countries. This would increase the SD.

Quote: Tell me if I'm wrong. Your model did not take into account that (some of the) non-Danes were misclassified as Danish, and so, the no g gain model gives an over-estimated SD.

No. It gives an underestimated SD ratio between the groups.

Look, if you have: (non-western SD/western SD) and you increase western SD due to misclassification, the ratio becomes smaller.
 Reply
#28
You assume I didn't read these books an the help commands. But I already did. I told you that before. I repeat, but these weren't helpful.

(2014-Dec-13, 04:15:50)Emil Wrote: Immigrant 'non-western' SD was ~14% larger than 'Western' SD, but the no-gains model only predicts it to be ~11% larger. The other models fare even worse. There is something more going on, perhaps differential selection for g between countries. This would increase the SD.


That's what I said. But I also said it is not clearly stated in your article. If no-gain model predicts 11% larger, why not mentioning the % for the other models ?

(2014-Dec-13, 04:15:50)Emil Wrote:
Meng Hu Wrote:Tell me if I'm wrong. Your model did not take into account that (some of the) non-Danes were misclassified as Danish, and so, the no g gain model gives an over-estimated SD.


No. It gives an underestimated SD ratio between the groups.

Look, if you have: (non-western SD/western SD) and you increase western SD due to misclassification, the ratio becomes smaller.


I understand the second sentence, but you misread my comment. I said your no gain model did not take into account racial misclassification. So logically, the no gain model predicts higher SD for immigrants than what was observed in the data, because misclassification underestimates the ratio nonwestern/western SD.

EDIT :

Concerning what's modeling and what is not. I think you should probably say in your article that the graph in figure 6, something like "correspond to the scenario we expect under the no g gain model". As I said, Figure 6 is not really "modeled" because it's descriptive but I can accept that it corresponds to a model of yours.

Generally, what I think is a model is something close to a "prediction" such as in regression. In this analysis, you predict the individual's outcome, not based on individual's characteristics but on group characteristics. Regression is usually understood as an aggregation method. When you hold constant several independent variables, they are held constant for the values of the entire group. We can't control an individual's characteristics. Only group characteristics.

So in my opinion, a "descriptive" stats is not a model, but can correspond to a model you have in mind.
 Reply
#29
MH,

Quote: That's what I said. But I also said it is not clearly stated in your article. If no-gain model predicts 11% larger, why not mentioning the % for the other models ?

We did not calculate these values.

Quote: I understand the second sentence, but you misread my comment. I said your no gain model did not take into account racial misclassification. So logically, the no gain model predicts higher SD for immigrants than what was observed in the data, because misclassification underestimates the ratio nonwestern/western SD.

The data shows that the SD is larger than the no gains model predicts.

This smells of another language confusion. The data are biased towards a higher ratio, yet the model that produces the highest predicted NW/W-ratio still underpredicts it. I.e. it is likely that something else is going on.

Quote:EDIT :

Concerning what's modeling and what is not. I think you should probably say in your article that the graph in figure 6, something like "correspond to the scenario we expect under the no g gain model". As I said, Figure 6 is not really "modeled" because it's descriptive but I can accept that it corresponds to a model of yours.

Generally, what I think is a model is something close to a "prediction" such as in regression. In this analysis, you predict the individual's outcome, not based on individual's characteristics but on group characteristics. Regression is usually understood as an aggregation method. When you hold constant several independent variables, they are held constant for the values of the entire group. We can't control an individual's characteristics. Only group characteristics.

So in my opinion, a "descriptive" stats is not a model, but can correspond to a model you have in mind.

Your definition of "model" is idiosyncratic. I have already supplied quotes supporting my use of "model".
 Reply
#30
You need to understand first what is modeling. By this, it is understood how many elements can explain the data. By elements, I meant "variables", e.g., interaction terms, squared and/or cubic terms, and maybe additional variables. See here, for a pictural illustration. The purpose is to look for the most parsimonious model. If you have main effect and squared term of age, and then you decide to add its cubic terms as well, you will try to compare the two nested models by, say, chi-squared test, and then discovers that the two models don't differ significantly (I dislike p-value, but it's just for the sake of our argument). You can conclude the cubic term is not necessary and that a model with squared effects of age is sufficient to explain the observed data.

When you do a statistical model, you are actually comparing the models to the observed data, and evaluate which one has the best approximation (fit) to the data. Modeling makes no sense when you have no point of comparison. Because a statistical model serves to predict the observed data.

Thus, you cannot say "we have modeled the trend of..." when you have, e.g., just computed the means and SDs. This is a descriptive stats. This is not a statistical model even though it can describe your model. These are two different things, that I kept saying it.

Further, Andy Field (2009, pp.32-33) "Discovering Statistics with SPSS" has a nice description of what is a model, certainly much better than the wiki you have cited.

Quote:We saw in the previous chapter that scientists are interested in discovering something about a phenomenon that we assume actually exists (a ‘real-world’ phenomenon). These real-world phenomena can be anything from the behaviour of interest rates in the economic market to the behaviour of undergraduates at the end-of-exam party. Whatever the phenomenon we desire to explain, we collect data from the real world to test our hypotheses about the phenomenon. Testing these hypotheses involves building statistical models of the phenomenon of interest.

The reason for building statistical models of real-world data is best explained by analogy. Imagine an engineer wishes to build a bridge across a river. That engineer would be pretty daft if she just built any old bridge, because the chances are that it would fall down. Instead, an engineer collects data from the real world: she looks at bridges in the real world and sees what materials they are made from, what structures they use and so on (she might even collect data about whether these bridges are damaged). She then uses this information to construct a model. She builds a scaled-down version of the real-world bridge because it is impractical, not to mention expensive, to build the actual bridge itself. The model may differ from reality in several ways – it will be smaller for a start – but the engineer will try to build a model that best fits the situation of interest based on the data available. Once the model has been built, it can be used to predict things about the real world: for example, the engineer might test whether the bridge can withstand strong winds by placing the model in a wind tunnel. It seems obvious that it is important that the model is an accurate representation of the real world. Social scientists do much the same thing as engineers: they build models of real-world processes in an attempt to predict how these processes operate under certain conditions (see Jane Superbrain Box 2.1 below). We don’t have direct access to the processes, so we collect data that represent the processes and then use these data to build statistical models (we reduce the process to a statistical model). We then use this statistical model to make predictions about the real-world phenomenon. Just like the engineer, we want our models to be as accurate as possible so that we can be confident that the predictions we make are also accurate. However, unlike engineers we don’t have access to the real-world situation and so we can only ever infer things about psychological, societal, biological or economic processes based upon the models we build. If we want our inferences to be accurate then the statistical model we build must represent the data collected (the observed data) as closely as possible. The degree to which a statistical model represents the data collected is known as the fit of the model.

Figure 2.2 illustrates the kinds of models that an engineer might build to represent the real-world bridge that she wants to create. The first model (a) is an excellent representation of the real-world situation and is said to be a good fit (i.e. there are a few small differences but the model is basically a very good replica of reality). If this model is used to make predictions about the real world, then the engineer can be confident that these predictions will be very accurate, because the model so closely resembles reality. So, if the model collapses in a strong wind, then there is a good chance that the real bridge would collapse also. The second model (b) has some similarities to the real world: the model includes some of the basic structural features, but there are some big differences from the real-world bridge (namely the absence of one of the supporting towers). This is what we might term a moderate fit (i.e. there are some differences between the model and the data but there are also some great similarities). If the engineer uses this model to make predictions about the real world then these predictions may be inaccurate and possibly catastrophic (e.g. the model predicts that the bridge will collapse in a strong wind, causing the real bridge to be closed down, creating 100-mile tailbacks with everyone stranded in the snow; all of which was unnecessary because the real bridge was perfectly safe – the model was a bad representation of reality). We can have some confidence, but not complete confidence, in predictions from this model. The final model © is completely different to the real-world situation; it bears no structural similarities to the real bridge and is a poor fit (in fact, it might more accurately be described as an abysmal fit). As such, any predictions based on this model are likely to be completely inaccurate. Extending this analogy to the social sciences we can say that it is important when we fit a statistical model to a set of data that this model fits the data well. If our model is a poor fit of the observed data then the predictions we make from it will be equally poor.

I have highlighted the important passages.


Attached Files Thumbnail(s)
   
 Reply
 Previous 1 2 3 4 5 11 Next   
 
 
Forum Jump:

Users browsing this thread: 1 Guest(s)