Hello There, Guest!

# [ODP] An update on the narrowing of the black-white gap in the Wordsum

This should be: Huang & Hauser's (2001)

Quote:The wordsum correlates at 0.71 with the AGCT aptitude test, and that the wordsum has an internal consistency reliability of 0.71 for whites and 0.63 for blacks (Huang & Hauser, 2001), which is not surprising given the shortness of the test.

This should be: The wordsum correlates at 0.71 with the AGCT aptitude test, and it has an internal reliability of 0.71 for whites and 0.63 for blacks (Huang & Hauser, 2001); these reliabilities are relatively low for cognitive measures, but this is not supprising given the shortness of the test.

Quote:The usual operation

This should be: formula.

Quote:But it is clear that the d gaps in the period 1988-1993 were clearly smaller than than earlier years. Lynn has also regressed the d gaps on years. The "b" slope was -0.004, which means that over 22 years, the d gap has been reduced by 0.004*22=0.088, given that the linearity assumption holds (which was true according to Lynn). This is indeed not very large.

Try: But it is clear that the d gaps in the period 1988-1993 were smaller than than earlier years. Lynn has also regressed the d gaps on years. The "b" slope was -0.004, which means that, over 22 years, the d gap diminished by 0.004*22=0.088, given that a linearity assumption holds (which was true according to Lynn). This is indeed not very large.[/quote]

Quote:However, their d scores differ from Lynn's only for years 1993 and 1994. But, more importantly, they faulted Lynn for not having used cohort as the variable of time trend (which can be calculated as year minus age).

Try: However, their d scores differ from Lynn's only for years 1993 and 1994. But, more importantly, they faulted Lynn for not having used cohort as the variable for the time trend (which can be calculated as year minus age).

Quote:Here is an explanation of the two concepts. With survey year, assuming age is held constant, we are asking how are the 40-year-olds in 1980 different from the 40-year-olds in 1990. The former experienced WWII, but the latter didn't. This is the period effect. With birth cohort, assuming age is held constant, we are asking how are people born in 1950 different from people born in 1960, when they were both 40 years old. The former experienced the sexual revolution in their teenage years, but the latter didn't. This is the cohort effect. The two effects may or may not be the same thing. (I must thank Satoshi Kanazawa for the tip.)

Try: Here is an explanation of the two concepts: With survey year, assuming age is held constant, we are asking, "How are the 40-year-olds in 1980 different from the 40-year-olds in 1990?". The former experienced WWII, but the latter didn't. This is the period effect. With birth cohort, assuming age is held constant, we are asking, "How are 40 year olds born in 1950 different from 40 year olds born in 1960?". The former experienced the sexual revolution in their teenage years, but the latter didn't. This is the cohort effect. The two effects may or may not be the same thing. (I must thank Satoshi Kanazawa for the tip.)

Quote:Given their parameters of 2.641 for intercept, 3.037 for race, 0.024 for the slope of year, and -0.0176 for the interaction, we can predict the changes in the gap over time. This is done by computing the white trend with race*year interaction, 2.641+3.037+(0.024*24)-(0.0176*24)=5.8316, and the white trend without the interaction, 2.641+3.037+(0.024*24)=6.2540, which gives a difference of 0.4224

It's standard to round to the same level of significant digits when dealing with the same sets of numbers. So either 0.024 and -0.018 or 0.024? and -0.0176.

Quote:difference (corrected for censored distribution of wordsum)

What does this mean? Also, add an article ("the") before censored.

Quote:Squared and perhaps cubed terms should have been applied to categorical variables of years and their interaction with race rather than using the continuous variable of survey year.

Why "should" have they? Was the relations non-linear. Perhaps you mean:

The authors should have checked if using squared and perhaps cubed terms produced a better fitting model. Doing so, might have generated different results.

The finding of Huang & Hauser (2001) is interesting because it is known that the black-white IQ gap in the U.S. has not declined in the adult samples, only in the children samples (Rushton & Jensen, 2006; Dickens & Flynn, 2006).

Maybe: The finding of Huang & Hauser (2001) is interesting because it is known that the black-white IQ gap in the U.S. has not declined in adult samples but only in child and adolescent ones (Rushton & Jensen, 2006; Dickens & Flynn, 2006).

(Use "the adult samples" when referring to a specific set of samples; use "adult samples" when referring to an unspecific set of samples; in this case, I think you are referring to an unspecific set. If not, you should say e.g.,:

It is known that the black-white IQ gap in the U.S. has not declined in the adult samples but only in the child samples discussed by Rushton & Jensen (2006) and Dickens & Flynn (2006).

Quote: It is possible, nonetheless, that there was a gap closing before the period analyzed by Dickens and Flynn. See Murray (2007).

Try: It is possible, nonetheless, that there was a gap closing before the period analyzed by Dickens and Flynn (see Murray, 2007).

Quote:Before deciding which method to apply, one needs to examine the distribution of the variables we will use.

Try: Before deciding which method to apply, one needs to examine the distribution of the variables one wishes to use.

Quote:An important assumption of linear regression is the normality of the data, especially the distribution of the dependent variable.

Try: An important assumption of linear regression is the normality of the data, especially in context to the distribution of the dependent variable.

Quote:The right procedure should be to use a tobit regression (for an introduction, see, McDonald & Moffitt, 1980).

Try: The right procedure should be to use a tobit regression (for an introduction, see McDonald & Moffitt, 1980).

Quote:Since the year 2000, the GSS begins to ask whether

Try: Since the year 2000, the GSS began to ask whether

Quote:For respondents in survey year 2000+ I have only included the respondents who declared not being hispanic (see appendix).

Use a common.

Quote:The variable cohort has values going from 1883 to 1994. The variable sex has the following values; male=1, female=2. The variable age has values going from 18 to 89. The variable degree has the following values; 0=lower than high school, 1=high school, 2=junior college, 3=bachelor, 4=graduate. The variable educ has values going from 0 to 20. The variable realinc has values going from 245 to 162607, and the respective numbers for log income are 5.5 and 11.99. The variable reg16 has the

For clarity place the variable names in quotes.

The variable "cohort" has values going from 1883 to 1994. The variable "sex" has the following values; male=1, female=2. The variable "age"....

Quote:According to the GSS codebook, the "white" category in variable "race" (before the year 2000) includes mexicans, spaniards and puerto ricans "who appear to be white".

Capitalize e.g., Mexican.

Quote:As for age variable, I decided to remove (set to missing data) people aged 70 or more

Try: As for the age variable...

Quote:Hence, following the recommendations of Hauser & Huang (1999) I weight the data by the variable "weight" which is the interaction of the variables "wtssall" and "oversamp", although this will not change the results.

I would use a comma.

Quote:The black-white raw score gap in cohort1 was 2.023 items correct and has become 1.001 item correct in cohort6, which means the gap has been reduced by an half, while the gap was 1.638 items correct in year1 and has become 1.333

Quote:This is because the more recent cohorts are younger, and the wordsum correlates positively with age (r=0.1005). In models 3 and 4, the scores among whites have a declining trend.

Who ever reported correlations to the ten-thousandth place? Also try:

This is because the more recent cohorts are younger, and wordsum correlates positively with age (r=0.1005). In models 3 and 4, the scores among whites have a declining trend.

Quote:This is still 50% reduction

Try: This is still a 50% reduction

Quote:A subsequent analysis is done by computing the d gap (see supplementary file) within each of the category of the dummy variables.

Categories.

Quote:I split the variable wordsum into two parts

Try: I split the variable "wordsum" into two parts

Quote:Another way to investigate whether or not the improvement occurs at high levels is to conduct logistic regression with wordsum as dependent binary variable (score levels 0-7 coded 0 and score levels 8-10 coded 1)

Try: Another way to investigate whether or not the improvement occurs at high levels is to conduct logistic regression with wordsum as the dependent binary variable (score levels 0-7 coded as 0 and score levels 8-10 coded as 1)

Quote:The most notable problem with the wordsum is not to be a measure of general intelligence

Try: The most notable problem with using wordsum, in this context, is that it is not a great measure of general intelligence.

Quote:Given Huang & Hauser's (1996, pp. 7-8) discussion, there is no clear answer to this question

Try: Given Huang & Hauser's (1996, pp. 7-8) discussion, there is no clear way to determine if this has occurred.

Quote:The affirmation that the test has become harder may be true. To some extent

Try: The affirmation that the test has become harder may be true to some extent.

Quote:whites find the wordsum harder over time while the blacks would find it a little bit easier

try: whites find the wordsum harder over time while the blacks find it a little bit easier

Quote:Generally, there is some indication that the black-white gap has been under-estimated in early cohorts. And by the same token, the magnitude of the gap narrowing.

Fragment. Try: Generally, there is some indication that the black-white gap has been under-estimated in early cohorts -- and by the same token, the magnitude of the gap narrowing.

Quote:But at the same time, the white trend could have been even flatter or turned out to be somewhat dysgenic.

I don't understand this and I would advise against using "dysgenic", since this implies a causal model, the discussion of which is outside the scope of the paper. Maybe just delete the sentence.

Quote:Granted the limitation of the wordsum test, one may wonder what is the consequence of the black-white gap decline for the genetic hypothesis proposed by Rushton & Jensen (2010). ...

I wouldn't, in this paper, discuss this. It's not directly relevant to the topic of the paper and it unnecessarily geneticizes the discussion (thus turning off potential readers). I've made the same point regarding many of Emil's discussions: Don't conflate issues e.g., the "spatial transferability hypothesis" with certain global genetic hypotheses. Delete the whole paragraph.

I'll get back to you regarding method later.
(2014-Oct-06, 20:57:19)Chuck Wrote:
(2014-Oct-06, 03:14:22)menghu1001 Wrote:

I'll get back to you regarding method later.

I thought over the statistical method; I'm fine with it. I would like to know, though, why the survey year and birth cohort method produce such divergent results. There must be an age x survey year interaction. Could you check for this? I get what's happening as I've looked at the results prior. Basically, in 1975 older (50-75) African Americans perform much worse than mid age and younger (18-50) ones. During later years, the older age gap narrows. Now, one might take this as indicating a (cross age) cohort narrowing, yet another interpretation would be that it represents an older age narrowing i.e., there is less a difference between older people in 2000 than 1975. To determine which, you would need data from same age people in e.g., 1925 and 2000 which you don't have (for the GSS).

Over at HV, I commented on a similar (in methodology) analysis:

"Reardon’s analysis, of course, is deeply flawed by his failure to take into account both age effects and test content effects in addition to his dubious method of deriving early comparison points. As for the latter, he, for example, derives his early points, from the 1940s, from Charles Murray’s analysis of the 1976, 1986, and 1996 Woodcock–Johnson I to III standardizations. Of course, these samples were from the 70s, 80s, and 90s. To derive magnitudes of differences from the 40s, he projects back in time based on Murray’s birth cohort analysis. These differences, based on Full scale IQ — e.g., between 70 year old Blacks and Whites in the 90s who would have been 20 or so in the 40s — are then compared with the average Math and Reading differences between 5 to 7 year olds from the Early Childhood Longitudinal Study in the late 1990s (a study which showed a large effect of age on the magnitude of the math and reading gap — see: sample 48 — and also a large general knowledge gap at very young ages — see III, Chuck (2012c). His analysis, then, is confounded by the three problems and their interactions: (1) His method of deriving early points. (2) His comparison across measures. (3) And his comparison across ages."

For your cohort analysis, you are looking at e.g., age 50 differences in 1975 and age 25 differences in 2000 and finding a large change. But it's not obvious that this is fully a cohort change in the sense of age 18 through 65 people in 1925 versus age 18 through 65 in 2000 as opposed to an age x survey interaction in the sense that older people in 1975 (but less so younger) versus older people in 2000 (which is not the same as e.g., younger people in 1925 versus 1975). Anyways, I think that you should make a note concerning this issue. Generally, it's not clear if your "cohort analysis" is better than the survey year analysis in terms of determining the true cross age cohort effect.

(The proper interpretation should be, "There is a much larger older age gap in 1975 versus 2000" as opposed to, "There was a larger 1925 to 2000 cohort narrowing".)
I have made a lot of changes, but I will upload later.

One important change is the removal (at least temporarily) of my logistic regression analysis. I know that MacCallum et al. (2002) have already treated the practice of dichotomization of a continuous variable. They say it has problems because it lowers the reliability of the variable and can possibly alter the interpretation of it. One illustration can help to understand. Imagine you have 4 people with different level of fear about spider, A (100%), B (60%), C (40%), D (0%). You dichotomize the variable at the mean or median, so that A and B have value of 1 (fear) while C and D have value of 0 (no fear) and yet B and C are more alike than either A and B or C and D. This labeling is totally arbitrary and not justified. However, these authors applied this criticism to correlational, ANOVA and regression analyses. I was using logistic regression, which attempt to estimate the likelihood of having value of 1 versus 0. But now that I think about it, I'm not so sure about its robustness. One can still argue that my labeling (0-7 vs 8-10) is arbitrary but at the same time, the categories of my dummy variables must also be arbitrary. So, I will email MacCallum and ask him what he thinks about it. Of course, I know lot of people in recent papers conducted such dichotomization for logistic regression, but none of them have cited MacCallum et al. In light of this, I have replaced this analysis by another; I computed the d gap of wordlow and wordhigh, by dividing the black-white difference by the SD given in Table 4.

I have also added an explanation of the tobit coefficient, just in case someone would like to ask me to write it.

Emil :

Emil Wrote:Why do you mention the range of numerical variables, and all the possible values of nominal variables?

For age variable, you don't need to guess what means a range of 18-69. But for region, you don't know what values are assigned to each regions.

Emil Wrote:It is unclear how the nominal variables are used in the regression models. Hopefully you have not used them as continuous variables, as that makes no sense at all. Reg16 (region lived) and family16 are clearly not even quasi-continuous variables. Regression on that as they were is clearly nonsense. Res16 is quasi-continuous, so regression with it is okay.

Generally, I read that people accept the idea that a variable is (can be) thought as continuous when it has at least 5 values. In the variables you mentioned, they have more than 5 values.

As for the link to the thread, I couldn't (but now I can), because when I wrote the article, I have not created this thread. But of course I will link to the OSF later.

Chuck :

I have made all the modifications you indicated. However, concerning the number of digit after zero, they differ because it's how it is presented in Huang & Hauser (0.024 for cohort and -0.0176 for race*cohort). In the case of interaction variable, in my experience, I have seen quite a lot of time that even a small coefficient can have meaningful effect, so in my opinion, I find it justified to add one more digit for this variable. Another reason where I think it's justified to add more digit (for unstandardized coeff, but not standardized coeff) after the zero is when the variable can take on a large number of values, such as age (16-69). Concerning the correlation of wordsum with age (0.1005) it's how Stata has displayed the result. If you insist, I can round it at 0.10. Also, I have removed the word "dysgenic" and replaced it by "negative".

Chuck Wrote:I wouldn't, in this paper, discuss this. It's not directly relevant to the topic of the paper and it unnecessarily geneticizes the discussion (thus turning off potential readers).

This would be unfortunate I think. When someone discusses black-white gap (IQ or achievement) he or she always attempts to understand the causes of it. If I don't attempt to explain the meaning of the gap narrowing in verbal IQ, I don't understand the meaning of such analysis. For example, Huang & Hauser (2001) don't buy the hereditarian argument. And they show the gap narrowing is due to gain in SES over time (although I think their analyses don't really prove it). But most people do not attempt to weigh the hypotheses. It's really of no use to test the environmental hypothesis if you don't think about the prediction that the hereditarian hypothesis can make. I think most researchers should stop making fallacies like the "confirmation bias". It's easy to fall into this trap. I don't remember how many times I have read "environmental variables explain the gap, we have proved this hypothesis to be true". If the BW gap has narrowed (and even if it didn't), I need to discuss the consequences. During this period studied, blacks have certainly improved in social status, probably more than did whites, so I need to talk about the relevance of the environmental and genetic hypotheses, even if people don't like it. Of course, I can delete the word genetic and replace it by hereditarian. That is a less provocative term, by the idea is still the same.

Chuck Wrote:I get what's happening as I've looked at the results prior. Basically, in 1975 older (50-75) African Americans perform much worse than mid age and younger (18-50) ones. During later years, the older age gap narrows. Now, one might take this as indicating a (cross age) cohort narrowing, yet another interpretation would be that it represents an older age narrowing i.e., there is less a difference between older people in 2000 than 1975.

Tell me if I'm right. You want me to conduct a tobit regression with cohort, race, age, cohort*race, cohort*age variables ? And you say that you suspect the cohort*age effect to become stronger in later cohorts ?

The syntax looks something like this :

gen ageC1 = age*cohortdummy1
gen ageC2 = age*cohortdummy2
gen ageC3 = age*cohortdummy3
gen ageC4 = age*cohortdummy4
gen ageC5 = age*cohortdummy5
gen ageC6 = age*cohortdummy6

tobit wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex ageC2 ageC3 ageC4 ageC5 ageC6 [pweight = weight], ll(0) ul(10)

Code:
```Tobit regression                                  Number of obs   =      22156                                                   F(  18,  22138) =      92.49                                                   Prob > F        =     0.0000 Log pseudolikelihood =  -47838.71                 Pseudo R2       =     0.0176 ------------------------------------------------------------------------------              |               Robust      wordsum |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] -------------+----------------------------------------------------------------          bw1 |   2.023368   .1559898    12.97   0.000     1.717617     2.32912 cohortdummy2 |  -.5709338    .564835    -1.01   0.312    -1.678051     .536183 cohortdummy3 |  -.6702743   .5388143    -1.24   0.214    -1.726389    .3858401 cohortdummy4 |  -1.338139   .5381658    -2.49   0.013    -2.392982   -.2832961 cohortdummy5 |  -1.531758   .5444232    -2.81   0.005    -2.598866     -.46465 cohortdummy6 |  -1.503185   .5799227    -2.59   0.010    -2.639874   -.3664949         bwC2 |  -.2301317   .1922563    -1.20   0.231    -.6069678    .1467043         bwC3 |  -.5600697   .1797855    -3.12   0.002     -.912462   -.2076774         bwC4 |  -.6036657   .1770871    -3.41   0.001    -.9507689   -.2565624         bwC5 |  -1.006898   .1803751    -5.58   0.000    -1.360446   -.6533498         bwC6 |  -1.004008   .1995938    -5.03   0.000    -1.395226   -.6127898          age |  -.0131813   .0082283    -1.60   0.109    -.0293094    .0029467          sex |   .1739637   .0317265     5.48   0.000     .1117776    .2361498        ageC2 |   .0157154    .009103     1.73   0.084    -.0021271     .033558        ageC3 |   .0311774   .0087378     3.57   0.000     .0140508    .0483041        ageC4 |    .042542   .0088985     4.78   0.000     .0251004    .0599837        ageC5 |   .0591789   .0094609     6.26   0.000     .0406349    .0777228        ageC6 |   .0695841   .0124641     5.58   0.000     .0451535    .0940146        _cons |    4.84196   .5199612     9.31   0.000     3.822799     5.86112 -------------+----------------------------------------------------------------       /sigma |   2.107472   .0135601                      2.080893    2.134051 ------------------------------------------------------------------------------   Obs. summary:        140  left-censored observations at wordsum<=0                      20698     uncensored observations                       1318 right-censored observations at wordsum>=10```

tobit wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 age sex ageC2 ageC3 ageC4 ageC5 ageC6 [pweight = weight], ll(0) ul(10)

Code:
```Tobit regression                                  Number of obs   =      22156                                                   F(  13,  22143) =     123.37                                                   Prob > F        =     0.0000 Log pseudolikelihood = -47868.274                 Pseudo R2       =     0.0170 ------------------------------------------------------------------------------              |               Robust      wordsum |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] -------------+----------------------------------------------------------------          bw1 |   1.414415     .04254    33.25   0.000     1.331034    1.497797 cohortdummy2 |  -.7754692   .5424091    -1.43   0.153     -1.83863    .2876911 cohortdummy3 |  -1.182844    .519948    -2.27   0.023    -2.201979   -.1637094 cohortdummy4 |  -1.889955   .5191508    -3.64   0.000    -2.907528   -.8723828 cohortdummy5 |  -2.448969   .5250497    -4.66   0.000    -3.478104   -1.419835 cohortdummy6 |  -2.410204   .5602023    -4.30   0.000     -3.50824   -1.312167          age |  -.0131711   .0082734    -1.59   0.111    -.0293876    .0030455          sex |   .1779512    .031749     5.60   0.000      .115721    .2401814        ageC2 |   .0153997   .0091492     1.68   0.092    -.0025334    .0333328        ageC3 |   .0311452   .0087811     3.55   0.000     .0139337    .0483567        ageC4 |   .0425241   .0089401     4.76   0.000     .0250008    .0600474        ageC5 |   .0600193   .0095079     6.31   0.000     .0413832    .0786553        ageC6 |   .0708083   .0125489     5.64   0.000     .0462115    .0954051        _cons |   5.392481   .5070554    10.63   0.000     4.398616    6.386345 -------------+----------------------------------------------------------------       /sigma |   2.110143   .0135794                      2.083527     2.13676 ------------------------------------------------------------------------------   Obs. summary:        140  left-censored observations at wordsum<=0                      20698     uncensored observations                       1318 right-censored observations at wordsum>=10```

tobit wordsum cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 age sex ageC2 ageC3 ageC4 ageC5 ageC6 [pweight = weight], ll(0) ul(10)

Code:
```Tobit regression                                  Number of obs   =      23817                                                   F(  12,  23805) =      40.13                                                   Prob > F        =     0.0000 Log pseudolikelihood =   -52876.4                 Pseudo R2       =     0.0054 ------------------------------------------------------------------------------              |               Robust      wordsum |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] -------------+---------------------------------------------------------------- cohortdummy2 |  -.6895328   .5584058    -1.23   0.217    -1.784044    .4049781 cohortdummy3 |  -1.159271   .5356257    -2.16   0.030    -2.209131   -.1094102 cohortdummy4 |   -1.81035    .534596    -3.39   0.001    -2.858192   -.7625077 cohortdummy5 |  -2.206941    .538878    -4.10   0.000    -3.263176   -1.150705 cohortdummy6 |  -2.340127   .5706149    -4.10   0.000    -3.458569   -1.221686          age |  -.0126966    .008511    -1.49   0.136    -.0293787    .0039854          sex |   .1219685    .031794     3.84   0.000     .0596502    .1842868        ageC2 |   .0124503   .0094042     1.32   0.186    -.0059825    .0308831        ageC3 |   .0280788   .0090236     3.11   0.002     .0103921    .0457656        ageC4 |   .0368479   .0091736     4.02   0.000     .0188671    .0548287        ageC5 |   .0455797   .0096493     4.72   0.000     .0266664    .0644929        ageC6 |   .0533736   .0123475     4.32   0.000     .0291717    .0775754        _cons |   6.736302   .5206905    12.94   0.000     5.715715    7.756888 -------------+----------------------------------------------------------------       /sigma |   2.197996   .0134352                      2.171662    2.224329 ------------------------------------------------------------------------------   Obs. summary:        184  left-censored observations at wordsum<=0                      22282     uncensored observations                       1351 right-censored observations at wordsum>=10```

So, how do we interpret this outcome ? It seems to me that the age gap becomes larger at later cohort, because you have positive coefficients that become stronger over time. When you controlling for age*cohort interaction, the cohort effect is negative. That is, the wordsum score for the entire group diminishes over time. There is still a meaningful black-white narrowing.
Quote:One important change is the removal (at least temporarily) of my logistic regression analysis. I know that MacCallum et al. (2002) have already treated the practice of dichotomization of a continuous variable. They say it has problems because it lowers the reliability of the variable and can possibly alter the interpretation of it. One illustration can help to understand. Imagine you have 4 people with different level of fear about spider, A (100%), B (60%), C (40%), D (0%). You dichotomize the variable at the mean or median, so that A and B have value of 1 (fear) while C and D have value of 0 (no fear) and yet B and C are more alike than either A and B or C and D. This labeling is totally arbitrary and not justified. However, these authors applied this criticism to correlational, ANOVA and regression analyses. I was using logistic regression, which attempt to estimate the likelihood of having value of 1 versus 0. But now that I think about it, I'm not so sure about its robustness. One can still argue that my labeling (0-7 vs 8-10) is arbitrary but at the same time, the categories of my dummy variables must also be arbitrary. So, I will email MacCallum and ask him what he thinks about it. Of course, I know lot of people in recent papers conducted such dichotomization for logistic regression, but none of them have cited MacCallum et al. In light of this, I have replaced this analysis by another; I computed the d gap of wordlow and wordhigh, by dividing the black-white difference by the SD given in Table 4.

You could explore the effect of dichotomizing it in different places. You used 0-7 vs. 8-10. You could try 0-6 vs. 7-10, 0-5 vs. 6-10 (the most 'natural' since it is split evenly along the scale), and 0-8 vs. 9-10.

Quote: Generally, I read that people accept the idea that a variable is (can be) thought as continuous when it has at least 5 values. In the variables you mentioned, they have more than 5 values.

Look at the variable for region. It is:
Quote:The variable reg16 has the following values; 0=Foreign, 1=New England, 2=Middle Atlantic, 3=East North Central, 4=West North Central, 5=South Atlantic, 6=East South Atlantic, 7=West South Atlantic, 8=Mountain, 9=Pacific.

What does it mean to be higher in this variable? What does it mean to be lower? Nothing. These are not places along a scale of something. It is a nominal variable. Using it as a continuous variable is nonsense.

Quote:The variable family16 has the following values; 0=other arrangement with relatives (e.g., aunt, uncle, grandparents), 1=mother & father, 2=father & stepmother, 3=mother & stepfather, 4=father, 5=mother, 6=male relative, 7=female relative, 8=male & female relatives.

Same for this. There is no answer to the question "What does it mean to be higher in family16?". It is because it is a nominal variable.

Finally,
Quote:he variable res16 has the following values; 1= in open country but not on a farm, 2=on a farm, 3=town lower than 50,000, 4=50,000 to 250,000, 5=in a suburb near a big city, 6=city greater than 250,000.

Is fine because: 1) there are 5 or more possible values, 2) there is a sensible answer to the question "What does it mean to be higher in res16?" The answer is that the higher one is in res16, the more people lives around oneself. Or reversely, the lower, the less people live around oneself. Or one could answer it with density of people in the area, etc. There are sensible answers. Variables like this one are called "quasi-continuous" (or "quasi-interval") because they are not quite continuous (every real number between min and max value is possible), but they are sort of continuous because there are a number of possible values between AND because interpretation of it as a scale is sensible.

Scales of measurement are usually discussed in the beginning of introductory statistics books. There is one in this book, section 2.2: http://health.adelaide.edu.au/psychology...ching/lsr/
Emil,

Quote:What does it mean to be higher in this variable? What does it mean to be lower? Nothing. These are not places along a scale of something. It is a nominal variable. Using it as a continuous variable is nonsense.

When you control for any variable, it's obvious that there are two things. One is that you adjust for differences in subgroups. For example, one subgroup (father) can have lower mean IQ than another subgroup (mother). If you adjust for it, and whatever the order of the values, the interpretation of the other coefficients won't change. However, and this is the second thing, in the variables you mentioned, e.g., res16, reg16, family16, a higher value obviously has no meaning. Thus, their coefficients have no meaning. But I'm not interested in these things. I don't care about what their coefficient is. I care only about BW changes over time.

For the logistic regression, I can obviously use different splitting. Obviously, one problem with my split is that 5-7 are not "low score" but medium or high. So, perhaps, try 0-4 vs 8-10 or 7-10. If I get similar results, perhaps it would mean that such dichotomization had not distorted the construct of interest. But what if there is a difference ? There are so many possible dichotomization that I don't know if it's relevant anymore. I will see if I can reach some agreement with MacCallum.
I don't think it even works as a control. You need to divide them up into dichotomous dummy variables to control for them in MR I think.
When you control for a given variable, the other coefficients are expressed at the mean of the controled variable. Say, you control for dichotomized race (1;2). In that case, the other coefficients are expressed as if race is equal to 1.5. If race is coded 0;1, then it's 0.5 (note it's also how it works in ANCOVA). In most cases, when you recode your variable, you'll likely get similar estimates for the other (non-recoded) coefficients. The thing that may be subjected to large change is the intercept, especially if you reverse code the original variable.

I have coded family16 differently (e.g., assigned 0 instead of 5, 1 instead of 2, etc.) but that didn't change the results.

Here's a try.

keep if age<70
gen weight = wtssall*oversamp
gen blackwhite2000after=1 if year>=2000 & race==1 & hispanic==1
replace blackwhite2000after=0 if year>=2000 & race==2 & hispanic==1
gen blackwhite2000before=1 if year<2000 & race==1
replace blackwhite2000before=0 if year<2000 & race==2
gen bw1 = max(blackwhite2000after,blackwhite2000before)
gen income = realinc
replace income = . if income==0
replace educ = . if educ>20
replace degree = . if degree>4
replace sibs = . if sibs==-1
replace sibs = . if sibs>37
replace res16 = . if res16==0
replace res16 = . if res16>=8
replace family16 = . if family16==-1
replace family16 = . if family16==9
replace wordsum = . if wordsum<0
replace wordsum = . if wordsum>10
replace cohort = . if cohort==0
replace cohort = . if cohort==9999
recode cohort (1905/1928=1) (1929/1943=2) (1944/1953=3) (1954/1962=4) (1963/1973=5) (1974/1994=6), generate(cohort6)
replace cohort6 = . if cohort6>6
tabulate cohort6, gen(cohortdummy)
gen bwC1 = bw1*cohortdummy1
gen bwC2 = bw1*cohortdummy2
gen bwC3 = bw1*cohortdummy3
gen bwC4 = bw1*cohortdummy4
gen bwC5 = bw1*cohortdummy5
gen bwC6 = bw1*cohortdummy6
gen familyrecode = .
replace familyrecode = 0 if family16==0
replace familyrecode = 1 if family16==8
replace familyrecode = 2 if family16==7
replace familyrecode = 3 if family16==6
replace familyrecode = 4 if family16==5
replace familyrecode = 5 if family16==4
replace familyrecode = 6 if family16==3
replace familyrecode = 7 if family16==2
replace familyrecode = 8 if family16==1
gen familyrecode1 = .
replace familyrecode1 = 0 if family16==5
replace familyrecode1 = 1 if family16==2
replace familyrecode1 = 2 if family16==7
replace familyrecode1 = 3 if family16==3
replace familyrecode1 = 4 if family16==1
replace familyrecode1 = 5 if family16==0
replace familyrecode1 = 6 if family16==8
replace familyrecode1 = 7 if family16==4
replace familyrecode1 = 8 if family16==6
gen familyrecode2 = .
replace familyrecode2 = 0 if family16==0
replace familyrecode2 = 1 if family16==4
replace familyrecode2 = 2 if family16==7
replace familyrecode2 = 3 if family16==3
replace familyrecode2 = 4 if family16==6
replace familyrecode2 = 5 if family16==5
replace familyrecode2 = 6 if family16==8
replace familyrecode2 = 7 if family16==1
replace familyrecode2 = 8 if family16==2
gen reg = .
replace reg = 0 if reg16==3
replace reg = 1 if reg16==5
replace reg = 2 if reg16==7
replace reg = 3 if reg16==6
replace reg = 4 if reg16==8
replace reg = 5 if reg16==4
replace reg = 6 if reg16==0
replace reg = 7 if reg16==2
replace reg = 8 if reg16==1
replace reg = 9 if reg16==9

regress wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex logincome degree educ reg16 res16 family16 sibs [pweight = weight], beta

Code:
```Linear regression                                      Number of obs =   20226                                                        F( 20, 20205) =  364.79                                                        Prob > F      =  0.0000                                                        R-squared     =  0.3117                                                        Root MSE      =   1.697 ------------------------------------------------------------------------------              |               Robust      wordsum |      Coef.   Std. Err.      t    P>|t|                     Beta -------------+----------------------------------------------------------------          bw1 |   1.095699   .1468845     7.46   0.000                 .1777428 cohortdummy2 |  -.2432754   .1667967    -1.46   0.145                -.0467582 cohortdummy3 |  -.1327666   .1610207    -0.82   0.410                -.0273688 cohortdummy4 |  -.4442166   .1630993    -2.72   0.006                  -.08921 cohortdummy5 |  -.3711671   .1662279    -2.23   0.026                -.0669138 cohortdummy6 |  -.1091146   .1879013    -0.58   0.561                -.0148669         bwC2 |  -.0214332   .1742682    -0.12   0.902                 -.003947         bwC3 |  -.1655899   .1658092    -1.00   0.318                -.0327039         bwC4 |  -.1458145   .1663234    -0.88   0.381                -.0277045         bwC5 |    -.40005   .1687139    -2.37   0.018                -.0673835         bwC6 |  -.4935015   .1931291    -2.56   0.011                -.0609301          age |   .0060567   .0014348     4.22   0.000                 .0408087          sex |   .2701488   .0266859    10.12   0.000                 .0658305    logincome |   .1710022   .0158248    10.81   0.000                 .0794625       degree |    .175303   .0268235     6.54   0.000                   .09789         educ |   .2571534   .0118184    21.76   0.000                 .3546749        reg16 |  -.0073669   .0058151    -1.27   0.205                -.0088605        res16 |    .095045   .0090247    10.53   0.000                 .0711172     family16 |   .0025916   .0079647     0.33   0.745                 .0022032         sibs |  -.0465609   .0046251   -10.07   0.000                -.0685357        _cons |   -.581039   .2460116    -2.36   0.018                        . ------------------------------------------------------------------------------```

regress wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex logincome degree educ reg16 res16 familyrecode sibs [pweight = weight], beta

Code:
```Linear regression                                      Number of obs =   20226                                                        F( 20, 20205) =  365.08                                                        Prob > F      =  0.0000                                                        R-squared     =  0.3118                                                        Root MSE      =   1.697 ------------------------------------------------------------------------------              |               Robust      wordsum |      Coef.   Std. Err.      t    P>|t|                     Beta -------------+----------------------------------------------------------------          bw1 |   1.098148   .1468342     7.48   0.000                   .17814 cohortdummy2 |  -.2428329   .1667948    -1.46   0.145                -.0466731 cohortdummy3 |  -.1326954   .1610526    -0.82   0.410                -.0273542 cohortdummy4 |  -.4447821   .1631311    -2.73   0.006                -.0893236 cohortdummy5 |  -.3724344   .1662453    -2.24   0.025                -.0671423 cohortdummy6 |  -.1116337   .1879246    -0.59   0.552                -.0152101         bwC2 |  -.0224281    .174277    -0.13   0.898                -.0041302         bwC3 |  -.1664026   .1658642    -1.00   0.316                -.0328644         bwC4 |  -.1466234   .1663764    -0.88   0.378                -.0278582         bwC5 |  -.4007139   .1687489    -2.37   0.018                -.0674953         bwC6 |  -.4932983   .1931479    -2.55   0.011                 -.060905          age |   .0060158   .0014359     4.19   0.000                 .0405335          sex |   .2700135   .0266828    10.12   0.000                 .0657975    logincome |   .1715282   .0158549    10.82   0.000                  .079707       degree |   .1755253   .0268157     6.55   0.000                 .0980142         educ |   .2573497   .0118336    21.75   0.000                 .3549456        reg16 |   -.007438   .0058213    -1.28   0.201                -.0089461        res16 |   .0948378   .0090363    10.50   0.000                 .0709622 familyrecode |  -.0051175   .0069678    -0.73   0.463                -.0050225         sibs |  -.0466206   .0046247   -10.08   0.000                -.0686237        _cons |  -.5469528   .2459031    -2.22   0.026                        . ------------------------------------------------------------------------------```

regress wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex logincome degree educ reg16 res16 familyrecode1 sibs [pweight = weight], beta

Code:
```Linear regression                                      Number of obs =   20226                                                        F( 20, 20205) =  365.04                                                        Prob > F      =  0.0000                                                        R-squared     =  0.3118                                                        Root MSE      =  1.6969 -------------------------------------------------------------------------------               |               Robust       wordsum |      Coef.   Std. Err.      t    P>|t|                     Beta --------------+----------------------------------------------------------------           bw1 |   1.095838   .1465374     7.48   0.000                 .1777654 cohortdummy2 |  -.2452279   .1667372    -1.47   0.141                -.0471335 cohortdummy3 |   -.139467   .1609478    -0.87   0.386                -.0287501 cohortdummy4 |  -.4513169   .1630004    -2.77   0.006                -.0906359 cohortdummy5 |  -.3816927   .1661727    -2.30   0.022                -.0688114 cohortdummy6 |  -.1194981   .1878523    -0.64   0.525                -.0162816          bwC2 |  -.0207212   .1742131    -0.12   0.905                -.0038159          bwC3 |  -.1597148   .1657395    -0.96   0.335                -.0315436          bwC4 |  -.1399076   .1662173    -0.84   0.400                -.0265822          bwC5 |  -.3924364    .168619    -2.33   0.020                 -.066101          bwC6 |  -.4848992    .193071    -2.51   0.012                 -.059868           age |   .0060752   .0014345     4.24   0.000                 .0409337           sex |   .2694222   .0266878    10.10   0.000                 .0656535     logincome |   .1712699   .0158322    10.82   0.000                 .0795869        degree |   .1756636   .0268296     6.55   0.000                 .0980914          educ |   .2570987   .0118139    21.76   0.000                 .3545995         reg16 |  -.0075259   .0058152    -1.29   0.196                -.0090518         res16 |   .0943371   .0090397    10.44   0.000                 .0705875 familyrecode1 |  -.0137886   .0090215    -1.53   0.126                -.0102153          sibs |  -.0466329   .0046277   -10.08   0.000                -.0686417         _cons |  -.5246691   .2463459    -2.13   0.033                        . -------------------------------------------------------------------------------```

regress wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex logincome degree educ reg16 res16 familyrecode2 sibs [pweight = weight], beta

Code:
```Linear regression                                      Number of obs =   20226                                                        F( 20, 20205) =  365.08                                                        Prob > F      =  0.0000                                                        R-squared     =  0.3118                                                        Root MSE      =  1.6969 -------------------------------------------------------------------------------               |               Robust       wordsum |      Coef.   Std. Err.      t    P>|t|                     Beta --------------+----------------------------------------------------------------           bw1 |   1.097943   .1464629     7.50   0.000                 .1781067 cohortdummy2 |  -.2424547    .166609    -1.46   0.146                -.0466004 cohortdummy3 |  -.1341314   .1608393    -0.83   0.404                -.0276502 cohortdummy4 |  -.4472196   .1629636    -2.74   0.006                -.0898131 cohortdummy5 |  -.3764029   .1660956    -2.27   0.023                -.0678577 cohortdummy6 |  -.1174024   .1877553    -0.63   0.532                -.0159961          bwC2 |  -.0236267   .1741071    -0.14   0.892                -.0043509          bwC3 |  -.1661975   .1656591    -1.00   0.316                -.0328239          bwC4 |  -.1462968   .1662056    -0.88   0.379                -.0277961          bwC5 |  -.4005288   .1685777    -2.38   0.018                -.0674641          bwC6 |  -.4930299   .1929912    -2.55   0.011                -.0608718           age |   .0059868    .001435     4.17   0.000                 .0403379           sex |   .2694145   .0266841    10.10   0.000                 .0656516     logincome |   .1720981   .0158374    10.87   0.000                 .0799718        degree |   .1757301   .0268169     6.55   0.000                 .0981285          educ |   .2576631   .0118325    21.78   0.000                 .3553779         reg16 |  -.0076389   .0058247    -1.31   0.190                -.0091877         res16 |   .0944661   .0090305    10.46   0.000                  .070684 familyrecode2 |  -.0113374   .0079373    -1.43   0.153                -.0094531          sibs |   -.046764   .0046238   -10.11   0.000                -.0688347         _cons |  -.5154476    .246452    -2.09   0.036                        . -------------------------------------------------------------------------------```

regress wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex logincome degree educ reg res16 family16 sibs [pweight = weight], beta

Code:
```Linear regression                                      Number of obs =   20226                                                        F( 20, 20205) =  365.16                                                        Prob > F      =  0.0000                                                        R-squared     =  0.3128                                                        Root MSE      =  1.6957 ------------------------------------------------------------------------------              |               Robust      wordsum |      Coef.   Std. Err.      t    P>|t|                     Beta -------------+----------------------------------------------------------------          bw1 |   1.077054   .1463924     7.36   0.000                 .1747182 cohortdummy2 |  -.2437178   .1663023    -1.47   0.143                -.0468432 cohortdummy3 |  -.1299419   .1604554    -0.81   0.418                -.0267866 cohortdummy4 |  -.4435006   .1626352    -2.73   0.006                -.0890662 cohortdummy5 |  -.3727247   .1657568    -2.25   0.025                -.0671946 cohortdummy6 |  -.1158754   .1878084    -0.62   0.537                 -.015788         bwC2 |   -.024472   .1738017    -0.14   0.888                -.0045066         bwC3 |  -.1737067   .1652074    -1.05   0.293                 -.034307         bwC4 |  -.1513894   .1658148    -0.91   0.361                -.0287637         bwC5 |  -.4049843   .1681489    -2.41   0.016                -.0682146         bwC6 |   -.491967   .1930388    -2.55   0.011                -.0607406          age |   .0060645   .0014337     4.23   0.000                 .0408615          sex |   .2691577   .0266842    10.09   0.000                  .065589    logincome |   .1697286   .0157905    10.75   0.000                 .0788707       degree |   .1759572   .0267701     6.57   0.000                 .0982553         educ |   .2558024    .011793    21.69   0.000                 .3528116          reg |   .0218362    .004184     5.22   0.000                 .0338935        res16 |   .0910688   .0090408    10.07   0.000                  .068142     family16 |   .0012812   .0079736     0.16   0.872                 .0010892         sibs |  -.0462154   .0046184   -10.01   0.000                -.0680273        _cons |  -.6298851   .2438412    -2.58   0.010                        . ------------------------------------------------------------------------------```

regress wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex logincome degree educ reg res16 familyrecode1 sibs [pweight = weight], beta

Code:
```Linear regression                                      Number of obs =   20226                                                        F( 20, 20205) =  365.47                                                        Prob > F      =  0.0000                                                        R-squared     =  0.3129                                                        Root MSE      =  1.6956 -------------------------------------------------------------------------------               |               Robust       wordsum |      Coef.   Std. Err.      t    P>|t|                     Beta --------------+----------------------------------------------------------------           bw1 |   1.078355   .1460574     7.38   0.000                 .1749292 cohortdummy2 |  -.2455091   .1662688    -1.48   0.140                -.0471875 cohortdummy3 |  -.1358898   .1604158    -0.85   0.397                -.0280127 cohortdummy4 |  -.4500099   .1625733    -2.77   0.006                -.0903734 cohortdummy5 |  -.3824976    .165738    -2.31   0.021                -.0689565 cohortdummy6 |  -.1258436   .1877791    -0.67   0.503                -.0171462          bwC2 |  -.0238513   .1737711    -0.14   0.891                -.0043923          bwC3 |  -.1685687   .1651688    -1.02   0.307                -.0332922          bwC4 |  -.1461107   .1657415    -0.88   0.378                -.0277608          bwC5 |  -.3982269   .1680839    -2.37   0.018                -.0670764          bwC6 |  -.4841555   .1929902    -2.51   0.012                -.0597762           age |   .0060783   .0014334     4.24   0.000                 .0409543           sex |   .2684576   .0266865    10.06   0.000                 .0654184     logincome |   .1700912   .0157964    10.77   0.000                 .0790392        degree |   .1764089   .0267762     6.59   0.000                 .0985075          educ |   .2557707   .0117892    21.70   0.000                 .3527678           reg |   .0217097   .0041827     5.19   0.000                 .0336972         res16 |   .0904257   .0090558     9.99   0.000                 .0676608 familyrecode1 |  -.0123615    .009031    -1.37   0.171                 -.009158          sibs |  -.0462869    .004621   -10.02   0.000                -.0681324         _cons |  -.5837063   .2442501    -2.39   0.017                        . -------------------------------------------------------------------------------```

regress wordsum bw1 cohortdummy2 cohortdummy3 cohortdummy4 cohortdummy5 cohortdummy6 bwC2 bwC3 bwC4 bwC5 bwC6 age sex logincome degree educ reg res16 familyrecode2 sibs [pweight = weight], beta

Code:
```Linear regression                                      Number of obs =   20226                                                        F( 20, 20205) =  365.46                                                        Prob > F      =  0.0000                                                        R-squared     =  0.3128                                                        Root MSE      =  1.6956 -------------------------------------------------------------------------------               |               Robust       wordsum |      Coef.   Std. Err.      t    P>|t|                     Beta --------------+----------------------------------------------------------------           bw1 |   1.080223   .1460067     7.40   0.000                 .1752322 cohortdummy2 |  -.2430345   .1661583    -1.46   0.144                -.0467119 cohortdummy3 |  -.1310357   .1603239    -0.82   0.414                 -.027012 cohortdummy4 |  -.4462279   .1625364    -2.75   0.006                -.0896139 cohortdummy5 |  -.3776067   .1656619    -2.28   0.023                -.0680748 cohortdummy6 |  -.1237979   .1877031    -0.66   0.510                -.0168675          bwC2 |  -.0264528   .1736817    -0.15   0.879                -.0048713          bwC3 |  -.1744731   .1651051    -1.06   0.291                -.0344583          bwC4 |    -.15195   .1657303    -0.92   0.359                -.0288702          bwC5 |  -.4056042   .1680526    -2.41   0.016                 -.068319          bwC6 |  -.4915503   .1929303    -2.55   0.011                -.0606892           age |       .006    .001434     4.18   0.000                 .0404265           sex |   .2684602   .0266825    10.06   0.000                  .065419     logincome |   .1708242    .015807    10.81   0.000                 .0793798        degree |   .1764796   .0267638     6.59   0.000                  .098547          educ |    .256259   .0118057    21.71   0.000                 .3534413           reg |   .0217382   .0041794     5.20   0.000                 .0337414         res16 |   .0905556   .0090466    10.01   0.000                  .067758 familyrecode2 |  -.0099752   .0079317    -1.26   0.209                -.0083173          sibs |  -.0464019   .0046177   -10.05   0.000                -.0683017         _cons |  -.5769559   .2440902    -2.36   0.018                        . -------------------------------------------------------------------------------```

Quote:Say, you control for dichotomized race (1;2). In that case, the other coefficients are expressed as if race is equal to 1.5. If race is coded 0;1, then it's 0.5 (note it's also how it works in ANCOVA).

Only if the Ns of the two racial groups are equal, no?

If you look at your estimated beta's, they are not the same:

reg16:
family16 .0022032 reg16 -.0088605
familyrecode -.0050225 reg16 -.0089461
familyrecode1 -.0102153 reg16 -.0090518
familyrecode2 -.0094531 reg16 -.0091877

reg:
family16 .0010892 reg .0338935
familyrecode1 reg .0336972
familyrecode2 reg -.0083173
(u forgot familyrecode + reg)

The results are all similar because these variables are probably not very important. However, they are not the same because MR is treating them as interval variables, and setting the 'mean' (also nonsense) to different values.

The overall results are not changed much. R2's all near .31. You could correlate the betas from each model to see how similar they are. I did it for model1 x model2.

Code:
```model1.betas = c(.1777428,-.0467582,-.0273688,-.08921,-.0669138,-.0148669,-.003947,            -.0327039,-.0277045,-.0673835,-.0609301,.0408087,.0658305,.0794625,            .09789,.3546749,-.0088605,.0711172,.0022032,-.0685357) model2.betas = c(.17814,-.0466731,-.0273542,-.0893236,-.0671423,-.0152101,-.0041302,                  -.0328644,-.0278582,-.0674953,-.060905,.0405335,.0657975,.079707,                  .0980142,.3549456,-.0089461,.0709622,-.0050225,-.0686237) cor(model1.betas,model2.betas) [1] 0.9998823```

So, yes, it seems to be not worth bothering about.
(2014-Oct-09, 04:01:39)Emil Wrote: Only if the Ns of the two racial groups are equal, no?

I don't think the N is the problem. See here for example :
http://menghublog.wordpress.com/2014/03/...egression/

(2014-Oct-09, 04:01:39)Emil Wrote: If you look at your estimated beta's, they are not the same:

reg16:
family16 .0022032 reg16 -.0088605
familyrecode -.0050225 reg16 -.0089461
familyrecode1 -.0102153 reg16 -.0090518
familyrecode2 -.0094531 reg16 -.0091877

reg:
family16 .0010892 reg .0338935
familyrecode1 reg .0336972
familyrecode2 reg -.0083173
(u forgot familyrecode + reg)

The results are all similar because these variables are probably not very important. However, they are not the same because MR is treating them as interval variables, and setting the 'mean' (also nonsense) to different values.

I know that family16 and reg16 have very low correlation. I have chosen these variables because you have problems with them. Hopefully, Stata gives the 7 digits after zero, so you can look at the beta with much precision. They are not perfectly identical, of course, but they are similar. In the numbers you show me, it's only familyrecode2 that is different than the other "family" variables. However, I'm not surprised. I did not say that the coefficients of family16 and reg16 would be (almost) the same. I said the coefficients of other variables will remain almost the same. I see it's true. Of course, the fact that family16 and reg16 have low correlations with wordsum helped a little bit, as you noted.
What were the gaps for each age group in the first and last survey year?

Forum Jump:

Users browsing this thread: 1 Guest(s)