Hello There, Guest!  
 Previous 1 2 3 4 5 8 Next   

[ODP] Crime, income and employment among immigrant groups in Norway and Finland

#21
I have a good knowledge on imputation. If you want I can guide toward the best methods. Not all imputations are equivalent. Some are adequate only in some type of data. What's the data you use ? Is it the file named "dataset.csv" that you uploaded here ? Because I'm not sure i can recommend the use of imputation. I have explained that here. Your variables must be highly correlated (approaching at least 0.40 or so). The ratio of subjects/variables (put in the imputation procedure) should not fall below 10/1. Alternatively, Hardt et al. (2012) recommend a maximum ratio of 1:3 for variables (with or without auxiliaries) against complete cases, that is, for 60 people having complete data, up to 20 variables could be used. The auxiliary variables are those that can be substitute because of their high correlation, such as, identical variables measured at different points in time (i.e., repeated measures). If the % of missing cases is too high, such as in your variables ViolentCrimeNorway, LarcenyNorway, ViolentCrimeFinland, and LarcenyFinland, I can tell you'll have big troubles.

Besides, the superiority of imputation over complete case is not restricted to FA, but extend to all kind of analyses. For example, in that case, multiple regression is inefficient, and then nearly 100% of this kind of analyses published in various other journals should be wrong, because they almost never apply imputation. Either because they don't know a thing about it or because it's time consuming. For example, 5 imputation minimum is recommended, but can be higher given the features of your data. But with 5 imputation, you will need to run the analysis five times with each of the imputed data set, and then, average the results, and as is recommended, you should also provide the standard error or CI, or standard deviation, to let the readers know about how much the estimates vary over the imputation. If your estimates vary too much, that may be a problem, a signals that your estimate is not stable, and that maybe, you'll need more imputation, 10, 20, 30, etc. But repeating the analysis 30 times with your 30 dataset is something researchers don't want to do. And I understand that...

Personally, I prefer maximum likelihood (ML) estimation, because multiple imputation (MI) gives me some headache about choosing which kind of MI is good depending on the data you have. If you have AMOS, or R, you can easily use ML.

*****

In your last sentence of your paper, "All datasets and source code is available in the supplementary materials.", it's optional but I recommend you to add "on OpenPsych Forum".
 Reply
#22
Here is my current code. Updated dataset file is attached.

Code:
setwd("Z:/Code/database")
read = read.csv("Megadataset_v1.4.csv")
colnames(read)

library(Hmisc) # for rcorr
library(car) # for scatterplot
library(stats) #for automatic multiple regression
library(psych) #for r.test
library(XLConnect) #writing to xls
library(nFactors) #how many factors to extract
library(mi) #imputation


DF.work = cbind(read["Norway.OutOfWork.2010Q2.men"], #for work data
          read["Norway.OutOfWork.2011Q2.men"],
          read["Norway.OutOfWork.2012Q2.men"],
          read["Norway.OutOfWork.2013Q2.men"],
          read["Norway.OutOfWork.2014Q2.men"],
          read["Norway.OutOfWork.2010Q2.women"],
          read["Norway.OutOfWork.2011Q2.women"],
          read["Norway.OutOfWork.2012Q2.women"],
          read["Norway.OutOfWork.2013Q2.women"],
          read["Norway.OutOfWork.2014Q2.women"])

DF.income = cbind(read["Norway.Income.index.2009"], #for income data
                  read["Norway.Income.index.2010"],
                  read["Norway.Income.index.2011"],
                  read["Norway.Income.index.2012"])

#make DF
DF = cbind(read["LV2012estimatedIQ"],
           read["Altinok2013ACH"],
           read["IslamPewResearch2010"],
           log(read["GDPpercapitaWorldBank2013"]),
           read["S.scores"],
           read["NorwayViolentCrimeAdjustedOddsRatioSkardhamar2014"],
           read["FinlandViolentCrimeAdjustedOddsRatioSkardhamar2014"],
           read["NorwayLarcenyAdjustedOddsRatioSkardhamar2014"],
           read["FinlandLarcenyAdjustedOddsRatioSkardhamar2014"],
           read["Norway.tertiary.edu.att.2013"],
           read["Norway.tertiary.edu.att.bigsamples.2013"])

  
#get 5 year means
DF["OutOfWork.2010to2014.men"] = apply(DF.work[1:5],1,mean,na.rm=T) #get means, ignore missing
DF["OutOfWork.2010to2014.women"] = apply(DF.work[6:10],1,mean,na.rm=T) #get means, ignore missing

#get means for income and add to DF
DF["Income.index.2009to2012"] = apply(DF.income[1:4],1,mean,na.rm=T) #get means, ignore missing

#compare islam vars
DF.islam = as.data.frame(cbind(read["IslamPewResearch1990"],
                               read["IslamPewResearch2010"],
                               read["IslamPewResearch2030Projected"],
                               read["Islam"]))

#correlation matrix
DF.cor = rcorr(as.matrix(DF)) #create correlation matrix with pairwise miss data deleted
round(DF.cor$r,2)

#write results to xlsx file
writeWorksheetToFile(file = "correlations_Norway2014.xlsx", data = round(DF.cor$r,2), sheet = "cors")
writeWorksheetToFile(file = "correlations_Norway2014.xlsx", data = round(DF.cor$P,4), sheet = "p")
writeWorksheetToFile(file = "correlations_Norway2014.xlsx", data = DF.cor$n, sheet = "n")

#are the same vars hard/easy to predict across predictors?
IQ.cors = DF.cor$r[6:nrow(DF.cor$r),1]
Altinok.cors = DF.cor$r[6:nrow(DF.cor$r),2]
Islam.cors = DF.cor$r[6:nrow(DF.cor$r),3]
GDP.cors = DF.cor$r[6:nrow(DF.cor$r),4]
S.cors = DF.cor$r[6:nrow(DF.cor$r),5]

DF.predict = as.data.frame(cbind(IQ.cors,Altinok.cors,Islam.cors,GDP.cors,S.cors))
DF.predict.cor = rcorr(as.matrix(DF.predict)) #are there clear patterns in how well the vars are predictable? YES!

#factor analysis
#subset data
DF.norway = DF[c("NorwayViolentCrimeAdjustedOddsRatioSkardhamar2014", #only norwegian vars
                 "NorwayLarcenyAdjustedOddsRatioSkardhamar2014",
                 "Norway.tertiary.edu.att.bigsamples.2013", #skip the small ed. att var
                 "OutOfWork.2010to2014.men",
                 "OutOfWork.2010to2014.women",
                 "Income.index.2009to2012")]

#handle missing values
DF.norway.complete = DF.norway[complete.cases(DF.norway),] #complete cases only, reduces N to 15

#count NA's
DF.norway.missing = apply(DF.norway, 1, is.na) #produces a col with T/F for each case
DF.norway.missing = apply(DF.norway.missing, 2, sum) #sums the number of missing per col
DF.norway.missing.table = table(DF.norway.missing) #tabulates them

#initial info
mi.info = mi.info(DF.norway)
missing.pattern.plot(DF.norway) #visual depiction of missing values
mp.plot(DF.norway, y.order = TRUE, x.order = TRUE)

#subsets
DF.norway.miss.1 = DF.norway[DF.norway.missing <= 1,] #keep data with 1 or less missing values, N=18
DF.norway.miss.2 = DF.norway[DF.norway.missing <= 2,] #keep data with 2 or less missing values, N=26

#impute
DF.norway.miss.1.impute = mi(DF.norway.miss.1, n.iter=200) #imputes, needs more interations
DF.norway.miss.1.imputed = mi.data.frame(DF.norway.miss.1.impute, m = 3)
DF.norway.miss.2.impute = mi(DF.norway.miss.2, n.iter=200) #imputes, needs more interations
DF.norway.miss.2.imputed = mi.data.frame(DF.norway.miss.2.impute, m = 3)

#compare desc. stats
DF.desc.stats = as.data.frame(rbind(describe(DF.norway),
                                    describe(DF.norway.complete),
                                    describe(DF.norway.miss.1.imputed),
                                    describe(DF.norway.miss.2.imputed)))
DF.desc.stats.ordered = DF.desc.stats[with(DF.desc.stats, order(vars)), ] #reorder

write.csv(DF.desc.stats.ordered, "desc_stats.csv")

#sampling tests
#bartlett's test
cortest.bartlett(DF.norway.complete)
cortest.bartlett(DF.norway.miss.1.imputed)
cortest.bartlett(DF.norway.miss.2.imputed)

#KMO function
kmo <- function(x)
{
  x <- subset(x, complete.cases(x)) # Omit missing values
  r <- cor(x) # Correlation matrix
  r2 <- r^2 # Squared correlation coefficients
  i <- solve(r) # Inverse matrix of correlation matrix
  d <- diag(i) # Diagonal elements of inverse matrix
  p2 <- (-i/sqrt(outer(d, d)))^2 # Squared partial correlation coefficients
  diag(r2) <- diag(p2) <- 0 # Delete diagonal elements
  KMO <- sum(r2)/(sum(r2)+sum(p2))
  MSA <- colSums(r2)/(colSums(r2)+colSums(p2))
  return(list(KMO=KMO, MSA=MSA))
}

#get KMO
kmo(DF.norway.complete)$KMO
kmo(DF.norway.miss.1.imputed)$KMO
kmo(DF.norway.miss.2.imputed)$KMO

#nfactors
nScree(DF.norway.complete)
nScree(DF.norway.miss.1.imputed)
nScree(DF.norway.miss.2.imputed)

#PA and ML
DF.norway.complete.pa = fa(DF.norway.complete, nfactors=1,rotate="none",scores="regression",fm="pa")
DF.norway.complete.ml = fa(DF.norway.complete, nfactors=1,rotate="none",scores="regression",fm="ml")
factor.congruence(DF.norway.complete.pa,DF.norway.complete.ml) #identical

DF.norway.miss.1.imputed.pa = fa(DF.norway.miss.1.imputed, nfactors=1,rotate="none",scores="regression",fm="pa")
DF.norway.miss.1.imputed.ml = fa(DF.norway.miss.1.imputed, nfactors=1,rotate="none",scores="regression",fm="ml")
factor.congruence(DF.norway.miss.1.imputed.pa,DF.norway.miss.1.imputed.ml) #identical

DF.norway.miss.2.imputed.pa = fa(DF.norway.miss.2.imputed, nfactors=1,rotate="none",scores="regression",fm="pa")
DF.norway.miss.2.imputed.ml = fa(DF.norway.miss.2.imputed, nfactors=1,rotate="none",scores="regression",fm="ml")
factor.congruence(DF.norway.miss.2.imputed.pa,DF.norway.miss.2.imputed.ml) #identical

DF.norway.miss.2.imputed.pa2 = fa(DF.norway.miss.2.imputed, nfactors=2,rotate="none",scores="regression",fm="pa")
DF.norway.miss.2.imputed.ml2 = fa(DF.norway.miss.2.imputed, nfactors=2,rotate="none",scores="regression",fm="ml")


#Strength of general factor
omega(DF.norway.complete)
omega(DF.norway.miss.1.imputed)
omega(DF.norway.miss.2.imputed)

#put the factor scores back into the big dataset - for SPI
Sfactor.scores = rep(NA,nrow(DF))   #make an empty vector of the right size for the factor scores
Sfactor.scores1 = rep(NA,nrow(DF))
Sfactor.scores2 = rep(NA,nrow(DF))

dims = as.integer(dimnames(DF.norway.complete.pa$scores)[[1]])    #converts the dimnames to integers
dims1 = as.integer(dimnames(DF.norway.miss.1.imputed.pa$scores)[[1]])  #converts the dimnames to integers
dims2 = as.integer(dimnames(DF.norway.miss.2.imputed.pa$scores)[[1]])  #converts the dimnames to integers

for (n in 1:nrow(DF))        #this puts the factor scores back into the big dataset
{  
  Sfactor.scores[dims[n]] = DF.norway.complete.pa$scores[n]
  Sfactor.scores1[dims1[n]] = DF.norway.miss.1.imputed.pa$scores[n]
  Sfactor.scores2[dims2[n]] = DF.norway.miss.2.imputed.pa$scores[n]
}

#reverse Sfactor scores
Sfactor.scores = Sfactor.scores*-1
Sfactor.scores1 = Sfactor.scores1*-1
Sfactor.scores2 = Sfactor.scores2*-1

#make DF
DF2 = cbind(read["LV2012estimatedIQ"],
           read["Altinok2013ACH"],
           read["IslamPewResearch2010"],
           log(read["GDPpercapitaWorldBank2013"]),
           read["S.scores"],
           Sfactor.scores,
           Sfactor.scores1,
           Sfactor.scores2)

DF2.cor = rcorr(as.matrix(DF2)) #create correlation matrix with pairwise miss data deleted
round(DF2.cor$r,2)


#visualize
scatterplot(OutOfWork.2010to2014.men ~ LV2012estimatedIQ, #predict employment from IQ
            data=DF,
            smoother=NULL, #no moving average
            labels=unlist(read["ID"]), #labels, but they dont work
            id.n = length(unlist(read["ID"])), #pointless, but needed
            xlab = "Lynn and Vanhanen national IQ",
            ylab="% of men unemployed, 2010-2014 average")


scatterplot(OutOfWork.2010to2014.women ~ IslamPewResearch2010, #predict employment from Islam
            data=DF,
            smoother=NULL, #no moving average
            labels=unlist(read["ID"]), #labels, but they dont work
            id.n = length(unlist(read["ID"])), #pointless, but needed
            xlab = "Prevalence of Islam",
            ylab="% of women unemployed, 2010-2014 average")

scatterplot(Norway.tertiary.edu.att.bigsamples.2013 ~ S.scores, #predict employment from Islam
            data=DF,
            smoother=NULL, #no moving average
            labels=unlist(read["ID"]), #labels, but they dont work
            id.n = length(unlist(read["ID"])), #pointless, but needed
            xlab = "S factor",
            ylab="fraction with long tertiary education")

scatterplot(Sfactor.scores2 ~ LV2012estimatedIQ, #predict S factor from IQ
            data=DF2,
            smoother=NULL, #no moving average
            labels=unlist(read["ID"]), #labels, but they dont work
            id.n = length(unlist(read["ID"])), #pointless, but needed
            xlab = "National IQ in home country",
            ylab="General socioeconomic factor (S) in Norway",
            main = "National IQ predicts immigrant group performance at the group level in Norway")

scatterplot(Sfactor.scores2 ~ IslamPewResearch2010, #predict S factor from Islam
            data=DF2,
            smoother=NULL, #no moving average
            labels=unlist(read["ID"]), #labels, but they dont work
            id.n = length(unlist(read["ID"])), #pointless, but needed
            xlab = "National prevalence of Islam in home country",
            ylab="General socioeconomic factor (S) in Norway",
            main = "National Islam prevalence predicts immigrant group performance at the group level in Norway")


As you can see, I am using multiple imputation. There seems to be a bug becus MI fucks up one of the variables, but only that one (Norway.tertiary.edu.att.bigsamples.2013). And the fuck-up has no effect on the factor analysis (S factors from the complete cases dataset correlates .99 or 1.00 with the ones from the imputed datasets). However, it has changed the data of the cases with actual data and changed the mean and the standard deviation. Everything else seems fine. Odd?

It can be seen in the descriptive statistics for each dataset (all data, complete cases, impute 1, impute 2).

Code:
> round(DF.desc.stats.ordered,2)
                                                   vars   n    mean      sd  median trimmed     mad     min      max
NorwayViolentCrimeAdjustedOddsRatioSkardhamar2014     1  26    1.31    0.87    1.25    1.24    1.11    0.20     3.20
NorwayViolentCrimeAdjustedOddsRatioSkardhamar20141    1  15    1.41    0.99    1.50    1.36    1.19    0.20     3.20
NorwayViolentCrimeAdjustedOddsRatioSkardhamar20142    1  18    1.33    0.93    1.15    1.28    0.96    0.20     3.20
NorwayViolentCrimeAdjustedOddsRatioSkardhamar20143    1  26    1.16    0.86    0.85    1.08    0.89   -0.01     3.20
NorwayLarcenyAdjustedOddsRatioSkardhamar2014          2  26    0.77    0.56    0.60    0.74    0.59    0.10     2.00
NorwayLarcenyAdjustedOddsRatioSkardhamar20141         2  15    0.78    0.55    0.50    0.76    0.44    0.20     1.60
NorwayLarcenyAdjustedOddsRatioSkardhamar20142         2  18    0.72    0.53    0.55    0.71    0.44    0.10     1.60
NorwayLarcenyAdjustedOddsRatioSkardhamar20143         2  26    0.60    0.51    0.47    0.58    0.33   -0.26     1.60
Norway.tertiary.edu.att.bigsamples.2013               3  67    0.12    0.08    0.11    0.12    0.09    0.01     0.31
Norway.tertiary.edu.att.bigsamples.20131              3  15    0.10    0.07    0.09    0.10    0.07    0.01     0.23
Norway.tertiary.edu.att.bigsamples.20132              3  18    0.52    0.02    0.52    0.52    0.02    0.50     0.56
Norway.tertiary.edu.att.bigsamples.20133              3  26    0.52    0.02    0.52    0.52    0.02    0.50     0.56
OutOfWork.2010to2014.men                              4 120    7.05    4.18    5.96    6.51    3.62    1.38    22.08
OutOfWork.2010to2014.men1                             4  15    7.40    5.36    5.96    6.64    4.00    2.68    22.08
OutOfWork.2010to2014.men2                             4  18    7.31    4.89    6.53    6.68    4.26    2.68    22.08
OutOfWork.2010to2014.men3                             4  26    6.88    4.30    6.32    6.20    3.38    2.66    22.08
OutOfWork.2010to2014.women                            5 120    7.50    4.97    6.30    6.75    2.95    1.32    31.82
OutOfWork.2010to2014.women1                           5  15    8.90    6.20    7.30    8.40    4.69    1.98    22.42
OutOfWork.2010to2014.women2                           5  18    8.17    5.93    6.48    7.67    4.46    1.90    22.42
OutOfWork.2010to2014.women3                           5  26    7.40    5.25    6.48    6.69    3.53    1.56    22.42
Income.index.2009to2012                               6  23   79.86   14.58   80.25   79.87   13.71   53.25   108.25
Income.index.2009to20121                              6  15   78.78   14.78   78.25   78.48   10.75   53.25   108.25
Income.index.2009to20122                              6  18 6852.23 2415.20 6500.53 6799.16 2890.42 2835.56 11718.06
Income.index.2009to20123                              6  26 6689.35 2268.99 6500.53 6635.00 2031.16 2835.56 11718.06
                                                     range  skew kurtosis     se
NorwayViolentCrimeAdjustedOddsRatioSkardhamar2014     3.00  0.55    -0.83   0.17
NorwayViolentCrimeAdjustedOddsRatioSkardhamar20141    3.00  0.39    -1.25   0.26
NorwayViolentCrimeAdjustedOddsRatioSkardhamar20142    3.00  0.57    -0.94   0.22
NorwayViolentCrimeAdjustedOddsRatioSkardhamar20143    3.21  0.77    -0.28   0.17
NorwayLarcenyAdjustedOddsRatioSkardhamar2014          1.90  0.56    -1.09   0.11
NorwayLarcenyAdjustedOddsRatioSkardhamar20141         1.40  0.38    -1.74   0.14
NorwayLarcenyAdjustedOddsRatioSkardhamar20142         1.50  0.55    -1.42   0.12
NorwayLarcenyAdjustedOddsRatioSkardhamar20143         1.86  0.70    -0.58   0.10
Norway.tertiary.edu.att.bigsamples.2013               0.30  0.42    -0.91   0.01
Norway.tertiary.edu.att.bigsamples.20131              0.22  0.39    -1.32   0.02
Norway.tertiary.edu.att.bigsamples.20132              0.05  0.50    -1.12   0.00
Norway.tertiary.edu.att.bigsamples.20133              0.06  0.75    -0.64   0.00
OutOfWork.2010to2014.men                             20.70  1.26     1.69   0.38
OutOfWork.2010to2014.men1                            19.40  1.38     1.18   1.38
OutOfWork.2010to2014.men2                            19.40  1.57     2.16   1.15
OutOfWork.2010to2014.men3                            19.42  1.80     3.72   0.84
OutOfWork.2010to2014.women                           30.50  1.92     5.11   0.45
OutOfWork.2010to2014.women1                          20.44  0.83    -0.58   1.60
OutOfWork.2010to2014.women2                          20.52  1.03    -0.04   1.40
OutOfWork.2010to2014.women3                          20.86  1.30     1.14   1.03
Income.index.2009to2012                              55.00 -0.01    -0.92   3.04
Income.index.2009to20121                             55.00  0.19    -0.78   3.82
Income.index.2009to20122                           8882.50  0.20    -0.97 569.27
Income.index.2009to20123                           8882.50  0.24    -0.74 444.99



Attached Files
.csv   Megadataset_v1.4.csv (Size: 197.93 KB / Downloads: 394)
 Reply
#23
If someone wants to use multiple imputation (MI), I have nothing against it. But (s)he seriously needs to read a lot about MI. I'm serious. When I read about that, I really gave me a headache. And I have also read some data analysts (Paul D Allison maybe?) saying the same thing : if you are not very sure about what you do, then don't do it. A lot of researchers don't know the proper way to use MI, and some have reported others used a sub-optimal option of imputation.

This said, can you tell me the meaning of this syntax ?

#impute
DF.norway.miss.1.impute = mi(DF.norway.miss.1, n.iter=200) #imputes, needs more interations
DF.norway.miss.1.imputed = mi.data.frame(DF.norway.miss.1.impute, m = 3)
DF.norway.miss.2.impute = mi(DF.norway.miss.2, n.iter=200) #imputes, needs more interations
DF.norway.miss.2.imputed = mi.data.frame(DF.norway.miss.2.impute, m = 3)

Why is there norway.miss.1 and norway.miss.2 ? Does that mean you use 2 data imputations ? I ask this because normally, in the literature, the number of imputed data set is called "m". And in your data, m=3. So, does that mean you use 3*2=6 imputations ? [edit: no, I understand now. You had 2 data sets, one with N=18 the other with N=26, so in fact you use 3 imputations]

Also, can you tell me the % of missing value per variables ? Normally, the more missing value you have, the more imputation you need.
 Reply
#24
As you can see from the code, I reduced the full dataset to three subsets. One with complete cases

Code:
DF.norway.complete = DF.norway[complete.cases(DF.norway),] #complete cases only, reduces N to 15


And two with missing values

Code:
DF.norway.miss.1 = DF.norway[DF.norway.missing <= 1,] #keep data with 1 or less missing values, N=18
DF.norway.miss.2 = DF.norway[DF.norway.missing <= 2,] #keep data with 2 or less missing values, N=26


The comments explain what I did.

The lines
Code:
#impute
DF.norway.miss.1.impute = mi(DF.norway.miss.1, n.iter=200) #imputes, needs more interations
DF.norway.miss.1.imputed = mi.data.frame(DF.norway.miss.1.impute, m = 3)
DF.norway.miss.2.impute = mi(DF.norway.miss.2, n.iter=200) #imputes, needs more interations
DF.norway.miss.2.imputed = mi.data.frame(DF.norway.miss.2.impute, m = 3)


The first and third lines runs the imputation, with 200 iterations and 3 imputations. The second and fourth lines extracts the imputed datasets.

We can try ML imputation too.
 Reply
#25
I switched to using the VIM package instead for imputing. This produces results with bugs.

Code:
#subsets
DF.norway.complete = DF.norway[complete.cases(DF.norway),] #complete cases only, reduces N to 15
DF.norway.miss.1 = DF.norway[DF.norway.missing <= 1,] #keep data with 1 or less missing values, N=18
DF.norway.miss.2 = DF.norway[DF.norway.missing <= 2,] #keep data with 2 or less missing values, N=26

DF.norway.miss.1.imp.vim = irmi(DF.norway.miss.1) #impute using VIM
DF.norway.miss.2.imp.vim = irmi(DF.norway.miss.2) #impute using VIM


The current draft is 10 pages, but I still need to add more. Perhaps 12-15 pages total when done.
 Reply
#26
Ok, I managed to keep it at 8 pages due to dropping some of the scatter-plots.

New version is attached as well as updated supplementary material.


Attached Files
.pdf   article.pdf (Size: 298.99 KB / Downloads: 458)
.7z   Crime, income and employment among immigrant groups in Norway and Finland.7z (Size: 584.17 KB / Downloads: 370)
 Reply
#27
You should make explicit how many imputations you had (m=3, or more). And remember that the more missing values you have, the more imputation you need.
http://www.statisticalhorizons.com/more-imputations
 Reply
#28
(2014-Sep-12, 04:16:01)Emil Wrote: Ok, I managed to keep it at 8 pages due to dropping some of the scatter-plots.

New version is attached as well as updated supplementary material.


You didn't make the corrections regarding the intro which I noted previously.
 Reply
#29
Here is a new draft. I have made most of the changes suggested by Chuck and Dalliard. A native speaker has corrected the language in the paper. I have added a table in the appendix with S factor scores in NO and DK. I have added another dataset with up to 3 imputations which results in N=67, but more uncertain data. Various other smaller changes.

Optimally, I'd like to switch to deterministic imputation so that my analyses are completely reproducible, but haven't found an easy way to do this in R so far. It will very likely make little difference to the predictive results which are already very strong (|.55| - |.78| for the fully imputed dataset NO, |.51| - |.71| for DK). They are also similar to the unimputed datasets for those variables with sufficient sampling size and variance.


Attached Files
.pdf   article.pdf (Size: 301.49 KB / Downloads: 485)
 Reply
#30
The essence of imputation is that you can never have the same result. You only need to write it in the paper, as you did. But next time, i recommend you to use more imputation. Ideally, the number of imputations should be a function of % of missing values.
 Reply
 Previous 1 2 3 4 5 8 Next   
 
 
Forum Jump:

Users browsing this thread: 2 Guest(s)