(2015-Jan-21, 12:29:59)Emil Wrote: Again, you are using an idiosyncratic narrow definition of what a model is. It does not make sense to alter the wording to fit that usage.

That's handwaving argument. In fact, it's not even an argument at all. For this to be an argument, such affirmation should have been accompanied by some citations, which are absent in your post. By saying that my definition is an "idiosyncratic narrow definition" you're showing you don't obviously know what's a model. Because that definition I have given to you is actually not "my" definition. But it's a logical conclusion anyone can derive from what statisticians are writing. So, affirming I'm distorting the definition of a statistical model proves that you don't understand the manner in which statisticians use the term "statistical model".

Anyone who reads enough papers on this matter can certainly notice that statisticians (and even non-statisticians) employ quite often a sentence like this : "the models are fitted against the data". That's the perfect occasion for asking you this question : why do you think they are saying "models are fitted against the data" ? The response is obvious. They make a distinction between the statistical models (unobservables) and the observed data (observables).

In your paper, what you have is :

model0 = observed data

model1 ≠ observed data

model2 ≠ observed data

model3 ≠ observed data

Thus, while models 1-3 (IQ gains) can be statistically tested between each other against the data, this is not the case for model0 (no IQ gains). Models 1-3 can be said to be approximations of the observed data, but not model0. Thus, model0 violates the definition of a statistical model. By definition, a statistical model can be "statistically tested" with respect to the data. It's the purpose of a statistical model, i.e., to know how a given model approximates the data. And a model (e.g., model0) which is equivalent to the observed data cannot be "statistically tested" because model0 = data. No one can say, for example, that model0 has better model fit than models 1-3, even if it's the most accurate description of your data (which is not difficult because model0=data). Every models can fail the statistical test when they are inconsistent with the data; and the possibility of failure can apply to models 1-3 but not to model0, because, once again, model0=data.

Have you ever heard of the following saying ? From the statistician George E. P. Box :

Quote:Essentially, all models are wrong, but some are useful.

And I have seen several economists quoting him, in order to make clear what's a model. What this sentence reveals is that a model necessarily incorporates a degree of inexactness. That's what I meant earlier by approximations. It is only when models are approximations that they can be compared and tested against each other.

As other asked, how can we not compare models ?

If you don't trust my words, perhaps you will trust the words of others. Models are expressed as equations, and understood as approximations with regard to the data. For instance :

Nachtigall et al. 2003 p. 4

(Why) Should We Use SEM? Pros and Cons of Structural Equation Modeling

Jeffrey M. Wooldwridge 2012 pp. 3-5

Introductory Econometrics: A Modern Approach

Konishi & Kitagawa 2008 p. 4

Information Criteria and Statistical Modeling (Springer Series in Statistics)

Rex B. Kline 2011 pp. 8, 16

Principles of Structural Equation Model

Sheldon M. Ross 2010 p. 540

Introductory Statistics (3rd edition)

Marloes Maathuis 2012
1. Role of statistical models
Quote:Model is by definition a simplification of (a complex) reality.

Anu Maria 1997

Introduction to Modeling and Simulation
Quote:Modeling is the process of producing a model; a model is a representation of the construction and working of some system of interest. A model is similar to but simpler than the system it represents. One purpose of a model is to enable the analyst to predict the effect of changes to the system. On the one hand, a model should be a close approximation to the real system and incorporate most of its salient features. On the other hand, it should not be so complex that it is impossible to understand and experiment with it

Galit Schmueli 2010

To Explain or to Predict?
Quote:Exploratory data analysis (EDA) is a key initial step in both explanatory and predictive modeling. It consists of summarizing the data numerically and graphically, reducing their dimension, and “preparing” for the more formal modeling step.

...

2.6.1 Validation. In explanatory modeling, validation consists of two parts: model validation validates that f adequately represents F, and model fit validates that fˆ fits the data {X, Y}. In contrast, validation in predictive modeling is focused on generalization, which is the ability of fˆ to predict new data {Xnew,Ynew}.

...

The top priority in terms of model performance in explanatory modeling is assessing explanatory power ... In contrast, in predictive modeling, the focus is on predictive accuracy or predictive power, which refer to the performance of fˆ on new data.

Cosma Shalizi 2011

Evaluating Statistical Models
Quote:Using a model to summarize old data, or to predict new data, doesn't commit us to assuming that the model describes the process which generates the data. But we often want to do that, because we want to interpret parts of the model as aspects of the real world. We think that in neighborhoods where people have more money, they spend more on houses - perhaps each extra $1000 in income translates into an extra $4020 in house prices. Used this way, statistical models become stories about how the data were generated. If they are accurate, we should be able to use them to simulate that process, to step through it and produce something that looks, probabilistically, just like the actual data. This is often what people have in mind when they talk about scienti c models, rather than just statistical ones.

An example: if you want to predict where in the night sky the planets will be, you can actually do very well with a model where the Earth is at the center of the universe, and the Sun and everything else revolve around it. You can even estimate, from data, how fast Mars (for example) goes around the Earth, or where, in this model, it should be tonight. But, since the Earth is not at the center of the solar system, those parameters don't actually refer to anything in reality. They are just mathematical ctions. On the other hand, we can also predict where the planets will appear in the sky using models where all the planets orbit the Sun, and the parameters of the orbit of Mars in that model do refer to reality.

SAS/STAT® 9.2 User's Guide, Second Edition
Quote:Obviously, the model must be "correct" to the extent that it sufficiently describes the data-generating mechanism

Topics in Statistical Data Analysis: Revealing Facts From Data
Quote:The following figure illustrates the statistical thinking process based on data in constructing statistical models for decision making under uncertainties.

Mueller & Hancock 2007

Best Practices in Structural Equation Modeling

Quote:A central issue addressed by SEM is how to assess the fit between observed data and the hypothesized model, ideally operationalized as an evaluation of the degree of discrepancy between the true population covariance matrix and that implied by the model's structural and nonstructural parameters. As the population parameter values are seldom known, the difference between an observed, sample-based covariance matrix and that implied by parameter estimates must serve to approximate the population discrepancy.

Kenneth A. Bollen 1989 pp. 68, 72

Structural Equations with Latent Variables

Quote:Model-reality consistency is a more "slippery" issue. Here the question is whether the model mirrors real-world processes. For instance, does an econometric model of the U.S. economy really correspond to the behavior of the economy? Fully assessing model-reality consistency is not possible since it presupposes perfect knowledge of the "real" world with which to evaluate the model. In practice, we imperfectly evaluate model-reality consistency in several ways. One is comparing the predictions implied by a model to those observed in a context different from the data that supply the model parameter estimates. For instance, we might check the realism of an econometric model by contrasting its predictions of inflation rates to those observed in the future. If we are fortunate enough to be able to manipulate variables in the model, we can do so and see if the model correctly predicts the consequences. Or, we can examine the assumptions and relations embedded in a model and debate their validity based on other experiences or insights.

It is tempting to use model-data consistency as proof of model-reality consistency, but we could be misled by so doing. The problem lies in the asymmetric link between these two consistency checks. If a model is consistent with reality, then the data should be consistent with the model. But, If the data are consistent with a model, this does not imply that the model corresponds to reality.

[...]

In sum, structural equation models face the same restrictions as other empirical methodologies. We can only reject a model - we can never prove a model to be valid. A good model-to-data fit does not mean that we have the true model.

The last paragraph helps to better understand why models are not actual data. Since all models are "wrong", so to speak, the best fitting model is not a proof this model is the true model, as they are all approximations.

And finally, the best one is that blog article :

The True Meaning Of Statistical Models
Briggs (2014) has nicely summarized the essence of a typical statistical model : "Why substitute perfectly good reality with a model?", "Because a statistical model is only interested in quantifying the uncertainty in some observable, given clearly stated evidence", "Every model (causal or statistical or combination) implies (logically implies) a prediction". This cannot illustrate better all I have said earlier. A statistical model is an approximation, and thus is different from a descriptive stats. Unfortunately, your so-called statistical model of no gain has no uncertainty in it.

I repeat, the description in your figure 6 definitely needs to be rewritten.