# Factor Analysis for scaling a test with pre-post data

1. Sep 11, 2014

### thelema418

I need to create scales for a test before running t-tests / ANOVA.

Instrument: One attitude survey test with 37 questions. Each question is a Likert type question with 1 to 5 points.

Data Set: The set of data includes pre-test and post-test scores. There was a 6 month delay between the pre-test and post-test.

The question is this -- my original tables have rows of participants and columns for Pre-scores and Post-scores. Some people have told me to run FA on the difference between Pre-Scores and Post-Scores. (This reduces the sample size because some people skipped questions or did not take the post-test). The KMO becomes less than .300 because of this.

Another idea is to treat the participants as different in terms of time, so that there is a Pre-Participant and Post-Participant in the rows. The columns will only be questions. This way we can create factor scores and then merge these scores as pre- and post-scores. When the data is handled this way, the KMO is greater than .700. But some have issued concerns about "replicating" individuals when doing this.

In the statistical literature, I haven't seen anything written about either issue with data for FA in terms of making scales. Any insight would be appreciated. Thanks.

2. Sep 14, 2014

### Stephen Tashi

The mathematics section isn't a good place to find social scientists. To get advice, I think you need to explain the mathematical models that are involved to a "general audience" of mathematicians.

From my casual acquaintance with the subject
:
1) A "likert" question is a question that asks the respondent to give an answer that (intuitively) represents the value of some scalar variable. (e.g. It might be a multiple choice question with answers such as a) never b) sometimes c) often d) most of the time e) always.)

2) A "scale" is a real valued function of all the respondent's answers to a questionnaire (Often a weighted average of the ordinal numbers corresponding to the answers - e.g. 1 for a), 2 for b),...etc.). The scale attempts to measure some aspect of the respondent, which I will call a "state".

Intuitively, a "scale" is a more reliable measure of a state than a single question. For example, answering the question:

may involve both the state of "aggressiveness" and the state of "paranoia". So if we wish to measure the "aggressiveness" of respondents, it seems more reliable to use a function that depends on answers to several questions.

I suppose factor analysis is supposed to develop scales by detecting functions ("scales") that are uncorrelated.

I think you have to explain the states your are tyring to measure and how "pre-test" and "post-test" condition is liable to affect them. You need to answer some (social) scientific questions.

3. Sep 14, 2014

### thelema418

I suppose this falls under "measurement theory." FA does a lot of things; it is a huge domain of study in itself with multiple branches.

I may be providing too much information by stating that the items are Likert scales: the responses are just ordinal measurements. I usually use factor analysis for dimension reduction on interval measurements, such as SAT-math scores, math exams, etc.

This type of factor analysis I am speaking of was developed by Charles Spearman (R-method statistics). There are some other types of factor analysis such as Stephenson's Q-method (which uses ipsatic testing devices and transforms the tables) and Cattell's P-method (which is factor analysis with a time variable - I have no background at all with this method).

The central idea of Spearman's factor analysis is to find latent variables which correlate with the known variables. The best example I can give is the purpose he had to develop it: to create a measure of general intelligence. The idea is that intelligence must be somehow related to tests of English, Math, Music, Foreign Language, etc. So, if you find the latent variable that correlates to all of them, then you have a measure of "something" (which Spearman labeled "general intelligence"). You can find any number of latent variables, but in Spearman's case the first latent variables explained almost all of the variance -- again, satisfying his claims of general intelligence. Spearman used the correlations to calculate a measure for each participant.

In case of Zimbardo's development of the ZTPI, this generated 5 latent variables, e.g. Present-
hedonic. In Zimbardo's case, he found the items that load together with FA, he named the factors, then he scored the factors by averaging the scores of loaded items after inverting any score that loaded negatively.

My question though is really about statistical assumptions for doing the FA. Maybe I'll post the question to a social science group too.