Factor Analysis for scaling a test with pre-post data

Click For Summary
SUMMARY

This discussion focuses on the application of Factor Analysis (FA) for scaling a test with pre-post data from an attitude survey consisting of 37 Likert-type questions. The user explores two approaches: running FA on the difference between pre-scores and post-scores, which results in a KMO value below 0.300, and treating participants as distinct for pre- and post-tests, yielding a KMO above 0.700. The conversation highlights the importance of measurement theory and the statistical assumptions necessary for conducting FA, particularly in the context of social science research.

PREREQUISITES
  • Understanding of Factor Analysis (FA) and its applications in social science research.
  • Familiarity with Likert scale data and ordinal measurement techniques.
  • Knowledge of KMO (Kaiser-Meyer-Olkin) measure for sampling adequacy in factor analysis.
  • Basic concepts of measurement theory and latent variable modeling.
NEXT STEPS
  • Research the statistical assumptions required for conducting Factor Analysis in social science contexts.
  • Explore the differences between Spearman's R-method and other factor analysis methods like Q-method and P-method.
  • Learn about the implications of using pre-test and post-test data in longitudinal studies.
  • Investigate techniques for handling missing data in factor analysis to maintain sample size and validity.
USEFUL FOR

Researchers, statisticians, and social scientists interested in applying Factor Analysis to survey data, particularly those working with pre-post test designs and measurement theory.

thelema418
Messages
131
Reaction score
4
I need to create scales for a test before running t-tests / ANOVA.

Instrument: One attitude survey test with 37 questions. Each question is a Likert type question with 1 to 5 points.

Data Set: The set of data includes pre-test and post-test scores. There was a 6 month delay between the pre-test and post-test.

The question is this -- my original tables have rows of participants and columns for Pre-scores and Post-scores. Some people have told me to run FA on the difference between Pre-Scores and Post-Scores. (This reduces the sample size because some people skipped questions or did not take the post-test). The KMO becomes less than .300 because of this.

Another idea is to treat the participants as different in terms of time, so that there is a Pre-Participant and Post-Participant in the rows. The columns will only be questions. This way we can create factor scores and then merge these scores as pre- and post-scores. When the data is handled this way, the KMO is greater than .700. But some have issued concerns about "replicating" individuals when doing this.

In the statistical literature, I haven't seen anything written about either issue with data for FA in terms of making scales. Any insight would be appreciated. Thanks.
 
Physics news on Phys.org
The mathematics section isn't a good place to find social scientists. To get advice, I think you need to explain the mathematical models that are involved to a "general audience" of mathematicians.

From my casual acquaintance with the subject
:
1) A "likert" question is a question that asks the respondent to give an answer that (intuitively) represents the value of some scalar variable. (e.g. It might be a multiple choice question with answers such as a) never b) sometimes c) often d) most of the time e) always.)

2) A "scale" is a real valued function of all the respondent's answers to a questionnaire (Often a weighted average of the ordinal numbers corresponding to the answers - e.g. 1 for a), 2 for b),...etc.). The scale attempts to measure some aspect of the respondent, which I will call a "state".

Intuitively, a "scale" is a more reliable measure of a state than a single question. For example, answering the question:

How often do you carry a hatchet with you when you answer the door?

a) never b) sometimes c) often d) most of the time e) always

may involve both the state of "aggressiveness" and the state of "paranoia". So if we wish to measure the "aggressiveness" of respondents, it seems more reliable to use a function that depends on answers to several questions.

I suppose factor analysis is supposed to develop scales by detecting functions ("scales") that are uncorrelated.

I think you have to explain the states your are tyring to measure and how "pre-test" and "post-test" condition is liable to affect them. You need to answer some (social) scientific questions.
 
I suppose this falls under "measurement theory." FA does a lot of things; it is a huge domain of study in itself with multiple branches.

I may be providing too much information by stating that the items are Likert scales: the responses are just ordinal measurements. I usually use factor analysis for dimension reduction on interval measurements, such as SAT-math scores, math exams, etc.

This type of factor analysis I am speaking of was developed by Charles Spearman (R-method statistics). There are some other types of factor analysis such as Stephenson's Q-method (which uses ipsatic testing devices and transforms the tables) and Cattell's P-method (which is factor analysis with a time variable - I have no background at all with this method).

The central idea of Spearman's factor analysis is to find latent variables which correlate with the known variables. The best example I can give is the purpose he had to develop it: to create a measure of general intelligence. The idea is that intelligence must be somehow related to tests of English, Math, Music, Foreign Language, etc. So, if you find the latent variable that correlates to all of them, then you have a measure of "something" (which Spearman labeled "general intelligence"). You can find any number of latent variables, but in Spearman's case the first latent variables explained almost all of the variance -- again, satisfying his claims of general intelligence. Spearman used the correlations to calculate a measure for each participant.

In case of Zimbardo's development of the ZTPI, this generated 5 latent variables, e.g. Present-
hedonic. In Zimbardo's case, he found the items that load together with FA, he named the factors, then he scored the factors by averaging the scores of loaded items after inverting any score that loaded negatively.

My question though is really about statistical assumptions for doing the FA. Maybe I'll post the question to a social science group too.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 54 ·
2
Replies
54
Views
6K
  • · Replies 1 ·
Replies
1
Views
3K