SPSS Wiki
Advertisement

Should you be doing Exploratory or Confirmatory Analysis?[]

While confirmatory factor analysis has been popular in recent years to test the degree of fit between a proposed structural model and the emergent structure of the data, the pendulum has swung back to favor exploratory analysis for a couple of key reasons. Firstly the results of confirmatory factor analysis are typically misinterpreted to support one structural solution over any other. This conclusion is particularly weak when only a few of the many possible structures were assessed. Secondly, replicating a structure through successive unconstrained exploratory procedures is considered much stronger evidence of structure than an unreplicated constrained confirmatory procedure. So unless you are absolutely sure that you should be doing Confirmatory factor analysis – Amos, stick with an exploratory procedure (explained here). التحليل العاملى هو اسلوب لتقليل حجم البيانات من خلال تحدي مجموعة كبيرة من العبارات أو الاسئلة ثم البحث فى كيفية تقليلها وتلخيصها لمجموعة اصغر من العوامل أو المكونات من خلال البحث عن التكتلات أو المجموعات المرتبطة من المتغيرات ويتم ذلك من خلال مجموعة من الاساليب المختلفة للتحليل العاملى و هناك منهجان للتحليل العامل وهما 1- المنهج الاستطلاعى exploratory factor analysis والمنهج التوكيدى confirmatory factor analysis د ايهاب البراوى

Should you be doing Principal Components Analysis or Factor Analysis?[]

If your primary goal is to take scores from a large set of measured variables and reduce them to scores on a smaller set of composite variables that retain as much information from the original variables as possible – i.e. data reduction, then Principal Components Analysis (PCA) is the appropriate analysis. If however, your purpose is to model the structure of correlations among your variables (or to put it another way, arrive at a parsimonious representation of the associations among measured variables) then you should do Factor Analysis(FA). The distinction is subtle, but important. For this example we will assume that you (like most researchers) have chosen to do a PCA. The other advantage of PCA over FA is that none can say you are manipulating results is since you are analysing total variance (as PCA does) rather than just common variance (as FA does). There is also a technical advantage to PCA that means the mathematics don’t fail when the matrix is very large, and/or there are multicollinearity problems.

Do your data suit the analysis?[]

Assuming you do want to do an Exploratory PCA, there are a number of assumptions about the data that can be checked as part of the analysis, but there are two critical issues you need to consider before you continue. All your variables need to be continuous – things like gender and the like, can not be analyzed using PCA. That is not to say that SPSS doesn’t give you other options (see CAPTCA (categorical principal components analysis) - SPSS Categories), but categorical data is a problem for Exploratory PCA and FA. Dichotomous data (e.g. yes/no answers) may also be a problem, you should consider substituting your correlation matrix with a tetrachoric correlation matrix if most or all of your data is dichotomous. The second major issue is sample size. You need at least 50 cases or 4 cases per variable – whichever is greater. Though even these lower limits may not assure a replicable outcome. Ideally you should have more like 300 or 7 cases per – whichever is greater. If you want to replicate you results, then you would need twice this number and then randomly divide the data into 2 separate data sets.

How many components/factors should you extract?[]

Once you’ve decided that an Exploratory PCA suits your purpose, and your data suits the analysis, you face only one big question – how many components will you extract? There are many different methods for extraction. SPSS gives you seven extraction options, yet all but one relate to Factor Analysis not PCA. In this example, that leaves us with what SPSS simply calls ‘Principal Components’ as our default option. Unfortunately SPSS also defaults to an eighth strongly criticized Kaiser rule (i.e. the retention of principal components with eigenvalues above 1). SPSS also offers a Scree Plot as a way of determining the number of components to extract, but this technique too has as drawn considerable criticism. In both the case of the Kaiser rule and the scree test, the nub of the criticism is that the results can not typically be replicated. So what can you do? The answer is Parallel Analysis. It’s similar to the simple scree test, but instead of looking of an ‘elbow’ in the plot of ranked eigenvalues, you compare the unrotated (initial) eigenvalues to eigenvalues from a random sample with the same number of cases and variables as yours. Any eigenvalue greater than that which could be expected from an equivalent random data set are extracted. This technique has been shown to be superior in various simulation studies. The problem is that SPSS is not pre-programmed to do Parallel Analysis, but fortunately you can use other computer programs for Parallel Analysis. One easy to use program is developed by Marley Watkins: MonteCarloPA.zip. This will download a zipped file that you can open on your computer. Unzip the file and click on the file MonteCarloPA.exe. Provide the following information: the number of variables you are analyzing, the number of subjects in your sample and the number of replications (specify 100). Then, click on calculate. Systematically, compare the first eigenvalue you obtained in SPSS with the corresponding first value generated in MonteCarloPA program. If your value is greater than the value from parallel analysis, you retain the factor; if it is smaller, you reject it.

Another way is to use SPSS syntax which is generously provided by Dr Brian P. O'Connor. You can download the “raw scores” version of the syntax from his website. Open the downloaded syntax file and enter the name/location of the data file for analyses after "FILE =". If you specify "FILE = *", then the program will read the current, active SPSS data file. You can alternatively enter the name/location of a previously saved SPSS systemfile instead of "*". You can use the "/ VAR =" subcommand after "/ missing=omit" subcommand to select variables for the analyses. The output of the Parallel Analysis lists all latent “roots” (principal components) in order, and tells you what their initial eigenvalue is (“Raw data”). The output also tells you what the average (“Mean”) and upper 95th percentile of eigenvalue of 100 random data sets with the same number of cases and variables as yours, is for each component. You are looking for “Raw data” eigenvalue greater than their corresponding (same line) 95th percentile value. These are the only emergent components/facotrs above random chance. Once you have done the Parallel Analysis you will know how many components/factors to extract.

Note for those using the Student Version of SPSS

The Student version of SPSS (as opposed to the Graduate Pack) won't let you work with syntax as suggested in this section. While it is not as accurate as running parallel analysis on your data, Dr Albert Cota has provided tables for people to lookup appropriate 'cut-offs' for parallel analysis.

Conducting the analysis[]

Menu select ANALYSE>DATA REDUCTION>FACTOR

  • On the “Factor Analysis” window select the variables to be analysed and move them to the “Variables” box.
  • Select the “Descriptives” button. On the “Factor Analysis: Descriptives” window select at least “initial solution” and “KMO and Bartlett’s test of sphericity”. Click “Continue”.
  • Select the “Extraction” button. On the “Factor Analysis: Extraction” window the “Method” will default to “Principal components”. This is correct to conduct PCA, but if you’re doing FA you will need to change this to “Principal Axis Factoring”. The current window also defaults to providing the “Unrotated factor solution”. There is usually little need to ask for this as most people don’t interpret it. Of course the one exception to this is when you are dealing with a one-component/factor solution. The other important thing on this window is the “Extract” options. Select “Number of factors” and make this the same number as the Parallel Analysis suggested earlier. Click “Continue”.
  • Select the “Rotation” button. On the Factor Analysis: Rotation” window select “Promax” as the “Method”. Both “Promax” and “Direct Oblimin” are types of oblique rotations. The rest are froms of orthogonal rotation, with “Varimax” being the most common of these. Unless you have a clear theoretical reason for choosing an orthogonal rotation (i.e. forcing the component/factors to be uncorrelated), then stick to an oblique rotation. The reason “Promax” is suggested here is that intercorrelation between emergent components/factors is far less susceptible to manipulation of kappa (with Promax), than delta (with Oblimin). The rest of the defaults are generally fine. Click “Continue”.
  • Select the “Scores” button if you want to save component/factor scores (otherwise skip this point). On the Factor Analysis: Factor Scores” window select “Save as variables”, and change the “Method” to “Anderson-Rubin” which is the most mathematically accurate way to calculate the score within SPSS. Click “Continue”.
  • Select the “Options” button. On the Factor Analysis: Options” window it will help your interpretation if you select both “Sorted by size” and “Suppress absolute values less than”, and change this latter value to “.3”. Click “Continue”.
  • On the “Factor Analysis” window, click “OK” (unless you want to save the syntax, in which case click “Paste”).


Interpretation[]

  • If you have run a PCA, then ignore the fact the SPSS prints “Factor Analysis” at the top of the results.
  • The KMO statistic assesses one of the assumptions of Principle Components and Factor Analysis – namely whether there appears to be some underlying (latent) structure in the data (technically referred to as the Factorability of R). This is also referred to as Sampling Adequacy, or even lack of Sphericity. The KMO should be .6 or greater, otherwise any results you get may be unreliable (mere mathematical illusions). You will also note Bartlett’s Test of Sphericity which looks at the same issue in a different way. Despite the fact that it is often significant (p<.05, which is what you want), you should ignore it unless your sample is less than 5 cases per variable because it is notoriously overly sensitive, and likely to be significant with any substantial data set.
  • SPSS tells you how much of the variance is explained by each principle component, and even cumulates these percentages for you (“Total Variance Explained” table).
  • The “Pattern Matrix” is the most interpretable of the matrices in the output. Remember that PCA and FA are not inferential statistics, they are sophisticated descriptive statistics - there is no significance test. The adequacy of the solution is solely a function of the degree to which the “Pattern Matrix” makes sense (is interpretable). As a rule of thumb, the three highest loading items on each component/factor offer the best clue as to what the factor\component represents.
  • Take note of any items that load on more than one factor/component (known as complex or split loading).
  • Don’t forget to look at the “Component Correlation Matrix” near the bottom of the output. It will tell you how intercorrelated the emergent components/factors were.


Further Reading[]

Advertisement