Multivariate Stats

1.According to Frequencies table, GAF variable has 9 missing values, Satisfaction has 12 and Agency has 1. Missing variables could affect calculations of statistics if treated as zero (or blank), e.g. they lower the mean value due to larger sample size. However, in case SPSS treats missing values as “system-missing”, the values are excluded from the calculation and the values are not affected. This is equivalent to list-wise exclusion of cases, which is one of the solutions to the problem of missing values. To preserve sample size, however, a replacement of missing values may be attempted. The replacement by series means is acceptable when the cases with missing values are random, i.e. when the parameters of variables distribution are similar for the total sample and the sub-sample with missing values. This is the case with this dataset, for GAF and Satisfaction variables. Such replacement leaves mean values intact and reduces the sample variance. A single case with missing Agency value may be safely excluded.

2.The Descriptives table produced by Explore routine shows a maximum value of 201 for GAF variable. This is most likely a coding error because the GAF value may not exceed 100 points. Due to this, the case needs to be excluded. More generally, the extreme values can be identified and tested using box plots under Explore routine. The outlying case number 82 is easily seen on the plot. Some other outlying cases are identified, but none of them were considered errors.

3.The steps for data screening may be outlined as follows:

Screen descriptive statistics and distribution using Analyze-Descriptive Statistics-Frequencies. Bar charts option may be used for estimating distribution shape and finding possible outliers. Numerically, minimum and maximum values, skewness and kurtosis may indicate potential problems with the set.

Explore the dataset using Analyze-Descriptive Statistics-Explore. Confidence intervals and boxplots may be used at this stage to identify outliers; besides, normality assumption may be formally tested and visually assessed. In the case of GAF dataset, Agency is selected as factor, while GAF and Satisfaction as dependents.

Analyze the patterns of missing values, e.g. using Analyze-Descriptive Statistics-Frequencies. For each variable with missing values, filter out the cases with present values and examine distribution of other variables using Frequency tables, descriptive statistics, normality tests and/or other tools.

Eliminate the outliers found, e.g. with Data-Select Cases. To “delete unselected cases” is a suitable solution if missing values are to be replaced by series mean.

Perform homoscedasticity test using Analyze-Compare Means-One-way ANOVA. Similarly, Agency is selected as factor and missing values are excluded list-wise. For the Satisfaction variable, homoscedasticity assumption is violated according to Levene’s test.

Replace missing values if necessary, using Transform-Replace Missing Values, by creating new variables.

Transform the variables if necessary for obtaining normality and homoscedasticity, e.g. using Transform-Compute Variable option.

Normality and homoscedasticity assumptions may be double-checked after this, in case the percentage of missing values is …
Posted by: Min Brust

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •