SPSS Explained

A

ANOVA An acronym for the ANalysis Of VAriance. By analysing the variance in the data due to different sources (e.g. an independent variable or error) we can decide if our experimental manipulation is influencing the scores in the data.

Asymp. Sig. (asymptotic significance) An estimate of the probability of a nonparametric test statistic employed by computer statistical analysis programs. This is often used when the exact probability cannot be worked out quickly.

B

beta weight The average amount by which the dependent variable increases when
the independent variable increases by one standard deviation (all other inde -
pendent variables are held constant).
between subjects Also known as independent measures. In this design, the
samples we select for each condition of the independent variable are independent,
as a member of one sample is not a member of another sample.
bootstrapping A sample is used to estimate a population. New bootstrap samples
are randomly selected from the original sample with replacement (so an item
can be selected more than once). The bootstrap samples, often 1,000 or more,
are then used to estimate the population sampling distribution.

C

case A row in the Data Editor file; the data collected from a single participant.
Chart Editor The feature in SPSS that allows the editing of charts and graphs.
comparisons The results of a statistical test with more than two conditions will
often show a significant result but not where the difference lies. We need to
undertake a comparison of conditions to see which ones are causing the effect.
If we compare them two at a time this is known as pairwise comparison and if
we perform unplanned comparisons after discovering the significant finding these
are referred to as post hoc comparisons.
component The term used in the principal components method of factor analysis
for a potential underlying factor.
condition A researcher chooses levels or categories of the independent variable(s)
to observe the effect on the dependent variable(s). These are referred to as conditions, levels, treatments or groups. For example, ‘morning’ and ‘afternoon’
might be chosen as the conditions for the independent variable of time of day.
confidence interval In statistics we use samples to estimate population values,
such as the mean or the difference in means. The confidence interval provides
a range of values within which we predict lies the population value (to a certain
level of confidence). The 95 per cent confidence interval of the mean worked
out from a sample indicates that the population mean would fall between the
upper and lower limits 95 per cent of the time.
contrasts With a number of conditions in a study we may plan a set of com -
parisons such as contrasting each condition with a control condition. These
planned comparisons are referred to as contrasts. We can plan complex contrasts
– for example, the effects of conditions 1 and 2 against condition 3.
correlation The degree to which the scores on two (or more) variables co-relate.
That is, the extent to which a variation in the scores on one variable results in
a corresponding variation in the scores on a second variable. Usually the relation -
ship we are looking for is linear. A multiple correlation examines the relationship
between a combination of predictor variables with a dependent variable.
critical value We reject the null hypothesis after a statistical test if the probability
of the calculated value of the test statistic (under the null hypothesis) is lower
than the significance level (e.g. .05). Computer programs print out the probability
of the calculated value (e.g. .023765) and we can examine this to see if it is
higher or lower than the significance level. Textbooks print tables
of the critical values of the test statistic, which are the values of the statistic
at a particular probability. For example, if the calculated value of a statistic (i.e.
a t test) is 4.20 and the critical value is 2.31 (at the .05 level of significance),
then clearly the probability of the test statistic is less than .05.
crosstabulation Frequency data can be represented in a table with the rows as
the conditions of one variable and the columns as the conditions of a second
variable. This is a crosstabulation. We can include more variables by adding
‘layers’ to the crosstabulation in SPSS.

D

Data Editor The feature in SPSS where data is entered. Saving the information
from the Data Editor will produce an SPSS .sav file. There are two windows
within the Data Editor: Data View and Variable View.
Data View The Data View window within the Data Editor presents a spreadsheet
style format for entering all the data points.
degrees of freedom When calculating a statistic we use information from the
data (such as the mean or total) in the calculation. The degrees of freedom is
the number of scores we need to know before we can work out the rest using
the information we already have. It is the number of scores that are free to vary
in the analysis.
dependent variable The variable measured by the researcher and predicted to
be influenced by (that is, depend on) the independent variable.
descriptive statistics Usually we wish to describe our data before conducting
further analysis or comparisons. Descriptive statistics such as the mean and
standard deviation enable us to summarise a dataset.
discriminant function A discriminant function is one derived from a set of
independent (or predictor) variables that can be used to discriminate between
the conditions of a dependent variable.
distribution The range of possible scores on a variable and their frequency of
occurrence. In statistical terms we refer to a distribution as a ‘probability density
function’. We use the mathematical formulae for known distributions to work
out the probability of finding a score as high as or as low as a particular score.

E

effect size The size of the difference between the means of two populations, in
terms of standard deviation units.
eigenvalue In a factor analysis an eigenvalue provides a measure of the amount
of variance that can be explained by a proposed factor. If a factor has an
eigenvalue of 1, it can explain as much variance as one of the original
independent variables.
equality of variance See homogeneity of variance.

F

factor Another name for ‘variable’, used commonly in the analysis of variance to
refer to an independent variable. In factor analysis we analyse the variation in
the data to see if it can be explained by fewer factors (i.e. ‘new’ variables) than
the original number of independent variables.

G

general linear model The underlying mathematical model employed in
parametric statistics. When there are only two variables, X and Y, the relationship
between them is linear when they satisfy the formula Y = a + bX (where a and
b are constants). The general linear model is a general form of this equation
allowing as many X and Y variables as we wish in our analysis.
grouping variable In analysing data in SPSS we can employ an independent
measures independent variable as a grouping variable. This separates our parti -
ci pants into groups (such as introverts versus extroverts). It is important when
inputting data into a statistical analysis program that we include the grouping
variable as a column, with each group defined (i.e. introvert as ‘1’ and
extrovert as ‘2’). We can then analyse the scores on other variables in terms of
these groups, such as comparing the introverts with the extroverts on, say, a
monitoring task.

H

homogeneity of variance Underlying parametric tests is the assumption that
the populations from which the samples are drawn have the same variance. We
can examine the variances of the samples in our data to see whether this
assumption is appropriate with our data or not.
homoscedasticity The scores in a scatterplot are evenly distributed along and
about a regression line. This is an assumption made in linear correlation. (This
is the correlation and regression equivalent of the homogeneity of variance
assumption.)
hypothesis A predicted relationship between variables. For example: ‘As sleep
loss increases so the number of errors on a specific monitoring task will increase.’

I

illustrative statistics Statistics that illustrate rather than analyse a set of data,
such as the total number of errors made on a reading task. Often we illustrate
a dataset by means of a graph or a table.
independent or independent measures A term used to indicate that there
are different subjects (participants) in each condition of an independent variable;
also known as ‘between subjects’.
independent variable A variable chosen by the researcher for testing, predicted
to influence the dependent variable.
inferential statistics Statistics that allow us to make inferences about the data –
for example, whether samples are drawn from different populations or whether
two variables correlate.
interaction When there are two or more factors in an analysis of variance, we
can examine the interactions between the factors. An interaction indicates
that the effect of one factor is not the same at each condition of another factor.
For example, if we find that more cold drinks are sold in summer and more hot
drinks sold in winter, we have an interaction of ‘drink temperature’ and ‘time
of year’.
intercept A linear regression finds the best fit linear relationship between two
variables. This is a straight line based on the formula Y = a + bX, where b is
the slope of the line and a is the intercept, or point where the line crosses the
Y-axis. (In the SPSS output for an ANOVA the term ‘intercept’ is used to refer
to the overall mean value and its difference from zero.)
item When we employ a test with a number of variables (such as questions in a
questionnaire) we refer to these variables as ‘items’, particularly in reliability
analysis where we are interested in the correlation between items in the test.

J

none

K

kurtosis The degree to which a distribution differs from the bell-shaped normal
distribution in terms of its peakness. A sharper peak with narrow ‘shoulders’ is
called leptokurtic and a flatter peak with wider ‘shoulders’ is called platykurtic.

L

levels of data Not all data are produced by using numbers in the same way.
Sometimes we use numbers to name or allocate participants to categories
(i.e. labelling a person as a liberal, and allocating them the number 1, or a
conservative, and allocating them the number 2). In this case the data is termed
‘nominal’. Sometimes we employ numbers to rank order participants, in which
case the data is termed ‘ordinal’. Finally, when the data is produced on a
measuring scale with equal intervals the data is termed ‘interval’ (or ‘ratio’ if the
scale includes an absolute zero value). Parametric statistics require interval data
for their analyses.
Likert scale A measuring scale where participants are asked to indicate their
level of agreement or disagreement to a particular statement on, typically, a 5-
or 7-point scale (from strongly agree to strongly disagree).
linear correlation The extent to which variables correlate in a linear manner.
For two variables this is how close their scatterplot is to a straight line.
linear regression A regression that is assumed to follow a linear model. For two
variables this is a straight line of best fit, which minimises the ‘error’.

M

main effect The effect of a factor (independent variable) on the dependent variable
in an analysis of variance measured without regard to the other factors in the
analysis. In an ANOVA with more than one independent variable we can examine
the effects of each factor individually (termed the main effect) and the factors
in combination (the interactions).
MANOVA A Multivariate Analysis of Variance. An analysis of variance technique
where there can be more than one dependent variable in the analysis.
mean A measure of the ‘average’ score in a set of data. The mean is found by
adding up all the scores and dividing by the number of scores.
mean square A term used in the analysis of variance to refer to the variance in
the data due to a particular source of variation.
median If we order a set of data from lowest to highest, the median is the point
that divides the scores into two, with half the scores below and half above the
median.
mixed design A mixed design is one that includes both independent measures
factors and repeated measures factors. For example, a group of men and a group
of women are tested in the morning and the afternoon. In this test ‘gender’ is
an independent measures variable (also known as ‘between subjects’) and time
of day is a repeated measures factor (also known as ‘within subjects’), so we
have a mixed design.
mode The score that has occurred the highest number of times in a set of data.
multiple correlation The correlation of one variable with a combination of other
variables.
multivariate Literally, this means ‘many variables’ but is most commonly used to
refer to a test with more than one dependent variable (as in the MANOVA).

N

nonparametric test Statistical tests that do not use, or make assumptions about,
the characteristics (parameters) of populations.
normal distribution A bell-shaped frequency distribution that appears to
underlie many human variables. The normal distribution can be worked out
mathematically using the population mean and standard deviation.
null hypothesis A prediction that there is no relationship between the inde -
pendent and dependent variables.

O

one-tailed test A prediction that two samples come from different populations,
specifying the direction of the difference – that is, which of the two populations
will have the larger mean value.
outlier An extreme value in a scatterplot in that it lies outside the main cluster of
scores. When calculating a linear correlation or regression, an outlier will have
a disproportionate influence on the statistical calculations.
Output Navigator An SPSS navigation and editing system in an outline view in
the left-hand column of the output window. This enables the user to hide or
show output or to move items within the output screen.

P

p value The probability of a test statistic (assuming the null hypothesis to be true).
If this value is very small (e.g. .02763), we reject the null hypothesis. We claim
a significant effect if the p value is smaller than a conventional significance level
(such as .05).
parameter A characteristic of a population, such as the population mean.
parametric tests Statistical tests that use the characteristics (parameters) of
populations or estimates of them (when assumptions are also made about the
populations under study).
partial correlation The correlation of two variables after having removed the
effects of a third variable from both.
participant A person taking part as a ‘subject’ in a study. The term ‘participant’
is preferred to ‘subject’ as it acknowledges the person’s agency – i.e. that they
have consented to take part in the study.
population A complete set of items or events. In statistics, this usually refers to
the complete set of subjects or scores we are interested in, from which we have
drawn a sample.
post hoc tests When we have more than two conditions of an independent
variable, a statistical test (such as an ANOVA) may show a significant result but
not the source of the effect. We can perform post hoc tests (literally, post hoc
means ‘after this’) to see which conditions are showing significant differences.
Post hoc tests should correct for the additional risk of Type I errors when
performing multiple tests on the same data.
power of a test The probability that, when there is a genuine effect to be found,
the test will find it (that is, correctly reject a false null hypothesis). As an
illustration, one test might be like a stopwatch that gives the same time for two
runners in a race but a more powerful test is like a sensitive electronic timer
that more accurately shows the times to differ by a fiftieth of a second.
probability The chance of a specific event occurring from a set of possible events,
expressed as a proportion. For example, if there were 4 women and 6 men in
a room, the probability of meeting a woman first on entering the room is 4/10
or .4 as there are 4 women out of 10 people in the room. A probability of 0
indicates an event will never occur and a probability of 1 that it will always
occur. In a room of only 10 men there is a probability of 0 (0/10) of meeting a
woman first and a probability of 1 (10/10) of meeting a man.

Q

none

R

range The difference between the lowest score and the highest score.
rank When a set of data is ordered from lowest to highest, the rank of a score is
its position in this order.
regression The prediction of scores on one variable by their scores on a second
variable. The larger the correlation between the variables, the more accurate the
prediction. We can undertake a multiple regression where the scores on one
variable are predicted from the scores on a number of predictor variables.
reliability A reliable test is one that that will produce the same result when
repeated (in the same circumstances). We can investigate the reliability of the
items in a test (such as the questions in a questionnaire) by examining the
relationship between each item and the overall score on the test.
repeated measures A term used to indicate that the same subjects (participants)
are providing data for all the conditions of an independent variable; also known
as ‘within subjects’.
residual A residual is the difference between an actual score and a predicted score.
If scores are predicted by a model (such as the normal distribution curve) then
the residual will give a measure of how well the data fit the model.

S

Sig. (2-tailed) The exact probability of the test statistic for a two tailed prediction.
Sometimes an estimate (see Asymp.Sig. – asymptotic significance – is also
included).
significance level The risk (probability) of erroneously claiming a relationship
between an independent and a dependent variable when there is not one.
Statistical tests are undertaken so that this probability is chosen to be small,
usually set at .05 indicating that this will occur no more than 5 times in 100.
simple main effects A significant interaction in a two factor analysis of variance
indicates that the effect of one variable is different at the various conditions of
the other variable. Calculating simple main effects tells us what these different
effects are. A simple main effect is the effect of one variable at a single condition
of the other variable.
skew The degree of symmetry of a distribution. A symmetrical distribution, like
the normal distribution, has a skew of zero. The skew is negative if the scores
‘pile’ to the right of the mean and positive if they pile to the left.
sphericity An assumption we make about the data in a repeated measures design.
Not only must we assume homogeneity of variance but homogeneity of
covariance – that is, homogeneity of variance of the differences between samples.
Essentially, we must assume the effect of an independent variable to be consistent
across both conditions and subjects in these designs for the analysis to be
appropriate.
standard deviation A measure of the standard (‘average’) difference (deviation)
of a score from the mean in a set of scores. It is the square root of the variance.
(There is a different calculation for standard deviation when the set of scores
are a population as opposed to a sample.)
standard error of the estimate A measure of the ‘average’ distance (standard
error) of a score from the regression line.
standard error of the mean The standard deviation of the distribution of
sample means. It is a measure of the standard (‘average’) difference of a sample
mean from the mean of all sample means of samples of the same size from the
same population.
standard score The position of a score within a distribution of scores. It provides
a measure of how many standard deviation units a specific score falls above or
below the mean. It is also referred to as a z score.
statistic Specifically, a characteristic of a sample, such as the sample mean. More
generally, statistic and statistics are used to describe techniques for summarising
and analysing numerical data.
statistics viewer The SPSS Statistics Viewer is the name of the file that contains
all of the output from the SPSS procedures. Often referred to (as in this book)
as the Output Window.
subject The term used for the source of data in a sample. If people are the subjects
of the study it is viewed as more respectful to refer to them as participants,
which acknowledges their role as helpful contributors to the investigation.
sums of squares The sum of the squared deviations of scores from their mean
value.

T

test statistic The calculated value of the statistical test that has been undertaken.
two-tailed test A prediction that two samples come from different populations,
but not stating which population has the higher mean value.
Type I error The error of rejecting the null hypothesis when it is true. The risk
of this occurring is set by the significance level.
Type II error The error of not rejecting the null hypothesis when it is false.

U

univariate A term used to refer to a statistical test where there is only one
dependent variable. ANOVA is a univariate analysis as there can be more than
one independent variable but only one dependent variable.

V

value labels Assigning value labels within the Variable View screen in SPSS
ensures that the output is labelled appropriately when grouping variables are
used – for example, 1 = males, 2 = females.
Variable View The screen within the SPSS Data Editor where the characteristics
of variables are assigned.
variance A measure of how much a set of scores vary from their mean value.
Variance is the square of the standard deviation.