Statistical Concepts – A Second Course, 5th Edition
About
Statistical Concepts—A Second Course presents the last 10 chapters from An Introduction to Statistical Concepts, Fourth Edition. Designed for second and upper-level statistics courses, this book highlights how statistics work and how best to utilize them to aid students in the analysis of their own data and the interpretation of research results.
In this new edition, Hahs-Vaughn and Lomax discuss sensitivity, specificity, and false positive and false negative errors. Coverage of effect sizes has been expanded upon and more organizational features (to summarize key concepts) have been included. A final chapter on mediation and moderation has been added for a more complete presentation of regression models.
This book acts as a clear and accessible instructional tool to help readers fully understand statistical concepts and how to apply them to data. It is an invaluable resource for students undertaking a course in statistics in any number of social science and behavioral science disciplines.
Chapter Outline
One-Factor Analysis of Variance - Fixed Effects Model
11.1What one-factor analysis of variance is and how it works
11.1.1 Characteristics
11.1.1.1 The Layout of the data
11.1.1.2 ANOVA theory
11.1.1.2.1 General theory and logic
11.1.1.2.2 General linear model
11.1.1.2.3 Partitioning the sums of squares
11.1.1.2.4 ANOVA summary table
11.1.1.3 The ANOVA model
11.1.1.3.1 The model
11.1.1.3.2 Estimation of the parameters of the model
11.1.1.3.3 Confidence intervals
11.1.1.3.4 An example
11.1.1.3.5 Expected mean squares
11.1.1.4 The unequal n's or unbalanced design
11.1.1.5 Alternative ANOVA procedures
11.1.1.5.1 Kruskal-Wallis test
11.1.1.5.2 Welch, Brown-Forsyth, and James Procedures
11.1.2 Power
11.1.3 Effect Size
11.1.3.1 Eta squared
11.1.3.2 Omega squared and epsilon squared
11.1.3.3 Cohen's f
11.1.3.4 Interpretation of effect size values
11.1.3.5 An effect size example
11.1.3.6 Confidence intervals for effect size
11.1.3.7 Items to Consider
11.1.4 Assumptions 11.1.4.1 Independence
11.1.4.2 Homogeneity of variance
11.1.4.3 Normality
11.2 Computing parametric and nonparametric models using SPSS
11.2.1 One-factor analysis of variance
11.2.2 Nonparametric procedures
11.2.2.1 Kruskal-Wallis
11.2.2.1.1 Interpreting the output for Kruskal-Wallis
11.2.2.2 Welch and Brown Forsythe
11.2.2.2.1 Interpreting the output for the Welch and Brown-Forsythe
11.3 Computing parametric and nonparametric models using R
11.3.1 Reading data into R
11.3.2 Generating the one-way ANOVA model
11.3.3 Generating the Welch test
11.3.4 Generating the Kruskal-Wallis test
11.4 Data screening
11.4.1 Normality
11.4.1.1 Interpreting normality evidence
11.4.2 Independence
11.4.2.1 Interpreting independence evidence
11.4.3 Homogeneity of variance
11.5 Power using G*Power
11.5.1 Post hoc power for the one-way ANOVA using G*Power
11.5.2 A priori power for the one-way ANOVA using G*Power
11.6 Research question template and example write-up
11.7 Additional resources
Multiple Compatison Procedues
12.1 What multiple comparison procedures are and how they work
12.1.1 Characteristics
12.1.1.1 Contrasts
12.1.1.2 Planned versus post hoc comparisons
12.1.1.3 The Type I error rate
12.1.1.4 Orthogonal contrasts
12.1.2 Selected multiple comparison procedures
12.1.2.1 Planned analysis of trend
12.1.2.2 Planned orthogonal contrasts
12.1.2.3 Planned contrasts with reference group: Dunnett method
12.1.2.4 Other planned contrasts: Dunn (or Bonferroni) and Dunn-Sidak methods
12.1.2.5 Complex post hoc contrasts: Scheffé and Kaiser-Bowden methods
12.1.2.6 Simple post hoc contrasts: Tukey HSD, Tukey-Kramer, Fisher LSD and Fisher-Hayter tests
12.1.2.7 Simple post hoc contrasts for unequal variances: Games-Howell, Dunnett T3, and C tests
12.1.2.8 Follow up tests to Kruskal-Wallis
12.1.3 Selecting the proper multiple comparison procedure
12.2 Computing multiple comparison procedures using SPSS
12.3 Computing multiple comparison procedures using R
12.3.1 Reading data into R
12.3.2 Generating the one-way ANOVA
12.3.3 Generating Tukey’s Multiple Comparison Procedure
12.3.4 Generating trend analysis
12.3.5 Generating other MCPs
12.4 Research question template and example write-up
Factorial Analysis of Variance - Fixed-Effects Model
13.1 What two-factor ANOVA is and how it works
13.1.1 Characteristics
13.1.1.1 The layout of the data
13.1.1.2 The ANOVA model
13.1.1.3 Main effects and interaction effects
13.1.1.4 Partitioning the sums of squares
13.1.1.5 The ANOVA summary table
13.1.1.6 Multiple comparison procedures
13.1.1.7 Expected mean squares
13.1.1.8 An example
13.1.2 Power
13.1.3 Effect size
13.1.3.1 Proportion of total variance effect size
13.1.3.2 Proportion of partial variance effect size
13.1.3.3 Interpreting effect size
13.1.3.4 Additional effect size considerations
13.1.3.5 Effect size example
13.1.3.6 Confidence intervals for effect size
13.1.4 Assumptions
13.2 What three-factor and higher-order ANOVA models are and how they work
13.2.1 Characteristics
13.2.2 The ANOVA model
13.2.3 The ANOVA summary table
13.2.4 The triple interaction
13.3 What the factorial ANOVA with unequal n's is and how it works
13.4 Computing factorial ANOVA using SPSS
13.4.1 Testing a statistically significant interaction
13.5 Computing factorial ANOVA using R
13.5.1 Reading data into R
13.5.2 Generating the factorial ANOVA
13.5.3 Generating tests for homogeneity of variance
13.5.4 Generating post hoc tests
13.5.5 Computing effect size
13.6 Data screening
13.6.1 Normality
13.6.1.1 Interpreting normality evidence
13.6.2 Independence
13.6.2.1 Interpreting independence evidence
13.6.3 Homogeneity of variance
13.7 Power using G*Power
13.7.1 Post hoc power for factorial ANOVA using G*Power
13.7.1.1 Power for interactions
13.7.2 A priori power for factorial ANOVA using G*Power
13.8 Research question template and example write-up
13.9 Additional resources
Introduction to Analysis of Covariance: The One-Factor Fixed-Effects Model with a Single Covatiate
14.1 What ANCOVA is and how it works
14.1.1 Characteristics
14.1.1.1 The layout of the data
14.1.1.2 The ANCOVA model
14.1.1.3 The ANCOVA summary table
14.1.1.4 Partitioning the sums of squares
14.1.1.5 Adjusted means and related procedures
14.1.1.6 An example
14.1.1.7 ANCOVA without randomization
14.1.1.8 More complex ANCOVA models
14.1.1.9 Nonparametric ANCOVA procedures
14.1.2 Sample size
14.1.3 Power
14.1.4 Effect size
14.1.5 Assumptions
14.1.5.1 Independence
14.1.5.2 Homogeneity of variance
14.1.5.3 Normality
14.1.5.4 Linearity
14.1.5.5 Fixed independent variable
14.1.5.6 Independence of the covariate and the independent variable
14.1.5.7 Covariate measured without error
14.1.5.8 Homogeneity of regression slopes
14.2 Computing ANCOVA using SPSS
14.3 Computing ANCOVA using R
14.3.1 Reading data into R
14.3.2 Generating the ANCOVA model
14.4 Data screening
14.4.1 Independence
14.4.1.1 Interpreting independence evidence
14.4.2 Homogeneity of variance
14.4.3 Normality
14.4.3.1 Interpreting normality evidence
14.4.4 Linearity
14.4.4.1 Overall linearity evidence
14.4.4.1.1 Interpreting overall linearity evidence
14.4.4.2 Linearity evidence by group
14.4.4.2.1 Interpreting evidence of linearity by group
14.4.5 Independence of covariate and independent variable
14.4.5.1 Interpreting evidence of independence of covariate and independent variable
14.4.6 Homogeneity of regression slopes
14.4.6.1 Interpreting evidence of homogeneity of regression slopes
14.5 Power using G*Power
14.5.1 Post hoc power for ANCOVA using G*Power
14.5.2 A priori power for ANCOVA using G*Power
14.6 Research question template and example write-up
14.7 Additional resources
Random- and Mixed-Effects Analysis of Variance Models
15.1 The one-factor random-effects model
15.1.1 Characteristics of the model
15.1.2 The ANOVA model
15.1.3 ANOVA summary table and expected mean squares
15.1.4 Assumptions and violation of assumptions
15.1.5 Multiple comparison procedures
15.2 The two-factor random-effects model
15.2.1 Characteristics of the model
15.2.2 The ANOVA model
15.2.3 ANOVA summary table and expected mean squares
15.2.4 Assumptions and violation of assumptions
15.2.5 Multiple comparison procedures
15.3 The two-factor mixed-effects model
15.3.1 Characteristics of the model
15.3.2 The ANOVA model
15.3.3 ANOVA summary table and expected mean squares
15.3.4 Assumptions and violation of assumptions
15.3.5 Multiple comparison procedures
15.4 The one-factor repeated measures design
15.4.1 Characteristics of the model
15.4.2 The layout of the data
15.4.3 The ANOVA model
15.4.4 Assumptions and violation of assumptions
15.4.5 ANOVA summary table and expected mean squares
15.4.6 Multiple comparison procedures
15.4.7 Alternative ANOVA procedures
15.4.8 An example
15.5 The two-factor split plot or mixed design
15.5.1 Characteristics of the model
15.5.2 The layout of the data
15.5.3 The ANOVA model
15.5.4 Assumptions and violation of assumptions
15.5.5 ANOVA summary table and expected mean squares
15.5.6 Multiple comparison procedures
15.5.7 An example
15.6 Computing ANOVA Models using SPSS
15.6.1 One-factor random-effects ANOVA
15.6.2 Two-factor random-effects ANOVA
15.6.3 Two-factor mixed-effects ANOVA
15.6.4 One-factor repeated measures ANOVA
15.6.5 Friedman’s Test: Nonparametric One-factor repeated measures ANOVA
15.6.6 Two-factor split-plot ANOVA
15.7 Computing ANOVA Models using R
15.7.1 The one-factor repeated measures design
15.7.2 Restructuring data for the one-factor repeated measures ANOVA model
15.7.3 Generating the one-factor repeated measures ANOVA model
15.7.4 Computing Friedman’s Test in R: Nonparametric one-factor repeated measures ANOVA
15.7.5 Computing the two-factor split-plot or mixed design in R
15.7.5.1 Reading data into R
15.7.5.2 Generating the two-factor split-plot ANOVA
15.8 Data screening for the two-factor split-plot ANOVA
15.8.1 Normality 15.8.1.1 Generating normality evidence
15.8.1.2 Interpreting normality evidence 15.8.2 Independence
15.8.2.1 Generating the scatterplot
15.8.2.2 Interpreting independence evidence
15.9 Power using G*Power
15.9.1 Post hoc power for two-factor split-plot ANOVA
15.9.2 A priori power for two-factor split-plot ANOVA
15.10 Research question template and example write-up
15.11 Additional resources
Hierarchical and Randomized Block Analysis of Variance Models
16.1 What hierarchical and randomized block ANOVA models are and how they work
16.1.1 Characteristics of the two-factor hierarchical model
16.1.1.1 The layout of the data for the two-factor hierarchical model
16.1.1.2 The two-factor hierarchical ANOVA model
16.1.1.3 ANOVA summary table and expected mean squares for the two-factor hierarchical model
16.1.1.4 Multiple comparison procedures for the two-factor hierarchical model
16.1.1.5 An example of the two-factor hierarchical model
16.1.2 Characteristics of the two-factor randomized block design for n = 1
16.1.2.1 The layout of the data for the two-factor randomized block design for n = 1
16.1.2.2 The two-factor randomized block design for n = 1 ANOVA model
16.1.2.3 ANOVA summary table and expected mean squares
16.1.2.4 Multiple comparison procedures
16.1.2.5 Methods of block formation
16.1.2.6 An example
16.1.3 Characteristics of the two-factor randomized block design for n > 1
16.1.4 Characteristics of the Friedman test
16.1.5 Comparison of various ANOVA models
16.1.6 Sample size
16.1.6.1 Hierarchical ANOVA model sample size
16.1.6.2 Randomized block ANOVA sample size
16.1.7 Power 16.1.8 Effect Size
16.1.8.1 Hierarchical ANOVA effect size
16.1.8.2 Two-factor randomized block effect size
16.1.9 Assumptions
16.1.9.1 Assumptions of hierarchical models
16.1.9.2 Assumptions of the two-factor randomized block ANOVA
16.2 Mathematical introduction snapshot
16.3 Computing hierarchical and randomized block ANOVA Models using SPSS
16.3.1 Computing the two-factor hierarchical ANOVA Using SPSS
16.3.2 Computing the two-factor fixed-effects randomized block ANOVA for n = 1 using SPSS
16.3.2.1 Interpreting the output
16.3.3 Computing the two-factor fixed-effects randomized block ANOVA for n > 1 using SPSS
16.3.4 Computing the Friedman Test Using SPSS
16.4 Computing hierarchical and randomized block analysis of variance models using R
16.4.1 Two-factor hierarchical ANOVA in R
16.4.1.1 Reading data into R
16.4.1.2 Generating the two-factor nested ANOVA model
16.4.1.3 Generating a post hoc test
16.4.2 Two-factor fixed-effects randomized block ANOVA in R
16.4.2.1 Reading data into R
16.4.2.2 Generating the two- factor fixed-effects randomized block ANOVA
16.5 Data screening
16.5.1 Examining Assumptions for the Two-Factor Hierarchical ANOVA
16.5.1.1 Normality
16.5.1.1.1 Interpreting normality evidence
16.5.1.2 Independence
16.5.1.3 Homogeneity of variance
16.5.2 Examining assumptions for the two-factor fixed-effects randomized block ANOVA for n = 1
16.5.2.1 Normality
16.5.2.1.1 Interpreting normality evidence
16.5.2.2 Independence
16.5.2.2.1 Generating the scatterplot
16.5.2.2.2 Interpreting independence evidence
16.5.2.3 Homogeneity of variance
16.6 Power using G*Power
16.7 Research question template and example write-up
16.8 Additional resources
Simple Linear Regression
17.1 What simple linear regression is and how it works
17.1.1 Characteristics
17.1.1.1 The Population Simple Linear Regression Model
17.1.1.2 The Sample Simple Linear Regression Model
17.1.1.2.1 Unstandardized regression model
17.1.1.2.2 Standardized regression model
17.1.1.2.3 Prediction errors
17.1.1.2.4 Least squares criterion
17.1.1.2.5 Proportion of predictable variation (coefficient of determination)
17.1.1.2.6 Significance tests and confidence intervals
17.1.2 Sample size
17.1.3 Power
17.1.4 Effect Size
17.1.4.1 Coefficient of determination
17.1.4.2 f2
17.1.4.3 Confidence intervals for effect size
17.1.5 Assumptions
17.1.5.1 Independence
17.1.5.2 Homoscedasticity
17.1.5.3 Normality
17.1.5.4 Linearity
17.1.5.5 Fixed X
17.1.5.6 Summary
17.2 Mathematical introduction snapshot
17.3 Computing simple linear regression using SPSS
17.4 Computing simple linear regression using R
17.4.1 Reading data into R
17.4.2 Generating the simple linear regression model
17.4.3 Generating correlation coefficients
17.4.4 Generating confidence intervals of coefficient estimates
17.5 Data screening
17.5.1 Independence
17.5.2 Homoscedasticity
17.5.3 Linearity
17.5.3.1 Hypothesis tests to examine linearity using SPSS
17.5.3.1.1 Interpreting hypothesis tests to examine linearity
17.5.4 Normality
17.5.4.1 Generating normality evidence
17.5.4.2 Interpreting normality evidence
17.5.5 Screening data for influential points
17.5.5.1 Casewise diagnostics
17.5.5.2 Cook's distance
17.5.5.3 Mahalanobis distances
17.5.5.4 DfBeta
17.6 Power using G*Power
17.6.1 Post hoc power
17.6.2 A priori power
17.7 Research question template and example write-up
17.8 Additional resources
Multiple Linear Regression
18.1 What multiple linear regression is and how it works
18.1.1 Characteristics
18.1.1.1 Partial correlation
18.1.1.2 Semipartial (part) correlation
18.1.1.3 Unstandardized regression model
18.1.1.4 Standardized regression model
18.1.1.5 Coefficient of multiple determination and multiple correlation
18.1.1.6 Significance tests
18.1.1.6.1 Test of significance of the overall regression model
18.1.1.6.2 Test of significance of bk
18.1.1.6.3 Other tests
18.1.1.7 Methods of entering predictors
18.1.1.7.1 Simultaneous regression
18.1.1.7.2 Backward elimination
18.1.1.7.3 Forward selection
18.1.1.7.4 Stepwise selection
18.1.1.7.5 All possible subsets regression
18.1.1.7.6 Hierarchical regression
18.1.1.7.7 Commentary on sequential regression procedures
18.1.1.8 Nonlinear relationships
18.1.1.9 Interactions
18.1.1.10 Categorical predictors
18.1.2 Sample size
18.1.3 Power
18.1.4 Effect size
18.1.4.1 Coefficient of multiple determination, R2
18.1.4.2 Multiple partial R2
18.1.4.3 f2
18.1.4.4 Partial f2
18.1.4.5 Additional effect size considerations
18.1.5 Assumptions
18.1.5.1 Independence
18.1.5.2 Homoscedasticity
18.1.5.3 Normality
18.1.5.4 Linearity
18.1.5.5 Fixed X
18.1.5.6 Noncollinearity
18.1.5.7 Summary of assumptions
18.2 Mathematical introduction snapshot
18.3 Computing multiple linear regression using SPSS
18.4 Computing multiple linear regression using R
18.4.1 Reading data into R
18.4.2 Generating the multiple regression model and saving values
18.4.3 Generating correlation coefficients
18.4.4 Generating confidence intervals of coefficient estimates
18.5 Data screening
18.5.1 Independence
18.5.2 Homoscedasticity
18.5.3 Linearity
18.5.4 Normality
18.5.4.1 Interpreting normality evidence
18.5.5 Screening data for influential points
18.5.5.1 Casewise diagnostics
18.5.5.2 Cook's distance
18.5.5.3 Mahalanobis distance
18.5.5.4 Centered leverage values
18.5.5.5 DfBeta
18.5.5.6 Diagnostic plots
18.5.6 Noncollinearity
18.6 Power using G*Power
18.6.1 Post Hoc power
18.6.2 A priori power
18.7 Research question template and example write-up
18.8 Additional resources
Logistic Regression
19.1 What logistic regression is and how it works
19.1.1 Characteristics
19.1.1.1 Logistic regression equation
19.1.1.2 Probability
19.1.1.3 Odds and logit (or log odds)
19.1.1.4 Estimation and model fit
19.1.1.5 Significance tests
19.1.1.5.1 Test of significance of the overall regression model
19.1.1.5.1.1 Change in log likelihood
19.1.1.5.1.2 Hosmer-Lemeshow goodness of fit test
19.1.1.5.1.3 Pseudo-variance explained
19.1.1.5.1.4 Predicted group membership
19.1.1.5.1.5 Cross-validation
19.1.1.6 Test of significance of the logistic regression coefficients
19.1.1.7 Methods of predictor entry
19.1.1.7.1 Simultaneous logistic regression
19.1.1.7.2 Stepwise logistic regression
19.1.1.7.3 Hierarchical regression
19.1.2 Sample size
19.1.3 Power
19.1.4 Effect size
19.1.5 Assumptions
19.1.5.1 Noncollinearity
19.1.5.2 Linearity
19.1.5.3 Independence of errors
19.1.5.4 Fixed X
19.1.5.5 Conditions
19.1.5.5.1 Nonzero cell counts
19.1.5.5.2 Nonseparation of data
19.1.5.5.3 Lack of influential points
19.2 Mathematical introduction snapshot
19.3 Computing logistic regression using SPSS
19.4 Computing logistic regression using R
19.4.1 Reading data into R
19.4.2 Generating the logistic regression model and saving values
19.4.3 Generating confidence intervals of coefficient estimates
19.4.4 Exponentiating coefficients
19.4.5 Producing odds ratios and their confidence intervals
19.5 Data screening
19.5.1 Noncollinearity
19.5.2 Linearity
19.5.3 Independence
19.5.4 Absence of outliers
19.5.4.1 Cook's distance
19.5.4.2 Leverage values
19.5.4.3 DfBeta
19.5.5 Assessing classification accuracy
19.5.5.1 ROC curves and AUC
19.6 Power using G*Power
19.6.1 Post hoc power
19.6.2 A priori power
19.7 Research question template and example write-up
19.8 Additional resources
Mediation and Moderation
20.1 What mediation is and how it works
20.1.1 Characteristics
20.1.1.1 Additional mediation models
20.1.2 Sample size
20.1.3 Power
20.1.4 Effect size
20.1.4.1 Partially standardized effect
20.1.4.2 Completely standardized effect
20.1.4.3 Other effect size indices for mediation models
20.1.5 Assumptions
20.2 What moderation is and how it works
20.2.1 Characteristics
20.2.1.1 Probing an interaction
20.2.1.2 Centering
20.2.2 Sample size
20.2.3 Power
20.2.4 Effect size
20.2.5 Assumptions
20.3 Computing mediation and moderation using SPSS
20.3.1 Installing the PROCESS macro
20.3.2 Computing mediation analysis using SPSS
20.3.2.1 Interpreting mediation output
20.3.3 Computing moderation analysis using SPSS
20.3.3.1 Interpreting moderation output
20.4 Computing mediation and moderation using R
20.4.1 Reading data into R
20.4.2 Generating a mediation model using R
20.4.3 Generating a moderation model using R
20.5 Additional resources
Data Files
Flashcards
Quizzes
Secondary Data Source
Offered here are a number of links for secondary data, many of which are publicly available (some of which require registration to use or are by application process to access restricted use data).
Bureau of Justice Statistics. Data is collected on a number of topics such as corrections, courts, crime type, law enforcement, and victims.
Census Bureau. The U.S. Census Bureau’s American FactFinder provides access to data about the United States, Puerto Rico, and the island areas. Data from surveys and censuses include: American Community Survey, Commodity Flow Survey, Economic Census, Population Estimates Program, and more.
https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
Centers for Disease Control and Prevention. Data and statistics are available on a wide variety of health related areas, such as alcohol use, birth defects, cancer, deaths and mortality, environmental health, healthy aging, life expectancy, physical activity, smoking and tobacco, and more. Survey data accessible includes the National Health Interview Survey and National Survey of Family Growth, among others.
https://www.cdc.gov/DataStatistics/
Central Intelligence Agency (CIA) World Fact Book. For over 260 countries in the world, information is available on history, government, people, economy, geography, transportation, military, and more.
https://www.cia.gov/library/publications/resources/the-world-factbook/
Child Care and Early Education Research Connections. Research Connections promotes high quality research in child care and early education and the use of that research in policy making.
https://www.researchconnections.org/childcare/search/studies
Common Core of Data (CCD). The CCD is the Department of Education’s primary database on public elementary and secondary education in the U.S. The CCD is a comprehensive, annual, national database of all public elementary and secondary schools and school districts.
Equity in Athletics. Provided by the Office of Postsecondary Education of the U.S. Department of Education, the data are drawn from the OPE Equity in Athletics Disclosure website database, which consists of athletics data submitted annually by all co-educational postsecondary institutions that receive Title IV funding (i.e., those that participate in federal student aid programs) and that have an intercollegiate athletics program as required by the Equity in Athletics Disclosure Act.
https://ope.ed.gov/athletics/#/
European Union statistics. A wide variety of statistics on countries in Europe is available from this site. Data topics include, for example, economic trends, trade, transportation, environment and energy, science, technology, digital society, and more.
https://ec.europa.eu/eurostat/
Geospatial and Statistical Data Center. Hosted by the University of Virginia, this site provides access to Census data, maps, and more.
http://www.worldcat.org/identities/lccn-no99-40549/
Inter-University Consortium for Political & Social Research (ICPSR). Through the University of Michigan’s ICPSR, access is provided to a large number (over 10,000) and wide variety of datasets (e.g., National Longitudinal Study of Adolescent to Adult Health, Add Health, 1994-2008; National Health and Nutrition Examination Survey, NHANES, 2007–2008; India Human Development Survey-II, 2011–2012). Most ICPSR data holdings are public use with no access restrictions.
https://www.icpsr.umich.edu/icpsrweb/ICPSR/
Integrated Postsecondary Education Data System (IPEDS). IPEDS provides information on U.S. colleges, universities, and technical and vocational institutions.
National Center for Education Statistics (NCES). NCES provides access to a wide variety of data related to education such as the Early Childhood Longitudinal Studies (ECLS) program, IPEDS, Schools and Staffing, and more.
National Institute of Health Supported Data Repositories. “This table lists NIH-supported data repositories that make data accessible for reuse. Most accept submissions of appropriate data from NIH-funded investigators (and others), but some restrict data submission to only those researchers involved in a specific research network. Also included are resources that serve as a portal for information about biomedical data and information sharing systems.”
https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
National Science Foundation Scientists and Engineers Statistical Data System (SESTAT). These public use data files are related to the science and engineering workforce and STEM graduates including the National Survey of College Graduates, Recent College Graduates, and Survey of Doctorate Recipients.
https://sestat.nsf.gov/datadownload/
Office of Population Research (OPR). The OPR’s data archive includes access to the Mexican Migration Project, Fragile Families and Child Wellbeing Study, New Immigrant Survey, The Game of Contacts (a behavioral surveillance study of heavy drug users in Curitiba, Brazil), National Longitudinal Survey of Freshmen, and much more.
https://opr.princeton.edu/archive/
Organization for Economic Cooperation and Development. Country level data is available on multiple indicators such as agriculture, development, population, education, GDP, tax, income equality, debt, unemployment, and more.
Pew Research Center. The Pew Center collects data on topics related to U.S. politics, media and news, social and demographic trends, religion and public life, Internet and technology, science, Hispanic trends, and more.
https://www.people-press.org/datasets/
Program for International Student Assessment (PISA). "The Program for International Student Assessment (PISA) is an international assessment that measures 15-year-old students' reading, mathematics, and science literacy every three years. First conducted in 2000, the major domain of study rotates between reading, mathematics, and science in each cycle. PISA also includes measures of general or cross-curricular competencies, such as collaborative problem solving. By design, PISA emphasizes functional skills that students have acquired as they near the end of compulsory schooling. PISA is coordinated by the Organization for Economic Cooperation and Development (OECD), an intergovernmental organization of industrialized countries, and is conducted in the United States by NCES.”
https://nces.ed.gov/surveys/pisa/datafiles.asp
State Profiles. Search for statewide information in elementary and secondary education, postsecondary education, and selected demographics for all states in the U.S. based on data collected and maintained by the National Center for Education Statistics. Data is also available on U.S. average. This resource also has the ability to graph the results.
https://nces.ed.gov/pubs2000/stateprofiles/state_profiles/index.asp
Statistics Canada. This is the national statistical office for Canada. A variety of public use microdata is available such as the Canadian Community Health Survey, Employment Insurance Coverage Survey, Travel Survey of Residents of Canada, National Graduates Survey, Canadian Internet Use Survey, and more.
Study of Instructional Improvement. “The Study of Instructional Improvement (SII) was a large scale quasi-experiment that sought to understand the impact of three widely-disseminated comprehensive school reform (CSR) programs on instruction and student achievement in high-poverty elementary schools. Over a four-year period, researchers at the University of Michigan followed schools working with one of three CSR programs—Accelerated Schools Project, America's Choice, and Success for All. The study also followed a set of closely matched comparison schools. The purpose of the study was to track implementation of the CSR programs in elementary schools and to investigate the impact of participation in these programs on teachers, students, and schools.” This website provides readers with an online report that describes the SII research program and provides a narrative account highlighting selected findings. The website also allows readers to gain familiarity with and/or to download SII data (note that SPSS data files are made available).
Survey of Adult Skills (PIAAC). The Survey of Adult Skills is an international survey conducted in multiple countries as part of the Programme for the International Assessment of Adult Competencies (PIAAC). It measures the key cognitive and workplace skills needed for individuals to participate in society and for economies to prosper.
http://www.oecd.org/skills/piaac/publicdataandanalysis/
United Nations. Country-level data is available on population, education, labor market, international merchandise trade, energy, crime, nutrition and health, science and technology, finance, environment, tourism, and more.
World Bank. Through the World Bank, data is available on economy, health, education, and much more on countries throughout the world.
https://data.worldbank.org/
World Values Survey. “The World Values Survey (www.worldvaluessurvey.org) is a global network of social scientists studying changing values and their impact on social and political life, led by an international team of scholars, with the WVS association and secretariat headquartered in Stockholm, Sweden. The WVS seeks to help scientists and policy makers understand changes in the beliefs, values and motivations of people throughout the world. Thousands of political scientists, sociologists, social psychologists, anthropologists and economists have used these data to analyze such topics as economic development, democratization, religion, gender equality, social capital, and subjective well-being. These data have also been widely used by government officials, journalists and students, and groups at the World Bank have analyzed the linkages between cultural factors and economic development.”
http://www.worldvaluessurvey.org/wvs.jsp
OTHER COLLECTIONS
Data Repositories. A list of data repositories where datasets for articles published in Scientific Data may be hosted.
https://www.nature.com/sdata/policies/repositories
Economics-related. Many of the data published in articles by Dr. Joshua Angrist, MIT, can be accessed here.
https://economics.mit.edu/faculty/angrist/data1/data
Journal of Statistics Education data archive. Data archived for publications from the journal. Please note that reading the article from which the data were published will be important to understand from where the data come.
http://jse.amstat.org/jse_data_archive.htm
Politically-related. A collection of links to politically related datasets, composed by Professor Dale Story at the University of Texas-Arlington.
http://www.uta.edu/faculty/story/DataSets.htm
Social science related. Maintained by the University of Amsterdam, the site provides links to a number of worldwide and country-specific entities for data and statistics related to social science topics.
http://www.sociosite.net/databases.php
OTHER RESOURCES ON FINDING AND ACCESSING SECONDARY DATA
If your institution has access to lynda.com, you may want to access the video on “Learning Public Data Sets.” This video shows how to find free, public sources of data on a variety of business, education, and health issues and download the data for your own analysis. Author Curt Frye introduces resources from the US government (from Census to trademark data), international agencies such as the World Bank and United Nations, search engines, web services, and even language resources like the Ngram Viewer for Google Books. He also shows how to import the data into an Excel spreadsheet for visualization and analysis. Topics addressed in the video include:
- Working with US census data
- Using data from the Securities and Exchange Commission
- Accessing data from other US agencies
- Finding international sources of data
- Gathering data from web-based search engines and data portals
- Visualizing and analyzing public data sets in Excel