Statistical Concepts – A First Course
About
Statistical Concepts—A First Course presents the first 10 chapters from An Introduction to Statistical Concepts, Fourth Edition. Designed for first and lower-level statistics courses, this book communicates a conceptual, intuitive understanding of statistics that does not assume extensive or recent training in mathematics and only requires a rudimentary knowledge of algebra.
Covering the most basic statistical concepts, this book is designed to help readers really understand those statistical concepts, in what situations they can be applied, and how to apply them to data. Specifically, the text covers basic descriptive statistics, including ways of representing data graphically, statistical measures that describe a set of data, the normal distribution and other types of standard scores, and an introduction to probability and sampling. The remainder of the text covers various inferential tests, including those involving tests of means (e.g., t tests), proportions, variances, and correlations.
Providing accessible and comprehensive coverage of topics suitable for an undergraduate or graduate course in statistics, this book is an invaluable resource for students undertaking an introductory course in statistics in any number of social science and behavioral science disciplines.
Chapter Outline
Introduction
1.1 What is the value of statistics?
1.2 Brief introduction to the history of statistics
1.3 General statistical definitions
1.3.1 Statistical notation
1.4 Types of variables
1.5 Scales of measurement
1.5.1 Nominal measurement scale
1.5.2 Ordinal measurement scale
1.5.3 Interval measurement scale
1.5.4 Ratio measurement scale
1.6 Additional Resources
Data Representation
2.1 Tabular display of distributions
2.1.1 Frequency distributions
2.1.2 Cumulative frequency distributions
2.1.3 Relative frequency distributions
2.1.4 Cumulative relative frequency distributions
2.2 Graphical display of distributions
2.2.1 Bar graph 2.2.2 Histogram
2.2.3 Frequency polygon
2.2.4 Cumulative frequency polygon
2.2.5 Shapes of frequency distributions
2.2.6 Stem-and-leaf display
2.3 Percentiles
2.3.1 Percentiles
2.3.2 Quartiles
2.3.3 Percentile Ranks
2.3.4 Box-and-whisker plot
2.4 Recommendations based on measurement scale
2.5 Computing tables, graphs, and more using SPSS
2.5.1 Introduction to SPSS
2.5.2 Frequencies
2.5.3 Graphs
2.5.3.1 Histograms
2.5.3.2 Boxplots
2.5.3.3 Bar graphs
2.5.3.4 Frequency polygons
2.6 Computing tables, graphs, and more using R
2.6.1 Introduction to R
2.6.1.1 R Basics
2.6.1.2 Downloading R and RStudio
2.6.1.3 Packages
2.6.1.4 Working in R
2.6.2 Frequencies
2.6.3 Graphs
2.6.3.1 Histograms
2.6.3.2 Boxplots
2.6.3.3 Bar graphs
2.6.3.4 Frequency polygons
2.7 Research question template and example write-up
2.8 Additional resources
Univariate Population Parameters and Sample Statistics
3.1 Summation notation
3.2 Measures of central tendency
3.2.1 Mode
3.2.2 Median
3.2.3 Mean
3.2.4 Summary of measures of central tendency
3.3 Measures of dispersion
3.3.1 Range (exclusive and inclusive)
3.3.2 H spread
3.3.3 Deviational measures
3.3.3.1 Deviation scores
3.3.3.2 Population variance and standard deviation
3.3.3.3 Sample variance and standard deviation
3.3.4 Summary of measures of dispersion
3.3.5 Recommendations Based on Measurement Scale
3.4 Computing sample statistics using SPSS
3.4.1 Explore
3.4.2 Descriptives
3.4.3 Frequencies
3.5 Computing sample statistics using R
3.5.1 Reading data into R
3.5.2 Generating sample statistics
3.6 Research question template and example write-up
3.7 Additional resources
The Normal Distribution and Standard Scores
4.1 The normal distribution and how it works
4.1.1 History
4.1.2 Characteristics
4.1.2.1 Standard curve
4.1.2.2 Family of curves
4.1.2.3 Unit normal distribution
4.1.2.4 Area
4.1.2.5 Transformation to unit normal distribution
4.1.2.6 Constant relationship with the standard deviation
4.1.2.7 Points of inflection and asymptotic curve
4.1.2.8 Examples
4.2 Standard scores and how they work
4.2.1 z scores
4.2.2 Other types of standard scores
4.3 Skewness and kurtosis statistics
4.3.1 Symmetry
4.3.2 Skewness
4.3.3 Kurtosis
4.4 Computing graphs and standard scores using SPSS
4.4.1 Explore
4.4.2 Descriptives
4.4.3 Frequencies
4.4.4 Graphs
4.4.5 Transform
4.5 Computing graphs and standard scores using R
4.5.1 Reading data into R
4.5.2 Generating skewness and kurtosis
4.5.3 Generating a histogram
4.5.4 Creating a standardized variable
4.6 Research question template and example write-up
4.7 Additional resources
Introduction to Probability and Sample Statistics
5.1 Brief introduction to probability
5.1.1 Importance of probability
5.1.2 Definition of probability
5.1.3 Intuition vs. probability
5.2 Sampling and estimation
5.2.1 Simple random sampling
5.2.1.1 Simple random sampling with replacement
5.2.1.2 Simple random sampling without replacement
5.2.1.3 Other types of sampling
5.2.2 Estimation of population parameters and sampling distributions
5.2.2.1 Sampling distribution of the mean
5.2.2.2 Variance error of the mean
5.2.2.3 Standard error of the mean
5.2.2.4 Confidence intervals
5.2.2.5 Central limit theorem
5.3 Additional Resources
Introduction to Hypothesis Testing: Inferences About a Single Mean
6.1 Inferences about a single mean and how it works
6.1.1 Characteristics
6.1.1.1 Types of hypotheses
6.1.1.2Types of decision errors
6.1.1.2.1 Example decision-making situation
6.1.1.2.2 Decision-making table
6.1.1.2.3 A little history
6.1.1.3 Level of significance (α)
6.1.1.5 Overview of steps in the decision-making process
6.1.1.5 Inferences about when is known
6.1.1.5.1 The z test 6.1.1.5.2 An example
6.1.1.5.3 Constructing confidence intervals around the mean
6.1.1.6 Inferences about μ when σ is unknown
6.1.1.6.1 A new test statistic t
6.1.1.6.2 The t distribution
6.1.1.6.3 The t test
6.1.1.6.4 An example
6.1.2 Sample size
6.1.3 Power
6.1.3.1 The full decision-making context
6.1.3.2 Power determinants
6.1.4 Effect Size
6.1.4.1 Cohen’s delta
6.1.4.2 Confidence intervals for Cohen’s delta
6.1.5 Assumptions
6.1.5.1 Independence
6.1.5.2 Normality
6.2 Computing inferences about a single mean using SPSS
6.3 Computing inferences about a single mean using R
6.3.1 Reading Data into R
6.3.2 Generating the one-sample t test
6.4 Data screening
6.4.1 Generating normality evidence
6.4.2 Interpreting normality evidence
6.5 Power using G*Power
6.5.1 A Priori Power
6.5.2 Post Hoc Power
6.6 Research question template and example write-up
6.7 Additional resources
Inferences About the Difference Between Two Means
7.1 Inferences about two independent means and how it works
7.1.1 Independent versus dependent samples
7.1.2 Hypotheses
7.1.3 Characteristics of tests of difference between two independent means
7.1.3.1 The independent t test
7.1.3.1.1 Confidence interval
7.1.3.1.2 Example of the independent t test
7.1.3.2 The Welch t' test
7.1.3.3 Recommendations
7.1.4 Sample size of the independent t test
7.1.5 Power of the independent t test
7.1.6 Effect size of the independent t test
7.1.6.1 Standardized mean difference
7.1.6.2 Strength of association
7.1.6.3 An example
7.1.6.4 Confidence intervals for Cohen’s delta
7.1.6.5 Recommendations for effect size of the independent t test
7.1.7 Assumptions of the independent t test
7.1.7.1 Normality
7.1.7.2 Independence
7.1.7.3 Homogeneity of variance
7.1.7.4 Conditions of the independent t test
7.2 Inferences about two dependent means and how they work
7.2.1 Characteristics of the dependent t test
7.2.1.1 Confidence interval for the dependent t test
7.2.1.2 Example of the dependent t test
7.2.1.3 Recommendations
7.2.1 Sample size of the dependent t test
7.2.2 Sample size of the dependent t test
7.2.3 Power of the dependent t test
7.2.4 Effect size of the dependent t test
7.2.4.1 Confidence intervals for Cohen’s delta
7.2.5 Assumptions of the dependent t test
7.2.5.1 Normality
7.2.5.2 Independence
7.2.5.3 Homogeneity of variance
7.2.5.4 Conditions of the dependent t test
7.3 Computing inferences about two independent means using SPSS
7.3.1 Interpreting the output for inferences about two independent means
7.4 Computing inferences about two dependent means using SPSS
7.4.1 Interpreting the output for inferences about two dependent means
7.5 Computing inferences about two independent means using R
7.5.1 Reading data into R
7.5.2 Generating the independent t and Welch t′ tests
7.6 Computing inferences about two dependent means using R
7.6.1 Reading data into R
7.6.2 Generating the dependent t test
7.7 Data screening
7.7.1 Data screening for the independent t test
7.7.1.1 Normality for the independent t test
7.7.1.1.1 Interpreting normality evidence
7.7.1.2 Homogeneity of variance for the independent t test
7.7.2 Data screening for the dependent t test
7.7.2.1 Normality for the dependent t test
7.7.2.1.1 Interpreting normality evidence for the dependent t test
7.7.2.2 Homogeneity of variance for the dependent t test
7.8 G*Power 7.8.1 Post hoc power for the independent t test using G*Power
7.8.2 Post hoc power for the dependent t test using G*Power
7.9 Research question template and example write-up
7.9.1 Research question template and example write-up for the independent t test
7.9.2 Research question template and example write-up for the dependent t test 7.10 Additional resources
Inferences About Proportions
8.1 What inferences about proportions involving the normal distribution are and how they work
8.1.1 Characteristics
8.1.1.1 Inferences about a single proportion
8.1.1.1.1 An example
8.1.1.2 Inferences about two independent proportions
8.1.1.2.1 An example
8.1.1.3 Inferences about two dependent proportions
8.1.1.3.1 An Example
8.1.2 Power
8.1.3 Effect size
8.1.4 Assumptions
8.2 What inferences about proportions involving the chi-square distribution are and how they work
8.2.1 Characteristics
8.2.1.1 The chi-square goodness-of-fit test
8.2.1.2 The chi-square test of association
8.2.1.2.1 An example 8.2.2 Power
8.2.3 Effect Size
8.2.3.1 Chi-square goodness-of-fit effect size
8.2.3.2 Chi-square test of association effect size
8.2.4 Assumptions
8.2.4.1 Chi-square goodness-of-fit assumptions
8.2.4.1 Chi-square test of association assumptions
8.3 Computing inferences about proportions involving the chi-square distribution using SPSS
8.3.1 The chi-square goodness-of-fit test
8.3.2 The chi-square test of association
8.4 Computing inferences about proportions involving the chi-square distribution using R
8.4.1 The chi-square goodness-of-fit test
8.4.1.1 Reading data into R
8.4.1.2 Generating the chi-square goodness-of-fit test
8.4.2 The chi-square test of association
8.4.2.1 Reading data into R
8.4.2.2 Generating the chi-square test of association
8.5 Data screening
8.6 Power using G*Power
8.6.1 Post hoc power for the chi-square test of association
8.7 Recommendations
8.8 Research question template and example write-up
8.8.1 Chi-square goodness-of-fit test 8
.8.2 Chi-square test of association
8.9 Additional resources
Inferences about Variances
9.1 Inferences about variances and how they work
9.1.1 Characteristics of the F distribution
9.1.1.1 Inferences about a single variance
9.1.1.1.1 An example
9.1.1.2 Inferences about two dependent variances
9.1.1.2.1 An Example
9.1.1.3 Inferences about two or more independent variances (homogeneity of variance tests)
9.1.1.3.1 Traditional tests
9.1.1.3.2 The Brown-Forsythe procedure
9.1.1.3.3 The O’Brien procedure
9.2 Assumptions
9.2.1 Assumptions for inferences about a single variance
9.2.2 Assumptions for inferences about two dependent variances
9.3 Sample Size, Power, and Effect Size
9.4 Computing inferences about variances using SPSS
9.5 Computing inferences about variances using R
9.5.1 Reading data into R for the test of inference about a single variance
9.5.2 Computing test of inference about a single variance
9.5.3 Computing test of inference about two dependent variances
9.5.4 Generating the test of inference about two dependent variances
9.6 Research question template and example write-up
9.7 Additional resources
Bivariate Measures of Association
10.1 What bivariate measures of association are and how they work
10.1.1 Characteristics
10.1.1.1 Scatterplot
10.1.1.2 Covariance
10.1.1.3 Pearson product-moment correlation coefficient
10.1.1.4 Inferences about the Pearson product-moment correlation coefficient
10.1.1.4.1 Inferences for a single sample
10.1.1.4.2 Inferences for two independent samples
10.1.1.5 Issues regarding correlations
10.1.1.5.1 Correlation and causality
10.1.1.5.2 Restriction of range
10.1.1.5.3 Confidence intervals
10.1.1.6 Other measures of association
10.1.1.6.1 Spearman’s rho
10.1.1.6.2 Kendall’s tau
10.1.1.6.3 Phi
10.1.1.6.4 Cramer’s phi
10.1.1.6.5 Other correlations
10.1.2 Power
10.1.3 Effect Size
10.1.3.1 Effect size for Pearson’s correlation coefficient
10.1.3.2 Effect size for two independent samples
10.1.3.3 Effect size for other correlations
10.1.4 Assumptions
10.2 Computing bivariate measures of association using SPSS
10.2.1 Bivariate correlations
10.2.1.1 Interpreting the output
10.2.1.2 Generating confidence intervals for the effect size (Pearson correlation coefficient)
10.2.2 Using crosstabs to compute correlations
10.2.2.1 Interpreting the output
10.2.2.2 Generating confidence intervals for the effect size (phi and Cramer’s phi)
10.3 Computing bivariate measures of association using R
10.3.1 Reading data into R
10.3.2 Generating correlation coefficients
10.4 Data screening
10.4.1 Scatterplots to examine linearity using SPSS
10.4.2 Hypothesis tests to examine linearity using SPSS
10.4.2.1 Interpreting hypothesis tests to examine linearity
10.4.3 Scatterplots to examine linearity using R
10.4.3.1 Interpreting linearity evidence
10.5 Power using G*Power
10.5.1 Post hoc power for the Pearson bivariate correlation using G*Power
10.6 Research question template and example write-up
10.7 Additional resources
Data Files
Flashcards
Quizzes
Secondary Data Source
Offered here are a number of links for secondary data, many of which are publicly available (some of which require registration to use or are by application process to access restricted use data).
Bureau of Justice Statistics. Data is collected on a number of topics such as corrections, courts, crime type, law enforcement, and victims.
Census Bureau. The U.S. Census Bureau’s American FactFinder provides access to data about the United States, Puerto Rico, and the island areas. Data from surveys and censuses include: American Community Survey, Commodity Flow Survey, Economic Census, Population Estimates Program, and more.
https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
Centers for Disease Control and Prevention. Data and statistics are available on a wide variety of health related areas, such as alcohol use, birth defects, cancer, deaths and mortality, environmental health, healthy aging, life expectancy, physical activity, smoking and tobacco, and more. Survey data accessible includes the National Health Interview Survey and National Survey of Family Growth, among others.
https://www.cdc.gov/DataStatistics/
Central Intelligence Agency (CIA) World Fact Book. For over 260 countries in the world, information is available on history, government, people, economy, geography, transportation, military, and more.
https://www.cia.gov/library/publications/resources/the-world-factbook/
Child Care and Early Education Research Connections. Research Connections promotes high quality research in child care and early education and the use of that research in policy making.
https://www.researchconnections.org/childcare/search/studies
Common Core of Data (CCD). The CCD is the Department of Education’s primary database on public elementary and secondary education in the U.S. The CCD is a comprehensive, annual, national database of all public elementary and secondary schools and school districts.
Equity in Athletics. Provided by the Office of Postsecondary Education of the U.S. Department of Education, the data are drawn from the OPE Equity in Athletics Disclosure website database, which consists of athletics data submitted annually by all co-educational postsecondary institutions that receive Title IV funding (i.e., those that participate in federal student aid programs) and that have an intercollegiate athletics program as required by the Equity in Athletics Disclosure Act.
https://ope.ed.gov/athletics/#/
European Union statistics. A wide variety of statistics on countries in Europe is available from this site. Data topics include, for example, economic trends, trade, transportation, environment and energy, science, technology, digital society, and more.
https://ec.europa.eu/eurostat/
Geospatial and Statistical Data Center. Hosted by the University of Virginia, this site provides access to Census data, maps, and more.
http://www.worldcat.org/identities/lccn-no99-40549/
Inter-University Consortium for Political & Social Research (ICPSR). Through the University of Michigan’s ICPSR, access is provided to a large number (over 10,000) and wide variety of datasets (e.g., National Longitudinal Study of Adolescent to Adult Health, Add Health, 1994-2008; National Health and Nutrition Examination Survey, NHANES, 2007–2008; India Human Development Survey-II, 2011–2012). Most ICPSR data holdings are public use with no access restrictions.
https://www.icpsr.umich.edu/icpsrweb/ICPSR/
Integrated Postsecondary Education Data System (IPEDS). IPEDS provides information on U.S. colleges, universities, and technical and vocational institutions.
National Center for Education Statistics (NCES). NCES provides access to a wide variety of data related to education such as the Early Childhood Longitudinal Studies (ECLS) program, IPEDS, Schools and Staffing, and more.
National Institute of Health Supported Data Repositories. “This table lists NIH-supported data repositories that make data accessible for reuse. Most accept submissions of appropriate data from NIH-funded investigators (and others), but some restrict data submission to only those researchers involved in a specific research network. Also included are resources that serve as a portal for information about biomedical data and information sharing systems.”
https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
National Science Foundation Scientists and Engineers Statistical Data System (SESTAT). These public use data files are related to the science and engineering workforce and STEM graduates including the National Survey of College Graduates, Recent College Graduates, and Survey of Doctorate Recipients.
https://sestat.nsf.gov/datadownload/
Office of Population Research (OPR). The OPR’s data archive includes access to the Mexican Migration Project, Fragile Families and Child Wellbeing Study, New Immigrant Survey, The Game of Contacts (a behavioral surveillance study of heavy drug users in Curitiba, Brazil), National Longitudinal Survey of Freshmen, and much more.
https://opr.princeton.edu/archive/
Organization for Economic Cooperation and Development. Country level data is available on multiple indicators such as agriculture, development, population, education, GDP, tax, income equality, debt, unemployment, and more.
Pew Research Center. The Pew Center collects data on topics related to U.S. politics, media and news, social and demographic trends, religion and public life, Internet and technology, science, Hispanic trends, and more.
https://www.people-press.org/datasets/
Program for International Student Assessment (PISA). "The Program for International Student Assessment (PISA) is an international assessment that measures 15-year-old students' reading, mathematics, and science literacy every three years. First conducted in 2000, the major domain of study rotates between reading, mathematics, and science in each cycle. PISA also includes measures of general or cross-curricular competencies, such as collaborative problem solving. By design, PISA emphasizes functional skills that students have acquired as they near the end of compulsory schooling. PISA is coordinated by the Organization for Economic Cooperation and Development (OECD), an intergovernmental organization of industrialized countries, and is conducted in the United States by NCES.”
https://nces.ed.gov/surveys/pisa/datafiles.asp
State Profiles. Search for statewide information in elementary and secondary education, postsecondary education, and selected demographics for all states in the U.S. based on data collected and maintained by the National Center for Education Statistics. Data is also available on U.S. average. This resource also has the ability to graph the results.
https://nces.ed.gov/pubs2000/stateprofiles/state_profiles/index.asp
Statistics Canada. This is the national statistical office for Canada. A variety of public use microdata is available such as the Canadian Community Health Survey, Employment Insurance Coverage Survey, Travel Survey of Residents of Canada, National Graduates Survey, Canadian Internet Use Survey, and more.
Study of Instructional Improvement. “The Study of Instructional Improvement (SII) was a large scale quasi-experiment that sought to understand the impact of three widely-disseminated comprehensive school reform (CSR) programs on instruction and student achievement in high-poverty elementary schools. Over a four-year period, researchers at the University of Michigan followed schools working with one of three CSR programs—Accelerated Schools Project, America's Choice, and Success for All. The study also followed a set of closely matched comparison schools. The purpose of the study was to track implementation of the CSR programs in elementary schools and to investigate the impact of participation in these programs on teachers, students, and schools.” This website provides readers with an online report that describes the SII research program and provides a narrative account highlighting selected findings. The website also allows readers to gain familiarity with and/or to download SII data (note that SPSS data files are made available).
Survey of Adult Skills (PIAAC). The Survey of Adult Skills is an international survey conducted in multiple countries as part of the Programme for the International Assessment of Adult Competencies (PIAAC). It measures the key cognitive and workplace skills needed for individuals to participate in society and for economies to prosper.
http://www.oecd.org/skills/piaac/publicdataandanalysis/
United Nations. Country-level data is available on population, education, labor market, international merchandise trade, energy, crime, nutrition and health, science and technology, finance, environment, tourism, and more.
World Bank. Through the World Bank, data is available on economy, health, education, and much more on countries throughout the world.
https://data.worldbank.org/
World Values Survey. “The World Values Survey (www.worldvaluessurvey.org) is a global network of social scientists studying changing values and their impact on social and political life, led by an international team of scholars, with the WVS association and secretariat headquartered in Stockholm, Sweden. The WVS seeks to help scientists and policy makers understand changes in the beliefs, values and motivations of people throughout the world. Thousands of political scientists, sociologists, social psychologists, anthropologists and economists have used these data to analyze such topics as economic development, democratization, religion, gender equality, social capital, and subjective well-being. These data have also been widely used by government officials, journalists and students, and groups at the World Bank have analyzed the linkages between cultural factors and economic development.”
http://www.worldvaluessurvey.org/wvs.jsp
OTHER COLLECTIONS
Data Repositories. A list of data repositories where datasets for articles published in Scientific Data may be hosted.
https://www.nature.com/sdata/policies/repositories
Economics-related. Many of the data published in articles by Dr. Joshua Angrist, MIT, can be accessed here.
https://economics.mit.edu/faculty/angrist/data1/data
Journal of Statistics Education data archive. Data archived for publications from the journal. Please note that reading the article from which the data were published will be important to understand from where the data come.
http://jse.amstat.org/jse_data_archive.htm
Politically-related. A collection of links to politically related datasets, composed by Professor Dale Story at the University of Texas-Arlington.
http://www.uta.edu/faculty/story/DataSets.htm
Social science related. Maintained by the University of Amsterdam, the site provides links to a number of worldwide and country-specific entities for data and statistics related to social science topics.
http://www.sociosite.net/databases.php
OTHER RESOURCES ON FINDING AND ACCESSING SECONDARY DATA
If your institution has access to lynda.com, you may want to access the video on “Learning Public Data Sets.” This video shows how to find free, public sources of data on a variety of business, education, and health issues and download the data for your own analysis. Author Curt Frye introduces resources from the US government (from Census to trademark data), international agencies such as the World Bank and United Nations, search engines, web services, and even language resources like the Ngram Viewer for Google Books. He also shows how to import the data into an Excel spreadsheet for visualization and analysis. Topics addressed in the video include:
- Working with US census data
- Using data from the Securities and Exchange Commission
- Accessing data from other US agencies
- Finding international sources of data
- Gathering data from web-based search engines and data portals
- Visualizing and analyzing public data sets in Excel