Computing and understading averages
Average
is the one value that best represents an entire group of scores. It is also called measures of central tendency
Mean –
is the most common type of average that is computed

Example, a collection is [2150, 1534, 3564], then the mean is (2150 + 1534 + 3564) / 3 = 2416
Median – is also an average, but of a very different kind
is defined as a midpoint of a set of scores
To compute the median, following:
- List values in order, from either highest to lowest or loowest to highest
- Find the middle-most score, that’s the median
Use median can avoid outlier. It’s less affected because it’s depended on the position of values, not magnitude
Mode – is the most general and least precise of central tendency, but also very important part in understanding the chracteristics of sample of scores
is the value that occurs most frequently
To compute the mode, following steps
- List all the values in distribution but list eaach value only once (unique)
- Tally the number of times that each value occurs
- The value that occurs most often is the mode
No matter how fancy-schmancy your statistical techniques are, you will almost always start by simply describing what’s there—hence the importance of understanding the simple notion of central tendency.
Scale of measurement
1. Nominal Level
is defined by characteristics of an outcome that fit into one and only one class or category
Example: Gender (Male/Female), Ethnicity (Causaian/African America..)
2. Ordinal Level
is ordered
Example: Rank 1 2 3
3. Interval Level
is a test or an assessment tool is based on some underlying continuum such as how mucj more a higher performance is than a lesser one. Differences are meaningful
Example: 30°C is hotter than 20°C, and 20°C is hotter than 10°C
4. Ratio Level
A zero trust point: A value of zero represents the complete absense of the characteristics being measured
Example: Height of 0 cm means no height. Income 0$ means no earnings
Understanding Variability
also called spread or dispersion, can be thought of as a measure of how different scores are from one another
Instead of comparing each score to every other score in a distribution,the one score that could be used as a comparison is...the average.
Variability becomes a measure of how much each score in a group of scores differs from the average, usually the mean
Together, these two (average and variability) can be used to describe the characteristics of a distribution and show how distributions differ from one another.
Three measures of variability are commonly used:
1. The Range
is the simplest measure of variability. It is the distance of the biggest score from the smallest score.

For example, with a collection of 98, 86, 77, 56, 48, the range is 98 – 48 = 50
It shows how much spread of the lowest to the highest point in a distribution
2. Standard Deviation
represents the average amount of variability in a set of score. In practical terms, it’s the average distance of each score from the mean

It says
- Low Standard Deviation: Indicates that the data points tend to be very close to the mean. The data is relatively consistent and tightly clustered
- High Standard Deviation: Indicates that data points are spread out over a wider range from the mean. The data is more variable and less consistent
3. The Variance
measures how spread out or dispersed a set of data points is around its mean (average). It quantifies the average squared deviation of each data point from the mean
It’s simply the standard deviation squared

It says
- Magnitude of spread: A larger variance indicates greater spread in the data.
- Sensitivity to outliers: because of squaring the deviation, it’s more sensitive to outlier than deviation
- Useful in calculation but less so for direct intepretation
Standard deviation is like the “user-friendly” measure of spread. It’s in the same units as your data, making it easier to understand what a “typical” deviation from the mean looks like.
Variance is more of a “behind-the-scenes” measure. It’s essential for many statistical calculations but less intuitive to interpret on its own because of the squared units.
Measures of variability help us even more fully understand what a distribution of data points looks like. Along with a measure of central tendency, we can use these values to distinguish distributions from one another and effectively describe what a collection and what those individual scores represent
Creating Graphs
The Classiest of Intervals
is a range of numbers and the frequency distribution of each range
Example

Histogram
a visual representation of the frequency distribution where the frequencies are represented by bars


Frequency Polygon
is a continuous line that represents the frequencies of scores within a class interval

Cumulating Frequencies


Bar Charts
is way to compare the frequencies of different categories with one another

Line Charts
show a trend in the data at equal intervals

Pie Charts
show the proportion or percentage of categories

Computing correlation coefficients
is a numerical index that reflects the relationship or association between two variables. The value is between -1.0 and +1.00
Type of correlations
- Direct correlation or positive correlation
- Indirect correlation or negative correlation

Computing a simple correlation coefficient

Example




The Scatterplot
is a very simple way to visually represent a correlation




The correlation matrix: Bunches of correlations
see correlations among all multiple variables

Example, the correlation between income level and education is 0,574. The correlation between income level and how sure people are that they will vote in the next election is 0,291 (meaning that higher the level of income, the less confident people were that they would vote)
The higher value of the correlation (regardless of its sign) is the stronger of the relationship
Understanding meaning of correlation coefficient

Coefficient of determination
is the percentage of variance in one variable that is accounted for by the variance in the other variable
For example, if the correlation between GPA and number of hours of study is .70, then the coefficient of determination is .70^2 = .49. It means 49% of the variance in GPA can be explained by the variance in studying time.
However, if 49% of the variance can be explained, it means 51% cannot, it is coefficient of a lienation (also called cofficient of nondetermination)

The idea of showing how things are related to one another and what they have in common is a very powerful one, and the correlation coefficient is a very useful descriptive statistics. Keep in mind that correlations express a relationship that is associative but not necessarily causal, and you’ll be able to understand how this statistic gives us valuable information about relationships between variables and how variables change or remain the same in concert with others.
Reliability and Validity
Reliability
is simply whether a test, or whatever as a measurement tools

Different types of Reliability

- Reliability coefficients to be positive, not to be negative
- Reliability coefficients are as large as possible (between .00 and +1.00) – higher is more stronger level of argreement between two sets of observations
Validity
is the propety of an assessment tools that indicates that the tools does what it says it does
Different types of Validity

Leave a Reply