2. ΣIGMA and Descriptive Statistics

Computing and understading averages

Average

is the one value that best represents an entire group of scores. It is also called measures of central tendency

Mean –

is the most common type of average that is computed

Example, a collection is [2150, 1534, 3564], then the mean is (2150 + 1534 + 3564) / 3 = 2416

Median – is also an average, but of a very different kind

is defined as a midpoint of a set of scores

To compute the median, following:

List values in order, from either highest to lowest or loowest to highest
Find the middle-most score, that’s the median

Use median can avoid outlier. It’s less affected because it’s depended on the position of values, not magnitude

Mode – is the most general and least precise of central tendency, but also very important part in understanding the chracteristics of sample of scores

is the value that occurs most frequently

To compute the mode, following steps

List all the values in distribution but list eaach value only once (unique)
Tally the number of times that each value occurs
The value that occurs most often is the mode

No matter how fancy-schmancy your statistical techniques are, you will almost always start by simply describing what’s there—hence the importance of understanding the simple notion of central tendency.

Scale of measurement

1. Nominal Level

is defined by characteristics of an outcome that fit into one and only one class or category

Example: Gender (Male/Female), Ethnicity (Causaian/African America..)

2. Ordinal Level

is ordered

Example: Rank 1 2 3

3. Interval Level

is a test or an assessment tool is based on some underlying continuum such as how mucj more a higher performance is than a lesser one. Differences are meaningful

Example: 30°C is hotter than 20°C, and 20°C is hotter than 10°C

4. Ratio Level

A zero trust point: A value of zero represents the complete absense of the characteristics being measured

Example: Height of 0 cm means no height. Income 0$ means no earnings

Understanding Variability

also called spread or dispersion, can be thought of as a measure of how different scores are from one another

Instead of comparing each score to every other score in a distribution,the one score that could be used as a comparison is...the average.

Variability becomes a measure of how much each score in a group of scores differs from the average, usually the mean

Together, these two (average and variability) can be used to describe the characteristics of a distribution and show how distributions differ from one another.

Three measures of variability are commonly used:

1. The Range

is the simplest measure of variability. It is the distance of the biggest score from the smallest score.

For example, with a collection of 98, 86, 77, 56, 48, the range is 98 – 48 = 50

It shows how much spread of the lowest to the highest point in a distribution

2. Standard Deviation

represents the average amount of variability in a set of score. In practical terms, it’s the average distance of each score from the mean

It says

Low Standard Deviation: Indicates that the data points tend to be very close to the mean. The data is relatively consistent and tightly clustered
High Standard Deviation: Indicates that data points are spread out over a wider range from the mean. The data is more variable and less consistent

3. The Variance

measures how spread out or dispersed a set of data points is around its mean (average). It quantifies the average squared deviation of each data point from the mean

It’s simply the standard deviation squared

It says

Magnitude of spread: A larger variance indicates greater spread in the data.
Sensitivity to outliers: because of squaring the deviation, it’s more sensitive to outlier than deviation
Useful in calculation but less so for direct intepretation

Standard deviation is like the “user-friendly” measure of spread. It’s in the same units as your data, making it easier to understand what a “typical” deviation from the mean looks like.

Variance is more of a “behind-the-scenes” measure. It’s essential for many statistical calculations but less intuitive to interpret on its own because of the squared units.

Measures of variability help us even more fully understand what a distribution of data points looks like. Along with a measure of central tendency, we can use these values to distinguish distributions from one another and effectively describe what a collection and what those individual scores represent

Creating Graphs

The Classiest of Intervals

is a range of numbers and the frequency distribution of each range

Example

Histogram

a visual representation of the frequency distribution where the frequencies are represented by bars

Frequency Polygon

is a continuous line that represents the frequencies of scores within a class interval

Cumulating Frequencies

Bar Charts

is way to compare the frequencies of different categories with one another

Line Charts

show a trend in the data at equal intervals

Pie Charts

show the proportion or percentage of categories

Computing correlation coefficients

is a numerical index that reflects the relationship or association between two variables. The value is between -1.0 and +1.00

Type of correlations

Direct correlation or positive correlation
Indirect correlation or negative correlation

Computing a simple correlation coefficient

Example

The Scatterplot

is a very simple way to visually represent a correlation

The correlation matrix: Bunches of correlations

see correlations among all multiple variables

Example, the correlation between income level and education is 0,574. The correlation between income level and how sure people are that they will vote in the next election is 0,291 (meaning that higher the level of income, the less confident people were that they would vote)

The higher value of the correlation (regardless of its sign) is the stronger of the relationship

Understanding meaning of correlation coefficient

Coefficient of determination

is the percentage of variance in one variable that is accounted for by the variance in the other variable

For example, if the correlation between GPA and number of hours of study is .70, then the coefficient of determination is .70^2 = .49. It means 49% of the variance in GPA can be explained by the variance in studying time.

However, if 49% of the variance can be explained, it means 51% cannot, it is coefficient of a lienation (also called cofficient of nondetermination)

The idea of showing how things are related to one another and what they have in common is a very powerful one, and the correlation coefficient is a very useful descriptive statistics. Keep in mind that correlations express a relationship that is associative but not necessarily causal, and you’ll be able to understand how this statistic gives us valuable information about relationships between variables and how variables change or remain the same in concert with others.

Reliability and Validity

Reliability

is simply whether a test, or whatever as a measurement tools

Different types of Reliability

Reliability coefficients to be positive, not to be negative
Reliability coefficients are as large as possible (between .00 and +1.00) – higher is more stronger level of argreement between two sets of observations

Validity

is the propety of an assessment tools that indicates that the tools does what it says it does

Different types of Validity

2. ΣIGMA and Descriptive Statistics

Comments

Leave a Reply Cancel reply