Lesson 5: Inferential Statistics for Categorical Variables
Lesson 5: Inferential Statistics for Categorical Variables
What are the characteristics of Inferential Statistics for categorical variables?
The Chi-square test is the main characteristic of inferential statistics for categorical variables.
What is the function of the Chi-square test?
The Chi-square test is used to test:
– Independence: Testing whether two categorical variables are independent of each other.
– Goodness of fit: Testing whether the collected data fits a theoretical distribution.
– Homogeneity: Testing whether two independent samples come from the same population.
Conditions for using the contingency table:
A contingency table is used when:
– The relationship between the two variables has only two subgroups.
– The hypothesis testing problem becomes comparing two proportions.
How many steps are there in the Chi-square test?
The Chi-square test consists of 7 steps:
1. Description: Describe the data and the problem to be tested.
2. Assumptions: Assume that the data is a random sample.
3. Hypothesis:
– H0: The two variables are independent.
– H1: The two variables are dependent.
4. Test: Choose the appropriate Chi-square test (e.g., Chi-square for independence, Chi-square for goodness of fit).
5. Significance level: Choose the significance level ? (usually 0.05). The corresponding critical values for the significance levels are:
– ? = 0.05 –> ?2 = 3.84
– ? = 0.01 –> ?2 = 6.63
– ? = 0.001 –> ?2 = 10.83
6. Calculation: Calculate the Chi-square statistic using the formula.
7. Conclusion:
– Reject H0 if the calculated ?2 > the ?2 from the table or sig (p-value) < ? (0.05).
– Do not reject H0 if the calculated ?2 ? the ?2 from the table or sig (p-value) ? ? (0.05).
True or false about the Chi-square test:
– The Chi-square test is only used for categorical variables (classification) or discrete quantitative variables (quantification needs to be converted into categorical): True.
– The meaning of continuity correction is to make the result more accurate by correcting 0.5: True.
– Continuity correction can only be used for contingency tables: True.
– The Z test is a standard test for comparing 2 proportions: True.
– The t-test is a test for comparing variances: True.
– Z test: easy to calculate the confidence interval: True.
– K? ?2: easy to perform, many proportions can be extended. True.
True or false about the contingency table:
– A contingency table is a 2-way frequency distribution table: True.
– A contingency table shows the relationship between two quantitative variables: False. A contingency table shows the relationship between two classification (categorical) variables.
– The number of rows and columns corresponds to the number of levels of the two variables: True. Level is similar to subgroup.
– Not independent: the distribution of one variable is the same across all levels of the other variable: False. The distribution of one variable is not the same across all levels of the other variable.
– Independent (related): the distribution of one variable is not the same across all levels of the other variable: False. The distribution of one variable is the same across all levels of the other variable.
True or false about the significance level alpha:
– The significance level ? is the threshold value of the test: True. Used to compare with p-value –> whether the hypothesis is statistically significant.
– The significance level ? determines the impossible values (values that are too large or too small, also known as noise factors): True.
– The significance level ? is also called the rejection region of the sample: True.
– The significance level ? will be taken after the survey to compare with the p-value: False. The significance level ? is taken before the survey.
What is the degree of freedom symbol and how to calculate it?
– Symbol: d.f
– Calculation: (r-1)(c-1) – used to calculate the critical value ?2 to compare with the calculated ?2 value.
Comparing one proportion with a population proportion command?
– Analyse-Non-Legacy-chi square
Comparing one proportion with a population proportion (true or false):
– p > 0.05 take Ho –> 2 proportions are similar: True.
– p < 0.05 reject H1 –> 2 proportions are different: False. p < 0.05 reject H0 –> 2 proportions are different.
– p > 0.05 reject H1 –> 2 proportions are different: False. p > 0.05 do not reject H0 –> 2 proportions are similar.
Comparing two proportions command?
– Analyse Descriptive statistics Crosstabs Statistics chi-square
– Check the two boxes Chi square and the box Risk
What does Risk output?
– OR and RR (2 rows under for cohort)
What is RR? Causal relationship in ____ research?
– RR is relative exposure risk –> causal relationship in cohort research.
– RR > 1: YTNC increases the likelihood of disease (statistically significant).
– RR < 1: YTNC reduces the likelihood of disease (not statistically significant).
What is OR? Causal relationship in ____ research?
– OR is odds ratio. Causal relationship in case-control research.
True or false:
– The 95% confidence interval of OR can have negative values: False. Confidence interval in other places can be negative. OR is not negative.
– If the 95% confidence interval of OR contains the value 1, the result is not statistically significant: True.
When to use Fisher exact test to get p?
– When the total number of cells with values less than 5 is more than 20%.
Comparing more than 2 proportions command?
– Analyse Descriptive statistics Crosstabs Statistics chi-square
Let me know if you have any other questions!
Leave a Reply