Lesson 5: Inferential Statistics for Categorical Variables


Lesson 5: Inferential Statistics for Categorical Variables

Lesson 5: Inferential Statistics for Categorical Variables

What are the characteristics of Inferential Statistics for categorical variables?

The Chi-square test is the main characteristic of inferential statistics for categorical variables.

What is the function of the Chi-square test?

The Chi-square test is used to test:

– Independence: Testing whether two categorical variables are independent of each other.

– Goodness of fit: Testing whether the collected data fits a theoretical distribution.

– Homogeneity: Testing whether two independent samples come from the same population.

Conditions for using the contingency table:

A contingency table is used when:

– The relationship between the two variables has only two subgroups.

– The hypothesis testing problem becomes comparing two proportions.

How many steps are there in the Chi-square test?

The Chi-square test consists of 7 steps:

1. Description: Describe the data and the problem to be tested.

2. Assumptions: Assume that the data is a random sample.

3. Hypothesis:

– H0: The two variables are independent.

– H1: The two variables are dependent.

4. Test: Choose the appropriate Chi-square test (e.g., Chi-square for independence, Chi-square for goodness of fit).

5. Significance level: Choose the significance level ? (usually 0.05). The corresponding critical values for the significance levels are:

– ? = 0.05 –> ?2 = 3.84

– ? = 0.01 –> ?2 = 6.63

– ? = 0.001 –> ?2 = 10.83

6. Calculation: Calculate the Chi-square statistic using the formula.

7. Conclusion:

– Reject H0 if the calculated ?2 > the ?2 from the table or sig (p-value) < ? (0.05).

– Do not reject H0 if the calculated ?2 ? the ?2 from the table or sig (p-value) ? ? (0.05).

True or false about the Chi-square test:

– The Chi-square test is only used for categorical variables (classification) or discrete quantitative variables (quantification needs to be converted into categorical): True.

– The meaning of continuity correction is to make the result more accurate by correcting 0.5: True.

– Continuity correction can only be used for contingency tables: True.

– The Z test is a standard test for comparing 2 proportions: True.

– The t-test is a test for comparing variances: True.

– Z test: easy to calculate the confidence interval: True.

– K? ?2: easy to perform, many proportions can be extended. True.

True or false about the contingency table:

– A contingency table is a 2-way frequency distribution table: True.

– A contingency table shows the relationship between two quantitative variables: False. A contingency table shows the relationship between two classification (categorical) variables.

– The number of rows and columns corresponds to the number of levels of the two variables: True. Level is similar to subgroup.

– Not independent: the distribution of one variable is the same across all levels of the other variable: False. The distribution of one variable is not the same across all levels of the other variable.

– Independent (related): the distribution of one variable is not the same across all levels of the other variable: False. The distribution of one variable is the same across all levels of the other variable.

True or false about the significance level alpha:

– The significance level ? is the threshold value of the test: True. Used to compare with p-value –> whether the hypothesis is statistically significant.

– The significance level ? determines the impossible values (values that are too large or too small, also known as noise factors): True.

– The significance level ? is also called the rejection region of the sample: True.

– The significance level ? will be taken after the survey to compare with the p-value: False. The significance level ? is taken before the survey.

What is the degree of freedom symbol and how to calculate it?

– Symbol: d.f

– Calculation: (r-1)(c-1) – used to calculate the critical value ?2 to compare with the calculated ?2 value.

Comparing one proportion with a population proportion command?

– Analyse-Non-Legacy-chi square

Comparing one proportion with a population proportion (true or false):

– p > 0.05 take Ho –> 2 proportions are similar: True.

– p < 0.05 reject H1 –> 2 proportions are different: False. p < 0.05 reject H0 –> 2 proportions are different.

– p > 0.05 reject H1 –> 2 proportions are different: False. p > 0.05 do not reject H0 –> 2 proportions are similar.

Comparing two proportions command?

– Analyse Descriptive statistics Crosstabs Statistics chi-square

– Check the two boxes Chi square and the box Risk

What does Risk output?

– OR and RR (2 rows under for cohort)

What is RR? Causal relationship in ____ research?

– RR is relative exposure risk –> causal relationship in cohort research.

– RR > 1: YTNC increases the likelihood of disease (statistically significant).

– RR < 1: YTNC reduces the likelihood of disease (not statistically significant).

What is OR? Causal relationship in ____ research?

– OR is odds ratio. Causal relationship in case-control research.

True or false:

– The 95% confidence interval of OR can have negative values: False. Confidence interval in other places can be negative. OR is not negative.

– If the 95% confidence interval of OR contains the value 1, the result is not statistically significant: True.

When to use Fisher exact test to get p?

– When the total number of cells with values less than 5 is more than 20%.

Comparing more than 2 proportions command?

– Analyse Descriptive statistics Crosstabs Statistics chi-square

Let me know if you have any other questions!



Leave a Reply

Your email address will not be published. Required fields are marked *