Chi-square Contingency Test

slbr4.gif (4051 bytes)

The chi-square contingency test is widely used for determining whether there is a statistically significant relationship between two categorical (nominal or ordinal) variables. 
(Parasuraman, Grewal, Krishnan, 2004)

slbl3.gif (530 bytes) A two-way tabulation is useful in understanding the nature of association between a pair of variables. (Parasuraman, Grewal, Krishnan, 2004)
slbl3.gif (530 bytes) Two-Way Table
  slbl3.gif (530 bytes) A two-way table uses a simple cross-tabulation.
  slbl3.gif (530 bytes) Categories
    slbl3.gif (530 bytes) Data must be coded into fixed sets of categories.
    slbl3.gif (530 bytes) The number of categories should not be large.
    slbl3.gif (530 bytes) Interval and/or ratio data that can be transformed into a limited set of categories can also be analyzed with this technique.
  slbl3.gif (530 bytes) Constructing the Table
    slbl3.gif (530 bytes) The number of responses for each category of one variable are broken into the categories of the second variable.
    slbl3.gif (530 bytes) For example ...

  slbl3.gif (530 bytes) The hypothesis
    slbl3.gif (530 bytes) The null is a test of no relationship or independence.
    slbl3.gif (530 bytes) The alternative hypothesis asserts there is a relationship; they are not independent.
    slbl3.gif (530 bytes) H0:  There is no relationship between height and sex.

Ha:  There is a relationship between height and sex.

slbl3.gif (530 bytes) Conducting the test
  slbl3.gif (530 bytes) The actual frequencies are compared with expected cell frequencies, created under the assumption of the null hypothesis (e.g. there is no relationship).
  slbl3.gif (530 bytes) Eij = (ninj)/n

Where ni and nj are the marginal frequencies
i = the number of sample units in category i of the row variable
j = the number of sample units in category j of the column variable

  slbl3.gif (530 bytes) Expected values:

n = 100

(60)(40)/100, etc.

  5'11" + < 5'11" Total
Men 24 36 60
Women 16 24 40
Total 40 60 100
  slbl3.gif (530 bytes) The value for c2

c2 = SS[(Oij - Eij)2]/Eij

summed from i = 1 to r and from j = 1 to c

O =observed
r = number of rows
c = number of columns

  slbl3.gif (530 bytes)

c2 = SS[(Oij - Eij)2]/Eij

(30-24)2/24 + (30-36)2/36 +  (10-16)2/16 + (30-24)2/24 =

1.5 + 1.0 + 2.25 + 1.5 = 6.25

  slbl3.gif (530 bytes) Degrees of Freedom

d.f. = (r-1)(c-1)

(2-1)(2-1) = (1)(1) = 1

  slbl3.gif (530 bytes) The Critical Value is taken from cTable given a specific significance level and the degrees of freedom.
  slbl3.gif (530 bytes) Our example:

a = 0.05 and d.f. = 1, the critical value of 3.84 is obtained from the table.

Decision Rule:  Reject H0:  if c2 > 3.84

Because, 6.25 is greater than 3.84, we reject the null hypothesis in favor of the alternative.

Conclusion:  The data strongly suggest a relationship between height and sex.

slbl3.gif (530 bytes) Most statistical software can compute a  ctest.
slbl3.gif (530 bytes) Precautions in Interpreting Two-Way Tables
  slbl3.gif (530 bytes) Unless the data were collected in a carefully controlled experiment, this test is a test of association, not a causal relationship.
  slbl3.gif (530 bytes) Just like with small samples, small cells can generate misleading results by giving too much weight to a few data points.  The test requires adequate numbers in each cell.
  slbl3.gif (530 bytes) This technique only examines pairs of data which can be problematic if the relationship between the two variables is dependent or influenced by one or more other variables.

Click Here for the Next Slide

slbr4.gif (4051 bytes)

Copyright Dr. Nancy D. Albers-Miller, All Rights Reserved