Discrimination-aware data mining

In the context of civil rights law, discrimination refers to
unfair or unequal treatment of people based on membership to a
category or a minority, without regard to individual merit. Rules
extracted from databases by data mining techniques, such as
classification or association rules, when used for decision tasks
such as benefit or credit approval, can be discriminatory, in the
above sense. This deficiency of classification and association
rules poses ethical and legal issues, as well as obstacles to
practical application. In this paper, the notion of discriminatory
classification rules is introduced and studied. Examples of
potentially discriminatory attributes include gender, race, job,
and age. A measure, termed $\alpha$-protection, of the
discrimination power of a classification rule containing a
discriminatory item is defined and its properties studied. We show
that the introduced notion is non-trivial, in the sense that
discriminatory rules can be derived from apparently safe ones
under natural assumptions about background knowledge. Finally, we
discuss how to check $\alpha$-protection and provide an empirical
assessment on the German credit dataset.