fairdo.metrics package#

The fairdo.metrics package provides a collection of metrics [1] to measure fairness/discrimination in datasets. The metrics are divided into following subpackages:

  • fairdo.metrics.group: This subpackage provides metrics to measure group fairness.

  • fairdo.metrics.individual: This subpackage provides metrics to measure individual fairness.

  • fairdo.metrics.dependence: This subpackage provides metrics to measure the (in)dependency between two variables.

  • fairdo.metrics.penalty: This subpackage contains specialized penalty functions to penalize fairness metrics to guarantee certain constraints such as group coverage [2].

Notes

References

Submodules#

fairdo.metrics.dependence module#

This module contains functions to calculate the dependency, correlation, association or any other relationship between two variables.

In the fairness context, the dependency between the target variable and the protected attribute(s) is of interest. Let \(y\) be the target variable and \(z\) be the protected attribute(s), then some kind of relationship between these two variables is calculated using a function \(f\): \(f(y, z)\).

fairdo.metrics.dependence.dependency_multi(y: ~numpy.array, z: ~numpy.array, dependency_function=<function normalized_mutual_info_score>, agg=<function amax>, positive_label=1, **kwargs) float[source]#

Calculates the dependency between y with each z[:,i] using the specified dependency_function. Aggregates the dependency scores using the agg function.

Let \(f\) be the dependency function, \(y\) be the target variable, and \(z\) be the protected attributes, then the dependency score is calculated as (pythonic notation):

\[\text{dependency}(y, z) = \text{agg}(f(y, z[:,0]), f(y, z[:,1]), \ldots, f(y, z[:,-1]))\]
Parameters:
  • y (np.array, shape (n_samples,)) – Flattened binary array of shape (n_samples,), can be a prediction or the truth label.

  • z (np.array, shape (n_samples, n_protected_attributes)) – Array of shape (n_samples, n_protected_attributes), represents the protected attributes.

  • dependency_function (callable, optional) – Function to compute the dependency between y and each protected attribute. Default is normalized_mutual_info_score.

  • agg (callable, optional) – Aggregation function to combine the dependency scores. Default is np.max.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The aggregated dependency score.

Return type:

float

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import dependency_multi
>>> y = np.random.randint(0, 2, (10,))
>>> z = np.random.randint(0, 2, (10, 3))
>>> dependency_multi(y, z)
0.12634639359704877
fairdo.metrics.dependence.dual_total_correlation(*arrays)[source]#

Calculate the dual total correlation [5] for more than two variables. Given a set of \(m\) categorical variables \(X = (X_1, X_2, \ldots, X_m)\), it is given by:

\[DTC(X) = DTC(X_1, X_2, \ldots, X_m) = (X_1, X_2, \ldots, X_m) - \sum_{i=1}^{m} H(X_i | X_1, X_2, \ldots, X_{i-1}, X_{i+1}, \ldots, X_m)\]

where \(H(X_1, X_2, \ldots, X_m)\) is the joint entropy and \(H(X_i | X_1, X_2, \ldots, X_{i-1}, X_{i+1}, \ldots, X_m)\) is the conditional entropy of \(X_i\) given all other variables.

Parameters:

*arrays (np.array) – Arrays of shape (n_samples,) containing the labels.

Returns:

The dual total correlation of the categorical variables.

Return type:

float

References

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import dual_total_correlation
>>> x = np.array([0, 1, 1, 0, 1, 0, 0, 1])
>>> y = 1 - x
>>> dual_total_correlation(x, y)
1.0
fairdo.metrics.dependence.mi(y: array, z: array, bins=2, **kwargs) float[source]#

Calculate the mutual information [12] between two arrays. The protected attribute z can be binary or non-binary.

Mutual information is a measure of the mutual dependence between two variables. It quantifies the “amount of information” (in units such as bits) obtained about one random variable, by observing the other random variable. Higher values indicate a higher dependency between the two variables. It is defined as:

\[I(Y, Z) = \sum_{Y, Z} p(Y, Z) \log \left(\frac{p(Y, Z)}{p(Y) \cdot p(Z)}\right)\]

where \(p(Y, Z)\) is the joint probability distribution of \(Y\) and \(Z\), and \(p(Y)\) and \(p(Z)\) are the respective marginal probability distributions.

Parameters:
  • y (np.array (n_samples,)) – Flattened array, can be a prediction or the truth label. Discrete values.

  • z (np.array (n_samples,)) – Flattened array of the same shape as y. Discrete values.

  • bins (int, optional) – Number of bins for discretization. Default is 2.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The mutual information between y and z.

Return type:

float

References

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import mi
>>> y = np.random.randint(0, 2, (10,))
>>> z = np.random.randint(0, 2, (10,))
>>> mi(y, z)
0.013844293808390806
fairdo.metrics.dependence.nmi(y: array, z: array, **kwargs) float[source]#

Calculate the normalized mutual information between two arrays. The protected attribute z can be binary or non-binary.

Normalized mutual information is a normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information, independent variables) and 1 (perfect correlation). The function handles any warning by ignoring them. The formula is given by:

\[\text{NMI}(Y, Z) = \frac{2 \cdot I(Y, Z)}{H(Y) + H(Z)}\]

where \(I(Y, Z)\) is the mutual information between Y and Z, and \(H(Y)\) and \(H(Z)\) are the entropies of \(Y\) and \(Z\), respectively.

Parameters:
  • y (np.array, shape (n_samples,)) – Flattened array, can be a prediction or the truth label. Discrete values.

  • z (np.array, shape (n_samples,)) – Flattened array of the same shape as y. Discrete values.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The normalized mutual information between y and z.

Return type:

float

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import nmi
>>> y = np.array([0, 1, 1, 0, 1, 0, 0, 1])
>>> z = np.array([0, 1, 1, 0, 1, 0, 0, 1])
>>> nmi(y, z)
1.0
fairdo.metrics.dependence.nmi_multi(y: ~numpy.array, z: ~numpy.array, agg=<function amax>, positive_label=1, **kwargs)[source]#

Compute the normalized mutual information [11] for multiple non-binary protected attributes.

This function calculates the normalized mutual information between y and each protected attribute in z, and then aggregates these scores using the specified agg function. Let \(y\) be the target variable and \(z\) be the protected attributes, then the normalized mutual information score for multiple protected attributes is calculated as (pythonic notation):

\[\text{NMI}(y, z) = \text{agg}(\text{NMI}(y, z[:,0]), \text{NMI}(y, z[:,1]), \ldots, \text{NMI}(y, z[:,-1]))\]
Parameters:
  • y (np.array, shape (n_samples,)) – Flattened binary array, can be a prediction or the truth label.

  • z (np.array, shape (n_samples, n_protected_attributes)) – Each z[:,i] represents a single protected attribute.

  • agg (callable, optional) – Aggregation function to combine the normalized mutual information scores. Default is np.max.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The aggregated normalized mutual information score.

Return type:

float

References

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import nmi_multi
>>> y = np.random.randint(0, 2, (10,))
>>> z = np.random.randint(0, 2, (10, 3))
>>> nmi_multi(y, z)
0.09855890449799566
fairdo.metrics.dependence.o_information(*arrays)[source]#

Calculate the O-information [6] of multiple categorical variables. The O-information is the difference between the total correlation and the dual total correlation:

\[O(X_1, X_2, \ldots, X_m) = TC(X_1, X_2, \ldots, X_m) - DTC(X_1, X_2, \ldots, X_m)\]

where \(TC\) is the total correlation and \(DTC\) is the dual total correlation.

Parameters:

*arrays (np.array) – Arrays of shape (n_samples,) containing the labels.

Returns:

The O-information of the categorical variables.

Return type:

float

References

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import o_information
>>> x = np.array([0, 1, 1, 0, 1, 0, 0, 1])
>>> y = 1 - x
>>> o_information(x, y)
0.0
fairdo.metrics.dependence.pearsonr(y: array, z: array, **kwargs) float[source]#

Calculate the Pearson correlation coefficient between two arrays. The protected attribute z can be binary or non-binary. It is given by:

\[\text{Pearson}(Y, Z) = \frac{\text{cov}(Y, Z)}{\sigma_Y \cdot \sigma_Z}\]

where \(\text{cov}(Y, Z)\) is the covariance between \(Y\) and \(Z\), and \(\sigma_Y\) and \(\sigma_Z\) are the respective standard deviations.

Parameters:
  • y (np.array) – Flattened array, can be a prediction or the truth label.

  • z (np.array) – Flattened array of the same shape as y.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The Pearson correlation coefficient between y and z.

Return type:

float

Notes

The Pearson correlation coefficient measures the linear relationship between two variables. The calculation of the Pearson correlation coefficient is not affected by scaling, and it ranges from -1 to 1. A value of 1 implies a perfect positive correlation, while a value of -1 implies a perfect negative correlation.

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import pearsonr
>>> y = np.array([0, 1, 1, 0, 1, 0, 0, 1])
>>> z = 1 - y
>>> pearsonr(y, z)
-0.9999999999999998
fairdo.metrics.dependence.pearsonr_abs(y: array, z: array, **kwargs) float[source]#

Calculate the absolute value of the Pearson correlation coefficient between two arrays. The protected attribute z can be binary or non-binary. It is given by:

\[\text{Pearson}(Y, Z) = \left|\frac{\text{cov}(Y, Z)}{\sigma_Y \cdot \sigma_Z}\right|\]

where \(\text{cov}(Y, Z)\) is the covariance between \(Y\) and \(Z\), and \(\sigma_Y\) and \(\sigma_Z\) are the respective standard deviations.

Parameters:
  • y (np.array) – Flattened array, can be a prediction or the truth label.

  • z (np.array) – Flattened array of the same shape as y.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The absolute value of the Pearson correlation coefficient between y and z.

Return type:

float

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import pearsonr_abs
>>> y = np.array([0, 1, 1, 0, 1, 0, 0, 1])
>>> z = 1 - y
>>> pearsonr_abs(y, z)
0.9999999999999998
fairdo.metrics.dependence.rdc(y: ~numpy.array, z: ~numpy.array, f=<ufunc 'sin'>, k=20, s=0.16666666666666666, n=1, **kwargs)[source]#

The Randomized Dependence Coefficient by David Lopez-Paz, Philipp Hennig, Bernhard Schoelkopf [7]. According to the paper, the coefficient should be relatively insensitive to the settings of the f, k, and s parameters.

Parameters:
  • y (np.array (n_samples,) or (n_samples, n_variables)) –

  • z (np.array (n_samples,) or (n_samples, n_variables)) –

  • f (callable) – function to use for random projection

  • k (int) – number of random projections to use

  • s (numeric) – scale parameter

  • n (int) – number of times to compute the RDC and return the median (for stability)

Returns:

RDC between y and z.

Return type:

float

Notes

Implementation by Gary Doran and taken from: garydoranjr/rdc

References

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import rdc
>>> y = np.random.rand(100)
>>> z = np.random.rand(100)
>>> rdc(y, z)
0.287647809294975
fairdo.metrics.dependence.total_correlation(*arrays) float[source]#

Calculate the total correlation (multi-information) of multiple categorical variables [13] [4]. Given a set of \(m\) categorical variables \(X = (X_1, X_2, \ldots, X_m)\), the total correlation is:

\[TC(X) = TC(X_1, X_2, \ldots, X_m) = \left(\sum_{i=1}^{m} H(X_i)\right) - H(X_1, X_2, \ldots, X_m)\]

where \(H(X_i)\) is the entropy of the i-th variable and \(H(X_1, X_2, \ldots, X_m)\) is the joint entropy.

Parameters:

*arrays (np.array) – Arrays of shape (n_samples,) containing the labels.

Returns:

The total correlation of the categorical variables.

Return type:

float

References

Examples

>>> import numpy as np
>>> from fairdo.metrics.dependence import total_correlation
>>> x = np.array([0, 1, 1, 0, 1, 0, 0, 1])
>>> y = 1 - x
>>> total_correlation(x, y)
1.0

fairdo.metrics.group module#

fairdo.metrics.group.average_odds_difference(y_true: array, y_pred: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Calculate the difference in Average Odds between privileged and unprivileged groups.

[1] Equality of Opportunity in Supervised Learning (Hardt, Price, Srebro, 2016) (https://arxiv.org/abs/1610.02413)

Parameters:
  • y_true (numpy.array) – Flattened array of true binary labels.

  • y_pred (numpy.array) – Flattened array of predicted binary labels. Must have same shape as y_true.

  • z (numpy.array) – Binary array indicating privileged (1) or unprivileged (0) group. Same shape as y_true.

  • positive_label (int, optional) – Label considered as positive, default is 1.

  • privileged_group (int, optional) – Label denoting the privileged group, default is 1.

Returns:

The difference in Average Odds between privileged and unprivileged groups.

Return type:

float

fairdo.metrics.group.average_odds_error(y_true: array, y_pred: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Compute the Average Odds Error. Can be used as an objective function to minimize.

Parameters:
  • y_true (numpy.array) – Flattened array of true binary labels.

  • y_pred (numpy.array) – Flattened array of predicted binary labels. Must have same shape as y_true.

  • z (numpy.array) – Binary array indicating privileged (1) or unprivileged (0) group. Same shape as y_true.

  • positive_label (int, optional) – Label considered as positive, default is 1.

  • privileged_group (int, optional) – Label denoting the privileged group, default is 1.

Returns:

The Average Odds Error between privileged and unprivileged groups.

Return type:

float

fairdo.metrics.group.disparate_impact_ratio(y: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Calculate the Disparate Impact ratio. The protected attribute z must be binary.

This function computes the ratio of probabilities of positive outcomes for the unprivileged group to the privileged group. A value of 1 indicates fairness, while a value < 1 indicates discrimination towards the unprivileged group. A value of > 1 would indicate discrimination towards the privileged group.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened binary array of shape y, represents the protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

The Disparate Impact ratio.

Return type:

float

fairdo.metrics.group.disparate_impact_ratio_deviation(y: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Calculate the difference in objective Disparate Impact ratio. The protected attribute z must be binary.

This function computes the difference between 1 and the Disparate Impact ratio. A value of 0 indicates fairness. A positive value indicates discrimination towards the unprivileged group. A negative value indicates discrimination towards the privileged group.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened binary array of shape y, represents the protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

The difference in objective Disparate Impact ratio.

Return type:

float

fairdo.metrics.group.disparate_impact_ratio_objective(y: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Calculate the objective Disparate Impact ratio. The protected attribute z must be binary.

This function computes the absolute difference between 1 and the Disparate Impact ratio. It can be used as an objective function to minimize discrimination towards the unprivileged group (and the privileged group). Lower values indicate less discrimination.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened binary array of shape y, represents the protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

The objective Disparate Impact ratio.

Return type:

float

fairdo.metrics.group.equal_opportunity_abs_diff(*args, **kwargs)[source]#

Compute the absolute difference in Equality of Opportunity [1].

[1] Equality of Opportunity (Hardt, Price, Srebro, 2016) (https://arxiv.org/abs/1610.02413)

Parameters:
  • *args (arguments) – Variable length argument list to be passed to equal_opportunity_difference function.

  • **kwargs (keyword arguments) – Arbitrary keyword arguments to be passed to equal_opportunity_difference function.

Returns:

The absolute difference in Equality of Opportunity between privileged and unprivileged groups.

Return type:

float

fairdo.metrics.group.equal_opportunity_difference(y_true: array, y_pred: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Compute the difference in Equality of Opportunity [1] between the privileged group and the unprivileged group.

Equality of Opportunity [1] is a fairness metric that measures the difference in true positive rates between the privileged and unprivileged groups. This function returns a float representing that difference. A value of 0 indicates perfect fairness, positive values indicate bias against the unprivileged group, while negative values indicate bias against the privileged group.

[1] Equality of Opportunity (Hardt, Price, Srebro, 2016)](https://arxiv.org/abs/1610.02413)

Parameters:
  • y_true (numpy.array) – The true binary labels as a flattened array.

  • y_pred (numpy.array) – The predicted binary labels from the model. Should be of the same shape as y_true.

  • z (numpy.array) – The protected attribute as a binary array. This array indicates the group (privileged or unprivileged) for each instance in the data. Should be of the same shape as y_true.

  • positive_label (int, optional (default=1)) – The label considered as positive in the dataset.

  • privileged_group (int, optional (default=1)) – The label that denotes the privileged group. If 0, the function will treat the unprivileged group as the privileged group.

Returns:

The difference in Equality of Opportunity between the privileged and unprivileged groups.

Return type:

float

fairdo.metrics.group.mean_difference(*args, **kwargs) float[source]#

Alias for the statistical_parity_difference function.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened binary array of shape y, represents the protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

The difference in statistical parity between unprivileged and privileged groups.

Return type:

float

fairdo.metrics.group.predictive_equality_abs_diff(*args, **kwargs)[source]#

Compute the absolute difference in Predictive Equality.

Parameters:
  • *args (arguments) – Variable length argument list to be passed to predictive_equality_difference function.

  • **kwargs (keyword arguments) – Arbitrary keyword arguments to be passed to predictive_equality_difference function.

Returns:

The absolute difference in Predictive Equality between privileged and unprivileged groups.

Return type:

float

fairdo.metrics.group.predictive_equality_difference(y_true: array, y_pred: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Calculate the difference in Predictive Equality.

Parameters:
  • y_true (numpy.array) – True binary labels as a flattened array.

  • y_pred (numpy.array) – Predicted binary labels as a flattened array. Must have same shape as y_true.

  • z (numpy.array) – Binary array denoting privileged (1) or unprivileged (0) group. Same shape as y_true.

  • positive_label (int, optional) – Label considered as positive, default is 1.

  • privileged_group (int, optional) – Label representing the privileged group, default is 1.

Returns:

The difference in Predictive Equality between privileged and unprivileged groups.

Return type:

float

fairdo.metrics.group.statistical_parity_abs_diff(y: ~numpy.array, z: ~numpy.array, agg_group=<function sum>, **kwargs) float[source]#

Calculate the absolute value of the statistical parity difference between all groups inside a protected attribute. The protected attribute z can be binary or non-binary. Returned value is aggregated with agg_group.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened array of shape y, represents the protected attribute. Can represent non-binary protected attribute.

  • agg_group (callable, optional) – Aggregation function for the group. Default is np.sum.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

The absolute value of the statistical parity difference.

Return type:

float

fairdo.metrics.group.statistical_parity_abs_diff_intersectionality(y: ~numpy.array, z: ~numpy.array, agg_group=<function amax>, **kwargs) float[source]#

Calculate the absolute difference in statistical parity for multiple non-binary protected attributes. Intersections from all protected attributes are considered. Protected attributes z[i] can be binary or non-binary.

Parameters:
  • y (np.array) – Flattened binary array of shape (n_samples,), can be the prediction or the truth label.

  • z (np.array) – Array of shape (n_samples, n_protected_attributes) representing the protected attribute.

  • agg_group (callable, optional) – Aggregation function for the group. Default is np.sum.

  • **kwargs (dict) – Additional keyword arguments.

fairdo.metrics.group.statistical_parity_abs_diff_max(y: array, z: array, **kwargs) float[source]#

Calculate the maximum of statistical parity absolute differences between all groups in a protected attribute. The protected attribute z can be binary or non-binary.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened array of shape y, represents the protected attribute. Can represent non-binary protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

Average of the absolute value of the statistical parity differences between all groups.

Return type:

float

fairdo.metrics.group.statistical_parity_abs_diff_mean(y: array, z: array, **kwargs) float[source]#

Calculate the sum of statistical parity absolute differences between all groups and return the average score. The protected attribute z can be binary or non-binary.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened array of shape y, represents the protected attribute. Can represent non-binary protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

Average of the absolute value of the statistical parity differences between all groups.

Return type:

float

fairdo.metrics.group.statistical_parity_abs_diff_multi(y: ~numpy.array, z: ~numpy.array, agg_attribute=<function amax>, agg_group=<function amax>, positive_label=1, **kwargs) float[source]#

Calculate the absolute difference in statistical parity for multiple non-binary protected attributes. Protected attributes z[i] can be binary or non-binary.

Parameters:
  • y (np.array) – Flattened binary array of shape (n_samples,), can be the prediction or the truth label.

  • z (np.array) – Array of shape (n_samples, n_protected_attributes) representing the protected attribute.

  • agg_attribute (callable, optional) – Aggregation function for the attribute. Default is np.sum.

  • agg_group (callable, optional) – Aggregation function for the group. Default is np.sum.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

Returns:

Aggregated attribute disparity.

Return type:

float

fairdo.metrics.group.statistical_parity_abs_diff_sum(y: array, z: array, **kwargs) float[source]#

Calculate the maximum of statistical parity absolute differences between all groups in a protected attribute. The protected attribute z can be binary or non-binary.

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened array of shape y, represents the protected attribute. Can represent non-binary protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

Average of the absolute value of the statistical parity differences between all groups.

Return type:

float

fairdo.metrics.group.statistical_parity_difference(y: array, z: array, positive_label=1, privileged_group=1, **kwargs) float[source]#

Calculate the difference in statistical parity according to [1]. The protected attribute z must be binary. Returned value can be negative.

[1] A Maximal Correlation Framework for Fair Machine Learning (Lee et al. 2022) (https://arxiv.org/abs/2106.00051)

Parameters:
  • y (np.array) – Flattened binary array, can be the prediction or the truth label.

  • z (np.array) – Flattened binary array of shape y, represents the protected attribute.

  • positive_label (int, optional) – Label considered as positive. Default is 1.

  • privileged_group (int, optional) – Label considered as privileged. Default is 1.

Returns:

The difference in statistical parity between unprivileged and privileged groups.

Return type:

float

fairdo.metrics.individual module#

fairdo.metrics.individual.consistency_score(x: array, y: array, n_neighbors=5, **kwargs) float[source]#

Compute the Consistency Score as defined in Learning Fair Representations (Zemel et al. 2013).

This score measures the consistency of the output y with respect to nearest neighbors in x. A higher score indicates more fairness.

Parameters:
  • x (np.array) – Array representing the input data.

  • y (np.array) – Array of the same length as x, representing the output data.

  • n_neighbors (int, optional) – Number of neighbors to consider. Default is 5.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The consistency score. Higher values indicate more fairness.

Return type:

float

fairdo.metrics.individual.consistency_score_objective(x: array, y: array, n_neighbors=5, **kwargs) float[source]#

Compute the inverse of the Consistency Score to use as an objective function.

This function is intended to be minimized. Lower values indicate more individual fairness.

Parameters:
  • x (np.array) – Array representing the input data.

  • y (np.array) – Array of the same length as x, representing the output data.

  • n_neighbors (int, optional) – Number of neighbors to consider. Default is 5.

  • **kwargs – Additional keyword arguments. These are not currently used.

Returns:

The inverse of the consistency score. Lower values indicate more fairness.

Return type:

float

fairdo.metrics.penalty module#

Penalty Functions for Constrained Optimization#

This module provides penalty functions specifically designed for fairness optimization with constraints. The constraint in this context is that the number of data points after pre-processing should match a specified value. A practical penalty function is relative_shortfall_penalty, which is designed to handle situations where the number of data points is less than this specified value, and in such cases, penalties are applied to the solutions.

fairdo.metrics.penalty.data_loss(y: array, dims: int, **kwargs) float[source]#

Calculate the relative amount of data lost after pre-processing.

Parameters:
  • y (np.array) – Labels of the data. The size of it depicts the current size of the data.

  • dims (int) – The size of the original data.

fairdo.metrics.penalty.group_missing_penalty(z: array, n_groups: array, agg_attribute='max', agg_group='max', eps=0.01, **kwargs) float[source]#

Calculate the penalty for missing groups in a protected attribute. The number of groups n_groups is used to calculate the penalty.

If agg_group is ‘max’, the penalty is 1 if any group is missing, otherwise 0. If agg_group is ‘sum’, the penalty is the sum of the penalties for each group. agg_attribute is used to aggregate the penalties for each protected attribute.

Parameters:
  • z (np.array) –

    Array of shape (n_samples, n_protected_attributes) representing multiple protected attributes. or (n_samples,) represents one protected attribute.

    Each protected attribute can consists of >2 groups.

  • n_groups (np.array or int) – Number of groups for each protected attribute.

  • agg_group (str, optional) – Aggregation function for the group. Default is ‘sum’.

  • agg_attribute (str, optional) – Aggregation function for the attribute. Default is ‘max’.

  • eps (float, optional) – Small value to add to the penalty. Default is 0.1. Acts as an upper bound for the maximum discrimination possible that is not a supremum. This is to ensure that missing a group is always worse than having a group with a large discrimination.

Returns:

The penalty for missing groups.

Return type:

float