When three or more groups of samples are compared (e.g. Using ANOVA/Tukey HSD or Kruskal-Wallis/Dunn), you often

see results shown as a boxplot, with lines highlighting which groups are significantly different. In Python there is

no single package to do this quickly, though by combining scikit-posthocs with statannotations similar plots can

be generated with relative ease. Here wego over the code involved step by step.

Statistical tests comparing three or more groups are typically done in two steps. The first test will check if there

are *any* statistical difference between the groups, the second test will then tell you *which* groups are different.

The second test is referred to as a post hoc test. While there are many combinations of tests to choose from,

common combinations are Kruskal-Wallis followed by a post hoc Dunn test (non-parametric) and ANOVA

with Tukey Honest Significant Differences (parametric). Though note that the code below can easily be adapted for

other tests.

All code from this post can be found on GitHub and Binder.

## The data

The iris dataset, with measurements of petal and sepals of three species of flowers will be a nice way to

test some of these test. The code below will load all required libraries and load the iris data which is included in

scikit-learn.

```
%load_ext nb_black
from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
iris_obj = load_iris()
iris_df = pd.DataFrame(iris_obj.data, columns=iris_obj.feature_names)
iris_df["species"] = [iris_obj.target_names[s] for s in iris_obj.target]
iris_df.head()
```

聽 | sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | species |
---|---|---|---|---|---|

0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |

1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |

2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |

3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |

4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |

For the first test (Kruskal-Wallis and ANOVA) we鈥檒l use the implementations from SciPy and these require for each

group a list of values to be passed as a function parameter. The easiest way to do so is with the code below, a list of

lists is created with for each species the sepal lengths. This can then be unpacked to parameters using the asterisk (*).

```
species = np.unique(iris_df.species)
data = []
for s in species:
data.append(iris_df[iris_df.species == s]["sepal length (cm)"])
```

## Kruskal-Wallis with Dunn

The Kruskal-Wallis test is included in SciPy and can easily be applied to our data after it was properly structured

in the previous step.

```
from scipy import stats
stats.kruskal(*data)
```

This will return a statistic (96.04) and a p-value (8.9e-22) so there is a significant difference between these species.

However, this test will not show us between which species there are difference. To know this we need to apply a post hoc

Dunn鈥檚 test, often used in combination with Kruskal-Wallis. The `posthoc_dunn()`

function included in

scikit-posthocs can be used. (Note the difference in syntax with SciPy鈥檚 test)

```
from scikit_posthocs import posthoc_dunn
# posthoc dunn test, with correction for multiple testing
dunn_df = posthoc_dunn(
iris_df, val_col="sepal length (cm)", group_col="species", p_adjust="fdr_bh"
)
dunn_df
```

This will return a matrix with all pairwise combinations of species and the p-value of the test (with correction

applied if `p_adjust`

is set to a valid method).

setosa | versicolor | virginica | |
---|---|---|---|

setosa | 1.000000e+00 | 1.529257e-09 | 6.000296e-22 |

versicolor | 1.529257e-09 | 1.000000e+00 | 2.774866e-04 |

virginica | 6.000296e-22 | 2.774866e-04 | 1.000000e+00 |

So in this case each species is significantly different from the other two. We鈥檒l have a look at ANOVA and Tukey HSD

first before going into details how to visualize these results better.

## ANOVA with Tukey HSD

Similarly to the previous example, we can run an ANOVA. First we run the `f_oneway()`

, which is the function in

SciPy to do an ANOVA, and finish with `posthoc_tukey()`

from scikit-posthocs.

The first test gives us a significant p-value (1.67e-31), so it is good to continue with the Tukey test.

```
from scikit_posthocs import posthoc_tukey
# First we do a oneway ANOVA as implemented in SciPy
print(stats.f_oneway(*data))
tukey_df = posthoc_tukey(iris_df, val_col="sepal length (cm)", group_col="species")
tukey_df
```

This gives us the finale table with all comparisons and p-values from those tests.

setosa | versicolor | virginica | |
---|---|---|---|

setosa | 1.000 | 0.001 | 0.001 |

versicolor | 0.001 | 1.000 | 0.001 |

virginica | 0.001 | 0.001 | 1.000 |

## Visualizing the results

These matrices are hard to interpret, and most will prefer a simple visualization to highlight significant differences.

While showing the actual data using seaborn is easy, adding in annotated lines with the p-values isn鈥檛. This is where

statannotations comes in, this package allows you to add those in with a few lines of code. While the package comes

with its own suite of statistical tests, post hoc tests unfortunately aren鈥檛 currently included. So here is how to do

this.

First, the matrix needs to be converted to a non-redundant list of comparisons with the p-value. This is done by

removing the lower half and diagonal of the matrix and turning the matrix format into a long dataframe using

`melt()`

. The code and resulting dataframe are shown below.

```
remove = np.tril(np.ones(tukey_df.shape), k=0).astype("bool")
tukey_df[remove] = np.nan
molten_df = tukey_df.melt(ignore_index=False).reset_index().dropna()
molten_df
```

index | variable | value | |
---|---|---|---|

3 | setosa | versicolor | 0.001 |

6 | setosa | virginica | 0.001 |

7 | versicolor | virginica | 0.001 |

Next, we鈥檒l have to draw the main plot using seaborn `boxplot()`

function and convert our dataframe into

a list of pairs and list of matching p-values for statannotations. The code below is a little cryptic due to the use

of list comprehensions and `iterrows()`

, though in a nutshell it will go over each row and create a tuple with the

species that are being compared. Then p-values are converted to a list using the same functions.

The list of pairs is passed to an `Annotator`

object along with the data. By calling

`configure()`

the plot is set up as we would like. Finally p-values are added using `set_pvalues_and_annotate()`

which will also add the annotations to the plot.

```
import seaborn as sns
from statannotations.Annotator import Annotator
ax = sns.boxplot(data=iris_df, x="species", y="sepal length (cm)", order=species)
pairs = [(i[1]["index"], i[1]["variable"]) for i in molten_df.iterrows()]
p_values = [i[1]["value"] for i in molten_df.iterrows()]
annotator = Annotator(
ax, pairs, data=iris_df, x="species", y="sepal length (cm)", order=species
)
annotator.configure(text_format="star", loc="inside")
annotator.set_pvalues_and_annotate(p_values)
plt.tight_layout()
```

## Conclusion

While it is a shame there is no package out (yet) that makes these stats and visualization a one-liner (like the

R package ggpubr), with these bits of code it is easy enough to do this ourselves.

## 0 Comments