I get a strange error when running the Tukey test. I hope somebody is able to help me with this as I tried a lot. This is my dataframe:
Name Score
1 A 2.29
2 B 2.19
This is my Tukey Test code:
#TUKEY HSD TEST
tukey = pairwise_tukeyhsd(endog=df['Score'].astype('float'),
groups=df['Name'],
alpha=0.05)
tukey.plot_simultaneous()
plt.vlines(x=49.57,ymin=-0.5,ymax=4.5, color="red")
tukey.summary()
This is the error:
<ipython-input-12-3e12e78a002f> in <module>()
2 tukey = pairwise_tukeyhsd(endog=df['Score'].astype('float'),
3 groups=df['Name'],
----> 4 alpha=0.05)
5
6 tukey.plot_simultaneous()
/usr/local/lib/python3.6/dist-packages/statsmodels/stats/multicomp.py in pairwise_tukeyhsd(endog, groups, alpha)
36 '''
37
---> 38 return MultiComparison(endog, groups).tukeyhsd(alpha=alpha)
/usr/local/lib/python3.6/dist-packages/statsmodels/sandbox/stats/multicomp.py in __init__(self, data, groups, group_order)
794 if group_order is None:
795 self.groupsunique, self.groupintlab = np.unique(groups,
--> 796 return_inverse=True)
797 else:
798 #check if group_order has any names not in groups
/usr/local/lib/python3.6/dist-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
221 ar = np.asanyarray(ar)
222 if axis is None:
--> 223 return _unique1d(ar, return_index, return_inverse, return_counts)
224 if not (-ar.ndim <= axis < ar.ndim):
225 raise ValueError('Invalid axis kwarg specified for unique')
/usr/local/lib/python3.6/dist-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
278
279 if optional_indices:
--> 280 perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
281 aux = ar[perm]
282 else:
**TypeError: '<' not supported between instances of 'float' and 'str'**
How can this error be resolved? Thanks in advance!
You have the problem because df['Name']
contains both floats and strings AND df['Name']
is of type pandas.core.series.Series
. This combination leads to an error with numpy.unique()
as seen from traceback. You can fix the problem with 2 ways.
tukey = pairwise_tukeyhsd(endog=df['Score'].astype('float'),
groups=list(df['Name']), # list instead of a Series
alpha=0.05)
OR
Make sure df['Name']
contains only numbers or only strings.