Slicing on entire panda dataframe instead of series results in change of data type and values assignment of first field to NaN, what is happening?

Your second solution raise error if numeric with strings columns:

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2.0,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df[df > 5])

TypeError: ‘>’ not supported between instances of ‘str’ and ‘int’

If compare only numeric columns it get values higher like 4 and all another numbers convert to misisng values:

df1 = df.select_dtypes(np.number)
print (df1[df1 > 4])
     B    C    D    E
0  NaN  7.0  NaN  5.0
1  5.0  8.0  NaN  NaN
2  NaN  9.0  5.0  6.0
3  5.0  NaN  7.0  9.0
4  5.0  NaN  NaN  NaN
5  NaN  NaN  NaN  NaN

Here are replaced at least one value in each column, so integers columns are converted to floats (because NaN is float):

print (df1[df1 > 4].dtypes)
B    float64
C    float64
D    float64
E    float64
dtype: object

If need compare all numeric columns if at least one of them match condition use DataFrame.any for test if at least one value is True:

#returned boolean DataFrame
print ((df1 > 7))
       B      C      D      E
0  False  False  False  False
1  False   True  False  False
2  False   True  False  False
3  False  False  False   True
4  False  False  False  False
5  False  False  False  False

print ((df1 > 7).any(axis=1))
0    False
1     True
2     True
3     True
4    False
5    False
dtype: bool


print (df1[(df1 > 7).any(axis=1)])
   B    C  D  E
1  5  8.0  3  3
2  4  9.0  5  6
3  5  4.0  7  9

Or if need filter original all columns is possible filter only numeric columns by DataFrame.select_dtypes:

print (df[(df.select_dtypes(np.number) > 7).any(axis=1)])
   A  B    C  D  E  F
1  b  5  8.0  3  3  a
2  c  4  9.0  5  6  a
3  d  5  4.0  7  9  b

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top