Your second solution raise error if numeric with strings columns:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2.0,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
print (df[df > 5])
TypeError: ‘>’ not supported between instances of ‘str’ and ‘int’
If compare only numeric columns it get values higher like 4
and all another numbers convert to misisng values:
df1 = df.select_dtypes(np.number)
print (df1[df1 > 4])
B C D E
0 NaN 7.0 NaN 5.0
1 5.0 8.0 NaN NaN
2 NaN 9.0 5.0 6.0
3 5.0 NaN 7.0 9.0
4 5.0 NaN NaN NaN
5 NaN NaN NaN NaN
Here are replaced at least one value in each column, so integers columns are converted to floats (because NaN
is float
):
print (df1[df1 > 4].dtypes)
B float64
C float64
D float64
E float64
dtype: object
If need compare all numeric columns if at least one of them match condition use DataFrame.any
for test if at least one value is True
:
#returned boolean DataFrame
print ((df1 > 7))
B C D E
0 False False False False
1 False True False False
2 False True False False
3 False False False True
4 False False False False
5 False False False False
print ((df1 > 7).any(axis=1))
0 False
1 True
2 True
3 True
4 False
5 False
dtype: bool
print (df1[(df1 > 7).any(axis=1)])
B C D E
1 5 8.0 3 3
2 4 9.0 5 6
3 5 4.0 7 9
Or if need filter original all columns is possible filter only numeric columns by DataFrame.select_dtypes
:
print (df[(df.select_dtypes(np.number) > 7).any(axis=1)])
A B C D E F
1 b 5 8.0 3 3 a
2 c 4 9.0 5 6 a
3 d 5 4.0 7 9 b
CLICK HERE to find out more related problems solutions.