Try adding:
Encoded_Dataframe.columns = [''.join(col) for col in Encoded_Dataframe.columns]
Your columns are a multi-index in Encoded_Dataframe
. If you write: Encoded_Dataframe.columns
, you will get MultiIndex([('A',),('B',),('C',)],)
and see what I mean, so you need to change the column names with what I have suggested, by joining the multiple levels with an empty string, so that you have one string as a column name instead of a tuple.
Also, you could use Strings = Strings.join(Encoded_Dataframe)
as an alternative to Strings = pd.concat([Strings,Encoded_Dataframe],axis=1)
:
Full Code:
import pandas as pd
import numpy as np
Names =pd.Series(["A","B","C","A","B","B","A","C","C","B","A","C"],name="Alphabet")
Strings =pd.DataFrame(Names)
print(Strings.head(12))
from sklearn.preprocessing import OneHotEncoder
MyEncoder=OneHotEncoder(sparse=False)
encoded =MyEncoder.fit_transform(Strings[["Alphabet"]])
print(encoded)
print(MyEncoder.categories_)
Encoded_Dataframe =pd.DataFrame(encoded)
Encoded_Dataframe.columns =list(MyEncoder.categories_)
print(Encoded_Dataframe.head())
Encoded_Dataframe.columns = [''.join(col) for col in Encoded_Dataframe.columns]
Strings =pd.concat([Strings,Encoded_Dataframe],axis=1)
print(Strings.head())
Alphabet
0 A
1 B
2 C
3 A
4 B
5 B
6 A
7 C
8 C
9 B
10 A
11 C
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]
[1. 0. 0.]
[0. 1. 0.]
[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]]
[array(['A', 'B', 'C'], dtype=object)]
A B C
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
3 1.0 0.0 0.0
4 0.0 1.0 0.0
Alphabet A B C
0 A 1.0 0.0 0.0
1 B 0.0 1.0 0.0
2 C 0.0 0.0 1.0
3 A 1.0 0.0 0.0
4 B 0.0 1.0 0.0
CLICK HERE to find out more related problems solutions.