Concatenation of two dataframe after onehotencoding

Try adding:

Encoded_Dataframe.columns = [''.join(col) for col in Encoded_Dataframe.columns]

Your columns are a multi-index in Encoded_Dataframe. If you write: Encoded_Dataframe.columns, you will get MultiIndex([('A',),('B',),('C',)],) and see what I mean, so you need to change the column names with what I have suggested, by joining the multiple levels with an empty string, so that you have one string as a column name instead of a tuple.

Also, you could use Strings = Strings.join(Encoded_Dataframe) as an alternative to Strings = pd.concat([Strings,Encoded_Dataframe],axis=1):

Full Code:

import pandas  as pd
import numpy as np
Names =pd.Series(["A","B","C","A","B","B","A","C","C","B","A","C"],name="Alphabet")
Strings =pd.DataFrame(Names)
print(Strings.head(12))
from sklearn.preprocessing import OneHotEncoder
MyEncoder=OneHotEncoder(sparse=False)
encoded =MyEncoder.fit_transform(Strings[["Alphabet"]])
print(encoded)
print(MyEncoder.categories_)
Encoded_Dataframe =pd.DataFrame(encoded)
Encoded_Dataframe.columns =list(MyEncoder.categories_)
print(Encoded_Dataframe.head())
Encoded_Dataframe.columns = [''.join(col) for col in Encoded_Dataframe.columns]
Strings =pd.concat([Strings,Encoded_Dataframe],axis=1)
print(Strings.head())
   Alphabet
0         A
1         B
2         C
3         A
4         B
5         B
6         A
7         C
8         C
9         B
10        A
11        C
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]]
[array(['A', 'B', 'C'], dtype=object)]
     A    B    C
0  1.0  0.0  0.0
1  0.0  1.0  0.0
2  0.0  0.0  1.0
3  1.0  0.0  0.0
4  0.0  1.0  0.0
  Alphabet    A    B    C
0        A  1.0  0.0  0.0
1        B  0.0  1.0  0.0
2        C  0.0  0.0  1.0
3        A  1.0  0.0  0.0
4        B  0.0  1.0  0.0

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top