I'm preprocessing my dataset with pd.get_dummies, but the result is not what I need.
Is it correct to use pd.get_dummies()? Or any approaches I can try?
import pandas as pd rawdataset=[['apple','banana','carrot','daikon','egg'], ['apple','banana'], ['apple','banana','carrot'], ['daikon','egg','fennel'], ['apple','banana','daikon']] dataset=pd.DataFrame(data=rawdataset) print(pd.get_dummies(dataset))
I expect it looks like this:
apple banana carrot daikon egg fennel0 1 1 1 1 1 0
1 1 1 0 0 0 0
…
not like this:
0_apple 0_daikon 1_banana 1_egg 2_carrot 2_daikon 2_fennel0 1 0 1 0 1 0 0
1 1 0 1 0 0 0 0
…
#python #pandas