I am struggling to approach the bag of words / vocabulary method for representing my input data as one hot vectors for my neural net model in keras.
I would like to build a simple 3 layer network but I need help in understanding and developing an approach to transform my labelled data in the form of text,sentinment which is has 7 labels, in the range of 0 - 1 in steps of 0.2.
I have tried to use scikit's vectorisers but they are too rigid i.e they either tokenise words or characters, whereas I need a sentence to be compared to the vocabulary which includes words, characters, punctuation and emojis. When i use tfid on a test sentence it only counts the words and ignores everything else. I also need guidance on taking this one hot approach and how it will be implemented in keras.