I'm doing a faster tests for a naive boolean information retrival system, and I would like use awk, grep, egrep, sed or thing similiar and pipes for split a text file into words and save them into other file with a word per line. Example my file cotains:
Hola mundo, hablo español y no sé si escribí bien la
pregunta, ojalá me puedan entender y ayudar
Adiós.
The output file should contain:
Hola
mundo
hablo
español
...
Thank!
Using tr:
tr -s '[[:punct:][:space:]]' '\n' < file