Word frequencies from strings in Postgres?

postgresql text nlp word-frequency

Marty · Mar 7, 2011 · Viewed 11.4k times · Source

Is it possible to identify distinct words and a count for each, from fields containing text strings in Postgres?

Answer

Something like this?

SELECT some_pk, 
       regexp_split_to_table(some_column, '\s') as word
FROM some_table

Getting the distinct words is easy then:

SELECT DISTINCT word
FROM ( 
  SELECT regexp_split_to_table(some_column, '\s') as word
  FROM some_table
) t

or getting the count for each word:

SELECT word, count(*)
FROM ( 
  SELECT regexp_split_to_table(some_column, '\s') as word
  FROM some_table
) t
GROUP BY word