I am using the Gensim HDP module on a set of documents.
>>> hdp = models.HdpModel(corpusB, id2word=dictionaryB)
>>> topics = hdp.print_topics(topics=-1, topn=20)
>>> len(topics)
150
>>> hdp = models.HdpModel(corpusA, id2word=dictionaryA)
>>> topics = hdp.print_topics(topics=-1, topn=20)
>>> len(topics)
150
>>> len(corpusA)
1113
>>> len(corpusB)
17
Why is the number of topics independent of corpus length?
@Aaron's code above is broken due to gensim API changes. I rewrote and simplified it as follows. Works as of June 2017 with gensim v2.1.0
import pandas as pd
def topic_prob_extractor(gensim_hdp):
shown_topics = gensim_hdp.show_topics(num_topics=-1, formatted=False)
topics_nos = [x[0] for x in shown_topics ]
weights = [ sum([item[1] for item in shown_topics[topicN][1]]) for topicN in topics_nos ]
return pd.DataFrame({'topic_id' : topics_nos, 'weight' : weights})