Get Plain text from a QLabel with Rich text

Anti Earth picture Anti Earth · Jan 17, 2012 · Viewed 14.2k times · Source

I have a QLabel that contains rich text.
I want to extract just the actual (visible) 'text' from the QLabel, and none of the code for formatting.
I essentially need a function similiar to the '.toPlainText' method of other Qt Widgets.

I can not simply call .text() and string manipulate away the html tags as suggested in this thread Get plain text from QString with HTML tags, since the returned QString contains all the <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> nonsense.

How do I extract the plain text?

(I'm open to any method, even if indirect. eg; Pre-existing functions that convert html to plain text)

Thanks!

Specs:
python 2.7.2
PyQt4
Windows 7

Answer

ekhumoro picture ekhumoro · Jan 17, 2012

Use a QTextDocument to do the conversion:

doc = QtGui.QTextDocument()
doc.setHtml(label.text())
text = doc.toPlainText()