Capture output from pexpect

Tim picture Tim · Oct 10, 2012 · Viewed 7.3k times · Source

I am having trouble with pexpect. I'm trying to grab output from tralics which reads in latex equations and emits the MathML representation, like this:

1 ~/ % tralics --interactivemath
This is tralics 2.14.5, a LaTeX to XML translator, running on tlocal
Copyright INRIA/MIAOU/APICS/MARELLE 2002-2012, Jos\'e Grimm
Licensed under the CeCILL Free Software Licensing Agreement
Starting translation of file texput.tex.
No configuration file.
> $x+y=z$
<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'><mrow><mi>x</mi>   <mo>+</mo><mi>y</mi><mo>=</mo><mi>z</mi></mrow></math></formula>
> 

So I try to get the formula using pexpect:

import pexpect
c = pexpect.spawn('tralics --interactivemath')
c.expect('>')
c.sendline('$x+y=z$')
s = c.read_nonblocking(size=2000)
print s

The output has the formula, but with the original input at the beginning and some control chars at the end:

"x+y=z$\r\n<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'><mrow><mi>x</mi><mo>+</mo><mi>y</mi><mo>=</mo><mi>z</mi></mrow></math></formula>\r\n\r> \x1b[K"

I can clean the output string, but I must be missing something basic. Is there a cleaner way to get the MathML?

Answer

Catalin Luta picture Catalin Luta · Oct 10, 2012

From what I understand you are trying to get this from pexpect:

<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'><mrow><mi>x</mi>   <mo>+</mo><mi>y</mi><mo>=</mo><mi>z</mi></mrow></math></formula>

You can use a regexp instead of ">" for the matching in order to get the expected result. This is the easiest example:

c.expect("<formula.*formula>");

After that, you can access the matched string by calling the match attribute of pexpect:

print c.match

You might also try different regexps, due to the fact that the one I posted is a greedy one and it might hinder your execution time if the formulas are big.