I have a stringio object created and it has some text in it. I'd like to clear its existing values and reuse it instead of recalling it. Is there anyway of doing this?
Don't bother clearing it, just create a new one—it’s faster.
Here's how I would find such things out:
>>> from StringIO import StringIO
>>> dir(StringIO)
['__doc__', '__init__', '__iter__', '__module__', 'close', 'flush', 'getvalue', 'isatty', 'next', 'read', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method truncate in module StringIO:
truncate(self, size=None) unbound StringIO.StringIO method
Truncate the file's size.
If the optional size argument is present, the file is truncated to
(at most) that size. The size defaults to the current position.
The current file position is not changed unless the position
is beyond the new file size.
If the specified size exceeds the file's current size, the
file remains unchanged.
So, you want .truncate(0)
. But it's probably cheaper (and easier) to initialise a new StringIO. See below for benchmarks.
(Thanks to tstone2077 for pointing out the difference.)
>>> from io import StringIO
>>> dir(StringIO)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'getvalue', 'isatty', 'line_buffering', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method_descriptor:
truncate(...)
Truncate size to pos.
The pos argument defaults to the current file position, as
returned by tell(). The current file position is unchanged.
Returns the new absolute position.
It is important to note with this that now the current file position is unchanged, whereas truncating to size zero would reset the position in the Python 2 variant.
Thus, for Python 2, you only need
>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
>>> s.getvalue()
'foo'
>>> s.truncate(0)
>>> s.getvalue()
''
>>> s.write('bar')
>>> s.getvalue()
'bar'
If you do this in Python 3, you won't get the result you expected:
>>> from io import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'\x00\x00\x00bar'
So in Python 3 you also need to reset the position:
>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.seek(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'bar'
If using the truncate
method in Python 2 code, it's safer to call seek(0)
at the same time (before or after, it doesn't matter) so that the code won't break when you inevitably port it to Python 3. And there's another reason why you should just create a new StringIO
object!
>>> from timeit import timeit
>>> def truncate(sio):
... sio.truncate(0)
... return sio
...
>>> def new(sio):
... return StringIO()
...
When empty, with StringIO:
>>> from StringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
3.5194039344787598
>>> timeit(lambda: new(StringIO()))
3.6533868312835693
With 3KB of data in, with StringIO:
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
4.3437709808349609
>>> timeit(lambda: new(StringIO('abc' * 1000)))
4.7179079055786133
And the same with cStringIO:
>>> from cStringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.55461597442626953
>>> timeit(lambda: new(StringIO()))
0.51241087913513184
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
1.0958449840545654
>>> timeit(lambda: new(StringIO('abc' * 1000)))
0.98760509490966797
So, ignoring potential memory concerns (del oldstringio
), it's faster to truncate a StringIO.StringIO
(3% faster for empty, 8% faster for 3KB of data), but it's faster ("fasterer" too) to create a new cStringIO.StringIO
(8% faster for empty, 10% faster for 3KB of data). So I'd recommend just using the easiest one—so presuming you're working with CPython, use cStringIO
and create new ones.
The same code, just with seek(0)
put in.
>>> def truncate(sio):
... sio.truncate(0)
... sio.seek(0)
... return sio
...
>>> def new(sio):
... return StringIO()
...
When empty:
>>> from io import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.9706327870007954
>>> timeit(lambda: new(StringIO()))
0.8734330690022034
With 3KB of data in:
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
3.5271066290006274
>>> timeit(lambda: new(StringIO('abc' * 1000)))
3.3496507499985455
So for Python 3 creating a new one instead of reusing a blank one is 11% faster and creating a new one instead of reusing a 3K one is 5% faster. Again, create a new StringIO
rather than truncating and seeking.