I'm having hell with moving a unicode named file between unicode named folders in a Python script under Windows...
What syntax would you use to find all files of type *.ext in a folder and move them to a relative location?
Assume files and folders are unicode.
The basic problem is the unconverted mix between Unicode and byte strings. The solutions can be converting to a single format or avoiding the problems using some trickery. All of my solutions include the glob
and shutil
standard library.
For the sake of example, I have some Unicode filenames ending with ods
, and I want to move them to the subdirectory called א
(Hebrew Aleph, a unicode character).
>>> import glob
>>> import shutil
>>> files=glob.glob('*.ods') # List of Byte string file names
>>> for file in files:
... shutil.copy2(file, 'א') # Byte string directory name
...
>>> import glob
>>> import shutil
>>> files=glob.glob(u'*.ods') # List of Unicode file names
>>> for file in files:
... shutil.copy2(file, u'א') # Unicode directory name
Credit to the Ezio Melotti, Python bug list.
Although this isn't the best solution in my opinion, there is a nice trick here that's worth mentioning.
Change your directory to the destination directory using os.getcwd()
, and then copy the files to it by referring to it as .
:
# -*- coding: utf-8 -*-
import os
import shutil
import glob
os.chdir('א') # CD to the destination Unicode directory
print os.getcwd() # DEBUG: Make sure you're in the right place
files=glob.glob('../*.ods') # List of Byte string file names
for file in files:
shutil.copy2(file, '.') # Copy each file
# Don't forget to go back to the original directory here, if it matters
The straightforward approach shutil.copy2(src, dest)
fails because shutil
concatenates a unicode with ASCII string without conversions:
>>> files=glob.glob('*.ods')
>>> for file in files:
... shutil.copy2(file, u'א')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.6/shutil.py", line 98, in copy2
dst = os.path.join(dst, os.path.basename(src))
File "/usr/lib/python2.6/posixpath.py", line 70, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1:
ordinal not in range(128)
As seen before, this can be avoided when using 'א'
instead of the Unicode u'א'
In my opinion, this is bug, because Python cannot expect basedir
names to be always str
, not unicode
. I have reported this as an issue in the Python buglist, and waiting for responses.