A month ago I encountered this Github: https://github.com/taraslayshchuk/es2csv
I installed this package via pip3 in Linux ubuntu. When I wanted to use this package, I encountered the problem that this package is meant for python2. I dived into the code and soon I found the problem.
for line in open(self.tmp_file, 'r'):
timer += 1
bar.update(timer)
line_as_dict = json.loads(line)
line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()}
csv_writer.writerow(line_dict_utf8)
output_file.close()
bar.finish()
else:
print('There is no docs with selected field(s): %s.' % ','.join(self.opts.fields))
The code did a check for unicode, this is not necessary within python3 Therefore, I changed the code to the code below. As result, The package worked properly under Ubuntu 16.
for line in open(self.tmp_file, 'r'):
timer += 1
bar.update(timer)
line_as_dict = json.loads(line)
# line_dict_utf8 = {k: v.encode('utf8') if isinstance(v, unicode) else v for k, v in line_as_dict.items()}
csv_writer.writerow(line_as_dict)
output_file.close()
bar.finish()
else:
print('There is no docs with selected field(s): %s.' % ','.join(self.opts.fields))
But a month later, it was necessary to get the es2csv package working on a Windows 10 operating system. After doing the exact same adjustments with es2csv under Windows 10, I received the following error message after I tried to run es2csv:
PS C:\> es2csv -u 192.168.230.151:9200 -i scrapy -o database.csv -q '*'
Found 218 results
Run query [#######################################################################################################################] [218/218] [100%] [0:00:00] [Time: 0:00:00] [ 2.3 Kidocs/s]
Write to csv [# ] [2/218] [ 0%] [0:00:00] [ETA: 0:00:00] [ 3.9 Kilines/s]T
raceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python36\Scripts\es2csv-script.py", line 11, in <module>
load_entry_point('es2csv==5.2.1', 'console_scripts', 'es2csv')()
File "c:\users\admin\appdata\local\programs\python\python36\lib\site-packages\es2csv.py", line 284, in main
es.write_to_csv()
File "c:\users\admin\appdata\local\programs\python\python36\lib\site-packages\es2csv.py", line 238, in write_to_csv
csv_writer.writerow(line_as_dict)
File "c:\users\admin\appdata\local\programs\python\python36\lib\csv.py", line 155, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "c:\users\admin\appdata\local\programs\python\python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 95-98: character maps to <undefined>
Does anyone has an idea how to fix this error message?
It's due to the default behaviour of open
in Python 3. By default, Python 3 will open files in Text mode, which means that it also has to apply a text decoding, such as utf-8 or ASCII, for every character it reads.
Python will use your locale to determine the most suitable encoding. On OS X and Linux, this is usually UTF-8. On Windows, it'll use an 8-bit character set, such windows-1252, to match the behaviour of Notepad.
As an 8-bit character set only has a limited number of characters, it's very easy to end up trying to write a character not supported by the character set. For example, if you tried to write a Hebrew character with Windows-1252, the Western European character set.
To resolve your problem, you simply need to override the automatic encoding selection in open
and hardcode it to use UTF-8:
for line in open(self.tmp_file, 'r', encoding='utf-8'):