UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte

wannabhappy picture wannabhappy · Jul 22, 2016 · Viewed 65.3k times · Source

I am trying to read twitter data from json file using python 2.7.12.

Code I used is such:

    import json
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')

    def get_tweets_from_file(file_name):
        tweets = []
        with open(file_name, 'rw') as twitter_file:
            for line in twitter_file:
                if line != '\r\n':
                    line = line.encode('ascii', 'ignore')
                    tweet = json.loads(line)
                    if u'info' not in tweet.keys():
                        tweets.append(tweet)
    return tweets

Result I got:

    Traceback (most recent call last):
      File "twitter_project.py", line 100, in <module>
        main()                  
      File "twitter_project.py", line 95, in main
        tweets = get_tweets_from_dir(src_dir, dest_dir)
      File "twitter_project.py", line 59, in get_tweets_from_dir
        new_tweets = get_tweets_from_file(file_name)
      File "twitter_project.py", line 71, in get_tweets_from_file
        line = line.encode('ascii', 'ignore')
    UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte

I went through all the answers from similar issues and came up with this code and it worked last time. I have no clue why it isn't working now...I would appreciate any help!

Answer

Sung-Ho_Ahn picture Sung-Ho_Ahn · Oct 17, 2017

In my case(mac os), there was .DS_store file in my data folder which was a hidden and auto generated file and it caused the issue. I was able to fix the problem after removing it.