How do I work with the results of pytrends?

Lienne picture Lienne · Jan 14, 2018 · Viewed 8.9k times · Source

so I'm new to python and ran into a problem using pytrends. I'm trying to compare 5 search terms and store the sum in a CSV.

The problem I'm having right now is I can't seem to isolate an individual element returned. I have the data, I can see it, but I can't seem to isolate an element to be able to do anything meaningful with it.

I found elsewhere a suggestion to use iloc, but that doesn't return anything for what's shown, and if I pass only one parameter it seems to display everything.

It feels really dumb, but I just can't figure this out, nor can I find anything online.

from pytrends.request import TrendReq
import csv
import pandas
import numpy
import time

# Login to Google. Only need to run this once, the rest of requests will use the same session.
pytrend = TrendReq(hl='en-US', tz=360)

with open('database.csv',"r") as f:
    reader = csv.reader(f,delimiter = ",")
    data = list(reader)
    row_count = len(data)
    comparator_string = data[1][0] + " opening"
print("comparator: ",comparator_string,"\n")

#Initialize search term list including comparator_string as the first item, plus 4 search terms
kw_list=[]
kw_list.append(comparator_string)

for x in range(1, 5, 1):
        search_string = data[x][0] + " opening"
        kw_list.append(search_string)

# Create payload and capture API tokens. Only needed for interest_over_time(), interest_by_region() & related_queries()
pytrend.build_payload(kw_list, cat=0, timeframe='today 3-m',geo='',gprop='')

# Interest Over Time
interest_over_time_df = pytrend.interest_over_time()
#time.sleep(randint(5, 10))

#printer = interest_over_time_df.sum()
printer = interest_over_time_df.iloc[1,1]
print("printer: \n",printer)

Answer

x1084 picture x1084 · Jan 23, 2018

pytrends returns pandas.DataFrame objects, and there are a number of ways to go about indexing and selecting data.

Let's take this following bit of code, for example:

kw_list = ['apples', 'oranges', 'bananas']
interest_over_time_df = pytrend.interest_over_time()

If you run print(interest_over_time_df) you will see something like this:

            apples  oranges  bananas  isPartial
date
2017-10-23      77       15       43      False
2017-10-24      77       15       46      False
2017-10-25      78       14       41      False
2017-10-26      78       14       43      False
2017-10-27      81       17       42      False
2017-10-28      91       17       39      False
...

You'll see an index column date on the left, as well as the four data columns apples, oranges, bananas, and isPartial. You can ignore isPartial for now: that field lets you know if the data point is complete for that particular date.

At this point you can select data by column, by columns + index, etc.:

>>> interest_over_time_df['apples']
date
2017-10-23    77
2017-10-24    77
2017-10-25    78
2017-10-26    78
2017-10-27    81

>>> interest_over_time_df['apples']['2017-10-26']
78

>>> interest_over_time_df.iloc[4]  # Give me row 4
apples          81
oranges         17
bananas         42
isPartial    False
Name: 2017-10-27 00:00:00, dtype: object

>>> interest_over_time_df.iloc[4, 0] # Give me row 4, value 0
81

You may be interested in pandas.DataFrame.loc, which selects rows by label, as opposed to pandas.DataFrame.iloc, which selects rows by integer:

>>> interest_over_time_df.loc['2017-10-26']
apples          78
oranges         14
bananas         43
isPartial    False
Name: 2017-10-26 00:00:00, dtype: object

>>> interest_over_time_df.loc['2017-10-26', 'apples']
78

Hope that helps.