Seaborn load_dataset

Arsibalt picture Arsibalt · May 19, 2015 · Viewed 75.4k times · Source

I am trying to get a grouped boxplot working using Seaborn as per the example

I can get the above example working, however the line:

tips = sns.load_dataset("tips")

is not explained at all. I have located the tips.csv file, but I can't seem to find adequate documentation on what load_dataset specifically does. I tried to create my own csv and load this, but to no avail. I also renamed the tips file and it still worked...

My question is thus:

Where is load_dataset actually looking for files? Can I actually use this for my own boxplots?

EDIT: I managed to get my own boxplots working using my own DataFrame, but I am still wondering whether load_dataset is used for anything more than mysterious tutorial examples.

Answer

selwyth picture selwyth · May 20, 2015

load_dataset looks for online csv files on https://github.com/mwaskom/seaborn-data. Here's the docstring:

Load a dataset from the online repository (requires internet).

Parameters


name : str Name of the dataset (name.csv on https://github.com/mwaskom/seaborn-data). You can obtain list of available datasets using :func:get_dataset_names

kws : dict, optional Passed to pandas.read_csv

If you want to modify that online dataset or bring in your own data, you likely have to use pandas. load_dataset actually returns a pandas DataFrame object, which you can confirm with type(tips).

If you already created your own data in a csv file called, say, tips2.csv, and saved it in the same location as your script, use this (after installing pandas) to load it in:

import pandas as pd

tips2 = pd.read_csv('tips2.csv')