How best to convert from azure blob csv format to pandas dataframe while running notebook in azure ml

Question 1

How best to convert from azure blob csv format to pandas dataframe while running notebook in azure ml

python azure pandas azure-storage-blobs azure-machine-learning-studio

random.me · Oct 13, 2015 · Viewed 13.9k times · Source

Answer

Answer

I think you want to use get_blob_to_bytes, or get_blob_to_text; these should output a string which you can use to create a dataframe as

from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME)
df = pd.read_csv(StringIO(blobstring))

Question 2

I have a number of large csv (tab delimited) data stored as azure blobs, and I want to create a pandas dataframe from these. I can do this locally as follows:

from azure.storage.blob import BlobService
import pandas as pd
import os.path

STORAGEACCOUNTNAME= 'account_name'
STORAGEACCOUNTKEY= "key"
LOCALFILENAME= 'path/to.csv'        
CONTAINERNAME= 'container_name'
BLOBNAME= 'bloby_data/000000_0'

blob_service = BlobService(account_name=STORAGEACCOUNTNAME, account_key=STORAGEACCOUNTKEY)

# Only get a local copy if haven't already got it
if not os.path.isfile(LOCALFILENAME):
    blob_service.get_blob_to_path(CONTAINERNAME,BLOBNAME,LOCALFILENAME)

df_customer = pd.read_csv(LOCALFILENAME, sep='\t')

However, when running the notebook on azure ML notebooks, I can't 'save a local copy' and then read from csv, and so I'd like to do the conversion directly (something like pd.read_azure_blob(blob_csv) or just pd.read_csv(blob_csv) would be ideal).

I can get to the desired end result (pandas dataframe for blob csv data), if I first create an azure ML workspace, and then read the datasets into that, and finally using https://github.com/Azure/Azure-MachineLearning-ClientLibrary-Python to access the dataset as a pandas dataframe, but I'd prefer to just read straight from the blob storage location.

How best to convert from azure blob csv format to pandas dataframe while running notebook in azure ml

Answer

Related questions