I am new to the MS Azure. I am trying to download Microsoft Academic Graph for various analysis, and they don't offer bulk-downloading the structured dataset. External sources such as openacademicgraph weren't really useful, so I thought I could try downloading the data through Azure.
Luckily, there were manuals for just that - "Get Microsoft Academic Graph on Azure storage - docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning".
I followed the steps in the manual to create a Azure account for MAG, getting a following email from Academic Knowledge API -
Welcome to the Microsoft Academic Graph (MAG) Azure Storage (AS) Distribution preview. Please be advised that this distribution is in free preview stage. Pricing structure is subject to change.
Your Azure Storage is successfully setup to receive MAG update through Azure Data Factory. Each MAG dataset is provisioned to a separate container named "mag-yyyy-mm-dd". The 2020-02-14 dataset was pushed to your Azure Storage.
As MAG comes with ODC-BY license, you are granted the rights to add values and redistribute the derivatives based on the terms of the open data license, e.g., the attribution to MAG in your products, services or community events.
Each snapshot of MAG will show up in your Azure Storage as a distinct container. In Microsoft Academic Graph documentation, you could find a sample to extract knowledge from MAG for your application using Azure Databricks. Also there is a sample using U-SQL, a member of Azure Data Lake Analytic Framework.
We also put together great Analytics and visualization samples that we used for our WWW Conference Analytics blog post. We hope this can help accelerate your development process and spark imagination!
Next step was "Set up Azure Databricks for Microsoft Academic Graph - docs.microsoft.com/en-us/academic-services/graph/get-started-setup-databricks", which I followed. I was able to create an Azure Databricks for MAG (I have no idea what they are as I'm new to this), but now I cannot get it to run.
Following is the error message I get:
Message
Cluster terminated. Reason: Cloud Provider Launch Failure
A cloud provider error was encountered while launching worker nodes. See the Databricks guide for more information.
Azure error code: OperationNotAllowed
Azure error message: Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: centralus, Current Limit: 4, Current Usage: 4, Additional Required: 4, (Minimum) New Limit Required: 8. Submit a request for Quota increase at https://aka.ms/ProdportalCRP/?#create/Microsoft.Support/Parameters/~~~ by specifying parameters listed in the ‘Details’ section for deployment to succeed. Please read more about quota limits at https://docs.microsoft.com/en-us/azure/azure-supportability/regional-quota-requests.
I'm not sure what I'm supposed to do.
"Total Regional Cores quota" is exceeded, not my personal subscription etc. How would I ask to increase the quota for the whole region? They say I need to apply for a larger quota, which cannot be done with the free trial account I created as per the manual. Does this mean that the manual is wrong, and I have to become Pay-As-You-Go? "Current Usage: 4" but I am not using anything at the moment. All I have is an Azure storage and a Databrick cluster which aren't running. I re-tried starting the cluster, and the second time it was successfully started, only to deactivated a couple of minutes later with the same error message.
I'm not going to do any complex querying and stuff - it's going to be pretty expensive. Being the poor research and such, all I am looking to get is the dataset following the MAG schema; I will run whatever analysis on them on my desktop which would be free, while slower. Any help would be really appreciated.
To try Azure Databricks, you need to have “Pay-As-You-Go” subscription.
Azure Free Trail has a limit of 4 cores, and you cannot create Azure Databricks cluster using a Free Trial Subscription because to create a spark cluster which requires more than 4 cores.
If you have a free account, go to your profile and change your subscription to pay-as-you-go. Then, remove the spending limit, and request a quota increase for vCPUs in your region. When you create your Azure Databricks workspace, you can select the Trial (Premium - 14-Days Free DBUs) pricing tier to give the workspace access to free Premium Azure Databricks DBUs for 14 days.
For more details, refer "Sign up for a Free Azure Databricks Trial".