After several years of almost exclusive attention to Python as the primary language for Microsoft’s data science toolbox, R community got back into Microsoft’s good graces in October 2019, and MS released Azure Machine Learning R SDK on GitHub (not surprisingly, it is built on top of Azure ML Python SDK). The purpose of these tools is to simplify the use of Azure cloud storage, compute, and other assets (such as experiments, models etc.) using code, as opposed to the Azure Machine Learning user interface (UI), a.k.a. “studio”. This is a great capability for anyone who wants to build fully automated data science pipelines, or simply prefers a code-first approach.

User interface of Azure Machine Learning, a.k.a. “studio”. Accessible via

There are a few simple steps you need to perform to set yourself up for using the Azure ML R SDK, and

has written anexcellent guideon getting started with it. For the purposes of this tutorial, I will assume you already have an Azure ML workspace and have managed to access it — at least via studio. Otherwise, follow Luca’s guide.Elements of Azure Machine Learning Workspace — Brief Recap

An Azure Machine Learning **workspace **is a top-level resource, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning

To us, data people, among other artifacts there are two that stand out: _datastores _and datasets.

**Datastores **are simply abstract objects that store connection information so that you can securely access your Azure storage services such as Azure Blob Container, Azure File Share, Azure SQL Database etc. For the purposes of this tutorial, I will assume that you already have a registered datastore in your Azure ML Workspace.

**Datasets **are a different animal. Azure Machine Learning datasets are not copies of your data. When you create a dataset, you simply create a reference to the data in a storage service, along with a copy of its metadata (see Secure Data Access in Azure Machine Learning by Microsoft). Typically, you create a dataset for a specific data-science project you have in mind, and use it to point to a particular, narrow set of data in your available datastores.

What does this mean for us? It means that when uploading local data to be used for Machine Learning in Azure, we are not actually uploading a dataset. We are uploading a file to one of our storage services, then we create an **abstraction **“datastore” to point to that file, and then we create an **abstraction **“dataset” to point to a file in that datastore. Seems like a lot of redundant layers, but it’s for a good reason — you will see it when you start working with Azure ML.

#r #azure-machine-learning #azure #data-science #machine-learning

How to upload a local file as an Azure ML dataset using R code
2.45 GEEK