Alex  Voloshyn

Alex Voloshyn


Azure Databricks Tutorial | Data Transformations at Scale

Azure Databricks is fast, easy to use and scalable big data collaboration platform. Based on Apache Spark brings high performance and benefits of spark without need of having high technical knowledge. You just write Python/Scala scripts and you are ready to go.

In this video I will cover basics of Databricks and show common Blob Storage JSON to Blob Storage CSV transformation scenario.

Samples from video…



What is GEEK

Buddha Community

Azure Databricks Tutorial | Data Transformations at Scale
 iOS App Dev

iOS App Dev


Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Gerhard  Brink

Gerhard Brink


Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.


As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).

This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

Getting Started With Azure Data Explorer Using the Go SDK

With the help of an example, this blog post will walk you through how to use the Azure Data explorer Go SDK to ingest data from an Azure Blob storage container and query it programmatically using the SDK. After a quick overview of how to setup Azure Data Explorer cluster (and a database), we will explore the code to understand what’s going on (and how) and finally test the application using a simple CLI interface

The sample data is a CSV file that can be downloaded from here.

What Is Azure Data Explorer?

Azure Data Explorer (also known as Kusto) is a fast and scalable data exploration service for analyzing large volumes of diverse data from any data source, such as websites, applications, IoT devices, and more. This data can then be used for diagnostics, monitoring, reporting, machine learning, and additional analytics capabilities.

It supports several ingestion methods, including connectors to common services like Event Hub, programmatic ingestion using SDKs, such as .NET and Python, and direct access to the engine for exploration purposes. It also integrates with analytics and modeling services for additional analysis and visualization of data using tools such as Power BI

Go SDK for Azure Data Explorer

The Go client SDK allows you to query, control and ingest into Azure Data Explorer clusters using Go. Please note that this is for interacting with the Azure Data Explorer cluster (and related components such as tables etc.). To create Azure Data Explorer clusters, databases etc. you should the use the admin component (control plane) SDK which is a part of the larger Azure SDK for Go

API docs -

Before getting started, here is what you would need to try out the sample application

#tutorial #big data #azure #analytics #go #azure data #azure data explorer

Trevor  Russel

Trevor Russel


Microsoft Azure Data Lake

2020 is different in every way, but one thing is constant for the past many years i.e. data and its role in molding our current technology. Recently, I was part of the team to create a central controlled data repository containing clear, consistent, and clean data. While exploring the technologies we landed on MS Azure echo system.

MS Azure echo system for developing data lakes/data warehouse is becoming mature and providing good support when it comes to the enterprise-level solutions. Starting from Azure Data Factory, it gave a good ELT/ETL processing with code-free services. This is very helpful to create pipelines for data ingestion, control flow, and moving data from source to destination. These pipelines have the capability to run 24/7 and ingest petabytes of data. Without the support of a data factory data movement between different enterprise systems requires a lot of effort and at times will be very expensive to develop and maintain. Additionally, there are more than 90 built-in connectors in Azure Data Factory which will help to connect with most of the sources like S3, Redshift, BigQuery, HDFS, Salesforce, and enterprise data warehouse to name a few.

#big data #data + integration #data streaming #big data adoption #data transformation #microsft azure

Ian  Robinson

Ian Robinson


Data Platform: The New Generation Data Lakes

The important thing about the architecture is not the vendor or specific product but the capabilities of the components used. The product choice depends on many factors.


This is article is a follow-up to Data Platform as a Service, describes the high-level architecture, and goes into details on Data Lake. We will detail the rest of the blocks and components shown in the next articles.

The important thing about the architecture is not the vendor or specific product but the capabilities of the components we used. In the end, the product choice depends on many factors:

  • Knowledge of the Team.
  • If the product is available in the cloud, we are using it.
  • Ability to integrate with our existing products.
  • The cost.

Custom Data Ingestion Engine Diagram

In this case, we have designed a solution mainly based on Azure services. At the same time, we have designed an architecture that would allow us to integrate or migrate to other cloud services in an Agile way.

Solution Architecture Based on Azure Services

Have an agile cloud data platform and not locking vendor depends on:

  • Use open-source technology for the core of the platform. This allows us to move our platform to another cloud provider.
  • Provide a data hub service for streaming and batch data.
  • Automated data pipelines allow us to move our data to different data repositories easily.
  • Data services layer **uncoupled **of data persistence engine.

Of course, we can use a specific product from a vendor that provides added value (Big QueryRedshiftSnowflake,…), but we should always have a plan to be able to replace it with another technology in an agile way.

#cloud #big data #bigdata #azure #data #spark #data lake #data platform #databricks