Getting started with Kafka Connector for Azure Cosmos DB using Docker

A local dev environment for learning Kafka and Cosmos DB — no cost involved!

Having a local development environment is quite handy when trying out a new service or technology. Docker has emerged as the de-facto choice in such cases. It is especially useful in scenarios where you’re trying to integrate multiple services and gives you the ability to start fresh before each run.

This blog post is a getting started guide for the Kafka Connector for Azure Cosmos DB. All the components (including Azure Cosmos DB) will run on your local machine, thanks to:

  • The Azure Cosmos DB Linux Emulator that can be used for local development and testing purposes without creating an Azure subscription or incurring any costs.
  • And, Docker Compose that is a tool for defining and running multi-container Docker applications. It will orchestrate all the components required by our setup including Azure Cosmos DB emulator, Kafka, Zookeeper, Kafka connectors etc.

End to end workflow

To make things easier, we will pick single-focused scenarios and go step by step:

  • Step 0 — A simple scenario to check if our setup if functional.
  • How to handle streaming JSON data
  • How to handle streaming JSON data which is not compatible with Azure Cosmos DB
  • How to handle Avro data using Schema Registry

It is assumed that you’re comfortable with Kafka and have an understanding of Kafka Connect

#kafka #docker #azure cosmos db

What is GEEK

Buddha Community

Getting started with Kafka Connector for Azure Cosmos DB using Docker

Getting started with Kafka Connector for Azure Cosmos DB using Docker

A local dev environment for learning Kafka and Cosmos DB — no cost involved!

Having a local development environment is quite handy when trying out a new service or technology. Docker has emerged as the de-facto choice in such cases. It is especially useful in scenarios where you’re trying to integrate multiple services and gives you the ability to start fresh before each run.

This blog post is a getting started guide for the Kafka Connector for Azure Cosmos DB. All the components (including Azure Cosmos DB) will run on your local machine, thanks to:

  • The Azure Cosmos DB Linux Emulator that can be used for local development and testing purposes without creating an Azure subscription or incurring any costs.
  • And, Docker Compose that is a tool for defining and running multi-container Docker applications. It will orchestrate all the components required by our setup including Azure Cosmos DB emulator, Kafka, Zookeeper, Kafka connectors etc.

End to end workflow

To make things easier, we will pick single-focused scenarios and go step by step:

  • Step 0 — A simple scenario to check if our setup if functional.
  • How to handle streaming JSON data
  • How to handle streaming JSON data which is not compatible with Azure Cosmos DB
  • How to handle Avro data using Schema Registry

It is assumed that you’re comfortable with Kafka and have an understanding of Kafka Connect

#kafka #docker #azure cosmos db

Haylie  Conn

Haylie Conn

1626665820

Getting started with Kafka Connector for Azure Cosmos DB using Docker

Having a local development environment is quite handy when trying out a new service or technology. Docker has emerged as the de-facto choice in such cases. It is specially useful in scenarios where you’re trying to integrate multiple services and gives you the ability to to start fresh before each run.

This blog post is a getting started guide for the Kafka Connector for Azure Cosmos DB. All the components (including Azure Cosmos DB) will run on your local machine, thanks to:

  • The Azure Cosmos DB Linux Emulator which can be used for local development and testing purposes without creating an Azure subscription or incurring any costs.
  • And, Docker Compose which is a tool for defining and running multi-container Docker applications. It will orchestrate all the components required by our setup including Azure Cosmos DB emulator, Kafka, Zookeeper, Kafka connectors etc.

Image diagram

To make things easier, we will pick single-focused scenarios and go step by step:

  • Step 0 – A simple scenario to check if our setup if functional.
  • How to handle streaming JSON data
  • How to handle streaming JSON data which is not compatible with Azure Cosmos DB
  • How to handle Avro data using Schema Registry

It is assumed that you’re comfortable with Kafka and have an understanding of Kafka Connect

#docker #kafka #azure cosmos db

Shubham Ankit

Shubham Ankit

1657081614

How to Automate Excel with Python | Python Excel Tutorial (OpenPyXL)

How to Automate Excel with Python

In this article, We will show how we can use python to automate Excel . A useful Python library is Openpyxl which we will learn to do Excel Automation

What is OPENPYXL

Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file. Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more.

Workbook: A spreadsheet is represented as a workbook in openpyxl. A workbook consists of one or more sheets.

Sheet: A sheet is a single page composed of cells for organizing data.

Cell: The intersection of a row and a column is called a cell. Usually represented by A1, B5, etc.

Row: A row is a horizontal line represented by a number (1,2, etc.).

Column: A column is a vertical line represented by a capital letter (A, B, etc.).

Openpyxl can be installed using the pip command and it is recommended to install it in a virtual environment.

pip install openpyxl

CREATE A NEW WORKBOOK

We start by creating a new spreadsheet, which is called a workbook in Openpyxl. We import the workbook module from Openpyxl and use the function Workbook() which creates a new workbook.

from openpyxl
import Workbook
#creates a new workbook
wb = Workbook()
#Gets the first active worksheet
ws = wb.active
#creating new worksheets by using the create_sheet method

ws1 = wb.create_sheet("sheet1", 0) #inserts at first position
ws2 = wb.create_sheet("sheet2") #inserts at last position
ws3 = wb.create_sheet("sheet3", -1) #inserts at penultimate position

#Renaming the sheet
ws.title = "Example"

#save the workbook
wb.save(filename = "example.xlsx")

READING DATA FROM WORKBOOK

We load the file using the function load_Workbook() which takes the filename as an argument. The file must be saved in the same working directory.

#loading a workbook
wb = openpyxl.load_workbook("example.xlsx")

 

GETTING SHEETS FROM THE LOADED WORKBOOK

 

#getting sheet names
wb.sheetnames
result = ['sheet1', 'Sheet', 'sheet3', 'sheet2']

#getting a particular sheet
sheet1 = wb["sheet2"]

#getting sheet title
sheet1.title
result = 'sheet2'

#Getting the active sheet
sheetactive = wb.active
result = 'sheet1'

 

ACCESSING CELLS AND CELL VALUES

 

#get a cell from the sheet
sheet1["A1"] <
  Cell 'Sheet1'.A1 >

  #get the cell value
ws["A1"].value 'Segment'

#accessing cell using row and column and assigning a value
d = ws.cell(row = 4, column = 2, value = 10)
d.value
10

 

ITERATING THROUGH ROWS AND COLUMNS

 

#looping through each row and column
for x in range(1, 5):
  for y in range(1, 5):
  print(x, y, ws.cell(row = x, column = y)
    .value)

#getting the highest row number
ws.max_row
701

#getting the highest column number
ws.max_column
19

There are two functions for iterating through rows and columns.

Iter_rows() => returns the rows
Iter_cols() => returns the columns {
  min_row = 4, max_row = 5, min_col = 2, max_col = 5
} => This can be used to set the boundaries
for any iteration.

Example:

#iterating rows
for row in ws.iter_rows(min_row = 2, max_col = 3, max_row = 3):
  for cell in row:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C3 >

  #iterating columns
for col in ws.iter_cols(min_row = 2, max_col = 3, max_row = 3):
  for cell in col:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.C3 >

To get all the rows of the worksheet we use the method worksheet.rows and to get all the columns of the worksheet we use the method worksheet.columns. Similarly, to iterate only through the values we use the method worksheet.values.


Example:

for row in ws.values:
  for value in row:
  print(value)

 

WRITING DATA TO AN EXCEL FILE

Writing to a workbook can be done in many ways such as adding a formula, adding charts, images, updating cell values, inserting rows and columns, etc… We will discuss each of these with an example.

 

CREATING AND SAVING A NEW WORKBOOK

 

#creates a new workbook
wb = openpyxl.Workbook()

#saving the workbook
wb.save("new.xlsx")

 

ADDING AND REMOVING SHEETS

 

#creating a new sheet
ws1 = wb.create_sheet(title = "sheet 2")

#creating a new sheet at index 0
ws2 = wb.create_sheet(index = 0, title = "sheet 0")

#checking the sheet names
wb.sheetnames['sheet 0', 'Sheet', 'sheet 2']

#deleting a sheet
del wb['sheet 0']

#checking sheetnames
wb.sheetnames['Sheet', 'sheet 2']

 

ADDING CELL VALUES

 

#checking the sheet value
ws['B2'].value
null

#adding value to cell
ws['B2'] = 367

#checking value
ws['B2'].value
367

 

ADDING FORMULAS

 

We often require formulas to be included in our Excel datasheet. We can easily add formulas using the Openpyxl module just like you add values to a cell.
 

For example:

import openpyxl
from openpyxl
import Workbook

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']

ws['A9'] = '=SUM(A2:A8)'

wb.save("new2.xlsx")

The above program will add the formula (=SUM(A2:A8)) in cell A9. The result will be as below.

image

 

MERGE/UNMERGE CELLS

Two or more cells can be merged to a rectangular area using the method merge_cells(), and similarly, they can be unmerged using the method unmerge_cells().

For example:
Merge cells

#merge cells B2 to C9
ws.merge_cells('B2:C9')
ws['B2'] = "Merged cells"

Adding the above code to the previous example will merge cells as below.

image

UNMERGE CELLS

 

#unmerge cells B2 to C9
ws.unmerge_cells('B2:C9')

The above code will unmerge cells from B2 to C9.

INSERTING AN IMAGE

To insert an image we import the image function from the module openpyxl.drawing.image. We then load our image and add it to the cell as shown in the below example.

Example:

import openpyxl
from openpyxl
import Workbook
from openpyxl.drawing.image
import Image

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']
#loading the image(should be in same folder)
img = Image('logo.png')
ws['A1'] = "Adding image"
#adjusting size
img.height = 130
img.width = 200
#adding img to cell A3

ws.add_image(img, 'A3')

wb.save("new2.xlsx")

Result:

image

CREATING CHARTS

Charts are essential to show a visualization of data. We can create charts from Excel data using the Openpyxl module chart. Different forms of charts such as line charts, bar charts, 3D line charts, etc., can be created. We need to create a reference that contains the data to be used for the chart, which is nothing but a selection of cells (rows and columns). I am using sample data to create a 3D bar chart in the below example:

Example

import openpyxl
from openpyxl
import Workbook
from openpyxl.chart
import BarChart3D, Reference, series

wb = openpyxl.load_workbook("example.xlsx")
ws = wb.active

values = Reference(ws, min_col = 3, min_row = 2, max_col = 3, max_row = 40)
chart = BarChart3D()
chart.add_data(values)
ws.add_chart(chart, "E3")
wb.save("MyChart.xlsx")

Result
image


How to Automate Excel with Python with Video Tutorial

Welcome to another video! In this video, We will cover how we can use python to automate Excel. I'll be going over everything from creating workbooks to accessing individual cells and stylizing cells. There is a ton of things that you can do with Excel but I'll just be covering the core/base things in OpenPyXl.

⭐️ Timestamps ⭐️
00:00 | Introduction
02:14 | Installing openpyxl
03:19 | Testing Installation
04:25 | Loading an Existing Workbook
06:46 | Accessing Worksheets
07:37 | Accessing Cell Values
08:58 | Saving Workbooks
09:52 | Creating, Listing and Changing Sheets
11:50 | Creating a New Workbook
12:39 | Adding/Appending Rows
14:26 | Accessing Multiple Cells
20:46 | Merging Cells
22:27 | Inserting and Deleting Rows
23:35 | Inserting and Deleting Columns
24:48 | Copying and Moving Cells
26:06 | Practical Example, Formulas & Cell Styling

📄 Resources 📄
OpenPyXL Docs: https://openpyxl.readthedocs.io/en/stable/ 
Code Written in This Tutorial: https://github.com/techwithtim/ExcelPythonTutorial 
Subscribe: https://www.youtube.com/c/TechWithTim/featured 

#python 

Eric  Bukenya

Eric Bukenya

1624713540

Learn NoSQL in Azure: Diving Deeper into Azure Cosmos DB

This article is a part of the series – Learn NoSQL in Azure where we explore Azure Cosmos DB as a part of the non-relational database system used widely for a variety of applications. Azure Cosmos DB is a part of Microsoft’s serverless databases on Azure which is highly scalable and distributed across all locations that run on Azure. It is offered as a platform as a service (PAAS) from Azure and you can develop databases that have a very high throughput and very low latency. Using Azure Cosmos DB, customers can replicate their data across multiple locations across the globe and also across multiple locations within the same region. This makes Cosmos DB a highly available database service with almost 99.999% availability for reads and writes for multi-region modes and almost 99.99% availability for single-region modes.

In this article, we will focus more on how Azure Cosmos DB works behind the scenes and how can you get started with it using the Azure Portal. We will also explore how Cosmos DB is priced and understand the pricing model in detail.

How Azure Cosmos DB works

As already mentioned, Azure Cosmos DB is a multi-modal NoSQL database service that is geographically distributed across multiple Azure locations. This helps customers to deploy the databases across multiple locations around the globe. This is beneficial as it helps to reduce the read latency when the users use the application.

As you can see in the figure above, Azure Cosmos DB is distributed across the globe. Let’s suppose you have a web application that is hosted in India. In that case, the NoSQL database in India will be considered as the master database for writes and all the other databases can be considered as a read replicas. Whenever new data is generated, it is written to the database in India first and then it is synchronized with the other databases.

Consistency Levels

While maintaining data over multiple regions, the most common challenge is the latency as when the data is made available to the other databases. For example, when data is written to the database in India, users from India will be able to see that data sooner than users from the US. This is due to the latency in synchronization between the two regions. In order to overcome this, there are a few modes that customers can choose from and define how often or how soon they want their data to be made available in the other regions. Azure Cosmos DB offers five levels of consistency which are as follows:

  • Strong
  • Bounded staleness
  • Session
  • Consistent prefix
  • Eventual

In most common NoSQL databases, there are only two levels – Strong and EventualStrong being the most consistent level while Eventual is the least. However, as we move from Strong to Eventual, consistency decreases but availability and throughput increase. This is a trade-off that customers need to decide based on the criticality of their applications. If you want to read in more detail about the consistency levels, the official guide from Microsoft is the easiest to understand. You can refer to it here.

Azure Cosmos DB Pricing Model

Now that we have some idea about working with the NoSQL database – Azure Cosmos DB on Azure, let us try to understand how the database is priced. In order to work with any cloud-based services, it is essential that you have a sound knowledge of how the services are charged, otherwise, you might end up paying something much higher than your expectations.

If you browse to the pricing page of Azure Cosmos DB, you can see that there are two modes in which the database services are billed.

  • Database Operations – Whenever you execute or run queries against your NoSQL database, there are some resources being used. Azure terms these usages in terms of Request Units or RU. The amount of RU consumed per second is aggregated and billed
  • Consumed Storage – As you start storing data in your database, it will take up some space in order to store that data. This storage is billed per the standard SSD-based storage across any Azure locations globally

Let’s learn about this in more detail.

#azure #azure cosmos db #nosql #azure #nosql in azure #azure cosmos db

Ruthie  Bugala

Ruthie Bugala

1626494129

Using the new C# Azure.Data.Tables SDK with Azure Cosmos DB

Last month, the Azure SDK team released a new library for Azure Tables for .NET, Java, JS/TS and Python. This release brings the Table SDK in line with other Azure SDKs and they use the specific Azure Core packages for handling requests, errors and credentials.

Azure Cosmos DB provides a Table API offering that is essentially Azure Table Storage on steroids! If you need a globally distributed table storage service, Azure Cosmos DB should be your go to choice.

If you’re making a choice between Azure Cosmos DB Table API and regular Azure Table Storage, I’d recommend reading the following article.

In this article, I’ll show you how we can perform simple operations against a Azure Cosmos DB Table API account using the new Azure.Data.Table C## SDK. Specifically, we’ll go over:

  • Installing the SDK 💻
  • Connecting to our Table Client and Creating a table 🔨
  • Defining our entity 🧾
  • Adding an entity ➕
  • Performing Transactional Batch Operations 💰
  • Querying our Table ❓
  • Deleting an entity ❌

Let’s dive into it!

Installing the SDK 💻

Installing the SDK is pretty simple. We can do so by running the following dotnet command:

dotnet add package Azure.Data.Tables

If you prefer using a UI to install the NuGet packages, we can do so by right-clicking our C## Project in Visual Studio, click on Manage NuGet packages and search for the Azure.Data.Tables package:

Connecting to our Table Client and Creating a table 🔨

The SDK provides us with two clients to interact with the service. A TableServiceClient is used for interacting with our table at the account lelvel.

We do this for creating tables, setting access policies etc.

We can also use a TableClient. This is used for performing operations on our entities. We can also use the TableClient to create tables like so:

TableClient tableClient = new TableClient(config["StorageConnection"], "Customers");
            await tableClient.CreateIfNotExistsAsync();

To create our Table Client, I’m passing in my storage connection string from Azure and the name of the table I want to interact with. On the following line, we create the table if it doesn’t exist.

To get out Storage Connection string, we can do so from our Cosmos DB account under Connection String:

When we run this code for the first time, we can see that the table has been created in our Data Explorer:

Defining our entity 🧾

In Table Storage, we create entities in our table that require a Partition Key and a Row Key. The combination of these need to be unique within our table.

Entities have a set of properties and strongly-typed entities need to extend from the ITableEntity interface, which expose Partition Key, Row Key, ETag and Timestamp properties. ETag and Timestamp will be generated by Cosmos DB, so we don’t need to set these.

For this tutorial, I’m going to use the above mentioned properties along with two string properties (Email and PhoneNumber) to make up a CustomerEntity type.

#csharp #programming #azure #data #azure cosmos db #azure