Wasswa  Meagan

Wasswa Meagan

1622638980

Ingest data from Apache Kafka into Azure Cosmos DB Cassandra API using Kafka Connect

Azure Cosmos DB Cassandra API is a fully managed cloud service that is compatible with Cassandra Query Language (CQL) v3.11 API. It has no operational overhead and you can benefit from all the underlying Azure Cosmos DB capabilities such as global distribution, automatic scale out partitioning, availability and latency guarantees, encryption at rest, backups etc.

At the time of writing this blog, the _Azure Cosmos DB Cassandra API serverless is available in preview mode _!

Your existing Cassandra applications can work with the Azure Cosmos DB Cassandra API since it works with CQLv4 compliant drivers (see examples for Java.Net CoreNode.jsPython etc.) But, you also need to think about integrating with other systems with existing data and bringing that into Azure Cosmos DB. One such system is Apache Kafka, which is a distributed streaming platform. It is used in industries and organizations to solve a wide variety of problems ranging from traditional asynchronous messaging, website activity tracking, log aggregation, real-time fraud detection and much more! It has a rich ecosystem of technologies such as Kafka Streams for stream processing and Kafka Connect for real-time data integration.

Thanks to its scalable design, Apache Kafka often serves as a central component in the overall data architecture with other systems pumping data into it. These could be click stream events, logs, sensor data, orders, database change-events etc. You name it! So, as you can imagine, there is a lot of data in Apache Kafka (topics) but it’s only useful when consumed or ingested into other systems. You could achieve this by writing good old plumbing code using the Kafka Producer/Consumer APIs using a language and client SDK of your choice. But you can do better!

This blog post demonstrates how you can use an open source solution (connector based) to ingest data from Kafka into Azure Cosmos DB Cassandra API. It uses a simple yet practical scenario along with a re-usable setup using Docker Compose to help with iterative development and testing. You will learn about:

  • Overview of Kafka Connect along with the details of the integration
  • How to configure and use the connector to work with Azure Cosmos DB
  • Use the connector to write data to multiple tables from a single Kafka topic

By the end of this blog, you should have a working end to end integration and be able to validate it.

The code and configuration associated with this blog post is available in this GitHub repository

#azure #database #nosql

What is GEEK

Buddha Community

Ingest data from Apache Kafka into Azure Cosmos DB Cassandra API using Kafka Connect
Chloe  Butler

Chloe Butler

1667425440

Pdf2gerb: Perl Script Converts PDF Files to Gerber format

pdf2gerb

Perl script converts PDF files to Gerber format

Pdf2Gerb generates Gerber 274X photoplotting and Excellon drill files from PDFs of a PCB. Up to three PDFs are used: the top copper layer, the bottom copper layer (for 2-sided PCBs), and an optional silk screen layer. The PDFs can be created directly from any PDF drawing software, or a PDF print driver can be used to capture the Print output if the drawing software does not directly support output to PDF.

The general workflow is as follows:

  1. Design the PCB using your favorite CAD or drawing software.
  2. Print the top and bottom copper and top silk screen layers to a PDF file.
  3. Run Pdf2Gerb on the PDFs to create Gerber and Excellon files.
  4. Use a Gerber viewer to double-check the output against the original PCB design.
  5. Make adjustments as needed.
  6. Submit the files to a PCB manufacturer.

Please note that Pdf2Gerb does NOT perform DRC (Design Rule Checks), as these will vary according to individual PCB manufacturer conventions and capabilities. Also note that Pdf2Gerb is not perfect, so the output files must always be checked before submitting them. As of version 1.6, Pdf2Gerb supports most PCB elements, such as round and square pads, round holes, traces, SMD pads, ground planes, no-fill areas, and panelization. However, because it interprets the graphical output of a Print function, there are limitations in what it can recognize (or there may be bugs).

See docs/Pdf2Gerb.pdf for install/setup, config, usage, and other info.


pdf2gerb_cfg.pm

#Pdf2Gerb config settings:
#Put this file in same folder/directory as pdf2gerb.pl itself (global settings),
#or copy to another folder/directory with PDFs if you want PCB-specific settings.
#There is only one user of this file, so we don't need a custom package or namespace.
#NOTE: all constants defined in here will be added to main namespace.
#package pdf2gerb_cfg;

use strict; #trap undef vars (easier debug)
use warnings; #other useful info (easier debug)


##############################################################################################
#configurable settings:
#change values here instead of in main pfg2gerb.pl file

use constant WANT_COLORS => ($^O !~ m/Win/); #ANSI colors no worky on Windows? this must be set < first DebugPrint() call

#just a little warning; set realistic expectations:
#DebugPrint("${\(CYAN)}Pdf2Gerb.pl ${\(VERSION)}, $^O O/S\n${\(YELLOW)}${\(BOLD)}${\(ITALIC)}This is EXPERIMENTAL software.  \nGerber files MAY CONTAIN ERRORS.  Please CHECK them before fabrication!${\(RESET)}", 0); #if WANT_DEBUG

use constant METRIC => FALSE; #set to TRUE for metric units (only affect final numbers in output files, not internal arithmetic)
use constant APERTURE_LIMIT => 0; #34; #max #apertures to use; generate warnings if too many apertures are used (0 to not check)
use constant DRILL_FMT => '2.4'; #'2.3'; #'2.4' is the default for PCB fab; change to '2.3' for CNC

use constant WANT_DEBUG => 0; #10; #level of debug wanted; higher == more, lower == less, 0 == none
use constant GERBER_DEBUG => 0; #level of debug to include in Gerber file; DON'T USE FOR FABRICATION
use constant WANT_STREAMS => FALSE; #TRUE; #save decompressed streams to files (for debug)
use constant WANT_ALLINPUT => FALSE; #TRUE; #save entire input stream (for debug ONLY)

#DebugPrint(sprintf("${\(CYAN)}DEBUG: stdout %d, gerber %d, want streams? %d, all input? %d, O/S: $^O, Perl: $]${\(RESET)}\n", WANT_DEBUG, GERBER_DEBUG, WANT_STREAMS, WANT_ALLINPUT), 1);
#DebugPrint(sprintf("max int = %d, min int = %d\n", MAXINT, MININT), 1); 

#define standard trace and pad sizes to reduce scaling or PDF rendering errors:
#This avoids weird aperture settings and replaces them with more standardized values.
#(I'm not sure how photoplotters handle strange sizes).
#Fewer choices here gives more accurate mapping in the final Gerber files.
#units are in inches
use constant TOOL_SIZES => #add more as desired
(
#round or square pads (> 0) and drills (< 0):
    .010, -.001,  #tiny pads for SMD; dummy drill size (too small for practical use, but needed so StandardTool will use this entry)
    .031, -.014,  #used for vias
    .041, -.020,  #smallest non-filled plated hole
    .051, -.025,
    .056, -.029,  #useful for IC pins
    .070, -.033,
    .075, -.040,  #heavier leads
#    .090, -.043,  #NOTE: 600 dpi is not high enough resolution to reliably distinguish between .043" and .046", so choose 1 of the 2 here
    .100, -.046,
    .115, -.052,
    .130, -.061,
    .140, -.067,
    .150, -.079,
    .175, -.088,
    .190, -.093,
    .200, -.100,
    .220, -.110,
    .160, -.125,  #useful for mounting holes
#some additional pad sizes without holes (repeat a previous hole size if you just want the pad size):
    .090, -.040,  #want a .090 pad option, but use dummy hole size
    .065, -.040, #.065 x .065 rect pad
    .035, -.040, #.035 x .065 rect pad
#traces:
    .001,  #too thin for real traces; use only for board outlines
    .006,  #minimum real trace width; mainly used for text
    .008,  #mainly used for mid-sized text, not traces
    .010,  #minimum recommended trace width for low-current signals
    .012,
    .015,  #moderate low-voltage current
    .020,  #heavier trace for power, ground (even if a lighter one is adequate)
    .025,
    .030,  #heavy-current traces; be careful with these ones!
    .040,
    .050,
    .060,
    .080,
    .100,
    .120,
);
#Areas larger than the values below will be filled with parallel lines:
#This cuts down on the number of aperture sizes used.
#Set to 0 to always use an aperture or drill, regardless of size.
use constant { MAX_APERTURE => max((TOOL_SIZES)) + .004, MAX_DRILL => -min((TOOL_SIZES)) + .004 }; #max aperture and drill sizes (plus a little tolerance)
#DebugPrint(sprintf("using %d standard tool sizes: %s, max aper %.3f, max drill %.3f\n", scalar((TOOL_SIZES)), join(", ", (TOOL_SIZES)), MAX_APERTURE, MAX_DRILL), 1);

#NOTE: Compare the PDF to the original CAD file to check the accuracy of the PDF rendering and parsing!
#for example, the CAD software I used generated the following circles for holes:
#CAD hole size:   parsed PDF diameter:      error:
#  .014                .016                +.002
#  .020                .02267              +.00267
#  .025                .026                +.001
#  .029                .03167              +.00267
#  .033                .036                +.003
#  .040                .04267              +.00267
#This was usually ~ .002" - .003" too big compared to the hole as displayed in the CAD software.
#To compensate for PDF rendering errors (either during CAD Print function or PDF parsing logic), adjust the values below as needed.
#units are pixels; for example, a value of 2.4 at 600 dpi = .0004 inch, 2 at 600 dpi = .0033"
use constant
{
    HOLE_ADJUST => -0.004 * 600, #-2.6, #holes seemed to be slightly oversized (by .002" - .004"), so shrink them a little
    RNDPAD_ADJUST => -0.003 * 600, #-2, #-2.4, #round pads seemed to be slightly oversized, so shrink them a little
    SQRPAD_ADJUST => +0.001 * 600, #+.5, #square pads are sometimes too small by .00067, so bump them up a little
    RECTPAD_ADJUST => 0, #(pixels) rectangular pads seem to be okay? (not tested much)
    TRACE_ADJUST => 0, #(pixels) traces seemed to be okay?
    REDUCE_TOLERANCE => .001, #(inches) allow this much variation when reducing circles and rects
};

#Also, my CAD's Print function or the PDF print driver I used was a little off for circles, so define some additional adjustment values here:
#Values are added to X/Y coordinates; units are pixels; for example, a value of 1 at 600 dpi would be ~= .002 inch
use constant
{
    CIRCLE_ADJUST_MINX => 0,
    CIRCLE_ADJUST_MINY => -0.001 * 600, #-1, #circles were a little too high, so nudge them a little lower
    CIRCLE_ADJUST_MAXX => +0.001 * 600, #+1, #circles were a little too far to the left, so nudge them a little to the right
    CIRCLE_ADJUST_MAXY => 0,
    SUBST_CIRCLE_CLIPRECT => FALSE, #generate circle and substitute for clip rects (to compensate for the way some CAD software draws circles)
    WANT_CLIPRECT => TRUE, #FALSE, #AI doesn't need clip rect at all? should be on normally?
    RECT_COMPLETION => FALSE, #TRUE, #fill in 4th side of rect when 3 sides found
};

#allow .012 clearance around pads for solder mask:
#This value effectively adjusts pad sizes in the TOOL_SIZES list above (only for solder mask layers).
use constant SOLDER_MARGIN => +.012; #units are inches

#line join/cap styles:
use constant
{
    CAP_NONE => 0, #butt (none); line is exact length
    CAP_ROUND => 1, #round cap/join; line overhangs by a semi-circle at either end
    CAP_SQUARE => 2, #square cap/join; line overhangs by a half square on either end
    CAP_OVERRIDE => FALSE, #cap style overrides drawing logic
};
    
#number of elements in each shape type:
use constant
{
    RECT_SHAPELEN => 6, #x0, y0, x1, y1, count, "rect" (start, end corners)
    LINE_SHAPELEN => 6, #x0, y0, x1, y1, count, "line" (line seg)
    CURVE_SHAPELEN => 10, #xstart, ystart, x0, y0, x1, y1, xend, yend, count, "curve" (bezier 2 points)
    CIRCLE_SHAPELEN => 5, #x, y, 5, count, "circle" (center + radius)
};
#const my %SHAPELEN =
#Readonly my %SHAPELEN =>
our %SHAPELEN =
(
    rect => RECT_SHAPELEN,
    line => LINE_SHAPELEN,
    curve => CURVE_SHAPELEN,
    circle => CIRCLE_SHAPELEN,
);

#panelization:
#This will repeat the entire body the number of times indicated along the X or Y axes (files grow accordingly).
#Display elements that overhang PCB boundary can be squashed or left as-is (typically text or other silk screen markings).
#Set "overhangs" TRUE to allow overhangs, FALSE to truncate them.
#xpad and ypad allow margins to be added around outer edge of panelized PCB.
use constant PANELIZE => {'x' => 1, 'y' => 1, 'xpad' => 0, 'ypad' => 0, 'overhangs' => TRUE}; #number of times to repeat in X and Y directions

# Set this to 1 if you need TurboCAD support.
#$turboCAD = FALSE; #is this still needed as an option?

#CIRCAD pad generation uses an appropriate aperture, then moves it (stroke) "a little" - we use this to find pads and distinguish them from PCB holes. 
use constant PAD_STROKE => 0.3; #0.0005 * 600; #units are pixels
#convert very short traces to pads or holes:
use constant TRACE_MINLEN => .001; #units are inches
#use constant ALWAYS_XY => TRUE; #FALSE; #force XY even if X or Y doesn't change; NOTE: needs to be TRUE for all pads to show in FlatCAM and ViewPlot
use constant REMOVE_POLARITY => FALSE; #TRUE; #set to remove subtractive (negative) polarity; NOTE: must be FALSE for ground planes

#PDF uses "points", each point = 1/72 inch
#combined with a PDF scale factor of .12, this gives 600 dpi resolution (1/72 * .12 = 600 dpi)
use constant INCHES_PER_POINT => 1/72; #0.0138888889; #multiply point-size by this to get inches

# The precision used when computing a bezier curve. Higher numbers are more precise but slower (and generate larger files).
#$bezierPrecision = 100;
use constant BEZIER_PRECISION => 36; #100; #use const; reduced for faster rendering (mainly used for silk screen and thermal pads)

# Ground planes and silk screen or larger copper rectangles or circles are filled line-by-line using this resolution.
use constant FILL_WIDTH => .01; #fill at most 0.01 inch at a time

# The max number of characters to read into memory
use constant MAX_BYTES => 10 * M; #bumped up to 10 MB, use const

use constant DUP_DRILL1 => TRUE; #FALSE; #kludge: ViewPlot doesn't load drill files that are too small so duplicate first tool

my $runtime = time(); #Time::HiRes::gettimeofday(); #measure my execution time

print STDERR "Loaded config settings from '${\(__FILE__)}'.\n";
1; #last value must be truthful to indicate successful load


#############################################################################################
#junk/experiment:

#use Package::Constants;
#use Exporter qw(import); #https://perldoc.perl.org/Exporter.html

#my $caller = "pdf2gerb::";

#sub cfg
#{
#    my $proto = shift;
#    my $class = ref($proto) || $proto;
#    my $settings =
#    {
#        $WANT_DEBUG => 990, #10; #level of debug wanted; higher == more, lower == less, 0 == none
#    };
#    bless($settings, $class);
#    return $settings;
#}

#use constant HELLO => "hi there2"; #"main::HELLO" => "hi there";
#use constant GOODBYE => 14; #"main::GOODBYE" => 12;

#print STDERR "read cfg file\n";

#our @EXPORT_OK = Package::Constants->list(__PACKAGE__); #https://www.perlmonks.org/?node_id=1072691; NOTE: "_OK" skips short/common names

#print STDERR scalar(@EXPORT_OK) . " consts exported:\n";
#foreach(@EXPORT_OK) { print STDERR "$_\n"; }
#my $val = main::thing("xyz");
#print STDERR "caller gave me $val\n";
#foreach my $arg (@ARGV) { print STDERR "arg $arg\n"; }

Download Details:

Author: swannman
Source Code: https://github.com/swannman/pdf2gerb

License: GPL-3.0 license

#perl 

Eric  Bukenya

Eric Bukenya

1624713540

Learn NoSQL in Azure: Diving Deeper into Azure Cosmos DB

This article is a part of the series – Learn NoSQL in Azure where we explore Azure Cosmos DB as a part of the non-relational database system used widely for a variety of applications. Azure Cosmos DB is a part of Microsoft’s serverless databases on Azure which is highly scalable and distributed across all locations that run on Azure. It is offered as a platform as a service (PAAS) from Azure and you can develop databases that have a very high throughput and very low latency. Using Azure Cosmos DB, customers can replicate their data across multiple locations across the globe and also across multiple locations within the same region. This makes Cosmos DB a highly available database service with almost 99.999% availability for reads and writes for multi-region modes and almost 99.99% availability for single-region modes.

In this article, we will focus more on how Azure Cosmos DB works behind the scenes and how can you get started with it using the Azure Portal. We will also explore how Cosmos DB is priced and understand the pricing model in detail.

How Azure Cosmos DB works

As already mentioned, Azure Cosmos DB is a multi-modal NoSQL database service that is geographically distributed across multiple Azure locations. This helps customers to deploy the databases across multiple locations around the globe. This is beneficial as it helps to reduce the read latency when the users use the application.

As you can see in the figure above, Azure Cosmos DB is distributed across the globe. Let’s suppose you have a web application that is hosted in India. In that case, the NoSQL database in India will be considered as the master database for writes and all the other databases can be considered as a read replicas. Whenever new data is generated, it is written to the database in India first and then it is synchronized with the other databases.

Consistency Levels

While maintaining data over multiple regions, the most common challenge is the latency as when the data is made available to the other databases. For example, when data is written to the database in India, users from India will be able to see that data sooner than users from the US. This is due to the latency in synchronization between the two regions. In order to overcome this, there are a few modes that customers can choose from and define how often or how soon they want their data to be made available in the other regions. Azure Cosmos DB offers five levels of consistency which are as follows:

  • Strong
  • Bounded staleness
  • Session
  • Consistent prefix
  • Eventual

In most common NoSQL databases, there are only two levels – Strong and EventualStrong being the most consistent level while Eventual is the least. However, as we move from Strong to Eventual, consistency decreases but availability and throughput increase. This is a trade-off that customers need to decide based on the criticality of their applications. If you want to read in more detail about the consistency levels, the official guide from Microsoft is the easiest to understand. You can refer to it here.

Azure Cosmos DB Pricing Model

Now that we have some idea about working with the NoSQL database – Azure Cosmos DB on Azure, let us try to understand how the database is priced. In order to work with any cloud-based services, it is essential that you have a sound knowledge of how the services are charged, otherwise, you might end up paying something much higher than your expectations.

If you browse to the pricing page of Azure Cosmos DB, you can see that there are two modes in which the database services are billed.

  • Database Operations – Whenever you execute or run queries against your NoSQL database, there are some resources being used. Azure terms these usages in terms of Request Units or RU. The amount of RU consumed per second is aggregated and billed
  • Consumed Storage – As you start storing data in your database, it will take up some space in order to store that data. This storage is billed per the standard SSD-based storage across any Azure locations globally

Let’s learn about this in more detail.

#azure #azure cosmos db #nosql #azure #nosql in azure #azure cosmos db

 iOS App Dev

iOS App Dev

1620466520

Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Ruthie  Bugala

Ruthie Bugala

1626494129

Using the new C# Azure.Data.Tables SDK with Azure Cosmos DB

Last month, the Azure SDK team released a new library for Azure Tables for .NET, Java, JS/TS and Python. This release brings the Table SDK in line with other Azure SDKs and they use the specific Azure Core packages for handling requests, errors and credentials.

Azure Cosmos DB provides a Table API offering that is essentially Azure Table Storage on steroids! If you need a globally distributed table storage service, Azure Cosmos DB should be your go to choice.

If you’re making a choice between Azure Cosmos DB Table API and regular Azure Table Storage, I’d recommend reading the following article.

In this article, I’ll show you how we can perform simple operations against a Azure Cosmos DB Table API account using the new Azure.Data.Table C## SDK. Specifically, we’ll go over:

  • Installing the SDK 💻
  • Connecting to our Table Client and Creating a table 🔨
  • Defining our entity 🧾
  • Adding an entity ➕
  • Performing Transactional Batch Operations 💰
  • Querying our Table ❓
  • Deleting an entity ❌

Let’s dive into it!

Installing the SDK 💻

Installing the SDK is pretty simple. We can do so by running the following dotnet command:

dotnet add package Azure.Data.Tables

If you prefer using a UI to install the NuGet packages, we can do so by right-clicking our C## Project in Visual Studio, click on Manage NuGet packages and search for the Azure.Data.Tables package:

Connecting to our Table Client and Creating a table 🔨

The SDK provides us with two clients to interact with the service. A TableServiceClient is used for interacting with our table at the account lelvel.

We do this for creating tables, setting access policies etc.

We can also use a TableClient. This is used for performing operations on our entities. We can also use the TableClient to create tables like so:

TableClient tableClient = new TableClient(config["StorageConnection"], "Customers");
            await tableClient.CreateIfNotExistsAsync();

To create our Table Client, I’m passing in my storage connection string from Azure and the name of the table I want to interact with. On the following line, we create the table if it doesn’t exist.

To get out Storage Connection string, we can do so from our Cosmos DB account under Connection String:

When we run this code for the first time, we can see that the table has been created in our Data Explorer:

Defining our entity 🧾

In Table Storage, we create entities in our table that require a Partition Key and a Row Key. The combination of these need to be unique within our table.

Entities have a set of properties and strongly-typed entities need to extend from the ITableEntity interface, which expose Partition Key, Row Key, ETag and Timestamp properties. ETag and Timestamp will be generated by Cosmos DB, so we don’t need to set these.

For this tutorial, I’m going to use the above mentioned properties along with two string properties (Email and PhoneNumber) to make up a CustomerEntity type.

#csharp #programming #azure #data #azure cosmos db #azure

Micheal  Block

Micheal Block

1601344800

Data Ingestion into Azure Data Explorer using Kafka Connect

In this blog, we will go over how to ingest data into Azure Data Explorer  using the open source Kafka Connect Sink connector for Azure Data Explorer  running on Kubernetes using Strimzi. Kafka Connect  is a tool for scalably and reliably streaming data between Apache Kafka and other systems using source and sink connectors and Strimzi provides a “Kubernetes-native” way of running Kafka clusters as well as Kafka Connect workers.

#kafka-connect #big-data #azure #kubernetes #kafka