Elvis Miranda

Elvis Miranda

1567580423

Using Python and MySQL in the ETL Process

Python is very popular these days. Since Python is a general-purpose programming language, it can also be used to perform the Extract, Transform, Load (ETL) process. Different ETL modules are available, but today we’ll stick with the combination of Python and MySQL.

We’ll use Python to invoke stored procedures and prepare and execute SQL statements.

We’ll use two similar-but-different approaches. First, we’ll invoke stored procedures that will do the whole job, and after that we’ll analyze how we could do the same process without stored procedures by using MySQL code in Python.

Ready? Before we dig in, let’s look at the data model – or data models, as there’s two of them in this article.

The Data Models

We’ll need two data models, one to store our operational data and the other to store our reporting data.

The Data Models

The first model is shown in the picture above. This model is used to store operational (live) data for a subscription-based business. For more insight into this model, please take a look at our previous article, CREATING A DWH, PART ONE: A SUBSCRIPTION BUSINESS DATA MODEL.

https://my.vertabelo.com/model/IvcYd9W06dorp9zZxp9OfUGFyMlD7Z7s

Separating operational and reporting data is usually a very wise decision. To achieve that separation, we’ll need to create a data warehouse (DWH). We already did that; you can see the model in the picture above. This model is also described in detail in the post CREATING A DWH, PART TWO: A SUBSCRIPTION BUSINESS DATA MODEL.

Finally, we need to extract data from the live database, transform it, and load it into our DWH. We’ve already done this using SQL stored procedures. You can find a description of what we want to achieve along with some code examples in CREATING A DATA WAREHOUSE, PART 3: A SUBSCRIPTION BUSINESS DATA MODEL.

If you need additional information regarding DWHs, we recommend reading these articles:

Our task today is to replace the SQL stored procedures with Python code. We’re ready to make some Python magic. Let’s start with using only stored procedures in Python.

Method 1: ETL Using Stored Procedures

Before we start describing the process, it’s important to mention that we have two databases on our server.

The subscription_live database is used to store transactional/live data, while the subscription_dwh is our reporting database (DWH).

We’ve already described the stored procedures used to update dimension and fact tables. They will read data from the subscription_live database, combine it with data in the subscription_dwh database, and insert new data into the subscription_dwhdatabase. These two procedures are:

If you want to see the complete code for these procedures, read CREATING A DATA WAREHOUSE, PART 3: A SUBSCRIPTION BUSINESS DATA MODEL.

Now we’re ready to write a simple Python script that will connect to the server and perform the ETL process. Let’s first take a look at the whole script (etl_procedures.py). Then we’ll explain the most important parts.

# import MySQL connector
import mysql.connector

# connect to server
connection = mysql.connector.connect(user='', password='', host='127.0.0.1')
print('Connected to database.')
cursor = connection.cursor()

# I update dimensions
cursor.callproc('subscription_dwh.p_update_dimensions')
print('Dimension tables updated.')

# II update facts
cursor.callproc('subscription_dwh.p_update_facts')
print('Fact tables updated.')

# commit & close connection
cursor.close()
connection.commit()
connection.close()
print('Disconnected from database.')


etl_procedures.py

Importing Modules and Connecting to the Database

Python uses modules to store definitions and statements. You could use an existing module or write your own. Using existing modules will simplify your life because you’re using pre-written code, but writing your own module is also very useful. When you quit the Python interpreter and run it again, you’ll lose functions and variables you’ve previously defined. Of course, you don’t want to type the same code over and over again. To avoid that, you could store your definitions in a module and import it into Python.

Back to etl_procedures.py. In our program, we start with importing MySQL Connector:

# import MySQL connector
import mysql.connector

MySQL Connector for Python is used as a standardized driver that connects to a MySQL server/database. You’ll need to download it and install it if you haven’t previously done that. Besides connecting to the database, it offers a number of methods and properties for working with a database. We’ll use some of them, but you can check the complete documentation HERE.

Next, we’ll need to connect to our database:

# connect to server
connection = mysql.connector.connect(user='', password='', host='127.0.0.1')
print('Connected to database.')
cursor = connection.cursor()

The first line will connect to a server (in this case, I’m connecting to my local machine) using your credentials (replace and with actual values). While establishing a connection, you could also specify the database you want to connect to, as shown below:

connection = mysql.connector.connect(user='', password='', host='127.0.0.1', database='')

I’ve intentionally connected only to a server and not to a specific database because I’ll be using two databases located on the same server.

The next command – print – is here just a notification that we were successfully connected. While it has no programming significance, it could be used to debug the code if something went wrong in the script.

The last line in this part is:

cursor = connection.cursor()

Cursors are the handler structure used to work with the data. We’ll use them for retrieving data from the database (SELECT), but also to modify the data (INSERT, UPDATE, DELETE). Before using a cursor, we need to create it. And that is what this line does.

Calling Procedures

The previous part was general and could be used for other database-related tasks. The following part of the code is specifically for ETL: calling our stored procedures with the cursor.callproc command. It looks like this:

# 1. update dimensions
cursor.callproc('subscription_dwh.p_update_dimensions')
print('Dimension tables updated.')

# 2. update facts
cursor.callproc('subscription_dwh.p_update_facts')
print('Fact tables updated.')

Calling procedures is pretty much self-explanatory. After each call, a print command was added. Again, this just gives us a notification that everything went okay.

Commit and Close

The final part of the script commits the database changes and closes all used objects:

# commit & close connection
cursor.close()
connection.commit()
connection.close()
print('Disconnected from database.')

Calling procedures is pretty much self-explanatory. After each call, a print command was added. Again, this just gives us a notification that everything went okay.

Committing is essential here; without it, there will be no changes to the database, even if you called a procedure or executed an SQL statement.

Running the Script

The last thing we need to do is to run our script. We’ll use the following commands in the Python Shell to achieve that:

import os file_path = ‘D://python_scripts’ os.chdir(file_path) exec(open(“etl_procedures.py”).read())

The script is executed and all changes are made in the database accordingly. The result can be seen in the picture below.

Method 2: ETL Using Python and MySQL

The approach presented above doesn’t differ a lot from the approach of calling stored procedures directly in MySQL. The only difference is that now we have a script that will do the whole job for us.

We could use another approach: putting everything inside the Python script. We’ll include Python statements, but we’ll also prepare SQL queries and execute them on the database. The source database (live) and the destination database (DWH) are the same as in the example with stored procedures.

Before we delve into this, let’s take a look at the complete script (etl_queries.py):

from datetime import date

# import MySQL connector
import mysql.connector

# connect to server
connection = mysql.connector.connect(user='', password='', host='127.0.0.1')
print('Connected to database.')

# 1. update dimensions

# 1.1 update dim_time
# date - yesterday
yesterday = date.fromordinal(date.today().toordinal()-1)
yesterday_str = '"' + str(yesterday) + '"'
# test if date is already in the table
cursor = connection.cursor()
query = (
  "SELECT COUNT(*) "
  "FROM subscription_dwh.dim_time " 
  "WHERE time_date = " + yesterday_str)
cursor.execute(query)
result = cursor.fetchall()
yesterday_subscription_count = int(result[0][0])
if yesterday_subscription_count == 0:
  yesterday_year = 'YEAR("' + str(yesterday) + '")'
  yesterday_month = 'MONTH("' + str(yesterday) + '")'
  yesterday_week = 'WEEK("' + str(yesterday) + '")'
  yesterday_weekday = 'WEEKDAY("' + str(yesterday) + '")'
  query = (
  "INSERT INTO subscription_dwh.`dim_time`(`time_date`, `time_year`, `time_month`, `time_week`, `time_weekday`, `ts`) " 
" VALUES (" + yesterday_str + ", " + yesterday_year + ", " + yesterday_month + ", " + yesterday_week + ", " + yesterday_weekday + ", Now())")
  cursor.execute(query)

# 1.2 update dim_city
query = (
  "INSERT INTO subscription_dwh.`dim_city`(`city_name`, `postal_code`, `country_name`, `ts`) "
  "SELECT city_live.city_name, city_live.postal_code, country_live.country_name, Now() "
  "FROM subscription_live.city city_live "
  "INNER JOIN subscription_live.country country_live ON city_live.country_id = country_live.id "
  "LEFT JOIN subscription_dwh.dim_city city_dwh ON city_live.city_name = city_dwh.city_name AND city_live.postal_code = city_dwh.postal_code AND country_live.country_name = city_dwh.country_name "
  "WHERE city_dwh.id IS NULL")
cursor.execute(query)

print('Dimension tables updated.')


# 2. update facts

# 2.1 update customers subscribed
# delete old data for the same date
query = (
  "DELETE subscription_dwh.`fact_customer_subscribed`.* "
  "FROM subscription_dwh.`fact_customer_subscribed` "
  "INNER JOIN subscription_dwh.`dim_time` ON subscription_dwh.`fact_customer_subscribed`.`dim_time_id` = subscription_dwh.`dim_time`.`id` "
  "WHERE subscription_dwh.`dim_time`.`time_date` = " + yesterday_str)
cursor.execute(query)
# insert new data
query = (
  "INSERT INTO subscription_dwh.`fact_customer_subscribed`(`dim_city_id`, `dim_time_id`, `total_active`, `total_inactive`, `daily_new`, `daily_canceled`, `ts`) "
  " SELECT city_dwh.id AS dim_ctiy_id, time_dwh.id AS dim_time_id, SUM(CASE WHEN customer_live.active = 1 THEN 1 ELSE 0 END) AS total_active, SUM(CASE WHEN customer_live.active = 0 THEN 1 ELSE 0 END) AS total_inactive, SUM(CASE WHEN customer_live.active = 1 AND DATE(customer_live.time_updated) = @time_date THEN 1 ELSE 0 END) AS daily_new, SUM(CASE WHEN customer_live.active = 0 AND DATE(customer_live.time_updated) = @time_date THEN 1 ELSE 0 END) AS daily_canceled, MIN(NOW()) AS ts "
  "FROM subscription_live.`customer` customer_live "
  "INNER JOIN subscription_live.`city` city_live ON customer_live.city_id = city_live.id "
  "INNER JOIN subscription_live.`country` country_live ON city_live.country_id = country_live.id "
  "INNER JOIN subscription_dwh.dim_city city_dwh ON city_live.city_name = city_dwh.city_name AND city_live.postal_code = city_dwh.postal_code AND country_live.country_name = city_dwh.country_name "
  "INNER JOIN subscription_dwh.dim_time time_dwh ON time_dwh.time_date = " + yesterday_str + " " 
  "GROUP BY city_dwh.id, time_dwh.id")
cursor.execute(query)

# 2.2 update subscription statuses
# delete old data for the same date
query = (
  "DELETE subscription_dwh.`fact_subscription_status`.* "
  "FROM subscription_dwh.`fact_subscription_status` "
  "INNER JOIN subscription_dwh.`dim_time` ON subscription_dwh.`fact_subscription_status`.`dim_time_id` = subscription_dwh.`dim_time`.`id` "
  "WHERE subscription_dwh.`dim_time`.`time_date` = " + yesterday_str)
cursor.execute(query)
# insert new data
query = (
  "INSERT INTO subscription_dwh.`fact_subscription_status`(`dim_city_id`, `dim_time_id`, `total_active`, `total_inactive`, `daily_new`, `daily_canceled`, `ts`) "
  "SELECT city_dwh.id AS dim_ctiy_id, time_dwh.id AS dim_time_id, SUM(CASE WHEN subscription_live.active = 1 THEN 1 ELSE 0 END) AS total_active, SUM(CASE WHEN subscription_live.active = 0 THEN 1 ELSE 0 END) AS total_inactive, SUM(CASE WHEN subscription_live.active = 1 AND DATE(subscription_live.time_updated) = @time_date THEN 1 ELSE 0 END) AS daily_new, SUM(CASE WHEN subscription_live.active = 0 AND DATE(subscription_live.time_updated) = @time_date THEN 1 ELSE 0 END) AS daily_canceled, MIN(NOW()) AS ts "
  "FROM subscription_live.`customer` customer_live "
  "INNER JOIN subscription_live.`subscription` subscription_live ON subscription_live.customer_id = customer_live.id "
  "INNER JOIN subscription_live.`city` city_live ON customer_live.city_id = city_live.id "
  "INNER JOIN subscription_live.`country` country_live ON city_live.country_id = country_live.id "
  "INNER JOIN subscription_dwh.dim_city city_dwh ON city_live.city_name = city_dwh.city_name AND city_live.postal_code = city_dwh.postal_code AND country_live.country_name = city_dwh.country_name "
  "INNER JOIN subscription_dwh.dim_time time_dwh ON time_dwh.time_date = " + yesterday_str + " "
  "GROUP BY city_dwh.id, time_dwh.id")
cursor.execute(query)

print('Fact tables updated.')

# commit & close connection
cursor.close()
connection.commit()
connection.close()
print('Disconnected from database.')

etl_queries.py

Importing Modules and Connecting to the Database

Once more, we’ll need to import MySQL using the following code:

import mysql.connector

We’ll also import the datetime module, as shown below. We need this for date-related operations in Python:

from datetime import date

The process for connecting to the database is the same as in the previous example.

Updating the dim_time Dimension

To update the dim_time table, we’ll need to check if the value (for yesterday) is already in the table. We’ll have to use Python’s date functions (instead of SQL’s) to do this:

# date - yesterday
yesterday = date.fromordinal(date.today().toordinal()-1)
yesterday_str = '"' + str(yesterday) + '"'


The first line of code will return yesterday’s date in the date variable, while the second line will store this value as a string. We’ll need this as a string because we will concatenate it with another string when we build the SQL query.

Next, we’ll need to test if this date is already in the dim_time table. After declaring a cursor, we’ll prepare the SQL query. To execute the query, we’ll use the cursor.executecommand:

# test if date is already in the table
cursor = connection.cursor()
query = (
  "SELECT COUNT(*) "
  "FROM subscription_dwh.dim_time " 
  "WHERE time_date = " + yesterday_str)
cursor.execute(query)
'"'

We’ll store the query result in the result variable. The result will have either 0 or 1 rows, so we can test the first column of the first row. It will contain either a 0 or a 1. (Remember, we can have the same date only once in a dimension table.)

If the date is not already in the table, we’ll prepare the strings that will be part of the SQL query:

result = cursor.fetchall()
yesterday_subscription_count = int(result[0][0])
if yesterday_subscription_count == 0:
  yesterday_year = 'YEAR("' + str(yesterday) + '")'
  yesterday_month = 'MONTH("' + str(yesterday) + '")'
  yesterday_week = 'WEEK("' + str(yesterday) + '")'
  yesterday_weekday = 'WEEKDAY("' + str(yesterday) + '")'

Finally, we’ll build a query and execute it. This will update the dim_time table after it is committed. Please notice that I’ve used the complete path to the table, including the database name (subscription_dwh).

  query = (
  "INSERT INTO subscription_dwh.`dim_time`(`time_date`, `time_year`, `time_month`, `time_week`, `time_weekday`, `ts`) " 
" VALUES (" + yesterday_str + ", " + yesterday_year + ", " + yesterday_month + ", " + yesterday_week + ", " + yesterday_weekday + ", Now())")
  cursor.execute(query)

Update the dim_city Dimension

Updating the dim_city table is even simpler because we don’t need to test anything before the insert. We’ll actually include that test in the SQL query.

# 1.2 update dim_city
query = (
  "INSERT INTO subscription_dwh.`dim_city`(`city_name`, `postal_code`, `country_name`, `ts`) "
  "SELECT city_live.city_name, city_live.postal_code, country_live.country_name, Now() "
  "FROM subscription_live.city city_live "
  "INNER JOIN subscription_live.country country_live ON city_live.country_id = country_live.id "
  "LEFT JOIN subscription_dwh.dim_city city_dwh ON city_live.city_name = city_dwh.city_name AND city_live.postal_code = city_dwh.postal_code AND country_live.country_name = city_dwh.country_name "
  "WHERE city_dwh.id IS NULL")
cursor.execute(query)

Here we prepare an execute the SQL query. Notice that I’ve again used the full paths to tables, including the names of both databases (subscription_live and subscription_dwh).

Updating the Fact Tables

The last thing we need to do is to update our fact tables. The process is almost the same as updating dimension tables: we prepare queries and execute them. These queries are much more complex, but they are the same as the ones used in the stored procedures.

We’ve added one improvement compared to the stored procedures: deleting the existing data for the same date in the fact table. This will allow us to run a script multiple times for the same date. At the end, we’ll need to commit the transaction and close all objects and the connection.

Running the Script

We have a minor change in this part, which is calling a different script:

-	import os
-	file_path = 'D://python_scripts'
-	os.chdir(file_path)
-	exec(open("etl_queries.py").read())

Because we’ve used the same messages and the script completed successfully, the result is the same:

How Would You Use Python in ETL?

Today we saw one example of performing the ETL process with a Python script. There are other ways to do this, e.g. a number of open-source solutions that utilize Python libraries to work with databases and perform the ETL process. In the next article, we’ll play with one of them. In the meantime, feel free to share your experience with Python and ETL.

Recommended Reading

Post Multipart Form Data in Python with Requests: Flask File Upload Example

Get Modular with Python Functions

Six Python Tips for Beginners

Web Scraping 101 in Python

5 Python Frameworks You Should Learn 2019

Data Preprocessing in Python

Deep Learning from Scratch and Using Tensorflow in Python

Deploying a python-django application using docker

#python #mysql

What is GEEK

Buddha Community

Using Python and MySQL in the ETL Process
Chloe  Butler

Chloe Butler

1667425440

Pdf2gerb: Perl Script Converts PDF Files to Gerber format

pdf2gerb

Perl script converts PDF files to Gerber format

Pdf2Gerb generates Gerber 274X photoplotting and Excellon drill files from PDFs of a PCB. Up to three PDFs are used: the top copper layer, the bottom copper layer (for 2-sided PCBs), and an optional silk screen layer. The PDFs can be created directly from any PDF drawing software, or a PDF print driver can be used to capture the Print output if the drawing software does not directly support output to PDF.

The general workflow is as follows:

  1. Design the PCB using your favorite CAD or drawing software.
  2. Print the top and bottom copper and top silk screen layers to a PDF file.
  3. Run Pdf2Gerb on the PDFs to create Gerber and Excellon files.
  4. Use a Gerber viewer to double-check the output against the original PCB design.
  5. Make adjustments as needed.
  6. Submit the files to a PCB manufacturer.

Please note that Pdf2Gerb does NOT perform DRC (Design Rule Checks), as these will vary according to individual PCB manufacturer conventions and capabilities. Also note that Pdf2Gerb is not perfect, so the output files must always be checked before submitting them. As of version 1.6, Pdf2Gerb supports most PCB elements, such as round and square pads, round holes, traces, SMD pads, ground planes, no-fill areas, and panelization. However, because it interprets the graphical output of a Print function, there are limitations in what it can recognize (or there may be bugs).

See docs/Pdf2Gerb.pdf for install/setup, config, usage, and other info.


pdf2gerb_cfg.pm

#Pdf2Gerb config settings:
#Put this file in same folder/directory as pdf2gerb.pl itself (global settings),
#or copy to another folder/directory with PDFs if you want PCB-specific settings.
#There is only one user of this file, so we don't need a custom package or namespace.
#NOTE: all constants defined in here will be added to main namespace.
#package pdf2gerb_cfg;

use strict; #trap undef vars (easier debug)
use warnings; #other useful info (easier debug)


##############################################################################################
#configurable settings:
#change values here instead of in main pfg2gerb.pl file

use constant WANT_COLORS => ($^O !~ m/Win/); #ANSI colors no worky on Windows? this must be set < first DebugPrint() call

#just a little warning; set realistic expectations:
#DebugPrint("${\(CYAN)}Pdf2Gerb.pl ${\(VERSION)}, $^O O/S\n${\(YELLOW)}${\(BOLD)}${\(ITALIC)}This is EXPERIMENTAL software.  \nGerber files MAY CONTAIN ERRORS.  Please CHECK them before fabrication!${\(RESET)}", 0); #if WANT_DEBUG

use constant METRIC => FALSE; #set to TRUE for metric units (only affect final numbers in output files, not internal arithmetic)
use constant APERTURE_LIMIT => 0; #34; #max #apertures to use; generate warnings if too many apertures are used (0 to not check)
use constant DRILL_FMT => '2.4'; #'2.3'; #'2.4' is the default for PCB fab; change to '2.3' for CNC

use constant WANT_DEBUG => 0; #10; #level of debug wanted; higher == more, lower == less, 0 == none
use constant GERBER_DEBUG => 0; #level of debug to include in Gerber file; DON'T USE FOR FABRICATION
use constant WANT_STREAMS => FALSE; #TRUE; #save decompressed streams to files (for debug)
use constant WANT_ALLINPUT => FALSE; #TRUE; #save entire input stream (for debug ONLY)

#DebugPrint(sprintf("${\(CYAN)}DEBUG: stdout %d, gerber %d, want streams? %d, all input? %d, O/S: $^O, Perl: $]${\(RESET)}\n", WANT_DEBUG, GERBER_DEBUG, WANT_STREAMS, WANT_ALLINPUT), 1);
#DebugPrint(sprintf("max int = %d, min int = %d\n", MAXINT, MININT), 1); 

#define standard trace and pad sizes to reduce scaling or PDF rendering errors:
#This avoids weird aperture settings and replaces them with more standardized values.
#(I'm not sure how photoplotters handle strange sizes).
#Fewer choices here gives more accurate mapping in the final Gerber files.
#units are in inches
use constant TOOL_SIZES => #add more as desired
(
#round or square pads (> 0) and drills (< 0):
    .010, -.001,  #tiny pads for SMD; dummy drill size (too small for practical use, but needed so StandardTool will use this entry)
    .031, -.014,  #used for vias
    .041, -.020,  #smallest non-filled plated hole
    .051, -.025,
    .056, -.029,  #useful for IC pins
    .070, -.033,
    .075, -.040,  #heavier leads
#    .090, -.043,  #NOTE: 600 dpi is not high enough resolution to reliably distinguish between .043" and .046", so choose 1 of the 2 here
    .100, -.046,
    .115, -.052,
    .130, -.061,
    .140, -.067,
    .150, -.079,
    .175, -.088,
    .190, -.093,
    .200, -.100,
    .220, -.110,
    .160, -.125,  #useful for mounting holes
#some additional pad sizes without holes (repeat a previous hole size if you just want the pad size):
    .090, -.040,  #want a .090 pad option, but use dummy hole size
    .065, -.040, #.065 x .065 rect pad
    .035, -.040, #.035 x .065 rect pad
#traces:
    .001,  #too thin for real traces; use only for board outlines
    .006,  #minimum real trace width; mainly used for text
    .008,  #mainly used for mid-sized text, not traces
    .010,  #minimum recommended trace width for low-current signals
    .012,
    .015,  #moderate low-voltage current
    .020,  #heavier trace for power, ground (even if a lighter one is adequate)
    .025,
    .030,  #heavy-current traces; be careful with these ones!
    .040,
    .050,
    .060,
    .080,
    .100,
    .120,
);
#Areas larger than the values below will be filled with parallel lines:
#This cuts down on the number of aperture sizes used.
#Set to 0 to always use an aperture or drill, regardless of size.
use constant { MAX_APERTURE => max((TOOL_SIZES)) + .004, MAX_DRILL => -min((TOOL_SIZES)) + .004 }; #max aperture and drill sizes (plus a little tolerance)
#DebugPrint(sprintf("using %d standard tool sizes: %s, max aper %.3f, max drill %.3f\n", scalar((TOOL_SIZES)), join(", ", (TOOL_SIZES)), MAX_APERTURE, MAX_DRILL), 1);

#NOTE: Compare the PDF to the original CAD file to check the accuracy of the PDF rendering and parsing!
#for example, the CAD software I used generated the following circles for holes:
#CAD hole size:   parsed PDF diameter:      error:
#  .014                .016                +.002
#  .020                .02267              +.00267
#  .025                .026                +.001
#  .029                .03167              +.00267
#  .033                .036                +.003
#  .040                .04267              +.00267
#This was usually ~ .002" - .003" too big compared to the hole as displayed in the CAD software.
#To compensate for PDF rendering errors (either during CAD Print function or PDF parsing logic), adjust the values below as needed.
#units are pixels; for example, a value of 2.4 at 600 dpi = .0004 inch, 2 at 600 dpi = .0033"
use constant
{
    HOLE_ADJUST => -0.004 * 600, #-2.6, #holes seemed to be slightly oversized (by .002" - .004"), so shrink them a little
    RNDPAD_ADJUST => -0.003 * 600, #-2, #-2.4, #round pads seemed to be slightly oversized, so shrink them a little
    SQRPAD_ADJUST => +0.001 * 600, #+.5, #square pads are sometimes too small by .00067, so bump them up a little
    RECTPAD_ADJUST => 0, #(pixels) rectangular pads seem to be okay? (not tested much)
    TRACE_ADJUST => 0, #(pixels) traces seemed to be okay?
    REDUCE_TOLERANCE => .001, #(inches) allow this much variation when reducing circles and rects
};

#Also, my CAD's Print function or the PDF print driver I used was a little off for circles, so define some additional adjustment values here:
#Values are added to X/Y coordinates; units are pixels; for example, a value of 1 at 600 dpi would be ~= .002 inch
use constant
{
    CIRCLE_ADJUST_MINX => 0,
    CIRCLE_ADJUST_MINY => -0.001 * 600, #-1, #circles were a little too high, so nudge them a little lower
    CIRCLE_ADJUST_MAXX => +0.001 * 600, #+1, #circles were a little too far to the left, so nudge them a little to the right
    CIRCLE_ADJUST_MAXY => 0,
    SUBST_CIRCLE_CLIPRECT => FALSE, #generate circle and substitute for clip rects (to compensate for the way some CAD software draws circles)
    WANT_CLIPRECT => TRUE, #FALSE, #AI doesn't need clip rect at all? should be on normally?
    RECT_COMPLETION => FALSE, #TRUE, #fill in 4th side of rect when 3 sides found
};

#allow .012 clearance around pads for solder mask:
#This value effectively adjusts pad sizes in the TOOL_SIZES list above (only for solder mask layers).
use constant SOLDER_MARGIN => +.012; #units are inches

#line join/cap styles:
use constant
{
    CAP_NONE => 0, #butt (none); line is exact length
    CAP_ROUND => 1, #round cap/join; line overhangs by a semi-circle at either end
    CAP_SQUARE => 2, #square cap/join; line overhangs by a half square on either end
    CAP_OVERRIDE => FALSE, #cap style overrides drawing logic
};
    
#number of elements in each shape type:
use constant
{
    RECT_SHAPELEN => 6, #x0, y0, x1, y1, count, "rect" (start, end corners)
    LINE_SHAPELEN => 6, #x0, y0, x1, y1, count, "line" (line seg)
    CURVE_SHAPELEN => 10, #xstart, ystart, x0, y0, x1, y1, xend, yend, count, "curve" (bezier 2 points)
    CIRCLE_SHAPELEN => 5, #x, y, 5, count, "circle" (center + radius)
};
#const my %SHAPELEN =
#Readonly my %SHAPELEN =>
our %SHAPELEN =
(
    rect => RECT_SHAPELEN,
    line => LINE_SHAPELEN,
    curve => CURVE_SHAPELEN,
    circle => CIRCLE_SHAPELEN,
);

#panelization:
#This will repeat the entire body the number of times indicated along the X or Y axes (files grow accordingly).
#Display elements that overhang PCB boundary can be squashed or left as-is (typically text or other silk screen markings).
#Set "overhangs" TRUE to allow overhangs, FALSE to truncate them.
#xpad and ypad allow margins to be added around outer edge of panelized PCB.
use constant PANELIZE => {'x' => 1, 'y' => 1, 'xpad' => 0, 'ypad' => 0, 'overhangs' => TRUE}; #number of times to repeat in X and Y directions

# Set this to 1 if you need TurboCAD support.
#$turboCAD = FALSE; #is this still needed as an option?

#CIRCAD pad generation uses an appropriate aperture, then moves it (stroke) "a little" - we use this to find pads and distinguish them from PCB holes. 
use constant PAD_STROKE => 0.3; #0.0005 * 600; #units are pixels
#convert very short traces to pads or holes:
use constant TRACE_MINLEN => .001; #units are inches
#use constant ALWAYS_XY => TRUE; #FALSE; #force XY even if X or Y doesn't change; NOTE: needs to be TRUE for all pads to show in FlatCAM and ViewPlot
use constant REMOVE_POLARITY => FALSE; #TRUE; #set to remove subtractive (negative) polarity; NOTE: must be FALSE for ground planes

#PDF uses "points", each point = 1/72 inch
#combined with a PDF scale factor of .12, this gives 600 dpi resolution (1/72 * .12 = 600 dpi)
use constant INCHES_PER_POINT => 1/72; #0.0138888889; #multiply point-size by this to get inches

# The precision used when computing a bezier curve. Higher numbers are more precise but slower (and generate larger files).
#$bezierPrecision = 100;
use constant BEZIER_PRECISION => 36; #100; #use const; reduced for faster rendering (mainly used for silk screen and thermal pads)

# Ground planes and silk screen or larger copper rectangles or circles are filled line-by-line using this resolution.
use constant FILL_WIDTH => .01; #fill at most 0.01 inch at a time

# The max number of characters to read into memory
use constant MAX_BYTES => 10 * M; #bumped up to 10 MB, use const

use constant DUP_DRILL1 => TRUE; #FALSE; #kludge: ViewPlot doesn't load drill files that are too small so duplicate first tool

my $runtime = time(); #Time::HiRes::gettimeofday(); #measure my execution time

print STDERR "Loaded config settings from '${\(__FILE__)}'.\n";
1; #last value must be truthful to indicate successful load


#############################################################################################
#junk/experiment:

#use Package::Constants;
#use Exporter qw(import); #https://perldoc.perl.org/Exporter.html

#my $caller = "pdf2gerb::";

#sub cfg
#{
#    my $proto = shift;
#    my $class = ref($proto) || $proto;
#    my $settings =
#    {
#        $WANT_DEBUG => 990, #10; #level of debug wanted; higher == more, lower == less, 0 == none
#    };
#    bless($settings, $class);
#    return $settings;
#}

#use constant HELLO => "hi there2"; #"main::HELLO" => "hi there";
#use constant GOODBYE => 14; #"main::GOODBYE" => 12;

#print STDERR "read cfg file\n";

#our @EXPORT_OK = Package::Constants->list(__PACKAGE__); #https://www.perlmonks.org/?node_id=1072691; NOTE: "_OK" skips short/common names

#print STDERR scalar(@EXPORT_OK) . " consts exported:\n";
#foreach(@EXPORT_OK) { print STDERR "$_\n"; }
#my $val = main::thing("xyz");
#print STDERR "caller gave me $val\n";
#foreach my $arg (@ARGV) { print STDERR "arg $arg\n"; }

Download Details:

Author: swannman
Source Code: https://github.com/swannman/pdf2gerb

License: GPL-3.0 license

#perl 

Shardul Bhatt

Shardul Bhatt

1626775355

Why use Python for Software Development

No programming language is pretty much as diverse as Python. It enables building cutting edge applications effortlessly. Developers are as yet investigating the full capability of end-to-end Python development services in various areas. 

By areas, we mean FinTech, HealthTech, InsureTech, Cybersecurity, and that's just the beginning. These are New Economy areas, and Python has the ability to serve every one of them. The vast majority of them require massive computational abilities. Python's code is dynamic and powerful - equipped for taking care of the heavy traffic and substantial algorithmic capacities. 

Programming advancement is multidimensional today. Endeavor programming requires an intelligent application with AI and ML capacities. Shopper based applications require information examination to convey a superior client experience. Netflix, Trello, and Amazon are genuine instances of such applications. Python assists with building them effortlessly. 

5 Reasons to Utilize Python for Programming Web Apps 

Python can do such numerous things that developers can't discover enough reasons to admire it. Python application development isn't restricted to web and enterprise applications. It is exceptionally adaptable and superb for a wide range of uses.

Robust frameworks 

Python is known for its tools and frameworks. There's a structure for everything. Django is helpful for building web applications, venture applications, logical applications, and mathematical processing. Flask is another web improvement framework with no conditions. 

Web2Py, CherryPy, and Falcon offer incredible capabilities to customize Python development services. A large portion of them are open-source frameworks that allow quick turn of events. 

Simple to read and compose 

Python has an improved sentence structure - one that is like the English language. New engineers for Python can undoubtedly understand where they stand in the development process. The simplicity of composing allows quick application building. 

The motivation behind building Python, as said by its maker Guido Van Rossum, was to empower even beginner engineers to comprehend the programming language. The simple coding likewise permits developers to roll out speedy improvements without getting confused by pointless subtleties. 

Utilized by the best 

Alright - Python isn't simply one more programming language. It should have something, which is the reason the business giants use it. Furthermore, that too for different purposes. Developers at Google use Python to assemble framework organization systems, parallel information pusher, code audit, testing and QA, and substantially more. Netflix utilizes Python web development services for its recommendation algorithm and media player. 

Massive community support 

Python has a steadily developing community that offers enormous help. From amateurs to specialists, there's everybody. There are a lot of instructional exercises, documentation, and guides accessible for Python web development solutions. 

Today, numerous universities start with Python, adding to the quantity of individuals in the community. Frequently, Python designers team up on various tasks and help each other with algorithmic, utilitarian, and application critical thinking. 

Progressive applications 

Python is the greatest supporter of data science, Machine Learning, and Artificial Intelligence at any enterprise software development company. Its utilization cases in cutting edge applications are the most compelling motivation for its prosperity. Python is the second most well known tool after R for data analytics.

The simplicity of getting sorted out, overseeing, and visualizing information through unique libraries makes it ideal for data based applications. TensorFlow for neural networks and OpenCV for computer vision are two of Python's most well known use cases for Machine learning applications.

Summary

Thinking about the advances in programming and innovation, Python is a YES for an assorted scope of utilizations. Game development, web application development services, GUI advancement, ML and AI improvement, Enterprise and customer applications - every one of them uses Python to its full potential. 

The disadvantages of Python web improvement arrangements are regularly disregarded by developers and organizations because of the advantages it gives. They focus on quality over speed and performance over blunders. That is the reason it's a good idea to utilize Python for building the applications of the future.

#python development services #python development company #python app development #python development #python in web development #python software development

Joe  Hoppe

Joe Hoppe

1595905879

Best MySQL DigitalOcean Performance – ScaleGrid vs. DigitalOcean Managed Databases

HTML to Markdown

MySQL is the all-time number one open source database in the world, and a staple in RDBMS space. DigitalOcean is quickly building its reputation as the developers cloud by providing an affordable, flexible and easy to use cloud platform for developers to work with. MySQL on DigitalOcean is a natural fit, but what’s the best way to deploy your cloud database? In this post, we are going to compare the top two providers, DigitalOcean Managed Databases for MySQL vs. ScaleGrid MySQL hosting on DigitalOcean.

At a glance – TLDR
ScaleGrid Blog - At a glance overview - 1st pointCompare Throughput
ScaleGrid averages almost 40% higher throughput over DigitalOcean for MySQL, with up to 46% higher throughput in write-intensive workloads. Read now

ScaleGrid Blog - At a glance overview - 2nd pointCompare Latency
On average, ScaleGrid achieves almost 30% lower latency over DigitalOcean for the same deployment configurations. Read now

ScaleGrid Blog - At a glance overview - 3rd pointCompare Pricing
ScaleGrid provides 30% more storage on average vs. DigitalOcean for MySQL at the same affordable price. Read now

MySQL DigitalOcean Performance Benchmark
In this benchmark, we compare equivalent plan sizes between ScaleGrid MySQL on DigitalOcean and DigitalOcean Managed Databases for MySQL. We are going to use a common, popular plan size using the below configurations for this performance benchmark:

Comparison Overview
ScaleGridDigitalOceanInstance TypeMedium: 4 vCPUsMedium: 4 vCPUsMySQL Version8.0.208.0.20RAM8GB8GBSSD140GB115GBDeployment TypeStandaloneStandaloneRegionSF03SF03SupportIncludedBusiness-level support included with account sizes over $500/monthMonthly Price$120$120

As you can see above, ScaleGrid and DigitalOcean offer the same plan configurations across this plan size, apart from SSD where ScaleGrid provides over 20% more storage for the same price.

To ensure the most accurate results in our performance tests, we run the benchmark four times for each comparison to find the average performance across throughput and latency over read-intensive workloads, balanced workloads, and write-intensive workloads.

Throughput
In this benchmark, we measure MySQL throughput in terms of queries per second (QPS) to measure our query efficiency. To quickly summarize the results, we display read-intensive, write-intensive and balanced workload averages below for 150 threads for ScaleGrid vs. DigitalOcean MySQL:

ScaleGrid MySQL vs DigitalOcean Managed Databases - Throughput Performance Graph

For the common 150 thread comparison, ScaleGrid averages almost 40% higher throughput over DigitalOcean for MySQL, with up to 46% higher throughput in write-intensive workloads.

#cloud #database #developer #digital ocean #mysql #performance #scalegrid #95th percentile latency #balanced workloads #developers cloud #digitalocean droplet #digitalocean managed databases #digitalocean performance #digitalocean pricing #higher throughput #latency benchmark #lower latency #mysql benchmark setup #mysql client threads #mysql configuration #mysql digitalocean #mysql latency #mysql on digitalocean #mysql throughput #performance benchmark #queries per second #read-intensive #scalegrid mysql #scalegrid vs. digitalocean #throughput benchmark #write-intensive

How To Compare Tesla and Ford Company By Using Magic Methods in Python

Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc…

You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like init, call, str etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).

Now there are a number of such special methods, which you might have come across too, in Python. We will just be taking an example of a few of them to understand how they work and how we can use them.

1. init

class AnyClass:
    def __init__():
        print("Init called on its own")
obj = AnyClass()

The first example is _init, _and as the name suggests, it is used for initializing objects. Init method is called on its own, ie. whenever an object is created for the class, the init method is called on its own.

The output of the above code will be given below. Note how we did not call the init method and it got invoked as we created an object for class AnyClass.

Init called on its own

2. add

Let’s move to some other example, add gives us the ability to access the built in syntax feature of the character +. Let’s see how,

class AnyClass:
    def __init__(self, var):
        self.some_var = var
    def __add__(self, other_obj):
        print("Calling the add method")
        return self.some_var + other_obj.some_var
obj1 = AnyClass(5)
obj2 = AnyClass(6)
obj1 + obj2

#python3 #python #python-programming #python-web-development #python-tutorials #python-top-story #python-tips #learn-python

Using Singular Value Separation in Python and Numpy (linalg.svd)

In this pythonn - Numpy tutorial we will learn about Numpy linalg.svd: Singular Value Decomposition in Python. In mathematics, a singular value decomposition (SVD) of a matrix refers to the factorization of a matrix into three separate matrices. It is a more generalized version of an eigenvalue decomposition of matrices. It is further related to the polar decompositions.

In Python, it is easy to calculate the singular decomposition of a complex or a real matrix using the numerical python or the numpy library. The numpy library consists of various linear algebraic functions including one for calculating the singular value decomposition of a matrix.

In machine learning models, singular value decomposition is widely used to train models and in neural networks. It helps in improving accuracy and in reducing the noise in data. Singular value decomposition transforms one vector into another without them necessarily having the same dimension. Hence, it makes matrix manipulation in vector spaces easier and efficient. It is also used in regression analysis.

Syntax of Numpy linalg.svd() function

The function that calculates the singular value decomposition of a matrix in python belongs to the numpy module, named linalg.svd() .

The syntax of the numpy linalg.svd () is as follows:

numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False)

You can customize the true and false boolean values based on your requirements.

The parameters of the function are given below:

  • A->array_like: This is the required matrix whose singular value decomposition is being calculated. It can be real or complex as required. It’s dimension should be >= 2.
  • full_matrices->boolean value(optional): If set to true, then the Hermitian transpose of the given matrix is a square, if it’s false then it isn’t.
  • compute_uv->boolen value(optional): It determines whether the Hermitian transpose is to be calculated or not in addition to the singular value decomposition.
  • hermitian->boolean value(optional): The given matrix is considered hermitian(that is symmetric, with real values) which might provide a more efficient method for computation.

The function returns three types of matrices based on the parameters mentioned above:

  • S->array_like: The vector containing the singular values in the descending order with dimensions same as the original matrix.
  • u->array_like: This is an optional solution that is returned when compute_uv is set to True. It is a set of vectors with singular values.
  • v-> array_like: Set of unitary arrays only returned when compute_uv is set to True.

It raises a LinALgError when the singular values diverse.

Prerequisites for setup

Before we dive into the examples, make sure you have the numpy module installed in your local system. This is required for using linear algebraic functions like the one discussed in this article. Run the following command in your terminal.

pip install numpy

That’s all you need right now, let’s look at how we will implement the code in the next section.

To calculate Singular Value Decomposition (SVD) in Python, use the NumPy library’s linalg.svd() function. Its syntax is numpy.linalg.svd(A, full_matrices=True, compute_uv=True, hermitian=False), where A is the matrix for which SVD is being calculated. It returns three matrices: S, U, and V.

Example 1: Calculating the Singular Value Decomposition of a 3×3 Matrix

In this first example we will take a 3X3 matrix and compute its singular value decomposition in the following way:

#importing the numpy module
import numpy as np
#using the numpy.array() function to create an array
A=np.array([[2,4,6],
       [8,10,12],
       [14,16,18]])
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

The output will be:

the output is=
s(the singular value) =  [3.36962067e+01 2.13673903e+00 8.83684950e-16]
u =  [[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]
v =  [[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

Example 1

Example 1

Example 2: Calculating the Singular Value Decomposition of a Random Matrix

In this example, we will be using the numpy.random.randint() function to create a random matrix. Let’s get into it!

#importing the numpy module
import numpy as np
#using the numpy.array() function to craete an array
A=np.random.randint(5, 200, size=(3,3))
#display the created matrix
print("The input matrix is=",A)
#calculatin all three matrices for the output
#using the numpy linalg.svd function
u,s,v=np.linalg.svd(A, compute_uv=True)
#displaying the result
print("the output is=")
print('s(the singular value) = ',s)
print('u = ',u)
print('v = ',v)

The output will be as follows:

The input matrix is= [[ 36  74 101]
 [104 129 185]
 [139 121 112]]
the output is=
s(the singular value) =  [348.32979681  61.03199722  10.12165841]
u =  [[-0.3635535  -0.48363012 -0.79619769]
 [-0.70916514 -0.41054007  0.57318554]
 [-0.60408084  0.77301925 -0.19372034]]
v =  [[-0.49036384 -0.54970618 -0.67628871]
 [ 0.77570499  0.0784348  -0.62620264]
 [ 0.39727203 -0.83166766  0.38794824]]

Example 2

Example 2

Suggested: Numpy linalg.eigvalsh: A Guide to Eigenvalue Computation.

Wrapping Up

In this article, we explored the concept of singular value decomposition in mathematics and how to calculate it using Python’s numpy module. We used the linalg.svd() function to compute the singular value decomposition of both given and random matrices. Numpy provides an efficient and easy-to-use method for performing linear algebra operations, making it highly valuable in machine learning, neural networks, and regression analysis. Keep exploring other linear algebraic functions in numpy to enhance your mathematical toolset in Python.

Article source at: https://www.askpython.com

#python #numpy