1666437840
Easy SQL is built to ease the data ETL development process. With Easy SQL, you can develop your ETL in SQL in an imperative way. It defines a few simple syntax on top of standard SQL, with which SQL could be executed one by one. Easy SQL also provides a processor to handle all the new syntax. Since this is SQL agnostic, any SQL engine could be plugged-in as a backend. There are built-in support for several popular SQL engines, including SparkSQL, PostgreSQL, Clickhouse, FlinkSQL, Aliyun Maxcompute, Google BigQuery. More will be added in the near future.
Install Easy SQL using pip: python3 -m pip install easy_sql-easy_sql[extra,extra]
Currently we are providing below extras, choose according to your need:
We also provide flink backend, but because of dependency confliction between pyspark and apache-flink, you need to install the flink backend dependencies manually with the following command python3 -m pip install apache-flink
.
Usually we read data from some data source and write data to some other system using flink with different connectors. So we need to download some jars for the used connectors as well. Refer here to get more information and here to download the required connectors.
Internally we use poetry
to manage the dependencies. So make sure you have installed it. Package could be built with the following make command: make package-pip
or just poetry build
.
After the above command, there will be a file named easy_sql*.whl
generated in the dist
folder. You can install it with command python3 -m pip install dist/easy_sql*.whl[extra]
or just poetry install -E 'extra extra'
.
Install easy_sql with spark as the backend: python3 -m pip install easy_sql-easy_sql[spark,cli]
.
Create a file named sample_etl.spark.sql
with content as below:
-- prepare-sql: drop database if exists sample cascade
-- prepare-sql: create database sample
-- prepare-sql: create table sample.test as select 1 as id, '1' as val
-- target=variables
select true as __create_output_table__
-- target=variables
select 1 as a
-- target=log.a
select '${a}' as a
-- target=log.test_log
select 1 as some_log
-- target=check.should_equal
select 1 as actual, 1 as expected
-- target=temp.result
select
${a} as id, ${a} + 1 as val
union all
select id, val from sample.test
-- target=output.sample.result
select * from result
-- target=log.sample_result
select * from sample.result
Run it with command:
bash -c "$(python3 -m easy_sql.data_process -f sample_etl.spark.sql -p)"
You need to start a postgres instance first.
If you have docker, run the command below:
docker run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=123456 postgres
Create a file named sample_etl.postgres.sql
with content as the test file here.
Make sure that you have install the corresponding backend with python3 -m pip install easy-sql-easy-sql[cli,pg]
Run it with command:
PG_URL=postgresql://postgres:123456@localhost:5432/postgres python3 -m easy_sql.data_process -f sample_etl.postgres.sql
You need to start a clickhouse instance first.
If you have docker, run the command below:
docker run -d --name clickhouse -p 9000:9000 yandex/clickhouse-server:20.12.5.18
Create a file named sample_etl.clickhouse.sql
with content as the test file here.
Make sure that you have install the corresponding backend with python3 -m pip install easy-sql-easy-sql[cli,clickhouse]
Run it with command:
CLICKHOUSE_URL=clickhouse+native://default@localhost:9000 python3 -m easy_sql.data_process -f sample_etl.clickhouse.sql
Because of dependency conflictions between pyspark and apache-flink, you need to install flink manually with command python3 -m pip install apache-flink
.
After the installation, you need to add flink commands directory to PATH environment variable to make flink commands discoverable by bash. To do it, execute the commands below:
export FLINK_HOME=$(python3 -m pyflink.find_flink_home)
export PATH=$FLINK_HOME/bin:$PATH
export PYFLINK_CLIENT_EXECUTABLE=python3 # Set Python interpreter for flink client.
You can add these commands to your .bashrc
or .zshrc
file for convenience.
Since there are many connectors for flink, you need to choose which connector to use before starting.
As an example, if you want to read or write data to postgres, then you need to start a postgres instance first.
If you have docker, run the command below:
docker run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=123456 postgres
Download the required jars as below:
mkdir -pv test/flink/jars
wget -P test/flink/jars https://repo1.maven.org/maven2/org/apache/flink/flink-connector-jdbc/1.15.1/flink-connector-jdbc-1.15.1.jar
wget -P test/flink/jars https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.14/postgresql-42.2.14.jar
Create a file named sample_etl.flink.postgres.sql
with content as the test file here.
Create a connector configuration file named sample_etl.flink_tables_file.json
with content as the test configuration file here.
Run it with command:
bash -c "$(python3 -m easy_sql.data_process -f sample_etl.flink.postgres.sql -p)"
There are a few other things to know about flink, click here to get more information.
The usage is similar, please refer to API doc here.
Easy SQL can be used as a very light-weight library. If you'd like to run ETL programmatically in your code. Please refer to the code snippets below:
from pyspark.sql import SparkSession
from easy_sql.sql_processor import SqlProcessor
from easy_sql.sql_processor.backend import SparkBackend
if __name__ == '__main__':
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
backend = SparkBackend(spark)
sql = '''
-- target=log.some_log
select 1 as a
'''
sql_processor = SqlProcessor(backend, sql)
sql_processor.run()
More sample code about other backends could be referred here
We recommend debugging ETLs from jupyter. You can follow the steps below to start debugging your ETL.
Install jupyter first with command python3 -m pip install jupyterlab
.
Create a file named debugger.py
with contents like below:
A more detailed sample could be found here.
from typing import Dict, Any
def create_debugger(sql_file_path: str, vars: Dict[str, Any] = None, funcs: Dict[str, Any] = None):
from pyspark.sql import SparkSession
from easy_sql.sql_processor.backend import SparkBackend
from easy_sql.sql_processor_debugger import SqlProcessorDebugger
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
backend = SparkBackend(spark)
debugger = SqlProcessorDebugger(sql_file_path, backend, vars, funcs)
return debugger
Create a file named test.sql
with contents as here.
Then start jupyter lab with command: jupyter lab
.
Start debugging like below:
Please submit PR.
Author: easysql
Source Code: https://github.com/easysql/easy_sql
License: Apache-2.0 license
1594967603
Are you looking for experienced, reliable, and qualified Python developers?
If yes, you have reached the right place.
At HourlyDeveloper.io, our full-stack Python development services deploy cutting edge technologies and offer outstanding solutions to make most of the major web and mobile technologies.
Hire Python developers, who have deep knowledge of utilizing the full potential of this open-source programming language. Scalability is the biggest advantage of Python, which is why it is loved by developers.
Consult with experts:- https://bit.ly/2DSb007
#hire python developers #python developers #python development company #python development services #python development #python developer
1622539128
A versatile programming language that is known for its ease of use, simplicity, and quality in development is Python. It can also be used by developers to automate repetitive tasks which reduce the development time of the project.
Want to develop a website or mobile app in Python?
WebClues Infotech is an award-winning python development agency that specializes in Website and Mobile App Development for various industries. With a skilled & expert team of 150+ members who have served around 600+ clients, WebClues Infotech is the right agency to help you out in your development needs.
Want to know more about the work we have done in Python Development
Visit: https://www.webcluesinfotech.com/python-development/
Share your requirements https://www.webcluesinfotech.com/contact-us/
View Portfolio https://www.webcluesinfotech.com/portfolio/
#best python development company in usa #python development services #python development agency #python web development company #python development services company #hire python developer
1617964379
Python is one of the 10 most popular programming languages of all time, The reason? It offers the flexibility and eases no other programming language offers.
Want to develop a GUI for a website, or mobile App?
If your answer is yes and I can guarantee in most cases it will then hire dedicated Python developers who have the experience and expertise related to your project requirements from WebClues Infotech.
You might be wondering how?
WebClues has a large pool of dedicated python developers who are highly skilled in what they do. Also, WebClues offers that developers for hiring at the very reasonable and flexible pricing structure.
Hire a Dedicated Python developer based on what you need.
Share your requirements here https://www.webcluesinfotech.com/contact-us/
Book Free Interview with Python developer: https://bit.ly/3dDShFg
#hire python developers #hire python developers #hire dedicated python developers india #python developers india #hire dedicated python developers programmers #python developers in usa for hire
1598944263
Looking to build robust, scalable, and dynamic responsive websites and applications in Python?
At HourlyDeveloper.io, we constantly endeavor to give you exactly what you need. If you need to hire Python developers, you’ve come to the right place. Our programmers are scholars at this language and the various uses it can be put to.
When you Hire Python Developers India you aren’t just getting teams that are whizzes in this field. You are also getting people who ensure that they are au courant with the latest developments in the field and can use this knowledge to offer ingenious solutions to all your Python-based needs.
Consult with our experts: https://bit.ly/3hNzzu2
#hire python developers india #hire python developers #python developers #python development company #python development services #python development
1593156510
At the end of 2019, Python is one of the fastest-growing programming languages. More than 10% of developers have opted for Python development.
In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.
Table of Contents hide
III Built-in data types in Python
The Size and declared value and its sequence of the object can able to be modified called mutable objects.
Mutable Data Types are list, dict, set, byte array
The Size and declared value and its sequence of the object can able to be modified.
Immutable data types are int, float, complex, String, tuples, bytes, and frozen sets.
id() and type() is used to know the Identity and data type of the object
a**=25+**85j
type**(a)**
output**:<class’complex’>**
b**={1:10,2:“Pinky”****}**
id**(b)**
output**:**238989244168
a**=str(“Hello python world”)****#str**
b**=int(18)****#int**
c**=float(20482.5)****#float**
d**=complex(5+85j)****#complex**
e**=list((“python”,“fast”,“growing”,“in”,2018))****#list**
f**=tuple((“python”,“easy”,“learning”))****#tuple**
g**=range(10)****#range**
h**=dict(name=“Vidu”,age=36)****#dict**
i**=set((“python”,“fast”,“growing”,“in”,2018))****#set**
j**=frozenset((“python”,“fast”,“growing”,“in”,2018))****#frozenset**
k**=bool(18)****#bool**
l**=bytes(8)****#bytes**
m**=bytearray(8)****#bytearray**
n**=memoryview(bytes(18))****#memoryview**
Numbers are stored in numeric Types. when a number is assigned to a variable, Python creates Number objects.
#signed interger
age**=**18
print**(age)**
Output**:**18
Python supports 3 types of numeric data.
int (signed integers like 20, 2, 225, etc.)
float (float is used to store floating-point numbers like 9.8, 3.1444, 89.52, etc.)
complex (complex numbers like 8.94j, 4.0 + 7.3j, etc.)
A complex number contains an ordered pair, i.e., a + ib where a and b denote the real and imaginary parts respectively).
The string can be represented as the sequence of characters in the quotation marks. In python, to define strings we can use single, double, or triple quotes.
# String Handling
‘Hello Python’
#single (') Quoted String
“Hello Python”
# Double (") Quoted String
“”“Hello Python”“”
‘’‘Hello Python’‘’
# triple (‘’') (“”") Quoted String
In python, string handling is a straightforward task, and python provides various built-in functions and operators for representing strings.
The operator “+” is used to concatenate strings and “*” is used to repeat the string.
“Hello”+“python”
output**:****‘Hello python’**
"python "*****2
'Output : Python python ’
#python web development #data types in python #list of all python data types #python data types #python datatypes #python types #python variable type