1593744240
Like any kind of apps, JavaScript apps also have to be written well. Otherwise, we run into all kinds of issues later on.
In this article, we’ll look at some best practices we should follow when writing JavaScript code.
We shouldn’t have multiple spaces inside regex literals.
For instance, we shouldn’t write:
const regex = /foo bar/;
Instead, we write:
const regexp = /foo bar/;
We should surround return
statements in parentheses so we can be sure that the assignment is being done before returning the value.
For instance, instead of writing:
function add (a, b) {
return result = a + b;
}
We write:
function add (a, b) {
return (result = a + b);
}
Assigning in a variable to itself is a useless statement, so we shouldn’t write it.
For instance, we shouldn’t write things like:
num = num;
Since comparing a variable to itself is always true
, it’s useless to have a variable compared to itself.
For instance, we shouldn’t write:
if (score === score) {}
The comma operator always returns the last expression in the sequence, so it’s pretty useless.
Therefore, we shouldn’t have code like:
const foo = (doSomething(), !!test);
We shouldn’t create variables with restricted names and assign values to them.
It doesn’t work and it just causes confusion.
For example, we shouldn’t have code like:
let undefined = 'value';
#software-development #programming #technology #javascript #web-development
1653377002
This PySpark cheat sheet with code samples covers the basics like initializing Spark in Python, loading data, sorting, and repartitioning.
Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It allows you to speed analytic applications up to 100 times faster compared to technologies on the market today. You can interface Spark with Python through "PySpark". This is the Spark Python API exposes the Spark programming model to Python.
Even though working with Spark will remind you in many ways of working with Pandas DataFrames, you'll also see that it can be tough getting familiar with all the functions that you can use to query, transform, inspect, ... your data. What's more, if you've never worked with any other programming language or if you're new to the field, it might be hard to distinguish between RDD operations.
Let's face it, map()
and flatMap()
are different enough, but it might still come as a challenge to decide which one you really need when you're faced with them in your analysis. Or what about other functions, like reduce()
and reduceByKey()
?
Even though the documentation is very elaborate, it never hurts to have a cheat sheet by your side, especially when you're just getting into it.
This PySpark cheat sheet covers the basics, from initializing Spark and loading your data, to retrieving RDD information, sorting, filtering and sampling your data. But that's not all. You'll also see that topics such as repartitioning, iterating, merging, saving your data and stopping the SparkContext are included in the cheat sheet.
Note that the examples in the document take small data sets to illustrate the effect of specific functions on your data. In real life data analysis, you'll be using Spark to analyze big data.
PySpark is the Spark Python API that exposes the Spark programming model to Python.
>>> from pyspark import SparkContext
>>> sc = SparkContext(master = 'local[2]')
>>> sc.version #Retrieve SparkContext version
>>> sc.pythonVer #Retrieve Python version
>>> sc.master #Master URL to connect to
>>> str(sc.sparkHome) #Path where Spark is installed on worker nodes
>>> str(sc.sparkUser()) #Retrieve name of the Spark User running SparkContext
>>> sc.appName #Return application name
>>> sc.applicationld #Retrieve application ID
>>> sc.defaultParallelism #Return default level of parallelism
>>> sc.defaultMinPartitions #Default minimum number of partitions for RDDs
>>> from pyspark import SparkConf, SparkContext
>>> conf = (SparkConf()
.setMaster("local")
.setAppName("My app")
. set ("spark. executor.memory", "lg"))
>>> sc = SparkContext(conf = conf)
In the PySpark shell, a special interpreter-aware SparkContext is already created in the variable called sc.
$ ./bin/spark-shell --master local[2]
$ ./bin/pyspark --master local[s] --py-files code.py
Set which master the context connects to with the --master argument, and add Python .zip..egg or.py files to the
runtime path by passing a comma-separated list to --py-files.
>>> rdd = sc.parallelize([('a',7),('a',2),('b',2)])
>>> rdd2 = sc.parallelize([('a',2),('d',1),('b',1)])
>>> rdd3 = sc.parallelize(range(100))
>>> rdd = sc.parallelize([("a",["x","y","z"]),
("b" ["p","r,"])])
Read either one text file from HDFS, a local file system or any Hadoop-supported file system URI with textFile(), or read in a directory of text files with wholeTextFiles().
>>> textFile = sc.textFile("/my/directory/•.txt")
>>> textFile2 = sc.wholeTextFiles("/my/directory/")
>>> rdd.getNumPartitions() #List the number of partitions
>>> rdd.count() #Count RDD instances 3
>>> rdd.countByKey() #Count RDD instances by key
defaultdict(<type 'int'>,{'a':2,'b':1})
>>> rdd.countByValue() #Count RDD instances by value
defaultdict(<type 'int'>,{('b',2):1,('a',2):1,('a',7):1})
>>> rdd.collectAsMap() #Return (key,value) pairs as a dictionary
{'a': 2, 'b': 2}
>>> rdd3.sum() #Sum of RDD elements 4950
>>> sc.parallelize([]).isEmpty() #Check whether RDD is empty
True
>>> rdd3.max() #Maximum value of RDD elements
99
>>> rdd3.min() #Minimum value of RDD elements
0
>>> rdd3.mean() #Mean value of RDD elements
49.5
>>> rdd3.stdev() #Standard deviation of RDD elements
28.866070047722118
>>> rdd3.variance() #Compute variance of RDD elements
833.25
>>> rdd3.histogram(3) #Compute histogram by bins
([0,33,66,99],[33,33,34])
>>> rdd3.stats() #Summary statistics (count, mean, stdev, max & min)
#Apply a function to each RFD element
>>> rdd.map(lambda x: x+(x[1],x[0])).collect()
[('a' ,7,7, 'a'),('a' ,2,2, 'a'), ('b' ,2,2, 'b')]
#Apply a function to each RDD element and flatten the result
>>> rdd5 = rdd.flatMap(lambda x: x+(x[1],x[0]))
>>> rdd5.collect()
['a',7 , 7 , 'a' , 'a' , 2, 2, 'a', 'b', 2 , 2, 'b']
#Apply a flatMap function to each (key,value) pair of rdd4 without changing the keys
>>> rdds.flatMapValues(lambda x: x).collect()
[('a', 'x'), ('a', 'y'), ('a', 'z'),('b', 'p'),('b', 'r')]
Getting
>>> rdd.collect() #Return a list with all RDD elements
[('a', 7), ('a', 2), ('b', 2)]
>>> rdd.take(2) #Take first 2 RDD elements
[('a', 7), ('a', 2)]
>>> rdd.first() #Take first RDD element
('a', 7)
>>> rdd.top(2) #Take top 2 RDD elements
[('b', 2), ('a', 7)]
Sampling
>>> rdd3.sample(False, 0.15, 81).collect() #Return sampled subset of rdd3
[3,4,27,31,40,41,42,43,60,76,79,80,86,97]
Filtering
>>> rdd.filter(lambda x: "a" in x).collect() #Filter the RDD
[('a',7),('a',2)]
>>> rdd5.distinct().collect() #Return distinct RDD values
['a' ,2, 'b',7]
>>> rdd.keys().collect() #Return (key,value) RDD's keys
['a', 'a', 'b']
>>> def g (x): print(x)
>>> rdd.foreach(g) #Apply a function to all RDD elements
('a', 7)
('b', 2)
('a', 2)
Reducing
>>> rdd.reduceByKey(lambda x,y : x+y).collect() #Merge the rdd values for each key
[('a',9),('b',2)]
>>> rdd.reduce(lambda a, b: a+ b) #Merge the rdd values
('a', 7, 'a' , 2 , 'b' , 2)
Grouping by
>>> rdd3.groupBy(lambda x: x % 2) #Return RDD of grouped values
.mapValues(list)
.collect()
>>> rdd.groupByKey() #Group rdd by key
.mapValues(list)
.collect()
[('a',[7,2]),('b',[2])]
Aggregating
>> seqOp = (lambda x,y: (x[0]+y,x[1]+1))
>>> combOp = (lambda x,y:(x[0]+y[0],x[1]+y[1]))
#Aggregate RDD elements of each partition and then the results
>>> rdd3.aggregate((0,0),seqOp,combOp)
(4950,100)
#Aggregate values of each RDD key
>>> rdd.aggregateByKey((0,0),seqop,combop).collect()
[('a',(9,2)), ('b',(2,1))]
#Aggregate the elements of each partition, and then the results
>>> rdd3.fold(0,add)
4950
#Merge the values for each key
>>> rdd.foldByKey(0, add).collect()
[('a' ,9), ('b' ,2)]
#Create tuples of RDD elements by applying a function
>>> rdd3.keyBy(lambda x: x+x).collect()
>>>> rdd.subtract(rdd2).collect() #Return each rdd value not contained in rdd2
[('b' ,2), ('a' ,7)]
#Return each (key,value) pair of rdd2 with no matching key in rdd
>>> rdd2.subtractByKey(rdd).collect()
[('d', 1)1
>>>rdd.cartesian(rdd2).collect() #Return the Cartesian product of rdd and rdd2
>>> rdd2.sortBy(lambda x: x[1]).collect() #Sort RDD by given function
[('d',1),('b',1),('a',2)]
>>> rdd2.sortByKey().collect() #Sort (key, value) ROD by key
[('a' ,2), ('b' ,1), ('d' ,1)]
>>> rdd.repartition(4) #New RDD with 4 partitions
>>> rdd.coalesce(1) #Decrease the number of partitions in the RDD to 1
>>> rdd.saveAsTextFile("rdd.txt")
>>> rdd.saveAsHadoopFile("hdfs:// namenodehost/parent/child",
'org.apache.hadoop.mapred.TextOutputFormat')
>>> sc.stop()
$ ./bin/spark-submit examples/src/main/python/pi.py
Have this Cheat Sheet at your fingertips
Original article source at https://www.datacamp.com
#pyspark #cheatsheet #spark #python
1647351133
Minimum educational required – 10+2 passed in any stream from a recognized board.
The age limit is 18 to 25 years. It may differ from one airline to another!
Physical and Medical standards –
You can become an air hostess if you meet certain criteria, such as a minimum educational level, an age limit, language ability, and physical characteristics.
As can be seen from the preceding information, a 10+2 pass is the minimal educational need for becoming an air hostess in India. So, if you have a 10+2 certificate from a recognized board, you are qualified to apply for an interview for air hostess positions!
You can still apply for this job if you have a higher qualification (such as a Bachelor's or Master's Degree).
So That I may recommend, joining Special Personality development courses, a learning gallery that offers aviation industry courses by AEROFLY INTERNATIONAL AVIATION ACADEMY in CHANDIGARH. They provide extra sessions included in the course and conduct the entire course in 6 months covering all topics at an affordable pricing structure. They pay particular attention to each and every aspirant and prepare them according to airline criteria. So be a part of it and give your aspirations So be a part of it and give your aspirations wings.
Read More: Safety and Emergency Procedures of Aviation || Operations of Travel and Hospitality Management || Intellectual Language and Interview Training || Premiere Coaching For Retail and Mass Communication || Introductory Cosmetology and Tress Styling || Aircraft Ground Personnel Competent Course
For more information:
Visit us at: https://aerofly.co.in
Phone : wa.me//+919988887551
Address: Aerofly International Aviation Academy, SCO 68, 4th Floor, Sector 17-D, Chandigarh, Pin 160017
Email: info@aerofly.co.in
#air hostess institute in Delhi,
#air hostess institute in Chandigarh,
#air hostess institute near me,
#best air hostess institute in India,
#air hostess institute,
#best air hostess institute in Delhi,
#air hostess institute in India,
#best air hostess institute in India,
#air hostess training institute fees,
#top 10 air hostess training institute in India,
#government air hostess training institute in India,
#best air hostess training institute in the world,
#air hostess training institute fees,
#cabin crew course fees,
#cabin crew course duration and fees,
#best cabin crew training institute in Delhi,
#cabin crew courses after 12th,
#best cabin crew training institute in Delhi,
#cabin crew training institute in Delhi,
#cabin crew training institute in India,
#cabin crew training institute near me,
#best cabin crew training institute in India,
#best cabin crew training institute in Delhi,
#best cabin crew training institute in the world,
#government cabin crew training institute
1653465344
This PySpark SQL cheat sheet is your handy companion to Apache Spark DataFrames in Python and includes code samples.
You'll probably already know about Apache Spark, the fast, general and open-source engine for big data processing; It has built-in modules for streaming, SQL, machine learning and graph processing. Spark allows you to speed analytic applications up to 100 times faster compared to other technologies on the market today. Interfacing Spark with Python is easy with PySpark: this Spark Python API exposes the Spark programming model to Python.
Now, it's time to tackle the Spark SQL module, which is meant for structured data processing, and the DataFrame API, which is not only available in Python, but also in Scala, Java, and R.
Without further ado, here's the cheat sheet:
This PySpark SQL cheat sheet covers the basics of working with the Apache Spark DataFrames in Python: from initializing the SparkSession to creating DataFrames, inspecting the data, handling duplicate values, querying, adding, updating or removing columns, grouping, filtering or sorting data. You'll also see that this cheat sheet also on how to run SQL Queries programmatically, how to save your data to parquet and JSON files, and how to stop your SparkSession.
Spark SGlL is Apache Spark's module for working with structured data.
A SparkSession can be used create DataFrame, register DataFrame as tables, execute SGL over tables, cache tables, and read parquet files.
>>> from pyspark.sql import SparkSession
>>> spark a SparkSession \
.builder\
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
>>> from pyspark.sql.types import*
Infer Schema
>>> sc = spark.sparkContext
>>> lines = sc.textFile(''people.txt'')
>>> parts = lines.map(lambda l: l.split(","))
>>> people = parts.map(lambda p: Row(nameap[0],ageaint(p[l])))
>>> peopledf = spark.createDataFrame(people)
Specify Schema
>>> people = parts.map(lambda p: Row(name=p[0],
age=int(p[1].strip())))
>>> schemaString = "name age"
>>> fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()]
>>> schema = StructType(fields)
>>> spark.createDataFrame(people, schema).show()
From Spark Data Sources
JSON
>>> df = spark.read.json("customer.json")
>>> df.show()
>>> df2 = spark.read.load("people.json", format="json")
Parquet files
>>> df3 = spark.read.load("users.parquet")
TXT files
>>> df4 = spark.read.text("people.txt")
#Filter entries of age, only keep those records of which the values are >24
>>> df.filter(df["age"]>24).show()
>>> df = df.dropDuplicates()
>>> from pyspark.sql import functions as F
Select
>>> df.select("firstName").show() #Show all entries in firstName column
>>> df.select("firstName","lastName") \
.show()
>>> df.select("firstName", #Show all entries in firstName, age and type
"age",
explode("phoneNumber") \
.alias("contactInfo")) \
.select("contactInfo.type",
"firstName",
"age") \
.show()
>>> df.select(df["firstName"],df["age"]+ 1) #Show all entries in firstName and age, .show() add 1 to the entries of age
>>> df.select(df['age'] > 24).show() #Show all entries where age >24
When
>>> df.select("firstName", #Show firstName and 0 or 1 depending on age >30
F.when(df.age > 30, 1) \
.otherwise(0)) \
.show()
>>> df[df.firstName.isin("Jane","Boris")] #Show firstName if in the given options
.collect()
Like
>>> df.select("firstName", #Show firstName, and lastName is TRUE if lastName is like Smith
df.lastName.like("Smith")) \
.show()
Startswith - Endswith
>>> df.select("firstName", #Show firstName, and TRUE if lastName starts with Sm
df.lastName \
.startswith("Sm")) \
.show()
>>> df.select(df.lastName.endswith("th"))\ #Show last names ending in th
.show()
Substring
>>> df.select(df.firstName.substr(1, 3) \ #Return substrings of firstName
.alias("name")) \
.collect()
Between
>>> df.select(df.age.between(22, 24)) \ #Show age: values are TRUE if between 22 and 24
.show()
Adding Columns
>>> df = df.withColumn('city',df.address.city) \
.withColumn('postalCode',df.address.postalCode) \
.withColumn('state',df.address.state) \
.withColumn('streetAddress',df.address.streetAddress) \
.withColumn('telePhoneNumber', explode(df.phoneNumber.number)) \
.withColumn('telePhoneType', explode(df.phoneNumber.type))
Updating Columns
>>> df = df.withColumnRenamed('telePhoneNumber', 'phoneNumber')
Removing Columns
>>> df = df.drop("address", "phoneNumber")
>>> df = df.drop(df.address).drop(df.phoneNumber)
>>> df.na.fill(50).show() #Replace null values
>>> df.na.drop().show() #Return new df omitting rows with null values
>>> df.na \ #Return new df replacing one value with another
.replace(10, 20) \
.show()
>>> df.groupBy("age")\ #Group by age, count the members in the groups
.count() \
.show()
>>> peopledf.sort(peopledf.age.desc()).collect()
>>> df.sort("age", ascending=False).collect()
>>> df.orderBy(["age","city"],ascending=[0,1])\
.collect()
>>> df.repartition(10)\ #df with 10 partitions
.rdd \
.getNumPartitions()
>>> df.coalesce(1).rdd.getNumPartitions() #df with 1 partition
Registering DataFrames as Views
>>> peopledf.createGlobalTempView("people")
>>> df.createTempView("customer")
>>> df.createOrReplaceTempView("customer")
Query Views
>>> df5 = spark.sql("SELECT * FROM customer").show()
>>> peopledf2 = spark.sql("SELECT * FROM global_temp.people")\
.show()
>>> df.dtypes #Return df column names and data types
>>> df.show() #Display the content of df
>>> df.head() #Return first n rows
>>> df.first() #Return first row
>>> df.take(2) #Return the first n rows >>> df.schema Return the schema of df
>>> df.describe().show() #Compute summary statistics >>> df.columns Return the columns of df
>>> df.count() #Count the number of rows in df
>>> df.distinct().count() #Count the number of distinct rows in df
>>> df.printSchema() #Print the schema of df
>>> df.explain() #Print the (logical and physical) plans
Data Structures
>>> rdd1 = df.rdd #Convert df into an RDD
>>> df.toJSON().first() #Convert df into a RDD of string
>>> df.toPandas() #Return the contents of df as Pandas DataFrame
Write & Save to Files
>>> df.select("firstName", "city")\
.write \
.save("nameAndCity.parquet")
>>> df.select("firstName", "age") \
.write \
.save("namesAndAges.json",format="json")
>>> spark.stop()
Have this Cheat Sheet at your fingertips
Original article source at https://www.datacamp.com
#pyspark #cheatsheet #spark #dataframes #python #bigdata
1659330128
The damerau-levenshtein gem allows to find edit distance between two UTF-8 or ASCII encoded strings with O(N*M) efficiency.
This gem implements pure Levenshtein algorithm, Damerau modification of it (where 2 character transposition counts as 1 edit distance). It also includes Boehmer & Rees 2008 modification of Damerau algorithm, where transposition of bigger than 1 character blocks is taken in account as well (Rees 2014).
require "damerau-levenshtein"
DamerauLevenshtein.distance("Something", "Smoething") #returns 1
It also returns a diff between two strings according to Levenshtein alrorithm. The diff is expressed by tags <ins>
, <del>
, and <subst>
. Such tags make it possible to highlight differnce between strings in a flexible way.
require "damerau-levenshtein"
differ = DamerauLevenshtein::Differ.new
differ.run("corn", "cron")
# output: ["c<subst>or</subst>n", "c<subst>ro</subst>n"]
sudo apt-get install build-essential libgmp3-dev
gem install damerau-levenshtein
require "damerau-levenshtein"
dl = DamerauLevenshtein
dl.distance("Something", "Smoething") #returns 1
dl.distance("Something", "Smoething", 0) #returns 2
dl.distance("Something", "meSothing", 2) #returns 2 instead of 4
dl.distance("Sjöstedt", "Sjostedt") #returns 1
dl.array_distance([1,2,3,5], [1,2,3,4]) #returns 1
differ = DamerauLevenshtein::Differ.new
differ.run("Something", "smthg")
differ = DamerauLevenshtein::Differ.new
differ.format = :raw
differ.run("Something", "smthg")
DamerauLevenshtein.version
#returns version number of the gem
DamerauLevenshtein.distance(string1, string2, block_size, max_distance)
#returns edit distance between 2 strings
DamerauLevenshtein.string_distance(string1, string2, block_size, max_distance)
# an alias for .distance
DamerauLevenshtein.array_distance(array1, array2, block_size, max_distance)
# returns edit distance between 2 arrays of integers
DamerauLevenshtein.distance
and .array_distance
take 4 arguments:
string1
(array1
for .array_distance
)string2
(array2
for .array_distance
)block_size
(default is 1)max_distance
(default is 10)block_size
determines maximum number of characters in a transposition block:
block_size = 0
(transposition does not count -- it is a pure Levenshtein algorithm)
block_size = 1
(transposition between 2 adjustent characters --
it is pure Damerau-Levenshtein algorithm)
block_size = 2
(transposition between blocks as big as 2 characters -- so abcd and cdab
counts as edit distance 2, not 4)
block_size = 3
(transposition between blocks as big as 3 characters --
so abcdef and defabc counts as edit distance 3, not 6)
etc.
max_distance
-- is a threshold after which algorithm gives up and returns max_distance
instead of real edit distance.
Levenshtein algorithm is expensive, so it makes sense to give up when edit distance is becoming too big. The argument max_distance does just that.
DamerauLevenshtein.distance("abcdefg", "1234567", 0, 3)
# output: 4 -- it gave up when edit distance exceeded 3
differ = DamerauLevenshtein::Differ.new
creates an instance of new differ class to return difference between two strings
differ.format
shows current format for diff. Default is :tag
format
differ.format = :raw
changes current format for diffs. Possible values are :tag
and :raw
differ.run("String1", "String2")
returns difference between two strings.
For example:
differ = DamerauLevenshtein::Differ.new
differ.run("Something", "smthng")
# output: ["<ins>S</ins><subst>o</subst>m<ins>e</ins>th<ins>i</ins>ng",
# "<del>S</del><subst>s</subst>m<del>e</del>th<del>i</del>ng"]
Or with parsing:
require "damerau-levenshtein"
require "nokogiri"
differ = DamerauLevenshtein::Differ.new
res = differ.run("Something", "Smothing!")
nodes = Nokogiri::XML("<root>#{res.first}</root>")
markup = nodes.root.children.map do |n|
case n.name
when "text"
n.text
when "del"
"~~#{n.children.first.text}~~"
when "ins"
"*#{n.children.first.text}*"
when "subst"
"**#{n.children.first.text}**"
end
end.join("")
puts markup
Output
S*o*m**e**thing~~!~~
This gem is following practices of Semantic Versioning
Author: GlobalNamesArchitecture
Source Code: https://github.com/GlobalNamesArchitecture/damerau-levenshtein
License: MIT license
1624406400
JavaScript loops made simple.
📺 The video in this post was made by Programming with Mosh
The origin of the article: https://www.youtube.com/watch?v=s9wW2PpJsmQ&list=PLTjRvDozrdlxEIuOBZkMAK5uiqp8rHUax&index=8
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!
#javascript #loops #javascript loops #javascript loops tutorial