Sheldon  Grant

Sheldon Grant

1647701592

A Node.js module that returns the OS/Distribution name

getos

Get the OS/Distribution name of the environment you are working on

Problem

os.platform() returns linux. If you want the distrubtion name, you're out of luck.

Solution

This. Simply call:

var getos = require('getos')

getos(function(e,os) {
  if(e) return console.log(e)
  console.log("Your OS is:" +JSON.stringify(os))
})

The os object conforms to:

{
  os: [OS NAME],
  dist:[DIST NAME],
  codename:[CODENAME],
  release:[VERSION]
}

For example:

{
  os: "linux",
  dist: "Ubuntu",
  codename: "precise",
  release: "12.04"
}

Disclaimer

Check os.json in this repo. Any distribution that shares a common resource file with another distrubtion is currently untested. These are the arrays of distrubitons with more than 1 member. If you are using one of these distrubtions, please submit an issue letting me know if it works. If it fails, please post the content of the file.

If you have a distrubtion not in os.json, please identify your resource file and submit it's name and content along with your distrbution/version in an issue.

Thanks for helping make this tool great.

Unit Tests

Unit tests stub out the behaviour of the OS files and libraries we depend on to ensure the behaviour of the application is sound. You can run these simply by running npm test

Authors and Contributors

getos has been made possible by these fantastic contributors:

Benjamin E. CoeGitHub/bcoeTwitter/@benjamincoe
Eugene SharyginGitHub/eush77Twitter/@eush77
David RouthieauGitHub/root-iounknown
LawrenceGitHub/mindmeltingTwitter/@mindmelting
Roman JurkovGitHub/winfinitTwitter/@winfinit
Rod VaggGitHub/rvaggTwitter/@rvagg
Zeke SikelianosGitHub/zekeTwitter/@zeke
AlexanderGitHub/alex4ZeroTwitter/@alex4Zero

Author: Retrohacker
Source Code: https://github.com/retrohacker/getos 
License: MIT License

#node #linux 

What is GEEK

Buddha Community

A Node.js module that returns the OS/Distribution name
Edward Jackson

Edward Jackson

1653377002

PySpark Cheat Sheet: Spark in Python

This PySpark cheat sheet with code samples covers the basics like initializing Spark in Python, loading data, sorting, and repartitioning.

Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It allows you to speed analytic applications up to 100 times faster compared to technologies on the market today. You can interface Spark with Python through "PySpark". This is the Spark Python API exposes the Spark programming model to Python. 

Even though working with Spark will remind you in many ways of working with Pandas DataFrames, you'll also see that it can be tough getting familiar with all the functions that you can use to query, transform, inspect, ... your data. What's more, if you've never worked with any other programming language or if you're new to the field, it might be hard to distinguish between RDD operations.

Let's face it, map() and flatMap() are different enough, but it might still come as a challenge to decide which one you really need when you're faced with them in your analysis. Or what about other functions, like reduce() and reduceByKey()

PySpark cheat sheet

Even though the documentation is very elaborate, it never hurts to have a cheat sheet by your side, especially when you're just getting into it.

This PySpark cheat sheet covers the basics, from initializing Spark and loading your data, to retrieving RDD information, sorting, filtering and sampling your data. But that's not all. You'll also see that topics such as repartitioning, iterating, merging, saving your data and stopping the SparkContext are included in the cheat sheet. 

Note that the examples in the document take small data sets to illustrate the effect of specific functions on your data. In real life data analysis, you'll be using Spark to analyze big data.

PySpark is the Spark Python API that exposes the Spark programming model to Python.

Initializing Spark 

SparkContext 

>>> from pyspark import SparkContext
>>> sc = SparkContext(master = 'local[2]')

Inspect SparkContext 

>>> sc.version #Retrieve SparkContext version
>>> sc.pythonVer #Retrieve Python version
>>> sc.master #Master URL to connect to
>>> str(sc.sparkHome) #Path where Spark is installed on worker nodes
>>> str(sc.sparkUser()) #Retrieve name of the Spark User running SparkContext
>>> sc.appName #Return application name
>>> sc.applicationld #Retrieve application ID
>>> sc.defaultParallelism #Return default level of parallelism
>>> sc.defaultMinPartitions #Default minimum number of partitions for RDDs

Configuration 

>>> from pyspark import SparkConf, SparkContext
>>> conf = (SparkConf()
     .setMaster("local")
     .setAppName("My app")
     . set   ("spark. executor.memory",   "lg"))
>>> sc = SparkContext(conf = conf)

Using the Shell 

In the PySpark shell, a special interpreter-aware SparkContext is already created in the variable called sc.

$ ./bin/spark-shell --master local[2]
$ ./bin/pyspark --master local[s] --py-files code.py

Set which master the context connects to with the --master argument, and add Python .zip..egg or.py files to the

runtime path by passing a comma-separated list to  --py-files.

Loading Data 

Parallelized Collections 

>>> rdd = sc.parallelize([('a',7),('a',2),('b',2)])
>>> rdd2 = sc.parallelize([('a',2),('d',1),('b',1)])
>>> rdd3 = sc.parallelize(range(100))
>>> rdd = sc.parallelize([("a",["x","y","z"]),
               ("b" ["p","r,"])])

External Data 

Read either one text file from HDFS, a local file system or any Hadoop-supported file system URI with textFile(), or read in a directory of text files with wholeTextFiles(). 

>>> textFile = sc.textFile("/my/directory/•.txt")
>>> textFile2 = sc.wholeTextFiles("/my/directory/")

Retrieving RDD Information 

Basic Information 

>>> rdd.getNumPartitions() #List the number of partitions
>>> rdd.count() #Count RDD instances 3
>>> rdd.countByKey() #Count RDD instances by key
defaultdict(<type 'int'>,{'a':2,'b':1})
>>> rdd.countByValue() #Count RDD instances by value
defaultdict(<type 'int'>,{('b',2):1,('a',2):1,('a',7):1})
>>> rdd.collectAsMap() #Return (key,value) pairs as a dictionary
   {'a': 2, 'b': 2}
>>> rdd3.sum() #Sum of RDD elements 4950
>>> sc.parallelize([]).isEmpty() #Check whether RDD is empty
True

Summary 

>>> rdd3.max() #Maximum value of RDD elements 
99
>>> rdd3.min() #Minimum value of RDD elements
0
>>> rdd3.mean() #Mean value of RDD elements 
49.5
>>> rdd3.stdev() #Standard deviation of RDD elements 
28.866070047722118
>>> rdd3.variance() #Compute variance of RDD elements 
833.25
>>> rdd3.histogram(3) #Compute histogram by bins
([0,33,66,99],[33,33,34])
>>> rdd3.stats() #Summary statistics (count, mean, stdev, max & min)

Applying Functions 

#Apply a function to each RFD element
>>> rdd.map(lambda x: x+(x[1],x[0])).collect()
[('a' ,7,7, 'a'),('a' ,2,2, 'a'), ('b' ,2,2, 'b')]
#Apply a function to each RDD element and flatten the result
>>> rdd5 = rdd.flatMap(lambda x: x+(x[1],x[0]))
>>> rdd5.collect()
['a',7 , 7 ,  'a' , 'a' , 2,  2,  'a', 'b', 2 , 2, 'b']
#Apply a flatMap function to each (key,value) pair of rdd4 without changing the keys
>>> rdds.flatMapValues(lambda x: x).collect()
[('a', 'x'), ('a', 'y'), ('a', 'z'),('b', 'p'),('b', 'r')]

Selecting Data

Getting

>>> rdd.collect() #Return a list with all RDD elements 
[('a', 7), ('a', 2), ('b', 2)]
>>> rdd.take(2) #Take first 2 RDD elements 
[('a', 7),  ('a', 2)]
>>> rdd.first() #Take first RDD element
('a', 7)
>>> rdd.top(2) #Take top 2 RDD elements 
[('b', 2), ('a', 7)]

Sampling

>>> rdd3.sample(False, 0.15, 81).collect() #Return sampled subset of rdd3
     [3,4,27,31,40,41,42,43,60,76,79,80,86,97]

Filtering

>>> rdd.filter(lambda x: "a" in x).collect() #Filter the RDD
[('a',7),('a',2)]
>>> rdd5.distinct().collect() #Return distinct RDD values
['a' ,2, 'b',7]
>>> rdd.keys().collect() #Return (key,value) RDD's keys
['a',  'a',  'b']

Iterating 

>>> def g (x): print(x)
>>> rdd.foreach(g) #Apply a function to all RDD elements
('a', 7)
('b', 2)
('a', 2)

Reshaping Data 

Reducing

>>> rdd.reduceByKey(lambda x,y : x+y).collect() #Merge the rdd values for each key
[('a',9),('b',2)]
>>> rdd.reduce(lambda a, b: a+ b) #Merge the rdd values
('a', 7, 'a' , 2 , 'b' , 2)

 

Grouping by

>>> rdd3.groupBy(lambda x: x % 2) #Return RDD of grouped values
          .mapValues(list)
          .collect()
>>> rdd.groupByKey() #Group rdd by key
          .mapValues(list)
          .collect() 
[('a',[7,2]),('b',[2])]

Aggregating

>> seqOp = (lambda x,y: (x[0]+y,x[1]+1))
>>> combOp = (lambda x,y:(x[0]+y[0],x[1]+y[1]))
#Aggregate RDD elements of each partition and then the results
>>> rdd3.aggregate((0,0),seqOp,combOp) 
(4950,100)
#Aggregate values of each RDD key
>>> rdd.aggregateByKey((0,0),seqop,combop).collect() 
     [('a',(9,2)), ('b',(2,1))]
#Aggregate the elements of each partition, and then the results
>>> rdd3.fold(0,add)
     4950
#Merge the values for each key
>>> rdd.foldByKey(0, add).collect()
[('a' ,9), ('b' ,2)]
#Create tuples of RDD elements by applying a function
>>> rdd3.keyBy(lambda x: x+x).collect()

Mathematical Operations 

>>>> rdd.subtract(rdd2).collect() #Return each rdd value not contained in rdd2
[('b' ,2), ('a' ,7)]
#Return each (key,value) pair of rdd2 with no matching key in rdd
>>> rdd2.subtractByKey(rdd).collect()
[('d', 1)1
>>>rdd.cartesian(rdd2).collect() #Return the Cartesian product of rdd and rdd2

Sort 

>>> rdd2.sortBy(lambda x: x[1]).collect() #Sort RDD by given function
[('d',1),('b',1),('a',2)]
>>> rdd2.sortByKey().collect() #Sort (key, value) ROD by key
[('a' ,2), ('b' ,1), ('d' ,1)]

Repartitioning 

>>> rdd.repartition(4) #New RDD with 4 partitions
>>> rdd.coalesce(1) #Decrease the number of partitions in the RDD to 1

Saving 

>>> rdd.saveAsTextFile("rdd.txt")
>>> rdd.saveAsHadoopFile("hdfs:// namenodehost/parent/child",
               'org.apache.hadoop.mapred.TextOutputFormat')

Stopping SparkContext 

>>> sc.stop()

Execution 

$ ./bin/spark-submit examples/src/main/python/pi.py

Have this Cheat Sheet at your fingertips

Original article source at https://www.datacamp.com

#pyspark #cheatsheet #spark #python

NBB: Ad-hoc CLJS Scripting on Node.js

Nbb

Not babashka. Node.js babashka!?

Ad-hoc CLJS scripting on Node.js.

Status

Experimental. Please report issues here.

Goals and features

Nbb's main goal is to make it easy to get started with ad hoc CLJS scripting on Node.js.

Additional goals and features are:

  • Fast startup without relying on a custom version of Node.js.
  • Small artifact (current size is around 1.2MB).
  • First class macros.
  • Support building small TUI apps using Reagent.
  • Complement babashka with libraries from the Node.js ecosystem.

Requirements

Nbb requires Node.js v12 or newer.

How does this tool work?

CLJS code is evaluated through SCI, the same interpreter that powers babashka. Because SCI works with advanced compilation, the bundle size, especially when combined with other dependencies, is smaller than what you get with self-hosted CLJS. That makes startup faster. The trade-off is that execution is less performant and that only a subset of CLJS is available (e.g. no deftype, yet).

Usage

Install nbb from NPM:

$ npm install nbb -g

Omit -g for a local install.

Try out an expression:

$ nbb -e '(+ 1 2 3)'
6

And then install some other NPM libraries to use in the script. E.g.:

$ npm install csv-parse shelljs zx

Create a script which uses the NPM libraries:

(ns script
  (:require ["csv-parse/lib/sync$default" :as csv-parse]
            ["fs" :as fs]
            ["path" :as path]
            ["shelljs$default" :as sh]
            ["term-size$default" :as term-size]
            ["zx$default" :as zx]
            ["zx$fs" :as zxfs]
            [nbb.core :refer [*file*]]))

(prn (path/resolve "."))

(prn (term-size))

(println (count (str (fs/readFileSync *file*))))

(prn (sh/ls "."))

(prn (csv-parse "foo,bar"))

(prn (zxfs/existsSync *file*))

(zx/$ #js ["ls"])

Call the script:

$ nbb script.cljs
"/private/tmp/test-script"
#js {:columns 216, :rows 47}
510
#js ["node_modules" "package-lock.json" "package.json" "script.cljs"]
#js [#js ["foo" "bar"]]
true
$ ls
node_modules
package-lock.json
package.json
script.cljs

Macros

Nbb has first class support for macros: you can define them right inside your .cljs file, like you are used to from JVM Clojure. Consider the plet macro to make working with promises more palatable:

(defmacro plet
  [bindings & body]
  (let [binding-pairs (reverse (partition 2 bindings))
        body (cons 'do body)]
    (reduce (fn [body [sym expr]]
              (let [expr (list '.resolve 'js/Promise expr)]
                (list '.then expr (list 'clojure.core/fn (vector sym)
                                        body))))
            body
            binding-pairs)))

Using this macro we can look async code more like sync code. Consider this puppeteer example:

(-> (.launch puppeteer)
      (.then (fn [browser]
               (-> (.newPage browser)
                   (.then (fn [page]
                            (-> (.goto page "https://clojure.org")
                                (.then #(.screenshot page #js{:path "screenshot.png"}))
                                (.catch #(js/console.log %))
                                (.then #(.close browser)))))))))

Using plet this becomes:

(plet [browser (.launch puppeteer)
       page (.newPage browser)
       _ (.goto page "https://clojure.org")
       _ (-> (.screenshot page #js{:path "screenshot.png"})
             (.catch #(js/console.log %)))]
      (.close browser))

See the puppeteer example for the full code.

Since v0.0.36, nbb includes promesa which is a library to deal with promises. The above plet macro is similar to promesa.core/let.

Startup time

$ time nbb -e '(+ 1 2 3)'
6
nbb -e '(+ 1 2 3)'   0.17s  user 0.02s system 109% cpu 0.168 total

The baseline startup time for a script is about 170ms seconds on my laptop. When invoked via npx this adds another 300ms or so, so for faster startup, either use a globally installed nbb or use $(npm bin)/nbb script.cljs to bypass npx.

Dependencies

NPM dependencies

Nbb does not depend on any NPM dependencies. All NPM libraries loaded by a script are resolved relative to that script. When using the Reagent module, React is resolved in the same way as any other NPM library.

Classpath

To load .cljs files from local paths or dependencies, you can use the --classpath argument. The current dir is added to the classpath automatically. So if there is a file foo/bar.cljs relative to your current dir, then you can load it via (:require [foo.bar :as fb]). Note that nbb uses the same naming conventions for namespaces and directories as other Clojure tools: foo-bar in the namespace name becomes foo_bar in the directory name.

To load dependencies from the Clojure ecosystem, you can use the Clojure CLI or babashka to download them and produce a classpath:

$ classpath="$(clojure -A:nbb -Spath -Sdeps '{:aliases {:nbb {:replace-deps {com.github.seancorfield/honeysql {:git/tag "v2.0.0-rc5" :git/sha "01c3a55"}}}}}')"

and then feed it to the --classpath argument:

$ nbb --classpath "$classpath" -e "(require '[honey.sql :as sql]) (sql/format {:select :foo :from :bar :where [:= :baz 2]})"
["SELECT foo FROM bar WHERE baz = ?" 2]

Currently nbb only reads from directories, not jar files, so you are encouraged to use git libs. Support for .jar files will be added later.

Current file

The name of the file that is currently being executed is available via nbb.core/*file* or on the metadata of vars:

(ns foo
  (:require [nbb.core :refer [*file*]]))

(prn *file*) ;; "/private/tmp/foo.cljs"

(defn f [])
(prn (:file (meta #'f))) ;; "/private/tmp/foo.cljs"

Reagent

Nbb includes reagent.core which will be lazily loaded when required. You can use this together with ink to create a TUI application:

$ npm install ink

ink-demo.cljs:

(ns ink-demo
  (:require ["ink" :refer [render Text]]
            [reagent.core :as r]))

(defonce state (r/atom 0))

(doseq [n (range 1 11)]
  (js/setTimeout #(swap! state inc) (* n 500)))

(defn hello []
  [:> Text {:color "green"} "Hello, world! " @state])

(render (r/as-element [hello]))

Promesa

Working with callbacks and promises can become tedious. Since nbb v0.0.36 the promesa.core namespace is included with the let and do! macros. An example:

(ns prom
  (:require [promesa.core :as p]))

(defn sleep [ms]
  (js/Promise.
   (fn [resolve _]
     (js/setTimeout resolve ms))))

(defn do-stuff
  []
  (p/do!
   (println "Doing stuff which takes a while")
   (sleep 1000)
   1))

(p/let [a (do-stuff)
        b (inc a)
        c (do-stuff)
        d (+ b c)]
  (prn d))
$ nbb prom.cljs
Doing stuff which takes a while
Doing stuff which takes a while
3

Also see API docs.

Js-interop

Since nbb v0.0.75 applied-science/js-interop is available:

(ns example
  (:require [applied-science.js-interop :as j]))

(def o (j/lit {:a 1 :b 2 :c {:d 1}}))

(prn (j/select-keys o [:a :b])) ;; #js {:a 1, :b 2}
(prn (j/get-in o [:c :d])) ;; 1

Most of this library is supported in nbb, except the following:

  • destructuring using :syms
  • property access using .-x notation. In nbb, you must use keywords.

See the example of what is currently supported.

Examples

See the examples directory for small examples.

Also check out these projects built with nbb:

API

See API documentation.

Migrating to shadow-cljs

See this gist on how to convert an nbb script or project to shadow-cljs.

Build

Prequisites:

  • babashka >= 0.4.0
  • Clojure CLI >= 1.10.3.933
  • Node.js 16.5.0 (lower version may work, but this is the one I used to build)

To build:

  • Clone and cd into this repo
  • bb release

Run bb tasks for more project-related tasks.

Download Details:
Author: borkdude
Download Link: Download The Source Code
Official Website: https://github.com/borkdude/nbb 
License: EPL-1.0

#node #javascript

Hire Dedicated Node.js Developers - Hire Node.js Developers

If you look at the backend technology used by today’s most popular apps there is one thing you would find common among them and that is the use of NodeJS Framework. Yes, the NodeJS framework is that effective and successful.

If you wish to have a strong backend for efficient app performance then have NodeJS at the backend.

WebClues Infotech offers different levels of experienced and expert professionals for your app development needs. So hire a dedicated NodeJS developer from WebClues Infotech with your experience requirement and expertise.

So what are you waiting for? Get your app developed with strong performance parameters from WebClues Infotech

For inquiry click here: https://www.webcluesinfotech.com/hire-nodejs-developer/

Book Free Interview: https://bit.ly/3dDShFg

#hire dedicated node.js developers #hire node.js developers #hire top dedicated node.js developers #hire node.js developers in usa & india #hire node js development company #hire the best node.js developers & programmers

Callum Slater

Callum Slater

1653465344

PySpark Cheat Sheet: Spark DataFrames in Python

This PySpark SQL cheat sheet is your handy companion to Apache Spark DataFrames in Python and includes code samples.

You'll probably already know about Apache Spark, the fast, general and open-source engine for big data processing; It has built-in modules for streaming, SQL, machine learning and graph processing. Spark allows you to speed analytic applications up to 100 times faster compared to other technologies on the market today. Interfacing Spark with Python is easy with PySpark: this Spark Python API exposes the Spark programming model to Python. 

Now, it's time to tackle the Spark SQL module, which is meant for structured data processing, and the DataFrame API, which is not only available in Python, but also in Scala, Java, and R.

Without further ado, here's the cheat sheet:

PySpark SQL cheat sheet

This PySpark SQL cheat sheet covers the basics of working with the Apache Spark DataFrames in Python: from initializing the SparkSession to creating DataFrames, inspecting the data, handling duplicate values, querying, adding, updating or removing columns, grouping, filtering or sorting data. You'll also see that this cheat sheet also on how to run SQL Queries programmatically, how to save your data to parquet and JSON files, and how to stop your SparkSession.

Spark SGlL is Apache Spark's module for working with structured data.

Initializing SparkSession 
 

A SparkSession can be used create DataFrame, register DataFrame as tables, execute SGL over tables, cache tables, and read parquet files.

>>> from pyspark.sql import SparkSession
>>> spark a SparkSession \
     .builder\
     .appName("Python Spark SQL basic example") \
     .config("spark.some.config.option", "some-value") \
     .getOrCreate()

Creating DataFrames
 

Fromm RDDs

>>> from pyspark.sql.types import*

Infer Schema

>>> sc = spark.sparkContext
>>> lines = sc.textFile(''people.txt'')
>>> parts = lines.map(lambda l: l.split(","))
>>> people = parts.map(lambda p: Row(nameap[0],ageaint(p[l])))
>>> peopledf = spark.createDataFrame(people)

Specify Schema

>>> people = parts.map(lambda p: Row(name=p[0],
               age=int(p[1].strip())))
>>>  schemaString = "name age"
>>> fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()]
>>> schema = StructType(fields)
>>> spark.createDataFrame(people, schema).show()

 

From Spark Data Sources
JSON

>>>  df = spark.read.json("customer.json")
>>> df.show()

>>>  df2 = spark.read.load("people.json", format="json")

Parquet files

>>> df3 = spark.read.load("users.parquet")

TXT files

>>> df4 = spark.read.text("people.txt")

Filter 

#Filter entries of age, only keep those records of which the values are >24
>>> df.filter(df["age"]>24).show()

Duplicate Values 

>>> df = df.dropDuplicates()

Queries 
 

>>> from pyspark.sql import functions as F

Select

>>> df.select("firstName").show() #Show all entries in firstName column
>>> df.select("firstName","lastName") \
      .show()
>>> df.select("firstName", #Show all entries in firstName, age and type
              "age",
              explode("phoneNumber") \
              .alias("contactInfo")) \
      .select("contactInfo.type",
              "firstName",
              "age") \
      .show()
>>> df.select(df["firstName"],df["age"]+ 1) #Show all entries in firstName and age, .show() add 1 to the entries of age
>>> df.select(df['age'] > 24).show() #Show all entries where age >24

When

>>> df.select("firstName", #Show firstName and 0 or 1 depending on age >30
               F.when(df.age > 30, 1) \
              .otherwise(0)) \
      .show()
>>> df[df.firstName.isin("Jane","Boris")] #Show firstName if in the given options
.collect()

Like 

>>> df.select("firstName", #Show firstName, and lastName is TRUE if lastName is like Smith
              df.lastName.like("Smith")) \
     .show()

Startswith - Endswith 

>>> df.select("firstName", #Show firstName, and TRUE if lastName starts with Sm
              df.lastName \
                .startswith("Sm")) \
      .show()
>>> df.select(df.lastName.endswith("th"))\ #Show last names ending in th
      .show()

Substring 

>>> df.select(df.firstName.substr(1, 3) \ #Return substrings of firstName
                          .alias("name")) \
        .collect()

Between 

>>> df.select(df.age.between(22, 24)) \ #Show age: values are TRUE if between 22 and 24
          .show()

Add, Update & Remove Columns 

Adding Columns

 >>> df = df.withColumn('city',df.address.city) \
            .withColumn('postalCode',df.address.postalCode) \
            .withColumn('state',df.address.state) \
            .withColumn('streetAddress',df.address.streetAddress) \
            .withColumn('telePhoneNumber', explode(df.phoneNumber.number)) \
            .withColumn('telePhoneType', explode(df.phoneNumber.type)) 

Updating Columns

>>> df = df.withColumnRenamed('telePhoneNumber', 'phoneNumber')

Removing Columns

  >>> df = df.drop("address", "phoneNumber")
 >>> df = df.drop(df.address).drop(df.phoneNumber)
 

Missing & Replacing Values 
 

>>> df.na.fill(50).show() #Replace null values
 >>> df.na.drop().show() #Return new df omitting rows with null values
 >>> df.na \ #Return new df replacing one value with another
       .replace(10, 20) \
       .show()

GroupBy 

>>> df.groupBy("age")\ #Group by age, count the members in the groups
      .count() \
      .show()

Sort 
 

>>> peopledf.sort(peopledf.age.desc()).collect()
>>> df.sort("age", ascending=False).collect()
>>> df.orderBy(["age","city"],ascending=[0,1])\
     .collect()

Repartitioning 

>>> df.repartition(10)\ #df with 10 partitions
      .rdd \
      .getNumPartitions()
>>> df.coalesce(1).rdd.getNumPartitions() #df with 1 partition

Running Queries Programmatically 
 

Registering DataFrames as Views

>>> peopledf.createGlobalTempView("people")
>>> df.createTempView("customer")
>>> df.createOrReplaceTempView("customer")

Query Views

>>> df5 = spark.sql("SELECT * FROM customer").show()
>>> peopledf2 = spark.sql("SELECT * FROM global_temp.people")\
               .show()

Inspect Data 
 

>>> df.dtypes #Return df column names and data types
>>> df.show() #Display the content of df
>>> df.head() #Return first n rows
>>> df.first() #Return first row
>>> df.take(2) #Return the first n rows >>> df.schema Return the schema of df
>>> df.describe().show() #Compute summary statistics >>> df.columns Return the columns of df
>>> df.count() #Count the number of rows in df
>>> df.distinct().count() #Count the number of distinct rows in df
>>> df.printSchema() #Print the schema of df
>>> df.explain() #Print the (logical and physical) plans

Output

Data Structures 
 

 >>> rdd1 = df.rdd #Convert df into an RDD
 >>> df.toJSON().first() #Convert df into a RDD of string
 >>> df.toPandas() #Return the contents of df as Pandas DataFrame

Write & Save to Files 

>>> df.select("firstName", "city")\
       .write \
       .save("nameAndCity.parquet")
 >>> df.select("firstName", "age") \
       .write \
       .save("namesAndAges.json",format="json")

Stopping SparkSession 

>>> spark.stop()

Have this Cheat Sheet at your fingertips

Original article source at https://www.datacamp.com

#pyspark #cheatsheet #spark #dataframes #python #bigdata

Aria Barnes

Aria Barnes

1622719015

Why use Node.js for Web Development? Benefits and Examples of Apps

Front-end web development has been overwhelmed by JavaScript highlights for quite a long time. Google, Facebook, Wikipedia, and most of all online pages use JS for customer side activities. As of late, it additionally made a shift to cross-platform mobile development as a main technology in React Native, Nativescript, Apache Cordova, and other crossover devices. 

Throughout the most recent couple of years, Node.js moved to backend development as well. Designers need to utilize a similar tech stack for the whole web project without learning another language for server-side development. Node.js is a device that adjusts JS usefulness and syntax to the backend. 

What is Node.js? 

Node.js isn’t a language, or library, or system. It’s a runtime situation: commonly JavaScript needs a program to work, however Node.js makes appropriate settings for JS to run outside of the program. It’s based on a JavaScript V8 motor that can run in Chrome, different programs, or independently. 

The extent of V8 is to change JS program situated code into machine code — so JS turns into a broadly useful language and can be perceived by servers. This is one of the advantages of utilizing Node.js in web application development: it expands the usefulness of JavaScript, permitting designers to coordinate the language with APIs, different languages, and outside libraries.

What Are the Advantages of Node.js Web Application Development? 

Of late, organizations have been effectively changing from their backend tech stacks to Node.js. LinkedIn picked Node.js over Ruby on Rails since it took care of expanding responsibility better and decreased the quantity of servers by multiple times. PayPal and Netflix did something comparative, just they had a goal to change their design to microservices. We should investigate the motivations to pick Node.JS for web application development and when we are planning to hire node js developers. 

Amazing Tech Stack for Web Development 

The principal thing that makes Node.js a go-to environment for web development is its JavaScript legacy. It’s the most well known language right now with a great many free devices and a functioning local area. Node.js, because of its association with JS, immediately rose in ubiquity — presently it has in excess of 368 million downloads and a great many free tools in the bundle module. 

Alongside prevalence, Node.js additionally acquired the fundamental JS benefits: 

  • quick execution and information preparing; 
  • exceptionally reusable code; 
  • the code is not difficult to learn, compose, read, and keep up; 
  • tremendous asset library, a huge number of free aides, and a functioning local area. 

In addition, it’s a piece of a well known MEAN tech stack (the blend of MongoDB, Express.js, Angular, and Node.js — four tools that handle all vital parts of web application development). 

Designers Can Utilize JavaScript for the Whole Undertaking 

This is perhaps the most clear advantage of Node.js web application development. JavaScript is an unquestionable requirement for web development. Regardless of whether you construct a multi-page or single-page application, you need to know JS well. On the off chance that you are now OK with JavaScript, learning Node.js won’t be an issue. Grammar, fundamental usefulness, primary standards — every one of these things are comparable. 

In the event that you have JS designers in your group, it will be simpler for them to learn JS-based Node than a totally new dialect. What’s more, the front-end and back-end codebase will be basically the same, simple to peruse, and keep up — in light of the fact that they are both JS-based. 

A Quick Environment for Microservice Development 

There’s another motivation behind why Node.js got famous so rapidly. The environment suits well the idea of microservice development (spilling stone monument usefulness into handfuls or many more modest administrations). 

Microservices need to speak with one another rapidly — and Node.js is probably the quickest device in information handling. Among the fundamental Node.js benefits for programming development are its non-obstructing algorithms.

Node.js measures a few demands all at once without trusting that the first will be concluded. Many microservices can send messages to one another, and they will be gotten and addressed all the while. 

Versatile Web Application Development 

Node.js was worked in view of adaptability — its name really says it. The environment permits numerous hubs to run all the while and speak with one another. Here’s the reason Node.js adaptability is better than other web backend development arrangements. 

Node.js has a module that is liable for load adjusting for each running CPU center. This is one of numerous Node.js module benefits: you can run various hubs all at once, and the environment will naturally adjust the responsibility. 

Node.js permits even apportioning: you can part your application into various situations. You show various forms of the application to different clients, in light of their age, interests, area, language, and so on. This builds personalization and diminishes responsibility. Hub accomplishes this with kid measures — tasks that rapidly speak with one another and share a similar root. 

What’s more, Node’s non-hindering solicitation handling framework adds to fast, letting applications measure a great many solicitations. 

Control Stream Highlights

Numerous designers consider nonconcurrent to be one of the two impediments and benefits of Node.js web application development. In Node, at whatever point the capacity is executed, the code consequently sends a callback. As the quantity of capacities develops, so does the number of callbacks — and you end up in a circumstance known as the callback damnation. 

In any case, Node.js offers an exit plan. You can utilize systems that will plan capacities and sort through callbacks. Systems will associate comparable capacities consequently — so you can track down an essential component via search or in an envelope. At that point, there’s no compelling reason to look through callbacks.

 

Final Words

So, these are some of the top benefits of Nodejs in web application development. This is how Nodejs is contributing a lot to the field of web application development. 

I hope now you are totally aware of the whole process of how Nodejs is really important for your web project. If you are looking to hire a node js development company in India then I would suggest that you take a little consultancy too whenever you call. 

Good Luck!

Original Source

#node.js development company in india #node js development company #hire node js developers #hire node.js developers in india #node.js development services #node.js development