Lindsey  Koepp

Lindsey Koepp

1599106380

How to Show off Your Data Science or Software Engineering Skills Effectively

What is the most important thing to do after you got your skills to be a data scientist? It has to be to show off your skills. Otherwise, there is no use of your skills. If you want to get a job or freelance or start a start-up, you have to show off your skills to people effectively.

Here are some effective ways to show off the skills and get involved in the data science community.

You should not start all at once. It will be overwhelming.

Start with one or two that you are most comfortable with. Slowly, you will feel like doing more.

Github

This is free and the easiest way to make a profile. Whatever small big projects you have done for practice, make a nice and well-organized GitHub profile with them. Employers ask for a Github profile link in your job application. So it’s almost mandatory for coders to have a Github profile. It took me some time, in the beginning, to know that GitHub profiles can rank in Google. If you search for a certain project or topic on Google, you will see some Github profiles show up in the search results.

Try searching with this “linear regression from scratch in python, GitHub”.

You can have a nice portfolio for free to show off your skills. If you keep posting regularly, your profile will also become popular. Though it will take some time. Probably 7/8 months or a year. But even if your Github profile is not popular, still you will be able to use your Github profile link in your resume.

Social Media

I mostly use Facebook, Twitter, and Linkedin. On Facebook, there are a lot of groups that you can join very easily. You will find many people sharing their ideas, asking for help, and having conversations. I suggest, join some groups that you think are suitable for you and get involved in the conversation. Share your ideas, good articles, courses, videos, or resources that you know are helpful. When you will help others, you will get help back. Here are some Facebook groups that I joined:

Data Science World

Deep Learning and Machine Learning

Python Programming

Beginning Data Science, Analytics, Machine Learning, Data Mining, R, Python

Artificial Intelligence & Deep Learning

There are so many groups like that. Choose the ones you like. The same way, follow the people in the profession on Twitter and connect on Linked In.

I got my first internship through a connection on LinkedIn

I can talk only about these three social media platforms because I use them. If you use other platforms as well, use them for your professional networking.

#towards-data-science #data-science #data-analytics #machine-learning #software-engineering

What is GEEK

Buddha Community

How to Show off Your Data Science or Software Engineering Skills Effectively
Callum Slater

Callum Slater

1653465344

PySpark Cheat Sheet: Spark DataFrames in Python

This PySpark SQL cheat sheet is your handy companion to Apache Spark DataFrames in Python and includes code samples.

You'll probably already know about Apache Spark, the fast, general and open-source engine for big data processing; It has built-in modules for streaming, SQL, machine learning and graph processing. Spark allows you to speed analytic applications up to 100 times faster compared to other technologies on the market today. Interfacing Spark with Python is easy with PySpark: this Spark Python API exposes the Spark programming model to Python. 

Now, it's time to tackle the Spark SQL module, which is meant for structured data processing, and the DataFrame API, which is not only available in Python, but also in Scala, Java, and R.

Without further ado, here's the cheat sheet:

PySpark SQL cheat sheet

This PySpark SQL cheat sheet covers the basics of working with the Apache Spark DataFrames in Python: from initializing the SparkSession to creating DataFrames, inspecting the data, handling duplicate values, querying, adding, updating or removing columns, grouping, filtering or sorting data. You'll also see that this cheat sheet also on how to run SQL Queries programmatically, how to save your data to parquet and JSON files, and how to stop your SparkSession.

Spark SGlL is Apache Spark's module for working with structured data.

Initializing SparkSession 
 

A SparkSession can be used create DataFrame, register DataFrame as tables, execute SGL over tables, cache tables, and read parquet files.

>>> from pyspark.sql import SparkSession
>>> spark a SparkSession \
     .builder\
     .appName("Python Spark SQL basic example") \
     .config("spark.some.config.option", "some-value") \
     .getOrCreate()

Creating DataFrames
 

Fromm RDDs

>>> from pyspark.sql.types import*

Infer Schema

>>> sc = spark.sparkContext
>>> lines = sc.textFile(''people.txt'')
>>> parts = lines.map(lambda l: l.split(","))
>>> people = parts.map(lambda p: Row(nameap[0],ageaint(p[l])))
>>> peopledf = spark.createDataFrame(people)

Specify Schema

>>> people = parts.map(lambda p: Row(name=p[0],
               age=int(p[1].strip())))
>>>  schemaString = "name age"
>>> fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()]
>>> schema = StructType(fields)
>>> spark.createDataFrame(people, schema).show()

 

From Spark Data Sources
JSON

>>>  df = spark.read.json("customer.json")
>>> df.show()

>>>  df2 = spark.read.load("people.json", format="json")

Parquet files

>>> df3 = spark.read.load("users.parquet")

TXT files

>>> df4 = spark.read.text("people.txt")

Filter 

#Filter entries of age, only keep those records of which the values are >24
>>> df.filter(df["age"]>24).show()

Duplicate Values 

>>> df = df.dropDuplicates()

Queries 
 

>>> from pyspark.sql import functions as F

Select

>>> df.select("firstName").show() #Show all entries in firstName column
>>> df.select("firstName","lastName") \
      .show()
>>> df.select("firstName", #Show all entries in firstName, age and type
              "age",
              explode("phoneNumber") \
              .alias("contactInfo")) \
      .select("contactInfo.type",
              "firstName",
              "age") \
      .show()
>>> df.select(df["firstName"],df["age"]+ 1) #Show all entries in firstName and age, .show() add 1 to the entries of age
>>> df.select(df['age'] > 24).show() #Show all entries where age >24

When

>>> df.select("firstName", #Show firstName and 0 or 1 depending on age >30
               F.when(df.age > 30, 1) \
              .otherwise(0)) \
      .show()
>>> df[df.firstName.isin("Jane","Boris")] #Show firstName if in the given options
.collect()

Like 

>>> df.select("firstName", #Show firstName, and lastName is TRUE if lastName is like Smith
              df.lastName.like("Smith")) \
     .show()

Startswith - Endswith 

>>> df.select("firstName", #Show firstName, and TRUE if lastName starts with Sm
              df.lastName \
                .startswith("Sm")) \
      .show()
>>> df.select(df.lastName.endswith("th"))\ #Show last names ending in th
      .show()

Substring 

>>> df.select(df.firstName.substr(1, 3) \ #Return substrings of firstName
                          .alias("name")) \
        .collect()

Between 

>>> df.select(df.age.between(22, 24)) \ #Show age: values are TRUE if between 22 and 24
          .show()

Add, Update & Remove Columns 

Adding Columns

 >>> df = df.withColumn('city',df.address.city) \
            .withColumn('postalCode',df.address.postalCode) \
            .withColumn('state',df.address.state) \
            .withColumn('streetAddress',df.address.streetAddress) \
            .withColumn('telePhoneNumber', explode(df.phoneNumber.number)) \
            .withColumn('telePhoneType', explode(df.phoneNumber.type)) 

Updating Columns

>>> df = df.withColumnRenamed('telePhoneNumber', 'phoneNumber')

Removing Columns

  >>> df = df.drop("address", "phoneNumber")
 >>> df = df.drop(df.address).drop(df.phoneNumber)
 

Missing & Replacing Values 
 

>>> df.na.fill(50).show() #Replace null values
 >>> df.na.drop().show() #Return new df omitting rows with null values
 >>> df.na \ #Return new df replacing one value with another
       .replace(10, 20) \
       .show()

GroupBy 

>>> df.groupBy("age")\ #Group by age, count the members in the groups
      .count() \
      .show()

Sort 
 

>>> peopledf.sort(peopledf.age.desc()).collect()
>>> df.sort("age", ascending=False).collect()
>>> df.orderBy(["age","city"],ascending=[0,1])\
     .collect()

Repartitioning 

>>> df.repartition(10)\ #df with 10 partitions
      .rdd \
      .getNumPartitions()
>>> df.coalesce(1).rdd.getNumPartitions() #df with 1 partition

Running Queries Programmatically 
 

Registering DataFrames as Views

>>> peopledf.createGlobalTempView("people")
>>> df.createTempView("customer")
>>> df.createOrReplaceTempView("customer")

Query Views

>>> df5 = spark.sql("SELECT * FROM customer").show()
>>> peopledf2 = spark.sql("SELECT * FROM global_temp.people")\
               .show()

Inspect Data 
 

>>> df.dtypes #Return df column names and data types
>>> df.show() #Display the content of df
>>> df.head() #Return first n rows
>>> df.first() #Return first row
>>> df.take(2) #Return the first n rows >>> df.schema Return the schema of df
>>> df.describe().show() #Compute summary statistics >>> df.columns Return the columns of df
>>> df.count() #Count the number of rows in df
>>> df.distinct().count() #Count the number of distinct rows in df
>>> df.printSchema() #Print the schema of df
>>> df.explain() #Print the (logical and physical) plans

Output

Data Structures 
 

 >>> rdd1 = df.rdd #Convert df into an RDD
 >>> df.toJSON().first() #Convert df into a RDD of string
 >>> df.toPandas() #Return the contents of df as Pandas DataFrame

Write & Save to Files 

>>> df.select("firstName", "city")\
       .write \
       .save("nameAndCity.parquet")
 >>> df.select("firstName", "age") \
       .write \
       .save("namesAndAges.json",format="json")

Stopping SparkSession 

>>> spark.stop()

Have this Cheat Sheet at your fingertips

Original article source at https://www.datacamp.com

#pyspark #cheatsheet #spark #dataframes #python #bigdata

Uriah  Dietrich

Uriah Dietrich

1618449987

How To Build A Data Science Career In 2021

For this week’s data science career interview, we got in touch with Dr Suman Sanyal, Associate Professor of Computer Science and Engineering at NIIT University. In this interview, Dr Sanyal shares his insights on how universities can contribute to this highly promising sector and what aspirants can do to build a successful data science career.

With industry-linkage, technology and research-driven seamless education, NIIT University has been recognised for addressing the growing demand for data science experts worldwide with its industry-ready courses. The university has recently introduced B.Tech in Data Science course, which aims to deploy data sets models to solve real-world problems. The programme provides industry-academic synergy for the students to establish careers in data science, artificial intelligence and machine learning.

“Students with skills that are aligned to new-age technology will be of huge value. The industry today wants young, ambitious students who have the know-how on how to get things done,” Sanyal said.

#careers # #data science aspirant #data science career #data science career intervie #data science education #data science education marke #data science jobs #niit university data science

Sid  Schuppe

Sid Schuppe

1618370462

Level Up 7 Data Science Skills Through YouTube

We are all familiar with the modern game design, that champions or heroes are always equipped with certain attributes and specialties. For example, Dota heroes are scored based on the aspects of agility, intelligence, and strength. To excel on the battlefield, the hero needs to have above-average scores among all attributes while additionally specialized in at least one.
So what if we think of learning data science as playing a game where all of us possess multi-dimensional abilities. Playing video games demands constantly sharpening our skills with weapons, training, or magic potion. This resembles us learning through reading, tutorials, and of course those YouTube learning resources mentioned later in this article.
First of all, let’s walk through seven essential skills that guarantee your wins in the data science game.

#data-science-training #learn-data-science #data-science-skills #data-science-resources #data-science

Lindsey  Koepp

Lindsey Koepp

1599106380

How to Show off Your Data Science or Software Engineering Skills Effectively

What is the most important thing to do after you got your skills to be a data scientist? It has to be to show off your skills. Otherwise, there is no use of your skills. If you want to get a job or freelance or start a start-up, you have to show off your skills to people effectively.

Here are some effective ways to show off the skills and get involved in the data science community.

You should not start all at once. It will be overwhelming.

Start with one or two that you are most comfortable with. Slowly, you will feel like doing more.

Github

This is free and the easiest way to make a profile. Whatever small big projects you have done for practice, make a nice and well-organized GitHub profile with them. Employers ask for a Github profile link in your job application. So it’s almost mandatory for coders to have a Github profile. It took me some time, in the beginning, to know that GitHub profiles can rank in Google. If you search for a certain project or topic on Google, you will see some Github profiles show up in the search results.

Try searching with this “linear regression from scratch in python, GitHub”.

You can have a nice portfolio for free to show off your skills. If you keep posting regularly, your profile will also become popular. Though it will take some time. Probably 7/8 months or a year. But even if your Github profile is not popular, still you will be able to use your Github profile link in your resume.

Social Media

I mostly use Facebook, Twitter, and Linkedin. On Facebook, there are a lot of groups that you can join very easily. You will find many people sharing their ideas, asking for help, and having conversations. I suggest, join some groups that you think are suitable for you and get involved in the conversation. Share your ideas, good articles, courses, videos, or resources that you know are helpful. When you will help others, you will get help back. Here are some Facebook groups that I joined:

Data Science World

Deep Learning and Machine Learning

Python Programming

Beginning Data Science, Analytics, Machine Learning, Data Mining, R, Python

Artificial Intelligence & Deep Learning

There are so many groups like that. Choose the ones you like. The same way, follow the people in the profession on Twitter and connect on Linked In.

I got my first internship through a connection on LinkedIn

I can talk only about these three social media platforms because I use them. If you use other platforms as well, use them for your professional networking.

#towards-data-science #data-science #data-analytics #machine-learning #software-engineering

 iOS App Dev

iOS App Dev

1620466520

Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition