Clemmie  Will

Clemmie Will

1604649205

R for Data Analytics: A Beginners Guide

If you are thinking of getting into R, this article will give you a starting point. Through this article, I have tried to give a basic insight into data analytics using R.

Installing R and R studio

You can download the setup file for R from “here”. Once this is sorted, you will need an IDE to start programming in R. RStudio will do just fine for an IDE and you can download a free desktop version from “here”. After downloading and installing the aforementioned software, you are all set to begin your programming journey with R. Now, you may open R Studio, click on File, New File, and lastly on R Script.

Let’s begin Data analytics with R.

Installing Packages and Importing Libraries

You need not reinvent the wheel every time you build a car. So is true for programming as well. A library is a collection of functions that are developed to perform certain tasks. So each time a programmer writes a code, instead of writing tens and hundreds of lines just to perform a simple operation such as finding a square root, he/she and directly use the readily available function in the default library of R. Packages are a collection of libraries.

Since the purpose of this article is just to familiarize with the basics of the R, we will be generally focusing on data wrangling and data visualization aspect of data analytics. I will cover modeling and other high-level concepts in the follow-up articles.

For data wrangling we will be using the following libraries:

  • dplyr: The go-to package for data wrangling. used for manipulating rows and columns of data sets and even joining separate data sets together.
  • tidyr: Pretty much as the name suggests, used for tidying the data.
  • lubridate: Used to manipulate and re-format dates and time.
  • **forcats: **Used to handle categorical variables. More on this in the follow-ups.

For data visualization, **ggplot2 **has pretty much everything you will be needing.

The list of useful libraries goes long, however, don’t fret too much about memorizing the name of every library required for effectively. For this purpose, we have packages like “tidyverse”. This package has got all the aforementioned libraries and many more. So let’s install this package to R and import it to our program with the below lines of instruction:

#Installs package
install.packages(“tidyverse”) 

#Load core tidyverse package
library(tidyverse)

Importing the Data set

For analysis, we will be using a data set of 1000 most popular based on IMDB reviews. The data has been collected from “Kaggle” and has been compiled by the data creator promptcloud. The data set was last updated in June 2017 therefore, the accuracy in today’s scenario cannot be ascertained. However, we are using this data set for demonstration purposes only so it will serve just fine for our purpose. After downloading the data set from the above link and placing it in a directory of our choice, now we are ready to import the data to our R script.

mydata = read.csv("<path>") #Replace <path> with the path of file

If you have stored your file in the working directory, you can directly call out the file name. But before that, you will have to set up the folder where you have saved the file as a working directory. Post which you can directly call out the file name.

setwd("<path>")

#Data set will be stored in mydata data frame
mydata = read.csv("IMDB_moviedata.csv") 

In my case, the file is stored as “IMDB_moviedata.csv”. CSV means comma-separated values. In this format, each element of data is separated by a comma. Although other characters can also be used to separate the elements of the data. To get more detail you can go through the R documentation by using the following command:

help(read.csv) #or
?read.csv

Tidying up the data

Now we have our data set imported, but in most cases, we cannot use it directly as it may not be correctly ordered or it might contain some features which are not required for our analysis. Before doing that, let’s first have a look at our data. You can use all or any of the below listed commands to do so.

head(mydata) #or
summary(mydata) #or
glimpse(mydata)

Below is a snippet of results from the glimpse function.

Image for post

So let’s begin tidying up the data.

#removing Description column
mydata <- select(mydata, -Description) 

#Assign NA to all blank entries
mydata <- mydata %>% mutate_all(na_if,"")
#Remove rows with NAs and store as mydata_cleaned
mydata_cleaned <- na.omit(mydata)
#Renaming the columns
mydata_cleaned %>% rename(Runtime_mins = Runtime..Minutes., Revenue_mills = Revenue..Millions.) -> mydata_cleaned
#Get the summary of clean data
summary(mydata_cleaned)

Data Visualization and Analysis

In this section, we will be performing some basic analysis of the imported data set. At first, let us compare the earnings of the top 10 rated movies. In our data set the movies have already been arranged as per their ranks. We will start by picking the top 10 movies from the data frame and comparing the revenue earned by them.

#selecting top 10 movies
top_ten <- head(mydata_cleaned, 10)

#subsetting revenue variable
var1 <- c("Rank", "Title", "Revenue_mills")
revenue_data <- top_ten[var1]
#viewing subsetted data frame
head(revenue_data)

Below is the data frame that we will be using to carry out this piece of analysis. Note that Rank 8 is missing. This is because rows with missing values were removed during the data cleaning process.

Image for post

#Data visualization

#Layer 1: column plot
a <- ggplot(data = revenue_data, mapping = aes(x = reorder(Title, Rank), y = Revenue_mills)) + 
geom_col(mapping = aes(fill = Title, color = Title), alpha = .7, size = 1.1, show.legend = FALSE) + 
labs(x = "Title", y = "Revenue in millions", title = "Earnings of top 10 movies") +
theme(axis.text.x = element_text(angle = 90, size = 5, vjust = 0.4, hjust = 1), plot.title = element_text(size = 15, vjust = 2),axis.title.x = element_text(size = 12, vjust = -0.35))
#Layer 2: label to show rank
b <- geom_label(mapping = aes(label=Rank), fill = "red", size = 4, color = "white", hjust=0.6)
#adding Layer 1 and Layer 2
p1 = a + b
#printing the graph
p1
#Note: For more help on plots check help(geom_col) and help(geom_label)

#data-science #data-analysis #r #developer

What is GEEK

Buddha Community

R for Data Analytics: A Beginners Guide
 iOS App Dev

iOS App Dev

1620466520

Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

CSharp REPL: A Command Line C# REPL with Syntax Highlighting

C# REPL

A cross-platform command line REPL for the rapid experimentation and exploration of C#. It supports intellisense, installing NuGet packages, and referencing local .NET projects and assemblies.

C# REPL screenshot 

(click to view animation)

C# REPL provides the following features:

  • Syntax highlighting via ANSI escape sequences
  • Intellisense with fly-out documentation
  • Nuget package installation
  • Reference local assemblies, solutions, and projects
  • Navigate to source via Source Link
  • IL disassembly (both Debug and Release mode)
  • Fast and flicker-free rendering. A "diff" algorithm is used to only render what's changed.

Installation

C# REPL is a .NET 6 global tool, and runs on Windows 10, Mac OS, and Linux. It can be installed via:

dotnet tool install -g csharprepl

If you're running on Mac OS Catalina (10.15) or later, make sure you follow any additional directions printed to the screen. You may need to update your PATH variable in order to use .NET global tools.

After installation is complete, run csharprepl to begin. C# REPL can be updated via dotnet tool update -g csharprepl.

Usage:

Run csharprepl from the command line to begin an interactive session. The default colorscheme uses the color palette defined by your terminal, but these colors can be changed using a theme.json file provided as a command line argument.

Evaluating Code

Type some C# into the prompt and press Enter to run it. The result, if any, will be printed:

> Console.WriteLine("Hello World")
Hello World

> DateTime.Now.AddDays(8)
[6/7/2021 5:13:00 PM]

To evaluate multiple lines of code, use Shift+Enter to insert a newline:

> var x = 5;
  var y = 8;
  x * y
40

Additionally, if the statement is not a "complete statement" a newline will automatically be inserted when Enter is pressed. For example, in the below code, the first line is not a syntactically complete statement, so when we press enter we'll go down to a new line:

> if (x == 5)
  | // caret position, after we press Enter on Line 1

Finally, pressing Ctrl+Enter will show a "detailed view" of the result. For example, for the DateTime.Now expression below, on the first line we pressed Enter, and on the second line we pressed Ctrl+Enter to view more detailed output:

> DateTime.Now // Pressing Enter shows a reasonable representation
[5/30/2021 5:13:00 PM]

> DateTime.Now // Pressing Ctrl+Enter shows a detailed representation
[5/30/2021 5:13:00 PM] {
  Date: [5/30/2021 12:00:00 AM],
  Day: 30,
  DayOfWeek: Sunday,
  DayOfYear: 150,
  Hour: 17,
  InternalKind: 9223372036854775808,
  InternalTicks: 637579915804530992,
  Kind: Local,
  Millisecond: 453,
  Minute: 13,
  Month: 5,
  Second: 0,
  Ticks: 637579915804530992,
  TimeOfDay: [17:13:00.4530992],
  Year: 2021,
  _dateData: 9860951952659306800
}

A note on semicolons: C# expressions do not require semicolons, but statements do. If a statement is missing a required semicolon, a newline will be added instead of trying to run the syntatically incomplete statement; simply type the semicolon to complete the statement.

> var now = DateTime.Now; // assignment statement, semicolon required

> DateTime.Now.AddDays(8) // expression, we don't need a semicolon
[6/7/2021 5:03:05 PM]

Keyboard Shortcuts

  • Basic Usage
    • Ctrl+C - Cancel current line
    • Ctrl+L - Clear screen
    • Enter - Evaluate the current line if it's a syntactically complete statement; otherwise add a newline
    • Control+Enter - Evaluate the current line, and return a more detailed representation of the result
    • Shift+Enter - Insert a new line (this does not currently work on Linux or Mac OS; Hopefully this will work in .NET 7)
    • Ctrl+Shift+C - Copy current line to clipboard
    • Ctrl+V, Shift+Insert, and Ctrl+Shift+V - Paste text to prompt. Automatically trims leading indent
  • Code Actions
    • F1 - Opens the MSDN documentation for the class/method under the caret (example)
    • F9 - Shows the IL (intermediate language) for the current statement in Debug mode.
    • Ctrl+F9 - Shows the IL for the current statement with Release mode optimizations.
    • F12 - Opens the source code in the browser for the class/method under the caret, if the assembly supports Source Link.
  • Autocompletion
    • Ctrl+Space - Open autocomplete menu. If there's a single option, pressing Ctrl+Space again will select the option
    • Enter, Right Arrow, Tab - Select active autocompletion option
    • Escape - closes autocomplete menu
  • Text Navigation
    • Home and End - Navigate to beginning of a single line and end of a single line, respectively
    • Ctrl+Home and Ctrl+End - Navigate to beginning of line and end across multiple lines in a multiline prompt, respectively
    • Arrows - Navigate characters within text
    • Ctrl+Arrows - Navigate words within text
    • Ctrl+Backspace - Delete previous word
    • Ctrl+Delete - Delete next word

Adding References

Use the #r command to add assembly or nuget references.

  • For assembly references, run #r "AssemblyName" or #r "path/to/assembly.dll"
  • For project references, run #r "path/to/project.csproj". Solution files (.sln) can also be referenced.
  • For nuget references, run #r "nuget: PackageName" to install the latest version of a package, or #r "nuget: PackageName, 13.0.5" to install a specific version (13.0.5 in this case).

Installing nuget packages

To run ASP.NET applications inside the REPL, start the csharprepl application with the --framework parameter, specifying the Microsoft.AspNetCore.App shared framework. Then, use the above #r command to reference the application DLL. See the Command Line Configuration section below for more details.

csharprepl --framework  Microsoft.AspNetCore.App

Command Line Configuration

The C# REPL supports multiple configuration flags to control startup, behavior, and appearance:

csharprepl [OPTIONS] [response-file.rsp] [script-file.csx] [-- <additional-arguments>]

Supported options are:

  • OPTIONS:
    • -r <dll> or --reference <dll>: Reference an assembly, project file, or nuget package. Can be specified multiple times. Uses the same syntax as #r statements inside the REPL. For example, csharprepl -r "nuget:Newtonsoft.Json" "path/to/myproj.csproj"
      • When an assembly or project is referenced, assemblies in the containing directory will be added to the assembly search path. This means that you don't need to manually add references to all of your assembly's dependencies (e.g. other references and nuget packages). Referencing the main entry assembly is enough.
    • -u <namespace> or --using <namespace>: Add a using statement. Can be specified multiple times.
    • -f <framework> or --framework <framework>: Reference a shared framework. The available shared frameworks depends on the local .NET installation, and can be useful when running an ASP.NET application from the REPL. Example frameworks are:
      • Microsoft.NETCore.App (default)
      • Microsoft.AspNetCore.All
      • Microsoft.AspNetCore.App
      • Microsoft.WindowsDesktop.App
    • -t <theme.json> or --theme <theme.json>: Read a theme file for syntax highlighting. This theme file associates C# syntax classifications with colors. The color values can be full RGB, or ANSI color names (defined in your terminal's theme). The NO_COLOR standard is supported.
    • --trace: Produce a trace file in the current directory that logs CSharpRepl internals. Useful for CSharpRepl bug reports.
    • -v or --version: Show version number and exit.
    • -h or --help: Show help and exit.
  • response-file.rsp: A filepath of an .rsp file, containing any of the above command line options.
  • script-file.csx: A filepath of a .csx file, containing lines of C# to evaluate before starting the REPL. Arguments to this script can be passed as <additional-arguments>, after a double hyphen (--), and will be available in a global args variable.

If you have dotnet-suggest enabled, all options can be tab-completed, including values provided to --framework and .NET namespaces provided to --using.

Integrating with other software

C# REPL is a standalone software application, but it can be useful to integrate it with other developer tools:

Windows Terminal

To add C# REPL as a menu entry in Windows Terminal, add the following profile to Windows Terminal's settings.json configuration file (under the JSON property profiles.list):

{
    "name": "C# REPL",
    "commandline": "csharprepl"
},

To get the exact colors shown in the screenshots in this README, install the Windows Terminal Dracula theme.

Visual Studio Code

To use the C# REPL with Visual Studio Code, simply run the csharprepl command in the Visual Studio Code terminal. To send commands to the REPL, use the built-in Terminal: Run Selected Text In Active Terminal command from the Command Palette (workbench.action.terminal.runSelectedText).

Visual Studio Code screenshot

Windows OS

To add the C# REPL to the Windows Start Menu for quick access, you can run the following PowerShell command, which will start C# REPL in Windows Terminal:

$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut("$env:appdata\Microsoft\Windows\Start Menu\Programs\csharprepl.lnk")
$shortcut.TargetPath = "wt.exe"
$shortcut.Arguments = "-w 0 nt csharprepl.exe"
$shortcut.Save()

You may also wish to add a shorter alias for C# REPL, which can be done by creating a .cmd file somewhere on your path. For example, put the following contents in C:\Users\username\.dotnet\tools\csr.cmd:

wt -w 0 nt csharprepl

This will allow you to launch C# REPL by running csr from anywhere that accepts Windows commands, like the Window Run dialog.

Comparison with other REPLs

This project is far from being the first REPL for C#. Here are some other projects; if this project doesn't suit you, another one might!

Visual Studio's C# Interactive pane is full-featured (it has syntax highlighting and intellisense) and is part of Visual Studio. This deep integration with Visual Studio is both a benefit from a workflow perspective, and a drawback as it's not cross-platform. As far as I know, the C# Interactive pane does not support NuGet packages or navigating to documentation/source code. Subjectively, it does not follow typical command line keybindings, so can feel a bit foreign.

csi.exe ships with C# and is a command line REPL. It's great because it's a cross platform REPL that comes out of the box, but it doesn't support syntax highlighting or autocompletion.

dotnet script allows you to run C# scripts from the command line. It has a REPL built-in, but the predominant focus seems to be as a script runner. It's a great tool, though, and has a strong community following.

dotnet interactive is a tool from Microsoft that creates a Jupyter notebook for C#, runnable through Visual Studio Code. It also provides a general framework useful for running REPLs.

Download Details:
Author: waf
Source Code: https://github.com/waf/CSharpRepl
License: MPL-2.0 License

#dotnet  #aspdotnet  #csharp 

Gerhard  Brink

Gerhard Brink

1620629020

Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.

Introduction

As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).


This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

Ian  Robinson

Ian Robinson

1624399200

Top 10 Big Data Tools for Data Management and Analytics

Introduction to Big Data

What exactly is Big Data? Big Data is nothing but large and complex data sets, which can be both structured and unstructured. Its concept encompasses the infrastructures, technologies, and Big Data Tools created to manage this large amount of information.

To fulfill the need to achieve high-performance, Big Data Analytics tools play a vital role. Further, various Big Data tools and frameworks are responsible for retrieving meaningful information from a huge set of data.

List of Big Data Tools & Frameworks

The most important as well as popular Big Data Analytics Open Source Tools which are used in 2020 are as follows:

  1. Big Data Framework
  2. Data Storage Tools
  3. Data Visualization Tools
  4. Big Data Processing Tools
  5. Data Preprocessing Tools
  6. Data Wrangling Tools
  7. Big Data Testing Tools
  8. Data Governance Tools
  9. Security Management Tools
  10. Real-Time Data Streaming Tools

#big data engineering #top 10 big data tools for data management and analytics #big data tools for data management and analytics #tools for data management #analytics #top big data tools for data management and analytics

akshay L

akshay L

1571812278

Data Analytics For Beginners

In this data analytics for beginners video you will see introduction to data analytics, what is data analytics, who is a data analyst and role & responsibilities of a data analyst. There is a use case in data analytics as well to get hands on knowledge.

Why Data Analytics is important?

Data analysis is an internal organisational function performed by Data Analysts that is more than merely presenting numbers and figures to management. It requires a much more in-depth approach to recording, analyzing and dissecting data, and presenting the findings in an easily-digestible format.

Why should you opt for a Data Analytics career?

If you want to fast-track your career then you should strongly consider Data Analytics. The reason for this is that it is one of the fastest growing technology. There is a huge demand for Data Analyst. The salaries for Data Analytics is fantastic.There is a huge growth opportunity in this domain as well. Hence this Intellipaat Data Analytics tutorial is your stepping stone to a successful career!

#Data Analytics For Beginners #Introduction To Data Analytics #Data Analytics Training #Intellipaat #Data Analytics