Jamison  Fisher

Jamison Fisher

1646998260

Pyreadr: A Python Package to Read and Write R RData & Pandas

pyreadr

A python package to read and write R RData and Rds files into/from pandas dataframes. It does not need to have R or other external dependencies installed. 

It can read mainly R data frames and tibbles. Also supports vectors, matrices, arrays and tables. R lists and R S4 objects (such as those from Bioconductor) are not supported. Please read the Known limitations section and the section on what objects can be read for more information. 

This package is based on the librdata C library by Evan Miller and a modified version of the cython wrapper around librdata jamovi-readstat by the Jamovi team.

Detailed documentation on all available methods is in the Module documentation

If you would like to read SPSS, SAS or STATA files into python in an easy way, take a look to pyreadstat, a wrapper around the C library ReadStat.

Moving from R to Python and fighting against indentation issues? Missing curly braces? Missing the <- operator for assignment? Then try PytwisteR! Python with a twist of R! (note: it works, but it's only a joke)

Dependencies

The package depends on pandas, which you normally have installed if you got Anaconda (highly recommended.) If creating a new conda or virtual environment or if you don't have it in your base installation, pandas should get installed automatically.

If you are reading 3D arrays, you will need to install xarray manually. This is not installed automatically as most users won't need it.

In order to compile from source, you will need a C compiler (see installation) and cython (version >= 0.28).

librdata also depends on zlib, bzip2 and lzma; it was reported not to be installed on Lubuntu or docker base ubuntu images. If you face this problem intalling the libraries solves it.

Installation

Using pip

Probably the easiest way: from your conda, virtualenv or just base installation do:

pip install pyreadr

If you are running on a machine without admin rights, and you want to install against your base installation you can do:

pip install pyreadr --user

We offer pre-compiled wheels for Windows, linux and macOs.

Using conda

The package is also available in conda-forge for windows, mac and linux 64 bit.

In order to install:

conda install -c conda-forge pyreadr 

From the latest sources

Download or clone the repo, open a command window and type:

python3 setup.py install

If you don't have admin privileges to the machine do:

python3 setup.py install --user

You can also install from the github repo directly (without cloning). Use the flag --user if necessary.

pip install git+https://github.com/ofajardo/pyreadr.git

You need a working C compiler and cython. You may also need to install bzlib (on ubuntu install libbz2-dev).

In order to run the tests:

python tests/test_basic.py

You can also install and test in place with:

python setup.py build_ext --inplace
python tests/test_basic.py --inplace

Usage

Basic Usage: reading files

Pass the path to a RData or Rds file to the function read_r. It will return a dictionary with object names as keys and pandas data frames as values.

For example, in order to read a RData file:

import pyreadr

result = pyreadr.read_r('test_data/basic/two.RData')

# done! let's see what we got
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1

reading a Rds file is equally simple. Rds files have one single object, which you can access with the key None:

import pyreadr

result = pyreadr.read_r('test_data/basic/one.Rds')

# done! let's see what we got
print(result.keys()) # let's check what objects we got: there is only None
df1 = result[None] # extract the pandas data frame for the only object available

Here there is a relation of all functions available. You can also check the Module documentation.

Function in this packagePurpose
read_rreads RData and Rds files
list_objectslist objects and column names contained in RData or Rds file
download_filedownload file from internet
write_rdatawrites RData files
write_rdswrites Rds files

Basic Usage: writing files

Pyreadr allows you to write one single pandas data frame into a single R dataframe and store it into a RData or Rds file. Other python or R object types are not supported. Writing more than one object is not supported.

import pyreadr
import pandas as pd

# prepare a pandas dataframe
df = pd.DataFrame([["a",1],["b",2]], columns=["A", "B"])

# let's write into RData
# df_name is the name for the dataframe in R, by default dataset
pyreadr.write_rdata("test.RData", df, df_name="dataset")

# now let's write a Rds
pyreadr.write_rds("test.Rds", df)

# done!

now you can check the result in R:

load("test.RData")
print(dataset)

dataset2 <- readRDS("test.Rds")
print(dataset2)

By default the resulting files will be uncompressed, you can activate gzip compression by passing the option compress="gzip". This is useful in case you have big files.

import pyreadr
import pandas as pd

# prepare a pandas dataframe
df = pd.DataFrame([["a",1],["b",2]], columns=["A", "B"])

# write a compressed RData file
pyreadr.write_rdata("test.RData", df, df_name="dataset", compress="gzip")

# write a compressed Rds file
pyreadr.write_rds("test.Rds", df, compress="gzip")

Reading files from internet

Librdata, the C backend of pyreadr absolutely needs a file in disk and only a string with the path can be passed as argument, therefore you cannot pass an url to pyreadr.read_r.

In order to help with this limitation, pyreadr provides a funtion download_file which as its name suggests downloads a file from an url to disk:

import pyreadr

url = "https://github.com/hadley/nycflights13/blob/master/data/airlines.rda?raw=true"
dst_path = "/some/path/on/disk/airlines.rda"
dst_path_again = pyreadr.download_file(url, dst_path)
res = pyreadr.read_r(dst_path)

As you see download_file returns the path where the file was written, therefore you can pass it to pyreadr.read_r directly:

import pyreadr

url = "https://github.com/hadley/nycflights13/blob/master/data/airlines.rda?raw=true"
dst_path = "/some/path/on/disk/airlines.rda"
res = pyreadr.read_r(pyreadr.download_file(url, dst_path), dst_path)

Reading selected objects

You can use the argument use_objects of the function read_r to specify which objects should be read.

import pyreadr

result = pyreadr.read_r('test_data/basic/two.RData', use_objects=["df1"])

# done! let's see what we got
print(result.keys()) # let's check what objects we got, now only df1 is listed
df1 = result["df1"] # extract the pandas data frame for object df1

List objects and column names

The function list_objects gives a dictionary with object names contained in the RData or Rds file as keys and a list of column names as values. It is not always possible to retrieve column names without reading the whole file in those cases you would get None instead of a column name.


import pyreadr

object_list = pyreadr.list_objects('test_data/basic/two.RData')

# done! let's see what we got
print(object_list) # let's check what objects we got and what columns those have

Reading timestamps and timezones

R Date objects are read as datetime.date objects.

R datetime objects (POSIXct and POSIXlt) are internally stored as UTC timestamps, and may have additional timezone information if the user set it explicitly. If no timezone information was set by the user R uses the local timezone for display.

librdata cannot retrieve that timezone information, therefore pyreadr display UTC time by default, which will not match the display in R. You can set explicitly some timezone (your local timezone for example) with the argument timezone for the function read_r

import pyreadr

result = pyreadr.read_r('test_data/basic/two.RData', timezone='CET')

if you would like to just use your local timezone as R does, you can get it with tzlocal (you need to install it first with pip) and pass the information to read_r:


import tzlocal
import pyreadr

my_timezone = tzlocal.get_localzone().zone
result = pyreadr.read_r('test_data/basic/two.RData', timezone=my_timezone)

If you have control over the data in R, a good option to avoid all of this is to transform the POSIX object to character, then transform it to a datetime in python.

When writing these kind of objects pyreadr transforms them to characters. Those can be easily transformed back to POSIX with as.POSIXct/lt (see later).

What objects can be read and written

Data frames composed of character, numeric (double), integer, timestamp (POSIXct and POSIXlt), date, logical atomic vectors. Factors are also supported.

Tibbles are also supported.

Atomic vectors as described before can also be directly read and are translated to a pandas data frame with one column.

Matrices, arrays and tables are also read and translated to pandas data frames (because those objects in R can be named, and plain numpy arrays do not support dimension names). The only exception is 3D arrays, which are translated to a xarray DataArray (as pandas does not support more than 2 dimensions). This is also the only time that an object different from a pandas dataframe is returned by read_r.

For 3D arrays, consider that python prints these in a different way as R does, but still you are looking at the same array (see for example here for an explanation.)

Only single pandas data frames can be written into R data frames.

Lists and S4 objects (such as those coming from Bioconductor are not supported. Please read the Known limitations section for more information.

More on writing files

For converting python/numpy types to R types the following rules are followed:

Python TypeR Type
np.int32 or lowerinteger
np.int64, np.floatnumeric
strcharacter
boollogical
datetime, datecharacter
categorydepends on the original dtype
any other objectcharacter
column all missinglogical
column with mixed typescharacter
  • datetime and date objects are translated to character to avoid problems with timezones. These characters can be easily translated back to POSIXct/lt in R using as.POSIXct/lt. The format of the datetimes/dates is prepared for this but can be controlled with the arguments dateformat and datetimeformat for write_rdata and write_rds. Those arguments take python standard formatting strings.
  • Pandas categories are NOT translated to R factors. Instead the original data type of the category is preserved and transformed according to the rules. This is because R factors are integers and levels are always strings, in pandas factors can be any type and leves any type as well, therefore it is not always adecquate to coerce everything to the integer/character system. In the other hand, pandas category level information is lost in the process.
  • Any other object is transformed to a character using the str representation of the object.
  • Columns with mixed types are translated to character. This does not apply to column cotaining np.nan, where the missing values are correctly translated.
  • R integers are 32 bit. Therefore python 64 bit integer have to be promoted to numeric in order to fit.
  • A pandas column containing only missing values is transformed to logical, following R's behavior.
  • librdata writes Numeric missing values as NaN instead of NA. In pandas we only have np.nan both as NaN and missing value representation, and it will always be written as NaN in R.

Known limitations

  • POSIXct and POSIXlt objects in R are stored internally as UTC timestamps and may have in addition time zone information. librdata does not return time zone information and thefore the display of the tiemstamps in R and in pandas may differ.
  • Librdata reads arrays with a maximum of 3 dimensions. If more dimensions are present you will get an error. Please submit an issue if this is the case.
  • Lists are not read.
  • S4 Objects and probably other kind of objects, including those that depend on non base R packages (Bioconductor for example) cannot be read. The error code in this case is as follows:
"pyreadr.custom_errors.LibrdataError: The file contains an unrecognized object"
  • Data frames with special values like arrays, matrices and other data frames are not supported.
  • librdata first de-compresses the file in memory and then extracts the data. That means you need more free RAM than the decompress file ocuppies in memory. RData and Rds files are highly compressed: they can occupy in memory easily 40 or even more times in memory as in disk. Take it into account in case you get a "Unable to allocate memory" error (see this )
  • When writing numeric missing values are translated to NaN instead of NA.
  • Writing rownames is currently not supported.
  • Writing is supported only for a single pandas data frame to a single R data frame. Other data types are not supported. Multiple data frames for rdata files are not supported.
  • RData and Rds files produced by R are (by default) compressed. Files produced by pyreadr are not compressed by default and therefore pretty bulky in comparison. You can pass the option compress="gzip" to write_rds or write_rda in order to activate gzip compression.
  • Pyreadr writing is a relative slow operation compared to doint it in R.
  • Cannot read RData or rds files in encodings other than utf-8.

Solutions to some of these limitations have been proposed in the upstream librdata issues (points 1-4 are addressed by issue 12, point 5 by issue 16 and point 7 by issue 17). However there is no guarantee that these changes will be made and there are no timelines either. If you think it would be nice if these issues are solved, please express your support in the librdata issues.

Contributing

Contributions are welcome! Please chech the document CONTRIBUTING.md for more details.

Change Log

A log with the changes for each version can be found here

Download Details:
Author: ofajardo
Source Code: https://github.com/ofajardo/pyreadr
License: View license

#pandas #python 

What is GEEK

Buddha Community

Pyreadr: A Python Package to Read and Write R RData & Pandas
Ray  Patel

Ray Patel

1619518440

top 30 Python Tips and Tricks for Beginners

Welcome to my Blog , In this article, you are going to learn the top 10 python tips and tricks.

1) swap two numbers.

2) Reversing a string in Python.

3) Create a single string from all the elements in list.

4) Chaining Of Comparison Operators.

5) Print The File Path Of Imported Modules.

6) Return Multiple Values From Functions.

7) Find The Most Frequent Value In A List.

8) Check The Memory Usage Of An Object.

#python #python hacks tricks #python learning tips #python programming tricks #python tips #python tips and tricks #python tips and tricks advanced #python tips and tricks for beginners #python tips tricks and techniques #python tutorial #tips and tricks in python #tips to learn python #top 30 python tips and tricks for beginners

CSharp REPL: A Command Line C# REPL with Syntax Highlighting

C# REPL

A cross-platform command line REPL for the rapid experimentation and exploration of C#. It supports intellisense, installing NuGet packages, and referencing local .NET projects and assemblies.

C# REPL screenshot 

(click to view animation)

C# REPL provides the following features:

  • Syntax highlighting via ANSI escape sequences
  • Intellisense with fly-out documentation
  • Nuget package installation
  • Reference local assemblies, solutions, and projects
  • Navigate to source via Source Link
  • IL disassembly (both Debug and Release mode)
  • Fast and flicker-free rendering. A "diff" algorithm is used to only render what's changed.

Installation

C# REPL is a .NET 6 global tool, and runs on Windows 10, Mac OS, and Linux. It can be installed via:

dotnet tool install -g csharprepl

If you're running on Mac OS Catalina (10.15) or later, make sure you follow any additional directions printed to the screen. You may need to update your PATH variable in order to use .NET global tools.

After installation is complete, run csharprepl to begin. C# REPL can be updated via dotnet tool update -g csharprepl.

Usage:

Run csharprepl from the command line to begin an interactive session. The default colorscheme uses the color palette defined by your terminal, but these colors can be changed using a theme.json file provided as a command line argument.

Evaluating Code

Type some C# into the prompt and press Enter to run it. The result, if any, will be printed:

> Console.WriteLine("Hello World")
Hello World

> DateTime.Now.AddDays(8)
[6/7/2021 5:13:00 PM]

To evaluate multiple lines of code, use Shift+Enter to insert a newline:

> var x = 5;
  var y = 8;
  x * y
40

Additionally, if the statement is not a "complete statement" a newline will automatically be inserted when Enter is pressed. For example, in the below code, the first line is not a syntactically complete statement, so when we press enter we'll go down to a new line:

> if (x == 5)
  | // caret position, after we press Enter on Line 1

Finally, pressing Ctrl+Enter will show a "detailed view" of the result. For example, for the DateTime.Now expression below, on the first line we pressed Enter, and on the second line we pressed Ctrl+Enter to view more detailed output:

> DateTime.Now // Pressing Enter shows a reasonable representation
[5/30/2021 5:13:00 PM]

> DateTime.Now // Pressing Ctrl+Enter shows a detailed representation
[5/30/2021 5:13:00 PM] {
  Date: [5/30/2021 12:00:00 AM],
  Day: 30,
  DayOfWeek: Sunday,
  DayOfYear: 150,
  Hour: 17,
  InternalKind: 9223372036854775808,
  InternalTicks: 637579915804530992,
  Kind: Local,
  Millisecond: 453,
  Minute: 13,
  Month: 5,
  Second: 0,
  Ticks: 637579915804530992,
  TimeOfDay: [17:13:00.4530992],
  Year: 2021,
  _dateData: 9860951952659306800
}

A note on semicolons: C# expressions do not require semicolons, but statements do. If a statement is missing a required semicolon, a newline will be added instead of trying to run the syntatically incomplete statement; simply type the semicolon to complete the statement.

> var now = DateTime.Now; // assignment statement, semicolon required

> DateTime.Now.AddDays(8) // expression, we don't need a semicolon
[6/7/2021 5:03:05 PM]

Keyboard Shortcuts

  • Basic Usage
    • Ctrl+C - Cancel current line
    • Ctrl+L - Clear screen
    • Enter - Evaluate the current line if it's a syntactically complete statement; otherwise add a newline
    • Control+Enter - Evaluate the current line, and return a more detailed representation of the result
    • Shift+Enter - Insert a new line (this does not currently work on Linux or Mac OS; Hopefully this will work in .NET 7)
    • Ctrl+Shift+C - Copy current line to clipboard
    • Ctrl+V, Shift+Insert, and Ctrl+Shift+V - Paste text to prompt. Automatically trims leading indent
  • Code Actions
    • F1 - Opens the MSDN documentation for the class/method under the caret (example)
    • F9 - Shows the IL (intermediate language) for the current statement in Debug mode.
    • Ctrl+F9 - Shows the IL for the current statement with Release mode optimizations.
    • F12 - Opens the source code in the browser for the class/method under the caret, if the assembly supports Source Link.
  • Autocompletion
    • Ctrl+Space - Open autocomplete menu. If there's a single option, pressing Ctrl+Space again will select the option
    • Enter, Right Arrow, Tab - Select active autocompletion option
    • Escape - closes autocomplete menu
  • Text Navigation
    • Home and End - Navigate to beginning of a single line and end of a single line, respectively
    • Ctrl+Home and Ctrl+End - Navigate to beginning of line and end across multiple lines in a multiline prompt, respectively
    • Arrows - Navigate characters within text
    • Ctrl+Arrows - Navigate words within text
    • Ctrl+Backspace - Delete previous word
    • Ctrl+Delete - Delete next word

Adding References

Use the #r command to add assembly or nuget references.

  • For assembly references, run #r "AssemblyName" or #r "path/to/assembly.dll"
  • For project references, run #r "path/to/project.csproj". Solution files (.sln) can also be referenced.
  • For nuget references, run #r "nuget: PackageName" to install the latest version of a package, or #r "nuget: PackageName, 13.0.5" to install a specific version (13.0.5 in this case).

Installing nuget packages

To run ASP.NET applications inside the REPL, start the csharprepl application with the --framework parameter, specifying the Microsoft.AspNetCore.App shared framework. Then, use the above #r command to reference the application DLL. See the Command Line Configuration section below for more details.

csharprepl --framework  Microsoft.AspNetCore.App

Command Line Configuration

The C# REPL supports multiple configuration flags to control startup, behavior, and appearance:

csharprepl [OPTIONS] [response-file.rsp] [script-file.csx] [-- <additional-arguments>]

Supported options are:

  • OPTIONS:
    • -r <dll> or --reference <dll>: Reference an assembly, project file, or nuget package. Can be specified multiple times. Uses the same syntax as #r statements inside the REPL. For example, csharprepl -r "nuget:Newtonsoft.Json" "path/to/myproj.csproj"
      • When an assembly or project is referenced, assemblies in the containing directory will be added to the assembly search path. This means that you don't need to manually add references to all of your assembly's dependencies (e.g. other references and nuget packages). Referencing the main entry assembly is enough.
    • -u <namespace> or --using <namespace>: Add a using statement. Can be specified multiple times.
    • -f <framework> or --framework <framework>: Reference a shared framework. The available shared frameworks depends on the local .NET installation, and can be useful when running an ASP.NET application from the REPL. Example frameworks are:
      • Microsoft.NETCore.App (default)
      • Microsoft.AspNetCore.All
      • Microsoft.AspNetCore.App
      • Microsoft.WindowsDesktop.App
    • -t <theme.json> or --theme <theme.json>: Read a theme file for syntax highlighting. This theme file associates C# syntax classifications with colors. The color values can be full RGB, or ANSI color names (defined in your terminal's theme). The NO_COLOR standard is supported.
    • --trace: Produce a trace file in the current directory that logs CSharpRepl internals. Useful for CSharpRepl bug reports.
    • -v or --version: Show version number and exit.
    • -h or --help: Show help and exit.
  • response-file.rsp: A filepath of an .rsp file, containing any of the above command line options.
  • script-file.csx: A filepath of a .csx file, containing lines of C# to evaluate before starting the REPL. Arguments to this script can be passed as <additional-arguments>, after a double hyphen (--), and will be available in a global args variable.

If you have dotnet-suggest enabled, all options can be tab-completed, including values provided to --framework and .NET namespaces provided to --using.

Integrating with other software

C# REPL is a standalone software application, but it can be useful to integrate it with other developer tools:

Windows Terminal

To add C# REPL as a menu entry in Windows Terminal, add the following profile to Windows Terminal's settings.json configuration file (under the JSON property profiles.list):

{
    "name": "C# REPL",
    "commandline": "csharprepl"
},

To get the exact colors shown in the screenshots in this README, install the Windows Terminal Dracula theme.

Visual Studio Code

To use the C# REPL with Visual Studio Code, simply run the csharprepl command in the Visual Studio Code terminal. To send commands to the REPL, use the built-in Terminal: Run Selected Text In Active Terminal command from the Command Palette (workbench.action.terminal.runSelectedText).

Visual Studio Code screenshot

Windows OS

To add the C# REPL to the Windows Start Menu for quick access, you can run the following PowerShell command, which will start C# REPL in Windows Terminal:

$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut("$env:appdata\Microsoft\Windows\Start Menu\Programs\csharprepl.lnk")
$shortcut.TargetPath = "wt.exe"
$shortcut.Arguments = "-w 0 nt csharprepl.exe"
$shortcut.Save()

You may also wish to add a shorter alias for C# REPL, which can be done by creating a .cmd file somewhere on your path. For example, put the following contents in C:\Users\username\.dotnet\tools\csr.cmd:

wt -w 0 nt csharprepl

This will allow you to launch C# REPL by running csr from anywhere that accepts Windows commands, like the Window Run dialog.

Comparison with other REPLs

This project is far from being the first REPL for C#. Here are some other projects; if this project doesn't suit you, another one might!

Visual Studio's C# Interactive pane is full-featured (it has syntax highlighting and intellisense) and is part of Visual Studio. This deep integration with Visual Studio is both a benefit from a workflow perspective, and a drawback as it's not cross-platform. As far as I know, the C# Interactive pane does not support NuGet packages or navigating to documentation/source code. Subjectively, it does not follow typical command line keybindings, so can feel a bit foreign.

csi.exe ships with C# and is a command line REPL. It's great because it's a cross platform REPL that comes out of the box, but it doesn't support syntax highlighting or autocompletion.

dotnet script allows you to run C# scripts from the command line. It has a REPL built-in, but the predominant focus seems to be as a script runner. It's a great tool, though, and has a strong community following.

dotnet interactive is a tool from Microsoft that creates a Jupyter notebook for C#, runnable through Visual Studio Code. It also provides a general framework useful for running REPLs.

Download Details:
Author: waf
Source Code: https://github.com/waf/CSharpRepl
License: MPL-2.0 License

#dotnet  #aspdotnet  #csharp 

Ray  Patel

Ray Patel

1619510796

Lambda, Map, Filter functions in python

Welcome to my Blog, In this article, we will learn python lambda function, Map function, and filter function.

Lambda function in python: Lambda is a one line anonymous function and lambda takes any number of arguments but can only have one expression and python lambda syntax is

Syntax: x = lambda arguments : expression

Now i will show you some python lambda function examples:

#python #anonymous function python #filter function in python #lambda #lambda python 3 #map python #python filter #python filter lambda #python lambda #python lambda examples #python map

Kennith  Blick

Kennith Blick

1625768100

Reading and Writing to Files in Python - Intermediate Python Tutorial #2

In this Python tutorial, we will learn how to deal with text files in Python using the built-in open function. You will understand how to use the most important modes: read, write and append.
That’s not all! We will discuss about file parsing and touch important string methods used for that such as: strip( ) and split( ). Finally we wrap up with a parsing file exercise to practice the new concepts. After that video you will be confident to deal with text files which is a very important skill to have as a programmer.

Playlist: Intermediate Python Tutorials | Video #2
Access the codes here: https://github.com/rscorrea1/youtube.git

Timestamp:
00:00 - Summary of the video
00:17 - Types of files
00:43 - How to open a file
01:17 - File modes
02:15 - How to read data from a file
03:00 - with statement
04:10 - readlines( ) method
05:05 - String: strip( ) method
06:22 - How to iterate over a file line by lin
08:47 - How to write data to a file
11:43 - How to append data to a file
12:37 - Exercise: Parsing a text file
16:14 - Converting data types
17:00 - Next video announcement

Thumbnail:
Photo by Mario Ho on Unsplash

#reading #writing #python #intermediate python tutorial #reading and writing to files in python

Ray  Patel

Ray Patel

1619571780

Top 20 Most Useful Python Modules or Packages

 March 25, 2021  Deepak@321  0 Comments

Welcome to my blog, In this article, we will learn the top 20 most useful python modules or packages and these modules every Python developer should know.

Hello everybody and welcome back so in this article I’m going to be sharing with you 20 Python modules you need to know. Now I’ve split these python modules into four different categories to make little bit easier for us and the categories are:

  1. Web Development
  2. Data Science
  3. Machine Learning
  4. AI and graphical user interfaces.

Near the end of the article, I also share my personal favorite Python module so make sure you stay tuned to see what that is also make sure to share with me in the comments down below your favorite Python module.

#python #packages or libraries #python 20 modules #python 20 most usefull modules #python intersting modules #top 20 python libraries #top 20 python modules #top 20 python packages