11 Tips on Creating a Great User Onboarding Experience

User onboarding is important and often ignored part of product strategy. Done right, it can help your product succeed, increase user activation, retention and engagement. Done wrong, it can cause a lot of damage. Use these 11 tips and create user onboarding experience that will improve your product and delight your users.

Why is user onboarding important?

First things first. Why is user onboarding important? Why should you care about it? The main reason is that good user onboarding process can help you activate your new users. It doesn’t matter if your product is easy, with only few actions to do. Or, if it is complex applications where new user can quickly get lost.

User onboarding helps with user activation

Think about it from the view of your users. They just started using your product. They do so because they believe it can help them solve a problem they have. Or, it can provide them with some value. It can fulfill one of their desires. This is why they chose your product instead of any other alternatives.

Now, it is your primary goal to quickly prove to them that they make the right decision. Meaning, you have to show them the shortest path to find what they are looking for. You could let this one their own, hoping they will somehow figure out everything by themselves. Unfortunately, this often doesn’t work.

Take yourself as an example. How many times did you register in some app and used it only once or twice, or not at all? This is the problem we are talking about. Registration is not the same as activation. It is not enough to get the user to register. The only thing that matters is activation. The user has to start using the product.

User onboarding is one tool that can help you with this. User onboarding helps you nudge the user to start using your product. This is especially true if user onboarding follows right after registration. Why wait until she activates her account. You need to nudge her to use your product as soon as possible. Otherwise, you may lose her forever.

#design development #ux psychology

What is GEEK

Buddha Community

11 Tips on Creating a Great User Onboarding Experience
Kaia  Schmitt

Kaia Schmitt

1659817260

SDK for Connecting to AWS IoT From A Device using Embedded C

AWS IoT Device SDK for Embedded C

Overview

The AWS IoT Device SDK for Embedded C (C-SDK) is a collection of C source files under the MIT open source license that can be used in embedded applications to securely connect IoT devices to AWS IoT Core. It contains MQTT client, HTTP client, JSON Parser, AWS IoT Device Shadow, AWS IoT Jobs, and AWS IoT Device Defender libraries. This SDK is distributed in source form, and can be built into customer firmware along with application code, other libraries and an operating system (OS) of your choice. These libraries are only dependent on standard C libraries, so they can be ported to various OS's - from embedded Real Time Operating Systems (RTOS) to Linux/Mac/Windows. You can find sample usage of C-SDK libraries on POSIX systems using OpenSSL (e.g. Linux demos in this repository), and on FreeRTOS using mbedTLS (e.g. FreeRTOS demos in FreeRTOS repository).

For the latest release of C-SDK, please see the section for Releases and Documentation.

C-SDK includes libraries that are part of the FreeRTOS 202012.01 LTS release. Learn more about the FreeRTOS 202012.01 LTS libraries by clicking here.

License

The C-SDK libraries are licensed under the MIT open source license.

Features

C-SDK simplifies access to various AWS IoT services. C-SDK has been tested to work with AWS IoT Core and an open source MQTT broker to ensure interoperability. The AWS IoT Device Shadow, AWS IoT Jobs, and AWS IoT Device Defender libraries are flexible to work with any MQTT client and JSON parser. The MQTT client and JSON parser libraries are offered as choices without being tightly coupled with the rest of the SDK. C-SDK contains the following libraries:

coreMQTT

The coreMQTT library provides the ability to establish an MQTT connection with a broker over a customer-implemented transport layer, which can either be a secure channel like a TLS session (mutually authenticated or server-only authentication) or a non-secure channel like a plaintext TCP connection. This MQTT connection can be used for performing publish operations to MQTT topics and subscribing to MQTT topics. The library provides a mechanism to register customer-defined callbacks for receiving incoming PUBLISH, acknowledgement and keep-alive response events from the broker. The library has been refactored for memory optimization and is compliant with the MQTT 3.1.1 standard. It has no dependencies on any additional libraries other than the standard C library, a customer-implemented network transport interface, and optionally a customer-implemented platform time function. The refactored design embraces different use-cases, ranging from resource-constrained platforms using only QoS 0 MQTT PUBLISH messages to resource-rich platforms using QoS 2 MQTT PUBLISH over TLS connections.

See memory requirements for the latest release here.

coreHTTP

The coreHTTP library provides the ability to establish an HTTP connection with a server over a customer-implemented transport layer, which can either be a secure channel like a TLS session (mutually authenticated or server-only authentication) or a non-secure channel like a plaintext TCP connection. The HTTP connection can be used to make "GET" (include range requests), "PUT", "POST" and "HEAD" requests. The library provides a mechanism to register a customer-defined callback for receiving parsed header fields in an HTTP response. The library has been refactored for memory optimization, and is a client implementation of a subset of the HTTP/1.1 standard.

See memory requirements for the latest release here.

coreJSON

The coreJSON library is a JSON parser that strictly enforces the ECMA-404 JSON standard. It provides a function to validate a JSON document, and a function to search for a key and return its value. A search can descend into nested structures using a compound query key. A JSON document validation also checks for illegal UTF8 encodings and illegal Unicode escape sequences.

See memory requirements for the latest release here.

corePKCS11

The corePKCS11 library is an implementation of the PKCS #11 interface (API) that makes it easier to develop applications that rely on cryptographic operations. Only a subset of the PKCS #11 v2.4 standard has been implemented, with a focus on operations involving asymmetric keys, random number generation, and hashing.

The Cryptoki or PKCS #11 standard defines a platform-independent API to manage and use cryptographic tokens. The name, "PKCS #11", is used interchangeably to refer to the API itself and the standard which defines it.

The PKCS #11 API is useful for writing software without taking a dependency on any particular implementation or hardware. By writing against the PKCS #11 standard interface, code can be used interchangeably with multiple algorithms, implementations and hardware.

Generally vendors for secure cryptoprocessors such as Trusted Platform Module (TPM), Hardware Security Module (HSM), Secure Element, or any other type of secure hardware enclave, distribute a PKCS #11 implementation with the hardware. The purpose of corePKCS11 mock is therefore to provide a PKCS #11 implementation that allows for rapid prototyping and development before switching to a cryptoprocessor specific PKCS #11 implementation in production devices.

Since the PKCS #11 interface is defined as part of the PKCS #11 specification replacing corePKCS11 with another implementation should require little porting effort, as the interface will not change. The system tests distributed in corePKCS11 repository can be leveraged to verify the behavior of a different implementation is similar to corePKCS11.

See memory requirements for the latest release here.

AWS IoT Device Shadow

The AWS IoT Device Shadow library enables you to store and retrieve the current state one or more shadows of every registered device. A device’s shadow is a persistent, virtual representation of your device that you can interact with from AWS IoT Core even if the device is offline. The device state is captured in its "shadow" is represented as a JSON document. The device can send commands over MQTT to get, update and delete its latest state as well as receive notifications over MQTT about changes in its state. The device’s shadow(s) are uniquely identified by the name of the corresponding "thing", a representation of a specific device or logical entity on the AWS Cloud. See Managing Devices with AWS IoT for more information on IoT "thing". This library supports named shadows, a feature of the AWS IoT Device Shadow service that allows you to create multiple shadows for a single IoT device. More details about AWS IoT Device Shadow can be found in AWS IoT documentation.

The AWS IoT Device Shadow library has no dependencies on additional libraries other than the standard C library. It also doesn’t have any platform dependencies, such as threading or synchronization. It can be used with any MQTT library and any JSON library (see demos with coreMQTT and coreJSON).

See memory requirements for the latest release here.

AWS IoT Jobs

The AWS IoT Jobs library enables you to interact with the AWS IoT Jobs service which notifies one or more connected devices of a pending “Job”. A Job can be used to manage your fleet of devices, update firmware and security certificates on your devices, or perform administrative tasks such as restarting devices and performing diagnostics. For documentation of the service, please see the AWS IoT Developer Guide. Interactions with the Jobs service use the MQTT protocol. This library provides an API to compose and recognize the MQTT topic strings used by the Jobs service.

The AWS IoT Jobs library has no dependencies on additional libraries other than the standard C library. It also doesn’t have any platform dependencies, such as threading or synchronization. It can be used with any MQTT library and any JSON library (see demos with libmosquitto and coreJSON).

See memory requirements for the latest release here.

AWS IoT Device Defender

The AWS IoT Device Defender library enables you to interact with the AWS IoT Device Defender service to continuously monitor security metrics from devices for deviations from what you have defined as appropriate behavior for each device. If something doesn’t look right, AWS IoT Device Defender sends out an alert so you can take action to remediate the issue. More details about Device Defender can be found in AWS IoT Device Defender documentation. This library supports custom metrics, a feature that helps you monitor operational health metrics that are unique to your fleet or use case. For example, you can define a new metric to monitor the memory usage or CPU usage on your devices.

The AWS IoT Device Defender library has no dependencies on additional libraries other than the standard C library. It also doesn’t have any platform dependencies, such as threading or synchronization. It can be used with any MQTT library and any JSON library (see demos with coreMQTT and coreJSON).

See memory requirements for the latest release here.

AWS IoT Over-the-air Update

The AWS IoT Over-the-air Update (OTA) library enables you to manage the notification of a newly available update, download the update, and perform cryptographic verification of the firmware update. Using the OTA library, you can logically separate firmware updates from the application running on your devices. You can also use the library to send other files (e.g. images, certificates) to one or more devices registered with AWS IoT. More details about OTA library can be found in AWS IoT Over-the-air Update documentation.

The AWS IoT Over-the-air Update library has a dependency on coreJSON for parsing of JSON job document and tinyCBOR for decoding encoded data streams, other than the standard C library. It can be used with any MQTT library, HTTP library, and operating system (e.g. Linux, FreeRTOS) (see demos with coreMQTT and coreHTTP over Linux).

See memory requirements for the latest release here.

AWS IoT Fleet Provisioning

The AWS IoT Fleet Provisioning library enables you to interact with the AWS IoT Fleet Provisioning MQTT APIs in order to provison IoT devices without preexisting device certificates. With AWS IoT Fleet Provisioning, devices can securely receive unique device certificates from AWS IoT when they connect for the first time. For an overview of all provisioning options offered by AWS IoT, see device provisioning documentation. For details about Fleet Provisioning, refer to the AWS IoT Fleet Provisioning documentation.

See memory requirements for the latest release here.

AWS SigV4

The AWS SigV4 library enables you to sign HTTP requests with Signature Version 4 Signing Process. Signature Version 4 (SigV4) is the process to add authentication information to HTTP requests to AWS services. For security, most requests to AWS must be signed with an access key. The access key consists of an access key ID and secret access key.

See memory requirements for the latest release here.

backoffAlgorithm

The backoffAlgorithm library is a utility library to calculate backoff period using an exponential backoff with jitter algorithm for retrying network operations (like failed network connection with server). This library uses the "Full Jitter" strategy for the exponential backoff with jitter algorithm. More information about the algorithm can be seen in the Exponential Backoff and Jitter AWS blog.

Exponential backoff with jitter is typically used when retrying a failed connection or network request to the server. An exponential backoff with jitter helps to mitigate the failed network operations with servers, that are caused due to network congestion or high load on the server, by spreading out retry requests across multiple devices attempting network operations. Besides, in an environment with poor connectivity, a client can get disconnected at any time. A backoff strategy helps the client to conserve battery by not repeatedly attempting reconnections when they are unlikely to succeed.

The backoffAlgorithm library has no dependencies on libraries other than the standard C library.

See memory requirements for the latest release here.

Sending metrics to AWS IoT

When establishing a connection with AWS IoT, users can optionally report the Operating System, Hardware Platform and MQTT client version information of their device to AWS. This information can help AWS IoT provide faster issue resolution and technical support. If users want to report this information, they can send a specially formatted string (see below) in the username field of the MQTT CONNECT packet.

Format

The format of the username string with metrics is:

<Actual_Username>?SDK=<OS_Name>&Version=<OS_Version>&Platform=<Hardware_Platform>&MQTTLib=<MQTT_Library_name>@<MQTT_Library_version>

Where

  • is the actual username used for authentication, if username and password are used for authentication. When username and password based authentication is not used, this is an empty value.
  • is the Operating System the application is running on (e.g. Ubuntu)
  • is the version number of the Operating System (e.g. 20.10)
  • is the Hardware Platform the application is running on (e.g. RaspberryPi)
  • is the MQTT Client library being used (e.g. coreMQTT)
  • is the version of the MQTT Client library being used (e.g. 1.1.0)

Example

  • Actual_Username = “iotuser”, OS_Name = Ubuntu, OS_Version = 20.10, Hardware_Platform_Name = RaspberryPi, MQTT_Library_Name = coremqtt, MQTT_Library_version = 1.1.0. If username is not used, then “iotuser” can be removed.
/* Username string:
 * iotuser?SDK=Ubuntu&Version=20.10&Platform=RaspberryPi&MQTTLib=coremqtt@1.1.0
 */

#define OS_NAME                   "Ubuntu"
#define OS_VERSION                "20.10"
#define HARDWARE_PLATFORM_NAME    "RaspberryPi"
#define MQTT_LIB                  "coremqtt@1.1.0"

#define USERNAME_STRING           "iotuser?SDK=" OS_NAME "&Version=" OS_VERSION "&Platform=" HARDWARE_PLATFORM_NAME "&MQTTLib=" MQTT_LIB
#define USERNAME_STRING_LENGTH    ( ( uint16_t ) ( sizeof( USERNAME_STRING ) - 1 ) )

MQTTConnectInfo_t connectInfo;
connectInfo.pUserName = USERNAME_STRING;
connectInfo.userNameLength = USERNAME_STRING_LENGTH;
mqttStatus = MQTT_Connect( pMqttContext, &connectInfo, NULL, CONNACK_RECV_TIMEOUT_MS, pSessionPresent );

Versioning

C-SDK releases will now follow a date based versioning scheme with the format YYYYMM.NN, where:

  • Y represents the year.
  • M represents the month.
  • N represents the release order within the designated month (00 being the first release).

For example, a second release in June 2021 would be 202106.01. Although the SDK releases have moved to date-based versioning, each library within the SDK will still retain semantic versioning. In semantic versioning, the version number itself (X.Y.Z) indicates whether the release is a major, minor, or point release. You can use the semantic version of a library to assess the scope and impact of a new release on your application.

Releases and Documentation

All of the released versions of the C-SDK libraries are available as git tags. For example, the last release of the v3 SDK version is available at tag 3.1.2.

202108.00

API documentation of 202108.00 release

This release introduces the refactored AWS IoT Fleet Provisioning library and the new AWS SigV4 library.

Additionally, this release brings minor version updates in the AWS IoT Over-the-Air Update and corePKCS11 libraries.

202103.00

API documentation of 202103.00 release

This release includes a major update to the APIs of the AWS IoT Over-the-air Update library.

Additionally, AWS IoT Device Shadow library introduces a minor update by adding support for named shadow, a feature of the AWS IoT Device Shadow service that allows you to create multiple shadows for a single IoT device. AWS IoT Jobs library introduces a minor update by introducing macros for $next job ID and compile-time generation of topic strings. AWS IoT Device Defender library introduces a minor update that adds macros to API for custom metrics feature of AWS IoT Device Defender service.

corePKCS11 also introduces a patch update by removing the pkcs11configPAL_DESTROY_SUPPORTED config and mbedTLS platform abstraction layer of DestroyObject. Lastly, no code changes are introduced for backoffAlgorithm, coreHTTP, coreMQTT, and coreJSON; however, patch updates are made to improve documentation and CI.

202012.01

API documentation of 202012.01 release

This release includes AWS IoT Over-the-air Update(Release Candidate), backoffAlgorithm, and PKCS #11 libraries. Additionally, there is a major update to the coreJSON and coreHTTP APIs. All libraries continue to undergo code quality checks (e.g. MISRA-C compliance), and Coverity static analysis. In addition, all libraries except AWS IoT Over-the-air Update and backoffAlgorithm undergo validation of memory safety with the C Bounded Model Checker (CBMC) automated reasoning tool.

202011.00

API documentation of 202011.00 release

This release includes refactored HTTP client, AWS IoT Device Defender, and AWS IoT Jobs libraries. Additionally, there is a major update to the coreJSON API. All libraries continue to undergo code quality checks (e.g. MISRA-C compliance), Coverity static analysis, and validation of memory safety with the C Bounded Model Checker (CBMC) automated reasoning tool.

202009.00

API documentation of 202009.00 release

This release includes refactored MQTT, JSON Parser, and AWS IoT Device Shadow libraries for optimized memory usage and modularity. These libraries are included in the SDK via Git submoduling. These libraries have gone through code quality checks including verification that no function has a GNU Complexity score over 8, and checks against deviations from mandatory rules in the MISRA coding standard. Deviations from the MISRA C:2012 guidelines are documented under MISRA Deviations. These libraries have also undergone both static code analysis from Coverity static analysis, and validation of memory safety and data structure invariance through the CBMC automated reasoning tool.

If you are upgrading from v3.x API of the C-SDK to the 202009.00 release, please refer to Migration guide from v3.1.2 to 202009.00 and newer releases. If you are using the C-SDK v4_beta_deprecated branch, note that we will continue to maintain this branch for critical bug fixes and security patches but will not add new features to it. See the C-SDK v4_beta_deprecated branch README for additional details.

v3.1.2

Details available here.

Porting Guide for 202009.00 and newer releases

All libraries depend on the ISO C90 standard library and additionally on the stdint.h library for fixed-width integers, including uint8_t, int8_t, uint16_t, uint32_t and int32_t, and constant macros like UINT16_MAX. If your platform does not support the stdint.h library, definitions of the mentioned fixed-width integer types will be required for porting any C-SDK library to your platform.

Porting coreMQTT

Guide for porting coreMQTT library to your platform is available here.

Porting coreHTTP

Guide for porting coreHTTP library is available here.

Porting AWS IoT Device Shadow

Guide for porting AWS IoT Device Shadow library is available here.

Porting AWS IoT Device Defender

Guide for porting AWS IoT Device Defender library is available here.

Porting AWS IoT Over-the-air Update

Guide for porting OTA library to your platform is available here.

Migration guide from v3.1.2 to 202009.00 and newer releases

MQTT Migration

Migration guide for MQTT library is available here.

Shadow Migration

Migration guide for Shadow library is available here.

Jobs Migration

Migration guide for Jobs library is available here.

Branches

main branch

The main branch hosts the continuous development of the AWS IoT Embedded C SDK (C-SDK) libraries. Please be aware that the development at the tip of the main branch is continuously in progress, and may have bugs. Consider using the tagged releases of the C-SDK for production ready software.

v4_beta_deprecated branch (formerly named v4_beta)

The v4_beta_deprecated branch contains a beta version of the C-SDK libraries, which is now deprecated. This branch was earlier named as v4_beta, and was renamed to v4_beta_deprecated. The libraries in this branch will not be released. However, critical bugs will be fixed and tested. No new features will be added to this branch.

Getting Started

Cloning

This repository uses Git Submodules to bring in the C-SDK libraries (eg, MQTT ) and third-party dependencies (eg, mbedtls for POSIX platform transport layer). Note: If you download the ZIP file provided by GitHub UI, you will not get the contents of the submodules (The ZIP file is also not a valid git repository). If you download from the 202012.00 Release Page page, you will get the entire repository (including the submodules) in the ZIP file, aws-iot-device-sdk-embedded-c-202012.00.zip. To clone the latest commit to main branch using HTTPS:

git clone --recurse-submodules https://github.com/aws/aws-iot-device-sdk-embedded-C.git

Using SSH:

git clone --recurse-submodules git@github.com:aws/aws-iot-device-sdk-embedded-C.git

If you have downloaded the repo without using the --recurse-submodules argument, you need to run:

git submodule update --init --recursive

When building with CMake, submodules are also recursively cloned automatically. However, -DBUILD_CLONE_SUBMODULES=0 can be passed as a CMake flag to disable this functionality. This is useful when you'd like to build CMake while using a different commit from a submodule.

Configuring Demos

The libraries in this SDK are not dependent on any operating system. However, the demos for the libraries in this SDK are built and tested on a Linux platform. The demos build with CMake, a cross-platform build tool.

Prerequisites

  • CMake 3.2.0 or any newer version for utilizing the build system of the repository.
  • C90 compiler such as gcc
    • Due to the use of mbedtls in corePKCS11, a C99 compiler is required if building the PKCS11 demos or the CMake install target.
  • Although not a part of the ISO C90 standard, stdint.h is required for fixed-width integer types that include uint8_t, int8_t, uint16_t, uint32_t and int32_t, and constant macros like UINT16_MAX, while stdbool.h is required for boolean parameters in coreMQTT. For compilers that do not provide these header files, coreMQTT provides the files stdint.readme and stdbool.readme, which can be renamed to stdint.h and stdbool.h, respectively, to provide the required type definitions.
  • A supported operating system. The ports provided with this repo are expected to work with all recent versions of the following operating systems, although we cannot guarantee the behavior on all systems.
    • Linux system with POSIX sockets, threads, RT, and timer APIs. (We have tested on Ubuntu 18.04).

Build Dependencies

The follow table shows libraries that need to be installed in your system to run certain demos. If a dependency is not installed and cannot be built from source, demos that require that dependency will be excluded from the default all target.

DependencyVersionUsage
OpenSSL1.1.0 or laterAll TLS demos and tests with the exception of PKCS11
Mosquitto Client1.4.10 or laterAWS IoT Jobs Mosquitto demo

AWS IoT Account Setup

You need to setup an AWS account and access the AWS IoT console for running the AWS IoT Device Shadow library, AWS IoT Device Defender library, AWS IoT Jobs library, AWS IoT OTA library and coreHTTP S3 download demos. Also, the AWS account can be used for running the MQTT mutual auth demo against AWS IoT broker. Note that running the AWS IoT Device Defender, AWS IoT Jobs and AWS IoT Device Shadow library demos require the setup of a Thing resource for the device running the demo. Follow the links to:

The MQTT Mutual Authentication and AWS IoT Shadow demos include example AWS IoT policy documents to run each respective demo with AWS IoT. You may use the MQTT Mutual auth and Shadow example policies by replacing [AWS_REGION] and [AWS_ACCOUNT_ID] with the strings of your region and account identifier. While the IoT Thing name and MQTT client identifier do not need to match for the demos to run, the example policies have the Thing name and client identifier identical as per AWS IoT best practices.

It can be very helpful to also have the AWS Command Line Interface tooling installed.

Configuring mutual authentication demos of MQTT and HTTP

You can pass the following configuration settings as command line options in order to run the mutual auth demos. Make sure to run the following command in the root directory of the C-SDK:

## optionally find your-aws-iot-endpoint from the command line
aws iot describe-endpoint --endpoint-type iot:Data-ATS
cmake -S . -Bbuild
-DAWS_IOT_ENDPOINT="<your-aws-iot-endpoint>" -DCLIENT_CERT_PATH="<your-client-certificate-path>" -DCLIENT_PRIVATE_KEY_PATH="<your-client-private-key-path>" 

In order to set these configurations manually, edit demo_config.h in demos/mqtt/mqtt_demo_mutual_auth/ and demos/http/http_demo_mutual_auth/ to #define the following:

  • Set AWS_IOT_ENDPOINT to your custom endpoint. This is found on the Settings page of the AWS IoT Console and has a format of ABCDEFG1234567.iot.<aws-region>.amazonaws.com where <aws-region> can be an AWS region like us-east-2.
    • Optionally, it can also be found with the AWS CLI command aws iot describe-endpoint --endpoint-type iot:Data-ATS.
  • Set CLIENT_CERT_PATH to the path of the client certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
  • Set CLIENT_PRIVATE_KEY_PATH to the path of the private key downloaded when setting up the device certificate in AWS IoT Account Setup.

It is possible to configure ROOT_CA_CERT_PATH to any PEM-encoded Root CA Certificate. However, this is optional because CMake will download and set it to AmazonRootCA1.pem when unspecified.

Configuring AWS IoT Device Defender and AWS IoT Device Shadow demos

To build the AWS IoT Device Defender and AWS IoT Device Shadow demos, you can pass the following configuration settings as command line options. Make sure to run the following command in the root directory of the C-SDK:

cmake -S . -Bbuild -DAWS_IOT_ENDPOINT="<your-aws-iot-endpoint>" -DROOT_CA_CERT_PATH="<your-path-to-amazon-root-ca>" -DCLIENT_CERT_PATH="<your-client-certificate-path>" -DCLIENT_PRIVATE_KEY_PATH="<your-client-private-key-path>" -DTHING_NAME="<your-registered-thing-name>"

An Amazon Root CA certificate can be downloaded from here.

In order to set these configurations manually, edit demo_config.h in the demo folder to #define the following:

  • Set AWS_IOT_ENDPOINT to your custom endpoint. This is found on the Settings page of the AWS IoT Console and has a format of ABCDEFG1234567.iot.us-east-2.amazonaws.com.
  • Set ROOT_CA_CERT_PATH to the path of the root CA certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
  • Set CLIENT_CERT_PATH to the path of the client certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
  • Set CLIENT_PRIVATE_KEY_PATH to the path of the private key downloaded when setting up the device certificate in AWS IoT Account Setup.
  • Set THING_NAME to the name of the Thing created in AWS IoT Account Setup.

Configuring the AWS IoT Fleet Provisioning demo

To build the AWS IoT Fleet Provisioning Demo, you can pass the following configuration settings as command line options. Make sure to run the following command in the root directory of the C-SDK:

cmake -S . -Bbuild -DAWS_IOT_ENDPOINT="<your-aws-iot-endpoint>" -DROOT_CA_CERT_PATH="<your-path-to-amazon-root-ca>" -DCLAIM_CERT_PATH="<your-claim-certificate-path>" -DCLAIM_PRIVATE_KEY_PATH="<your-claim-private-key-path>" -DPROVISIONING_TEMPLATE_NAME="<your-template-name>" -DDEVICE_SERIAL_NUMBER="<your-serial-number>"

An Amazon Root CA certificate can be downloaded from here.

To create a provisioning template and claim credentials, sign into your AWS account and visit here. Make sure to enable the "Use the AWS IoT registry to manage your device fleet" option. Once you have created the template and credentials, modify the claim certificate's policy to match the sample policy.

In order to set these configurations manually, edit demo_config.h in the demo folder to #define the following:

  • Set AWS_IOT_ENDPOINT to your custom endpoint. This is found on the Settings page of the AWS IoT Console and has a format of ABCDEFG1234567.iot.us-east-2.amazonaws.com.
  • Set ROOT_CA_CERT_PATH to the path of the root CA certificate downloaded when setting up the device certificate in AWS IoT Account Setup.
  • Set CLAIM_CERT_PATH to the path of the claim certificate downloaded when setting up the template and claim credentials.
  • Set CLAIM_PRIVATE_KEY_PATH to the path of the private key downloaded when setting up the template and claim credentials.
  • Set PROVISIONING_TEMPLATE_NAME to the name of the provisioning template created.
  • Set DEVICE_SERIAL_NUMBER to an arbitrary string representing a device identifier.

Configuring the S3 demos

You can pass the following configuration settings as command line options in order to run the S3 demos. Make sure to run the following command in the root directory of the C-SDK:

cmake -S . -Bbuild -DS3_PRESIGNED_GET_URL="s3-get-url" -DS3_PRESIGNED_PUT_URL="s3-put-url"

S3_PRESIGNED_PUT_URL is only needed for the S3 upload demo.

In order to set these configurations manually, edit demo_config.h in demos/http/http_demo_s3_download_multithreaded, and demos/http/http_demo_s3_upload to #define the following:

  • Set S3_PRESIGNED_GET_URL to a S3 presigned URL with GET access.
  • Set S3_PRESIGNED_PUT_URL to a S3 presigned URL with PUT access.

You can generate the presigned urls using demos/http/common/src/presigned_urls_gen.py. More info can be found here.

Configure S3 Download HTTP Demo using SigV4 Library:

Refer this demos/http/http_demo_s3_download/README.md to follow the steps needed to configure and run the S3 Download HTTP Demo using SigV4 Library that generates the authorization HTTP header needed to authenticate the HTTP requests send to S3.

Setup for AWS IoT Jobs demo

  1. The demo requires the Linux platform to contain curl and libmosquitto. On a Debian platform, these dependencies can be installed with:
    apt install curl libmosquitto-dev

If the platform does not contain the libmosquitto library, the demo will build the library from source.

libmosquitto 1.4.10 or any later version of the first major release is required to run this demo.

  1. A job that specifies the URL to download for the demo needs to be created on the AWS account for the Thing resource that will be used by the demo.
    The job can be created directly from the AWS IoT console or using the aws cli tool.

The following creates a job that specifies a Linux Kernel link for downloading.

 aws iot create-job \
        --job-id 'job_1' \
        --targets arn:aws:iot:us-west-2:<account-id>:thing/<thing-name> \
        --document '{"url":"https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.8.5.tar.xz"}'

Prerequisites for the AWS Over-The-Air Update (OTA) demos

  1. To perform a successful OTA update, you need to complete the prerequisites mentioned here.
  2. A code signing certificate is required to authenticate the update. A code signing certificate based on the SHA-256 ECDSA algorithm will work with the current demos. An example of how to generate this kind of certificate can be found here.

Scheduling an OTA Update Job

After you build and run the initial executable you will have to create another executable and schedule an OTA update job with this image.

  1. Increase the version of the application by setting macro APP_VERSION_BUILD in demos/ota/ota_demo_core_[mqtt/http]/demo_config.h to a different version than what is running.
  2. Rebuild the application using the build steps below into a different directory, say build-dir-2.
  3. Rename the demo executable to reflect the change, e.g. mv ota_demo_core_mqtt ota_demo_core_mqtt2
  4. Create an OTA job:
    1. Go to the AWS IoT Core console.
    2. Manage → Jobs → Create → Create a FreeRTOS OTA update job → Select the corresponding name for your device from the thing list.
    3. Sign a new firmware → Create a new profile → Select any SHA-ECDSA signing platform → Upload the code signing certificate(from prerequisites) and provide its path on the device.
    4. Select the image → Select the bucket you created during the prerequisite steps → Upload the binary build-dir-2/bin/ota_demo2.
    5. The path on device should be the absolute path to place the executable and the binary name: e.g. /home/ubuntu/aws-iot-device-sdk-embedded-C-staging/build-dir/bin/ota_demo_core_mqtt2.
    6. Select the IAM role created during the prerequisite steps.
    7. Create the Job.
  5. Run the initial executable again with the following command: sudo ./ota_demo_core_mqtt or sudo ./ota_demo_core_http.
  6. After the initial executable has finished running, go to the directory where the downloaded firmware image resides which is the path name used when creating an OTA job.
  7. Change the permissions of the downloaded firmware to make it executable, as it may be downloaded with read (user default) permissions only: chmod 775 ota_demo_core_mqtt2
  8. Run the downloaded firmware image with the following command: sudo ./ota_demo_core_mqtt2

Building and Running Demos

Before building the demos, ensure you have installed the prerequisite software. On Ubuntu 18.04 and 20.04, gcc, cmake, and OpenSSL can be installed with:

sudo apt install build-essential cmake libssl-dev

Build a single demo

  • Go to the root directory of the C-SDK.
  • Run cmake to generate the Makefiles: cmake -S . -Bbuild && cd build
  • Choose a demo from the list below or alternatively, run make help | grep demo:
defender_demo
http_demo_basic_tls
http_demo_mutual_auth
http_demo_plaintext
http_demo_s3_download
http_demo_s3_download_multithreaded
http_demo_s3_upload
jobs_demo_mosquitto
mqtt_demo_basic_tls
mqtt_demo_mutual_auth
mqtt_demo_plaintext
mqtt_demo_serializer
mqtt_demo_subscription_manager
ota_demo_core_http
ota_demo_core_mqtt
pkcs11_demo_management_and_rng
pkcs11_demo_mechanisms_and_digests
pkcs11_demo_objects
pkcs11_demo_sign_and_verify
shadow_demo_main
  • Replace demo_name with your desired demo then build it: make demo_name
  • Go to the build/bin directory and run any demo executables from there.

Build all configured demos

  • Go to the root directory of the C-SDK.
  • Run cmake to generate the Makefiles: cmake -S . -Bbuild && cd build
  • Run this command to build all configured demos: make
  • Go to the build/bin directory and run any demo executables from there.

Running corePKCS11 demos

The corePKCS11 demos do not require any AWS IoT resources setup, and are standalone. The demos build upon each other to introduce concepts in PKCS #11 sequentially. Below is the recommended order.

  1. pkcs11_demo_management_and_rng
  2. pkcs11_demo_mechanisms_and_digests
  3. pkcs11_demo_objects
  4. pkcs11_demo_sign_and_verify
    1. Please note that this demo requires the private and public key generated from pkcs11_demo_objects to be in the directory the demo is executed from.

Alternative option of Docker containers for running demos locally

Install Docker:

curl -fsSL https://get.docker.com -o get-docker.sh

sh get-docker.sh

Installing Mosquitto to run MQTT demos locally

The following instructions have been tested on an Ubuntu 18.04 environment with Docker and OpenSSL installed.

Download the official Docker image for Mosquitto 1.6.14. This version is deliberately chosen so that the Docker container can load certificates from the host system. Any version after 1.6.14 will drop privileges as soon as the configuration file has been read (before TLS certificates are loaded).

docker pull eclipse-mosquitto:1.6.14

If a Mosquitto broker with TLS communication needs to be run, ignore this step and proceed to the next step. A Mosquitto broker with plain text communication can be run by executing the command below.

docker run -it -p 1883:1883 --name mosquitto-plain-text eclipse-mosquitto:1.6.14

Set BROKER_ENDPOINT defined in demos/mqtt/mqtt_demo_plaintext/demo_config.h to localhost.

Ignore the remaining steps unless a Mosquitto broker with TLS communication also needs to be run.

For TLS communication with Mosquitto broker, server and CA credentials need to be created. Use OpenSSL commands to generate the credentials for the Mosquitto server.

# Generate CA key and certificate. Provide the Subject field information as appropriate for CA certificate.
openssl req -x509 -nodes -sha256 -days 365 -newkey rsa:2048 -keyout ca.key -out ca.crt
# Generate server key and certificate.# Provide the Subject field information as appropriate for Server certificate. Make sure the Common Name (CN) field is different from the root CA certificate.
openssl req -nodes -sha256 -new -keyout server.key -out server.csr # Sign with the CA cert.
openssl x509 -req -sha256 -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 365

Note: Make sure to use different Common Name (CN) detail between the CA and server certificates; otherwise, SSL handshake fails with exactly same Common Name (CN) detail in both the certificates.

port 8883

cafile /mosquitto/config/ca.crt
certfile /mosquitto/config/server.crt
keyfile /mosquitto/config/server.key

# Use this option for TLS mutual authentication (where client will provide CA signed certificate)
#require_certificate true
tls_version tlsv1.2
#use_identity_as_username true

Create a mosquitto.conf file to use port 8883 (for TLS communication) and providing path to the generated credentials.

Run the docker container from the local directory containing the generated credential and mosquitto.conf files.

docker run -it -p 8883:8883 -v $(pwd):/mosquitto/config/ --name mosquitto-basic-tls eclipse-mosquitto:1.6.14

Update demos/mqtt/mqtt_demo_basic_tls/demo_config.h to the following:
Set BROKER_ENDPOINT to localhost.
Set ROOT_CA_CERT_PATH to the absolute path of the CA certificate created in step 4. for the local Mosquitto server.

Installing httpbin to run HTTP demos locally

Run httpbin through port 80:

docker pull kennethreitz/httpbin
docker run -p 80:80 kennethreitz/httpbin

SERVER_HOST defined in demos/http/http_demo_plaintext/demo_config.h can now be set to localhost.

To run http_demo_basic_tls, download ngrok in order to create an HTTPS tunnel to the httpbin server currently hosted on port 80:

./ngrok http 80 # May have to use ./ngrok.exe depending on OS or filename of the executable

ngrok will provide an https link that can be substituted in demos/http/http_demo_basic_tls/demo_config.h and has a format of https://ABCDEFG12345.ngrok.io.

Set SERVER_HOST in demos/http/http_demo_basic_tls/demo_config.h to the https link provided by ngrok, without https:// preceding it.

You must also download the Root CA certificate provided by the ngrok https link and set ROOT_CA_CERT_PATH in demos/http/http_demo_basic_tls/demo_config.h to the file path of the downloaded certificate.

Installation

The C-SDK libraries and platform abstractions can be installed to a file system through CMake. To do so, run the following command in the root directory of the C-SDK. Note that installation is not required to run any of the demos.

cmake -S . -Bbuild -DBUILD_DEMOS=0 -DBUILD_TESTS=0
cd build
sudo make install

Note that because make install will automatically build the all target, it may be useful to disable building demos and tests with -DBUILD_DEMOS=0 -DBUILD_TESTS=0 unless they have already been configured. Super-user permissions may be needed if installing to a system include or system library path.

To install only a subset of all libraries, pass -DINSTALL_LIBS to install only the libraries you need. By default, all libraries will be installed, but you may exclude any library that you don't need from this list:

-DINSTALL_LIBS="DEFENDER;SHADOW;JOBS;OTA;OTA_HTTP;OTA_MQTT;BACKOFF_ALGORITHM;HTTP;JSON;MQTT;PKCS"

By default, the install path will be in the project directory of the SDK. You can also set -DINSTALL_TO_SYSTEM=1 to install to the system path for headers and libraries in your OS (e.g. /usr/local/include & /usr/local/lib for Linux).

Upon entering make install, the location of each library will be specified first followed by the location of all installed headers:

-- Installing: /usr/local/lib/libaws_iot_defender.so
-- Installing: /usr/local/lib/libaws_iot_shadow.so
...
-- Installing: /usr/local/include/aws/defender.h
-- Installing: /usr/local/include/aws/defender_config_defaults.h
-- Installing: /usr/local/include/aws/shadow.h
-- Installing: /usr/local/include/aws/shadow_config_defaults.h

You may also set an installation path of your choice by passing the following flags through CMake. Make sure to run the following command in the root directory of the C-SDK:

cmake -S . -Bbuild -DBUILD_DEMOS=0 -DBUILD_TESTS=0 \
-DCSDK_HEADER_INSTALL_PATH="/header/path" -DCSDK_LIB_INSTALL_PATH="/lib/path"
cd build
sudo make install

POSIX platform abstractions are used together with the C-SDK libraries in the demos. By default, these abstractions are also installed but can be excluded by passing the flag: -DINSTALL_PLATFORM_ABSTRACTIONS=0.

Lastly, a custom config path for any specific library can also be specified through the following CMake flags, allowing libraries to be compiled with a config of your choice:

-DDEFENDER_CUSTOM_CONFIG_DIR="defender-config-directory"
-DSHADOW_CUSTOM_CONFIG_DIR="shadow-config-directory"
-DJOBS_CUSTOM_CONFIG_DIR="jobs-config-directory"
-DOTA_CUSTOM_CONFIG_DIR="ota-config-directory"
-DHTTP_CUSTOM_CONFIG_DIR="http-config-directory"
-DJSON_CUSTOM_CONFIG_DIR="json-config-directory"
-DMQTT_CUSTOM_CONFIG_DIR="mqtt-config-directory"
-DPKCS_CUSTOM_CONFIG_DIR="pkcs-config-directory"

Note that the file name of the header should not be included in the directory.

Generating Documentation

Note: For pre-generated documentation, please visit Releases and Documentation section.

The Doxygen references were created using Doxygen version 1.9.2. To generate the Doxygen pages, use the provided Python script at tools/doxygen/generate_docs.py. Please ensure that each of the library submodules under libraries/standard/ and libraries/aws/ are cloned before using this script.

cd <CSDK_ROOT>
git submodule update --init --recursive --checkout
python3 tools/doxygen/generate_docs.py

The generated documentation landing page is located at docs/doxygen/output/html/index.html.


Author: aws
Source code: https://github.com/aws/aws-iot-device-sdk-embedded-C
License: MIT license

#aws 

Easter  Deckow

Easter Deckow

1655630160

PyTumblr: A Python Tumblr API v2 Client

PyTumblr

Installation

Install via pip:

$ pip install pytumblr

Install from source:

$ git clone https://github.com/tumblr/pytumblr.git
$ cd pytumblr
$ python setup.py install

Usage

Create a client

A pytumblr.TumblrRestClient is the object you'll make all of your calls to the Tumblr API through. Creating one is this easy:

client = pytumblr.TumblrRestClient(
    '<consumer_key>',
    '<consumer_secret>',
    '<oauth_token>',
    '<oauth_secret>',
)

client.info() # Grabs the current user information

Two easy ways to get your credentials to are:

  1. The built-in interactive_console.py tool (if you already have a consumer key & secret)
  2. The Tumblr API console at https://api.tumblr.com/console
  3. Get sample login code at https://api.tumblr.com/console/calls/user/info

Supported Methods

User Methods

client.info() # get information about the authenticating user
client.dashboard() # get the dashboard for the authenticating user
client.likes() # get the likes for the authenticating user
client.following() # get the blogs followed by the authenticating user

client.follow('codingjester.tumblr.com') # follow a blog
client.unfollow('codingjester.tumblr.com') # unfollow a blog

client.like(id, reblogkey) # like a post
client.unlike(id, reblogkey) # unlike a post

Blog Methods

client.blog_info(blogName) # get information about a blog
client.posts(blogName, **params) # get posts for a blog
client.avatar(blogName) # get the avatar for a blog
client.blog_likes(blogName) # get the likes on a blog
client.followers(blogName) # get the followers of a blog
client.blog_following(blogName) # get the publicly exposed blogs that [blogName] follows
client.queue(blogName) # get the queue for a given blog
client.submission(blogName) # get the submissions for a given blog

Post Methods

Creating posts

PyTumblr lets you create all of the various types that Tumblr supports. When using these types there are a few defaults that are able to be used with any post type.

The default supported types are described below.

  • state - a string, the state of the post. Supported types are published, draft, queue, private
  • tags - a list, a list of strings that you want tagged on the post. eg: ["testing", "magic", "1"]
  • tweet - a string, the string of the customized tweet you want. eg: "Man I love my mega awesome post!"
  • date - a string, the customized GMT that you want
  • format - a string, the format that your post is in. Support types are html or markdown
  • slug - a string, the slug for the url of the post you want

We'll show examples throughout of these default examples while showcasing all the specific post types.

Creating a photo post

Creating a photo post supports a bunch of different options plus the described default options * caption - a string, the user supplied caption * link - a string, the "click-through" url for the photo * source - a string, the url for the photo you want to use (use this or the data parameter) * data - a list or string, a list of filepaths or a single file path for multipart file upload

#Creates a photo post using a source URL
client.create_photo(blogName, state="published", tags=["testing", "ok"],
                    source="https://68.media.tumblr.com/b965fbb2e501610a29d80ffb6fb3e1ad/tumblr_n55vdeTse11rn1906o1_500.jpg")

#Creates a photo post using a local filepath
client.create_photo(blogName, state="queue", tags=["testing", "ok"],
                    tweet="Woah this is an incredible sweet post [URL]",
                    data="/Users/johnb/path/to/my/image.jpg")

#Creates a photoset post using several local filepaths
client.create_photo(blogName, state="draft", tags=["jb is cool"], format="markdown",
                    data=["/Users/johnb/path/to/my/image.jpg", "/Users/johnb/Pictures/kittens.jpg"],
                    caption="## Mega sweet kittens")

Creating a text post

Creating a text post supports the same options as default and just a two other parameters * title - a string, the optional title for the post. Supports markdown or html * body - a string, the body of the of the post. Supports markdown or html

#Creating a text post
client.create_text(blogName, state="published", slug="testing-text-posts", title="Testing", body="testing1 2 3 4")

Creating a quote post

Creating a quote post supports the same options as default and two other parameter * quote - a string, the full text of the qote. Supports markdown or html * source - a string, the cited source. HTML supported

#Creating a quote post
client.create_quote(blogName, state="queue", quote="I am the Walrus", source="Ringo")

Creating a link post

  • title - a string, the title of post that you want. Supports HTML entities.
  • url - a string, the url that you want to create a link post for.
  • description - a string, the desciption of the link that you have
#Create a link post
client.create_link(blogName, title="I like to search things, you should too.", url="https://duckduckgo.com",
                   description="Search is pretty cool when a duck does it.")

Creating a chat post

Creating a chat post supports the same options as default and two other parameters * title - a string, the title of the chat post * conversation - a string, the text of the conversation/chat, with diablog labels (no html)

#Create a chat post
chat = """John: Testing can be fun!
Renee: Testing is tedious and so are you.
John: Aw.
"""
client.create_chat(blogName, title="Renee just doesn't understand.", conversation=chat, tags=["renee", "testing"])

Creating an audio post

Creating an audio post allows for all default options and a has 3 other parameters. The only thing to keep in mind while dealing with audio posts is to make sure that you use the external_url parameter or data. You cannot use both at the same time. * caption - a string, the caption for your post * external_url - a string, the url of the site that hosts the audio file * data - a string, the filepath of the audio file you want to upload to Tumblr

#Creating an audio file
client.create_audio(blogName, caption="Rock out.", data="/Users/johnb/Music/my/new/sweet/album.mp3")

#lets use soundcloud!
client.create_audio(blogName, caption="Mega rock out.", external_url="https://soundcloud.com/skrillex/sets/recess")

Creating a video post

Creating a video post allows for all default options and has three other options. Like the other post types, it has some restrictions. You cannot use the embed and data parameters at the same time. * caption - a string, the caption for your post * embed - a string, the HTML embed code for the video * data - a string, the path of the file you want to upload

#Creating an upload from YouTube
client.create_video(blogName, caption="Jon Snow. Mega ridiculous sword.",
                    embed="http://www.youtube.com/watch?v=40pUYLacrj4")

#Creating a video post from local file
client.create_video(blogName, caption="testing", data="/Users/johnb/testing/ok/blah.mov")

Editing a post

Updating a post requires you knowing what type a post you're updating. You'll be able to supply to the post any of the options given above for updates.

client.edit_post(blogName, id=post_id, type="text", title="Updated")
client.edit_post(blogName, id=post_id, type="photo", data="/Users/johnb/mega/awesome.jpg")

Reblogging a Post

Reblogging a post just requires knowing the post id and the reblog key, which is supplied in the JSON of any post object.

client.reblog(blogName, id=125356, reblog_key="reblog_key")

Deleting a post

Deleting just requires that you own the post and have the post id

client.delete_post(blogName, 123456) # Deletes your post :(

A note on tags: When passing tags, as params, please pass them as a list (not a comma-separated string):

client.create_text(blogName, tags=['hello', 'world'], ...)

Getting notes for a post

In order to get the notes for a post, you need to have the post id and the blog that it is on.

data = client.notes(blogName, id='123456')

The results include a timestamp you can use to make future calls.

data = client.notes(blogName, id='123456', before_timestamp=data["_links"]["next"]["query_params"]["before_timestamp"])

Tagged Methods

# get posts with a given tag
client.tagged(tag, **params)

Using the interactive console

This client comes with a nice interactive console to run you through the OAuth process, grab your tokens (and store them for future use).

You'll need pyyaml installed to run it, but then it's just:

$ python interactive-console.py

and away you go! Tokens are stored in ~/.tumblr and are also shared by other Tumblr API clients like the Ruby client.

Running tests

The tests (and coverage reports) are run with nose, like this:

python setup.py test

Author: tumblr
Source Code: https://github.com/tumblr/pytumblr
License: Apache-2.0 license

#python #api 

Tamale  Moses

Tamale Moses

1669003576

Exploring Mutable and Immutable in Python

In this Python article, let's learn about Mutable and Immutable in Python. 

Mutable and Immutable in Python

Mutable is a fancy way of saying that the internal state of the object is changed/mutated. So, the simplest definition is: An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created.

Both of these states are integral to Python data structure. If you want to become more knowledgeable in the entire Python Data Structure, take this free course which covers multiple data structures in Python including tuple data structure which is immutable. You will also receive a certificate on completion which is sure to add value to your portfolio.

Mutable Definition

Mutable is when something is changeable or has the ability to change. In Python, ‘mutable’ is the ability of objects to change their values. These are often the objects that store a collection of data.

Immutable Definition

Immutable is the when no change is possible over time. In Python, if the value of an object cannot be changed over time, then it is known as immutable. Once created, the value of these objects is permanent.

List of Mutable and Immutable objects

Objects of built-in type that are mutable are:

  • Lists
  • Sets
  • Dictionaries
  • User-Defined Classes (It purely depends upon the user to define the characteristics) 

Objects of built-in type that are immutable are:

  • Numbers (Integer, Rational, Float, Decimal, Complex & Booleans)
  • Strings
  • Tuples
  • Frozen Sets
  • User-Defined Classes (It purely depends upon the user to define the characteristics)

Object mutability is one of the characteristics that makes Python a dynamically typed language. Though Mutable and Immutable in Python is a very basic concept, it can at times be a little confusing due to the intransitive nature of immutability.

Objects in Python

In Python, everything is treated as an object. Every object has these three attributes:

  • Identity – This refers to the address that the object refers to in the computer’s memory.
  • Type – This refers to the kind of object that is created. For example- integer, list, string etc. 
  • Value – This refers to the value stored by the object. For example – List=[1,2,3] would hold the numbers 1,2 and 3

While ID and Type cannot be changed once it’s created, values can be changed for Mutable objects.

Check out this free python certificate course to get started with Python.

Mutable Objects in Python

I believe, rather than diving deep into the theory aspects of mutable and immutable in Python, a simple code would be the best way to depict what it means in Python. Hence, let us discuss the below code step-by-step:

#Creating a list which contains name of Indian cities  

cities = [‘Delhi’, ‘Mumbai’, ‘Kolkata’]

# Printing the elements from the list cities, separated by a comma & space

for city in cities:
		print(city, end=’, ’)

Output [1]: Delhi, Mumbai, Kolkata

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(cities)))

Output [2]: 0x1691d7de8c8

#Adding a new city to the list cities

cities.append(‘Chennai’)

#Printing the elements from the list cities, separated by a comma & space 

for city in cities:
	print(city, end=’, ’)

Output [3]: Delhi, Mumbai, Kolkata, Chennai

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(cities)))

Output [4]: 0x1691d7de8c8

The above example shows us that we were able to change the internal state of the object ‘cities’ by adding one more city ‘Chennai’ to it, yet, the memory address of the object did not change. This confirms that we did not create a new object, rather, the same object was changed or mutated. Hence, we can say that the object which is a type of list with reference variable name ‘cities’ is a MUTABLE OBJECT.

Let us now discuss the term IMMUTABLE. Considering that we understood what mutable stands for, it is obvious that the definition of immutable will have ‘NOT’ included in it. Here is the simplest definition of immutable– An object whose internal state can NOT be changed is IMMUTABLE.

Again, if you try and concentrate on different error messages, you have encountered, thrown by the respective IDE; you use you would be able to identify the immutable objects in Python. For instance, consider the below code & associated error message with it, while trying to change the value of a Tuple at index 0. 

#Creating a Tuple with variable name ‘foo’

foo = (1, 2)

#Changing the index[0] value from 1 to 3

foo[0] = 3
	
TypeError: 'tuple' object does not support item assignment 

Immutable Objects in Python

Once again, a simple code would be the best way to depict what immutable stands for. Hence, let us discuss the below code step-by-step:

#Creating a Tuple which contains English name of weekdays

weekdays = ‘Sunday’, ‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’

# Printing the elements of tuple weekdays

print(weekdays)

Output [1]:  (‘Sunday’, ‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’)

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(weekdays)))

Output [2]: 0x1691cc35090

#tuples are immutable, so you cannot add new elements, hence, using merge of tuples with the # + operator to add a new imaginary day in the tuple ‘weekdays’

weekdays  +=  ‘Pythonday’,

#Printing the elements of tuple weekdays

print(weekdays)

Output [3]: (‘Sunday’, ‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’, ‘Pythonday’)

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(weekdays)))

Output [4]: 0x1691cc8ad68

This above example shows that we were able to use the same variable name that is referencing an object which is a type of tuple with seven elements in it. However, the ID or the memory location of the old & new tuple is not the same. We were not able to change the internal state of the object ‘weekdays’. The Python program manager created a new object in the memory address and the variable name ‘weekdays’ started referencing the new object with eight elements in it.  Hence, we can say that the object which is a type of tuple with reference variable name ‘weekdays’ is an IMMUTABLE OBJECT.

Also Read: Understanding the Exploratory Data Analysis (EDA) in Python

Where can you use mutable and immutable objects:

Mutable objects can be used where you want to allow for any updates. For example, you have a list of employee names in your organizations, and that needs to be updated every time a new member is hired. You can create a mutable list, and it can be updated easily.

Immutability offers a lot of useful applications to different sensitive tasks we do in a network centred environment where we allow for parallel processing. By creating immutable objects, you seal the values and ensure that no threads can invoke overwrite/update to your data. This is also useful in situations where you would like to write a piece of code that cannot be modified. For example, a debug code that attempts to find the value of an immutable object.

Watch outs:  Non transitive nature of Immutability:

OK! Now we do understand what mutable & immutable objects in Python are. Let’s go ahead and discuss the combination of these two and explore the possibilities. Let’s discuss, as to how will it behave if you have an immutable object which contains the mutable object(s)? Or vice versa? Let us again use a code to understand this behaviour–

#creating a tuple (immutable object) which contains 2 lists(mutable) as it’s elements

#The elements (lists) contains the name, age & gender 

person = (['Ayaan', 5, 'Male'], ['Aaradhya', 8, 'Female'])

#printing the tuple

print(person)

Output [1]: (['Ayaan', 5, 'Male'], ['Aaradhya', 8, 'Female'])

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(person)))

Output [2]: 0x1691ef47f88

#Changing the age for the 1st element. Selecting 1st element of tuple by using indexing [0] then 2nd element of the list by using indexing [1] and assigning a new value for age as 4

person[0][1] = 4

#printing the updated tuple

print(person)

Output [3]: (['Ayaan', 4, 'Male'], ['Aaradhya', 8, 'Female'])

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(person)))

Output [4]: 0x1691ef47f88

In the above code, you can see that the object ‘person’ is immutable since it is a type of tuple. However, it has two lists as it’s elements, and we can change the state of lists (lists being mutable). So, here we did not change the object reference inside the Tuple, but the referenced object was mutated.

Also Read: Real-Time Object Detection Using TensorFlow

Same way, let’s explore how it will behave if you have a mutable object which contains an immutable object? Let us again use a code to understand the behaviour–

#creating a list (mutable object) which contains tuples(immutable) as it’s elements

list1 = [(1, 2, 3), (4, 5, 6)]

#printing the list

print(list1)

Output [1]: [(1, 2, 3), (4, 5, 6)]

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(list1)))

Output [2]: 0x1691d5b13c8	

#changing object reference at index 0

list1[0] = (7, 8, 9)

#printing the list

Output [3]: [(7, 8, 9), (4, 5, 6)]

#printing the location of the object created in the memory address in hexadecimal format

print(hex(id(list1)))

Output [4]: 0x1691d5b13c8

As an individual, it completely depends upon you and your requirements as to what kind of data structure you would like to create with a combination of mutable & immutable objects. I hope that this information will help you while deciding the type of object you would like to select going forward.

Before I end our discussion on IMMUTABILITY, allow me to use the word ‘CAVITE’ when we discuss the String and Integers. There is an exception, and you may see some surprising results while checking the truthiness for immutability. For instance:
#creating an object of integer type with value 10 and reference variable name ‘x’ 

x = 10
 

#printing the value of ‘x’

print(x)

Output [1]: 10

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(x)))

Output [2]: 0x538fb560

#creating an object of integer type with value 10 and reference variable name ‘y’

y = 10

#printing the value of ‘y’

print(y)

Output [3]: 10

#Printing the location of the object created in the memory address in hexadecimal format

print(hex(id(y)))

Output [4]: 0x538fb560

As per our discussion and understanding, so far, the memory address for x & y should have been different, since, 10 is an instance of Integer class which is immutable. However, as shown in the above code, it has the same memory address. This is not something that we expected. It seems that what we have understood and discussed, has an exception as well.

Quick checkPython Data Structures

Immutability of Tuple

Tuples are immutable and hence cannot have any changes in them once they are created in Python. This is because they support the same sequence operations as strings. We all know that strings are immutable. The index operator will select an element from a tuple just like in a string. Hence, they are immutable.

Exceptions in immutability

Like all, there are exceptions in the immutability in python too. Not all immutable objects are really mutable. This will lead to a lot of doubts in your mind. Let us just take an example to understand this.

Consider a tuple ‘tup’.

Now, if we consider tuple tup = (‘GreatLearning’,[4,3,1,2]) ;

We see that the tuple has elements of different data types. The first element here is a string which as we all know is immutable in nature. The second element is a list which we all know is mutable. Now, we all know that the tuple itself is an immutable data type. It cannot change its contents. But, the list inside it can change its contents. So, the value of the Immutable objects cannot be changed but its constituent objects can. change its value.

FAQs

1. Difference between mutable vs immutable in Python?

Mutable ObjectImmutable Object
State of the object can be modified after it is created.State of the object can’t be modified once it is created.
They are not thread safe.They are thread safe
Mutable classes are not final.It is important to make the class final before creating an immutable object.

2. What are the mutable and immutable data types in Python?

  • Some mutable data types in Python are:

list, dictionary, set, user-defined classes.

  • Some immutable data types are: 

int, float, decimal, bool, string, tuple, range.

3. Are lists mutable in Python?

Lists in Python are mutable data types as the elements of the list can be modified, individual elements can be replaced, and the order of elements can be changed even after the list has been created.
(Examples related to lists have been discussed earlier in this blog.)

4. Why are tuples called immutable types?

Tuple and list data structures are very similar, but one big difference between the data types is that lists are mutable, whereas tuples are immutable. The reason for the tuple’s immutability is that once the elements are added to the tuple and the tuple has been created; it remains unchanged.

A programmer would always prefer building a code that can be reused instead of making the whole data object again. Still, even though tuples are immutable, like lists, they can contain any Python object, including mutable objects.

5. Are sets mutable in Python?

A set is an iterable unordered collection of data type which can be used to perform mathematical operations (like union, intersection, difference etc.). Every element in a set is unique and immutable, i.e. no duplicate values should be there, and the values can’t be changed. However, we can add or remove items from the set as the set itself is mutable.

6. Are strings mutable in Python?

Strings are not mutable in Python. Strings are a immutable data types which means that its value cannot be updated.

Join Great Learning Academy’s free online courses and upgrade your skills today.


Original article source at: https://www.mygreatlearning.com

#python 

What Is R Programming Language? introduction & Basics

In this R article, we will learn about What Is R Programming Language? introduction & Basics. R is a programming language developed by Ross Ihaka and Robert Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R libraries are written in R, but for heavy computational tasks, C, C++, and Fortran codes are preferred.

Data analysis with R is done in a series of steps; programming, transforming, discovering, modeling and communicating the results

  • Program: R is a clear and accessible programming tool
  • Transform: R is made up of a collection of libraries designed specifically for data science
  • Discover: Investigate the data, refine your hypothesis and analyze them
  • Model: R provides a wide array of tools to capture the right model for your data
  • Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny apps to share with the world.

What is R used for?

  • Statistical inference
  • Data analysis
  • Machine learning algorithm

As conclusion, R is the world’s most widely used statistics programming language. It’s the 1st choice of data scientists and supported by a vibrant and talented community of contributors. R is taught in universities and deployed in mission-critical business applications.

R-environment setup

Windows Installation – We can download the Windows installer version of R from R-3.2.2 for windows (32/64)
 

As it is a Windows installer (.exe) with the name “R-version-win.exe”. You can just double click and run the installer accepting the default settings. If your Windows is a 32-bit version, it installs the 32-bit version. But if your windows are 64-bit, then it installs both the 32-bit and 64-bit versions.

After installation, you can locate the icon to run the program in a directory structure “R\R3.2.2\bin\i386\Rgui.exe” under the Windows Program Files. Clicking this icon brings up the R-GUI which is the R console to do R Programming. 
 

R basic Syntax

R Programming is a very popular programming language that is broadly used in data analysis. The way in which we define its code is quite simple. The “Hello World!” is the basic program for all the languages, and now we will understand the syntax of R programming with the “Hello world” program. We can write our code either in the command prompt, or we can use an R script file.

R command prompt

Once you have R environment setup, then it’s easy to start your R command prompt by just typing the following command at your command prompt −
$R
This will launch R interpreter and you will get a prompt > where you can start typing your program as follows −
 

>myString <- "Hello, World"
>print (myString)
[1] "Hello, World!"

Here the first statement defines a string variable myString, where we assign a string “Hello, World!” and then the next statement print() is being used to print the value stored in myString variable.

R data-types

While doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory.

In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are −

  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Factors
  • Data Frames

Vectors

#create a vector and find the elements which are >5
v<-c(1,2,3,4,5,6,5,8)
v[v>5]

#subset
subset(v,v>5)

#position in the vector created in which square of the numbers of v is >10 holds good
which(v*v>10)

#to know the values 
v[v*v>10]

Output: [1] 6 8 Output: [1] 6 8 Output: [1] 4 5 6 7 8 Output: [1] 4 5 6 5 8

Matrices

A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.

#matrices: a vector with two dimensional attributes
mat<-matrix(c(1,2,3,4))
 
mat1<-matrix(c(1,2,3,4),nrow=2)
mat1

Output:     [,1] [,2] [1,]    1    3 [2,]    2    4

mat2<-matrix(c(1,2,3,4),ncol=2,byrow=T)
mat2

Output:       [,1] [,2] [1,]    1    2 [2,]    3    4

mat3<-matrix(c(1,2,3,4),byrow=T)
mat3

#transpose of matrix
mattrans<-t(mat)
mattrans

#create a character matrix called fruits with elements apple, orange, pear, grapes
fruits<-matrix(c("apple","orange","pear","grapes"),2)
#create 3×4 matrix of marks obtained in each quarterly exams for 4 different subjects 
X<-matrix(c(50,70,40,90,60, 80,50, 90,100, 50,30, 70),nrow=3)
X

#give row names and column names
rownames(X)<-paste(prefix="Test.",1:3)
subs<-c("Maths", "English", "Science", "History")
colnames(X)<-subs
X

Output:       [,1]  [1,]    1  [2,]    2  [3,]    3  [4,]    4 Output:      [,1] [,2] [,3] [,4]  [1,]    1    2    3    4 Output:      [,1] [,2] [,3] [,4]  [1,]   50   90   50   50  [2,]   70   60   90   30  [3,]   40   80  100   70 Output:   Maths English Science History  Test. 1    50      90      50      50  Test. 2    70      60      90      30  Test. 3    40      80     100      70

Arrays

While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimensions. In the below example we create an array with two elements which are 3×3 matrices each.

#Arrays
arr<-array(1:24,dim=c(3,4,2))
arr

#create an array using alphabets with dimensions 3 rows, 2 columns and 3 arrays
arr1<-array(letters[1:18],dim=c(3,2,3))

#select only 1st two matrix of an array
arr1[,,c(1:2)]

#LIST
X<-list(u=2, n='abc')
X
X$u
 [,1] [,2] [,3] [,4]
 [,1] [,2] [,3] [,4]
 [,1] [,2]
 [,1] [,2]

Dataframes

Data frames are tabular data objects. Unlike a matrix in a data frame, each column can contain different modes of data. The first column can be numeric while the second column can be character and the third column can be logical. It is a list of vectors of equal length.

#Dataframes
students<-c("J","L","M","K","I","F","R","S")
Subjects<-rep(c("science","maths"),each=2)
marks<-c(55,70,66,85,88,90,56,78)
data<-data.frame(students,Subjects,marks)
#Accessing dataframes
data[[1]]

data$Subjects
data[,1]

Output: [1] J L M K I F R S Levels: F I J K L M R S Output:   data$Subjects   [1] science science maths   maths   science science maths   maths     Levels: maths science 

Factors

Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.

Factors are created using the factor() function. The nlevels function gives the count of levels.

#Factors
x<-c(1,2,3)
factor(x)

#apply function
data1<-data.frame(age=c(55,34,42,66,77),bmi=c(26,25,21,30,22))
d<-apply(data1,2,mean)
d

#create two vectors age and gender and find mean age with respect to gender
age<-c(33,34,55,54)
gender<-factor(c("m","f","m","f"))
tapply(age,gender,mean)

Output: [1] 1 2 3 Levels: 1 2 3 Output:  age  bmi 54.8 24.8 Output:  f  m         44 44

R Variables

A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, a group of atomic vectors, or a combination of many R objects. A valid variable name consists of letters, numbers, and the dot or underlines characters.

Rules for writing Identifiers in R

  1. Identifiers can be a combination of letters, digits, period (.), and underscore (_).
  2. It must start with a letter or a period. If it starts with a period, it cannot be followed by a digit.
  3. Reserved words in R cannot be used as identifiers.

Valid identifiers in R

total, sum, .fine.with.dot, this_is_acceptable, Number5

Invalid identifiers in R

tot@l, 5um, _fine, TRUE, .0ne

Best Practices

Earlier versions of R used underscore (_) as an assignment operator. So, the period (.) was used extensively in variable names having multiple words. Current versions of R support underscore as a valid identifier but it is good practice to use a period as word separators.
For example, a.variable.name is preferred over a_variable_name or alternatively we could use camel case as aVariableName.

Constants in R

Constants, as the name suggests, are entities whose value cannot be altered. Basic types of constant are numeric constants and character constants.

Numeric Constants

All numbers fall under this category. They can be of type integer, double or complex. It can be checked with the typeof() function.
Numeric Constants followed by L are regarded as integers and those followed by i are regarded as complex.

> typeof(5)
> typeof(5L)
> typeof(5L)

[1] “double” [1] “double” [[1] “double”

Character Constants

Character constants can be represented using either single quotes (‘) or double quotes (“) as delimiters.

> 'example'
> typeof("5")

[1] "example" [1] "character"

R Operators

Operators – Arithmetic, Relational, Logical, Assignment, and some of the Miscellaneous Operators that R programming language provides. 

There are four main categories of Operators in the R programming language.

  1. Arithmetic Operators
  2. Relational Operators
  3. Logical Operators
  4. Assignment Operators
  5. Mixed Operators

x <- 35
y<-10

   x+y       > x-y     > x*y       > x/y      > x%/%y     > x%%y   > x^y   [1] 45      [1] 25    [1] 350    [1] 3.5      [1] 3      [1] 5 [1]2.75e+15 

Logical Operators

The below table shows the logical operators in R. Operators & and | perform element-wise operation producing result having a length of the longer operand. But && and || examines only the first element of the operands resulting in a single length logical vector.

a <- c(TRUE,TRUE,FALSE,0,6,7)
b <- c(FALSE,TRUE,FALSE,TRUE,TRUE,TRUE)
a&b 
[1] FALSE TRUE FALSE FALSE TRUE TRUE
a&&b
[1] FALSE
> a|b
[1] TRUE TRUE FALSE TRUE TRUE TRUE
> a||b
[1] TRUE
> !a
[1] FALSE FALSE TRUE TRUE FALSE FALSE
> !b
[1] TRUE FALSE TRUE FALSE FALSE FALSE

R functions

Functions are defined using the function() directive and are stored as R objects just like anything else. In particular, they are R objects of class “function”. Here’s a simple function that takes no arguments simply prints ‘Hi statistics’.

#define the function
f <- function() {
print("Hi statistics!!!")
}
#Call the function
f()

Output: [1] "Hi statistics!!!"

Now let’s define a function called standardize, and the function has a single argument x which is used in the body of a function.

#Define the function that will calculate standardized score.
standardize = function(x) {
m = mean(x)
sd = sd(x)
result = (x – m) / sd
result
}
input<- c(40:50) #Take input for what we want to calculate a standardized score.
standardize(input) #Call the function

Output:   standardize(input) #Call the function   [1] -1.5075567 -1.2060454 -0.9045340 -0.6030227 -0.3015113 0.0000000 0.3015113 0.6030227 0.9045340 1.2060454 1.5075567 

Loop Functions

R has some very useful functions which implement looping in a compact form to make life easier. The very rich and powerful family of applied functions is made of intrinsically vectorized functions. These functions in R allow you to apply some function to a series of objects (eg. vectors, matrices, data frames, or files). They include:

  1. lapply(): Loop over a list and evaluate a function on each element
  2. sapply(): Same as lapply but try to simplify the result
  3. apply(): Apply a function over the margins of an array
  4. tapply(): Apply a function over subsets of a vector
  5. mapply(): Multivariate version of lapply

There is another function called split() which is also useful, particularly in conjunction with lapply.

R Vectors

A vector is a sequence of data elements of the same basic type. Members in a vector are officially called components. Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical, integer, double, complex, character, and raw.

The c() function can be used to create vectors of objects by concatenating things together. 
x <- c(1,2,3,4,5) #double
x #If you use only x auto-printing occurs
l <- c(TRUE, FALSE) #logical
l <- c(T, F) ## logical
c <- c("a", "b", "c", "d") ## character
i <- 1:20 ## integer
cm <- c(2+2i, 3+3i) ## complex
print(l)
print(c)
print(i)
print(cm)

You can see the type of each vector using typeof() function in R.
typeof(x)
typeof(l)
typeof(c)
typeof(i)
typeof(cm)

Output: print(l) [1] TRUE FALSE   print(c)   [1] "a" "b" "c" "d"   print(i)   [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20   print(cm)   [1] 2+2i 3+3i Output: typeof(x) [1] "double"   typeof(l)   [1] "logical"   typeof(c)   [1] "character"   typeof(i)   [1] "integer"   typeof(cm)   [1] "complex" 

Creating a vector using seq() function:

We can use the seq() function to create a vector within an interval by specifying step size or specifying the length of the vector. 

seq(1:10) #By default it will be incremented by 1
seq(1, 20, length.out=5) # specify length of the vector
seq(1, 20, by=2) # specify step size

Output: > seq(1:10) #By default it will be incremented by 1 [1] 1 2 3 4 5 6 7 8 9 10 > seq(1, 20, length.out=5) # specify length of the vector [1] 1.00 5.75 10.50 15.25 20.00 > seq(1, 20, by=2) # specify step size [1] 1 3 5 7 9 11 13 15 17 19

Extract Elements from a Vector:

Elements of a vector can be accessed using indexing. The vector indexing can be logical, integer, or character. The [ ] brackets are used for indexing. Indexing starts with position 1, unlike most programming languages where indexing starts from 0.

Extract Using Integer as Index:

We can use integers as an index to access specific elements. We can also use negative integers to return all elements except that specific element.

x<- 101:110
x[1]   #access the first element
x[c(2,3,4,5)] #Extract 2nd, 3rd, 4th, and 5th elements
x[5:10]        #Extract all elements from 5th to 10th
x[c(-5,-10)] #Extract all elements except 5th and 10th
x[-c(5:10)] #Extract all elements except from 5th to 10th 

Output:   x[1] #Extract the first element   [1] 101   x[c(2,3,4,5)] #Extract 2nd, 3rd, 4th, and 5th elements   [1] 102 103 104 105   x[5:10] #Extract all elements from 5th to 10th   [1] 105 106 107 108 109 110   x[c(-5,-10)] #Extract all elements except 5th and 10th   [1] 101 102 103 104 106 107 108 109   x[-c(5:10)] #Extract all elements except from 5th to 10th   [1] 101 102 103 104 

Extract Using Logical Vector as Index:

If you use a logical vector for indexing, the position where the logical vector is TRUE will be returned.

x[x < 105]
x[x>=104]

Output:   x[x < 105] [1] 101 102 103 104 x[x>=104]   [1] 104 105 106 107 108 109 110 

Modify a Vector in R:

We can modify a vector and assign a new value to it. You can truncate a vector by using reassignments. Check the below example. 

x<- 10:12
x[1]<- 101 #Modify the first element
x
x[2]<-102 #Modify the 2nd element
x
x<- x[1:2] #Truncate the last element
x 

Output:   x   [1] 101 11 12   x[2]<-102 #Modify the 2nd element   x   [1] 101 102 12   x<- x[1:2] #Truncate the last element   x   [1] 101 102 

Arithmetic Operations on Vectors:

We can use arithmetic operations on two vectors of the same length. They can be added, subtracted, multiplied, or divided. Check the output of the below code.

# Create two vectors.
v1 <- c(1:10)
v2 <- c(101:110)

# Vector addition.
add.result <- v1+v2
print(add.result)
# Vector subtraction.
sub.result <- v2-v1
print(sub.result)
# Vector multiplication.
multi.result <- v1*v2
print(multi.result)
# Vector division.
divi.result <- v2/v1
print(divi.result)

Output:   print(add.result)   [1] 102 104 106 108 110 112 114 116 118 120   print(sub.result)   [1] 100 100 100 100 100 100 100 100 100 100   print(multi.result)   [1] 101 204 309 416 525 636 749 864 981 1100   print(divi.result)   [1] 101.00000 51.00000 34.33333 26.00000 21.00000 17.66667 15.28571 13.50000 12.11111 11.00000 

Find Minimum and Maximum in a Vector:

The minimum and the maximum of a vector can be found using the min() or the max() function. range() is also available which returns the minimum and maximum in a vector.

x<- 1001:1010
max(x) # Find the maximum
min(x) # Find the minimum
range(x) #Find the range

Output:   max(x) # Find the maximum   [1] 1010   min(x) # Find the minimum   [1] 1001   range(x) #Find the range   [1] 1001 1010 

R Lists

The list is a data structure having elements of mixed data types. A vector having all elements of the same type is called an atomic vector but a vector having elements of a different type is called list.
We can check the type with typeof() or class() function and find the length using length()function.

x <- list("stat",5.1, TRUE, 1 + 4i)
x
class(x)
typeof(x)
length(x)

Output:   x   [[1]]   [1] "stat"   [[2]]   [1] 5.1   [[3]]   [1] TRUE   [[4]]   [1] 1+4i   class(x)   [1] “list”   typeof(x)   [1] “list”   length(x)   [1] 4 

You can create an empty list of a prespecified length with the vector() function.

x <- vector("list", length = 10)
x

Output:   x   [[1]]   NULL   [[2]]   NULL   [[3]]   NULL   [[4]]   NULL   [[5]]   NULL   [[6]]   NULL   [[7]]   NULL   [[8]]   NULL   [[9]]   NULL   [[10]]   NULL 

How to extract elements from a list?

Lists can be subset using two syntaxes, the $ operator, and square brackets []. The $ operator returns a named element of a list. The [] syntax returns a list, while the [[]] returns an element of a list.

# subsetting
l$e
l["e"]
l[1:2]
l[c(1:2)] #index using integer vector
l[-c(3:length(l))] #negative index to exclude elements from 3rd up to last.
l[c(T,F,F,F,F)] # logical index to access elements

Output: > l$e [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 0 0 0 0 0 0 0 [2,] 0 1 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 [4,] 0 0 0 1 0 0 0 0 0 0 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 0 0 0 0 0 1 0 0 0 0 [7,] 0 0 0 0 0 0 1 0 0 0 [8,] 0 0 0 0 0 0 0 1 0 0 [9,] 0 0 0 0 0 0 0 0 1 0 [10,] 0 0 0 0 0 0 0 0 0 1 > l["e"] $e [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 0 0 0 0 0 0 0 [2,] 0 1 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 [4,] 0 0 0 1 0 0 0 0 0 0 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 0 0 0 0 0 1 0 0 0 0 [7,] 0 0 0 0 0 0 1 0 0 0 [8,] 0 0 0 0 0 0 0 1 0 0 [9,] 0 0 0 0 0 0 0 0 1 0 [10,] 0 0 0 0 0 0 0 0 0 1 > l[1:2] [[1]] [1] 1 2 3 4 [[2]] [1] FALSE > l[c(1:2)] #index using integer vector [[1]] [1] 1 2 3 4 [[2]] [1] FALSE > l[-c(3:length(l))] #negative index to exclude elements from 3rd up to last. [[1]] [1] 1 2 3 4 [[2]] [1] FALSE l[c(T,F,F,F,F)] [[1]] [1] 1 2 3 4

Modifying a List in R:

We can change components of a list through reassignment.

l[["name"]] <- "Kalyan Nandi"
l

Output: [[1]] [1] 1 2 3 4 [[2]] [1] FALSE [[3]] [1] “Hello Statistics!” $d function (arg = 42) { print(“Hello World!”) } $name [1] “Kalyan Nandi”

R Matrices

In R Programming Matrix is a two-dimensional data structure. They contain elements of the same atomic types. A Matrix can be created using the matrix() function. R can also be used for matrix calculations. Matrices have rows and columns containing a single data type. In a matrix, the order of rows and columns is important. Dimension can be checked directly with the dim() function and all attributes of an object can be checked with the attributes() function. Check the below example.

Creating a matrix in R

m <- matrix(nrow = 2, ncol = 3)
dim(m)
attributes(m)
m <- matrix(1:20, nrow = 4, ncol = 5)
m

Output:   dim(m)   [1] 2 3   attributes(m)   $dim   [1] 2 3   m <- matrix(1:20, nrow = 4, ncol = 5)   m   [,1] [,2] [,3] [,4] [,5]   [1,] 1 5 9 13 17   [2,] 2 6 10 14 18   [3,] 3 7 11 15 19   [4,] 4 8 12 16 20 

Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

x<-1:3
y<-10:12
z<-30:32
cbind(x,y,z)
rbind(x,y,z)

Output:   cbind(x,y,z)   x y z   [1,] 1 10 30   [2,] 2 11 31   [3,] 3 12 32   rbind(x,y,z)   [,1] [,2] [,3]   x 1 2 3   y 10 11 12   z 30 31 32 

By default, the matrix function reorders a vector into columns, but we can also tell R to use rows instead.

x <-1:9
matrix(x, nrow = 3, ncol = 3)
matrix(x, nrow = 3, ncol = 3, byrow = TRUE)

Output   cbind(x,y,z)   x y z   [1,] 1 10 30   [2,] 2 11 31   [3,] 3 12 32   rbind(x,y,z)   [,1] [,2] [,3]   x 1 2 3   y 10 11 12   z 30 31 32 

R Arrays

In R, Arrays are the data types that can store data in more than two dimensions. An array can be created using the array() function. It takes vectors as input and uses the values in the dim parameter to create an array. If you create an array of dimensions (2, 3, 4) then it creates 4 rectangular matrices each with 2 rows and 3 columns. Arrays can store only data type.

Give a Name to Columns and Rows:

We can give names to the rows, columns, and matrices in the array by setting the dimnames parameter.

v1 <- c(1,2,3)
v2 <- 100:110
col.names <- c("Col1","Col2","Col3","Col4","Col5","Col6","Col7")
row.names <- c("Row1","Row2")
matrix.names <- c("Matrix1","Matrix2")
arr4 <- array(c(v1,v2), dim=c(2,7,2), dimnames = list(row.names,col.names, matrix.names))
arr4

Output: , , Matrix1 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1 3 101 103 105 107 109 Row2 2 100 102 104 106 108 110 , , Matrix2 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1 3 101 103 105 107 109 Row2 2 100 102 104 106 108 110

Accessing/Extracting Array Elements:

# Print the 2nd row of the 1st matrix of the array.
print(arr4[2,,1])
# Print the element in the 2nd row and 4th column of the 2nd matrix.
print(arr4[2,4,2])
# Print the 2nd Matrix.
print(arr4[,,2])

Output: > print(arr4[2,,1]) Col1 Col2 Col3 Col4 Col5 Col6 Col7 2 100 102 104 106 108 110 > > # Print the element in the 2nd row and 4th column of the 2nd matrix. > print(arr4[2,4,2]) [1] 104 > > # Print the 2nd Matrix. > print(arr4[,,2]) Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1 3 101 103 105 107 109 Row2 2 100 102 104 106 108 110

R Factors

Factors are used to represent categorical data and can be unordered or ordered. An example might be “Male” and “Female” if we consider gender. Factor objects can be created with the factor() function.

x <- factor(c("male", "female", "male", "male", "female"))
x
table(x)

Output:   x   [1] male female male male female   Levels: female male   table(x)   x   female male     2      3 

By default, Levels are put in alphabetical order. If you print the above code you will get levels as female and male. But if you want to get your levels in a particular order then set levels parameter like this.

x <- factor(c("male", "female", "male", "male", "female"), levels=c("male", "female"))
x
table(x)

Output:   x   [1] male female male male female   Levels: male female   table(x)   x   male female    3      2 

R Dataframes

Data frames are used to store tabular data in R. They are an important type of object in R and are used in a variety of statistical modeling applications. Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Unlike matrices, data frames can store different classes of objects in each column. Matrices must have every element be the same class (e.g. all integers or all numeric).

Creating a Data Frame:

Data frames can be created explicitly with the data.frame() function.

employee <- c('Ram','Sham','Jadu')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2016-11-1','2015-3-25','2017-3-14'))
employ_data <- data.frame(employee, salary, startdate)
employ_data
View(employ_data)

Output: employ_data employee salary startdate 1 Ram 21000 2016-11-01 2 Sham 23400 2015-03-25 3 Jadu 26800 2017-03-14   View(employ_data) 

Get the Structure of the Data Frame:

If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:

str(employ_data)

Output: > str(employ_data) 'data.frame': 3 obs. of 3 variables: $ employee : Factor w/ 3 levels "Jadu","Ram","Sham": 2 3 1 $ salary : num 21000 23400 26800 $ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"

Note that the first column, employee, is of type factor, instead of a character vector. By default, data.frame() function converts character vector into factor. To suppress this behavior, we can pass the argument stringsAsFactors=FALSE.

employ_data <- data.frame(employee, salary, startdate, stringsAsFactors = FALSE)
str(employ_data)

Output: 'data.frame': 3 obs. of 3 variables: $ employee : chr "Ram" "Sham" "Jadu" $ salary : num 21000 23400 26800 $ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"

R Packages

The primary location for obtaining R packages is CRAN.

You can obtain information about the available packages on CRAN with the available.packages() function.
a <- available.packages()

head(rownames(a), 30) # Show the names of the first 30 packages
Packages can be installed with the install.packages() function in R.  To install a single package, pass the name of the lecture to the install.packages() function as the first argument.
The following code installs the ggplot2 package from CRAN.
install.packages(“ggplot2”)
You can install multiple R packages at once with a single call to install.packages(). Place the names of the R packages in a character vector.
install.packages(c(“caret”, “ggplot2”, “dplyr”))
 

Loading packages
Installing a package does not make it immediately available to you in R; you must load the package. The library() function is used to load packages into R. The following code is used to load the ggplot2 package into R. Do not put the package name in quotes.
library(ggplot2)
If you have Installed your packages without root access using the command install.packages(“ggplot2″, lib=”/data/Rpackages/”). Then to load use the below command.
library(ggplot2, lib.loc=”/data/Rpackages/”)
After loading a package, the functions exported by that package will be attached to the top of the search() list (after the workspace).
library(ggplot2)

search()

R – CSV() files

In R, we can read data from files stored outside the R environment. We can also write data into files that will be stored and accessed by the operating system. R can read and write into various file formats like CSV, Excel, XML, etc.

Getting and Setting the Working Directory

We can check which directory the R workspace is pointing to using the getwd() function. You can also set a new working directory using setwd()function.

# Get and print current working directory.
print(getwd())

# Set current working directory.
setwd("/web/com")

# Get and print current working directory.
print(getwd())

Output: [1] "/web/com/1441086124_2016" [1] "/web/com"

Input as CSV File

The CSV file is a text file in which the values in the columns are separated by a comma. Let’s consider the following data present in the file named input.csv.

You can create this file using windows notepad by copying and pasting this data. Save the file as input.csv using the save As All files(*.*) option in notepad.

Reading a CSV File

Following is a simple example of read.csv() function to read a CSV file available in your current working directory −

data <- read.csv("input.csv")
print(data)
  id,   name,    salary,   start_date,     dept

R- Charts and Graphs

R- Pie Charts

Pie charts are created with the function pie(x, labels=) where x is a non-negative numeric vector indicating the area of each slice and labels= notes a character vector of names for the slices.

Syntax

The basic syntax for creating a pie-chart using the R is −

pie(x, labels, radius, main, col, clockwise)

Following is the description of the parameters used −

  • x is a vector containing the numeric values used in the pie chart.
  • labels are used to give a description of the slices.
  • radius indicates the radius of the circle of the pie chart. (value between −1 and +1).
  • main indicates the title of the chart.
  • col indicates the color palette.
  • clockwise is a logical value indicating if the slices are drawn clockwise or anti-clockwise.

Simple Pie chart

# Simple Pie Chart
slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main="Pie Chart of Countries")

 

3-D pie chart

The pie3D( ) function in the plotrix package provides 3D exploded pie charts.

# 3D Exploded Pie Chart
library(plotrix)
slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie3D(slices,labels=lbls,explode=0.1,
   main="Pie Chart of Countries ")

R -Bar Charts

A bar chart represents data in rectangular bars with a length of the bar proportional to the value of the variable. R uses the function barplot() to create bar charts. R can draw both vertical and Horizontal bars in the bar chart. In the bar chart, each of the bars can be given different colors.

Let us suppose, we have a vector of maximum temperatures (in degree Celsius) for seven days as follows.

max.temp <- c(22, 27, 26, 24, 23, 26, 28)
barplot(max.temp)

Some of the frequently used ones are, “main” to give the title, “xlab” and “ylab” to provide labels for the axes, names.arg for naming each bar, “col” to define color, etc.

We can also plot bars horizontally by providing the argument horiz=TRUE.

# barchart with added parameters
barplot(max.temp,
main = "Maximum Temperatures in a Week",
xlab = "Degree Celsius",
ylab = "Day",
names.arg = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),
col = "darkred",
horiz = TRUE)

Simply doing barplot(age) will not give us the required plot. It will plot 10 bars with height equal to the student’s age. But we want to know the number of students in each age category.

This count can be quickly found using the table() function, as shown below.

> table(age)
age
16 17 18 19 
1  2  6  1

Now plotting this data will give our required bar plot. Note below, that we define the argument “density” to shade the bars.

barplot(table(age),
main="Age Count of 10 Students",
xlab="Age",
ylab="Count",
border="red",
col="blue",
density=10
)

 

A histogram represents the frequencies of values of a variable bucketed into ranges. Histogram is similar to bar chat but the difference is it groups the values into continuous ranges. Each bar in histogram represents the height of the number of values present in that range.

R creates histogram using hist() function. This function takes a vector as an input and uses some more parameters to plot histograms.

Syntax

The basic syntax for creating a histogram using R is −

hist(v,main,xlab,xlim,ylim,breaks,col,border)

Following is the description of the parameters used −

  • v is a vector containing numeric values used in the histogram.
  • main indicates the title of the chart.
  • col is used to set the color of the bars.
  • border is used to set the border color of each bar.
  • xlab is used to give a description of the x-axis.
  • xlim is used to specify the range of values on the x-axis.
  • ylim is used to specify the range of values on the y-axis.
  • breaks are used to mention the width of each bar.

Example

A simple histogram is created using input vector, label, col, and border parameters.

The script given below will create and save the histogram in the current R working directory.

# Create data for the graph.
v <-  c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "yellow",border = "blue")

# Save the file.
dev.off()

 

Range of X and Y values

To specify the range of values allowed in X axis and Y axis, we can use the xlim and ylim parameters.

The width of each bar can be decided by using breaks.

# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram_lim_breaks.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "green",border = "red", xlim = c(0,40), ylim = c(0,5),
   breaks = 5)

# Save the file.
dev.off()

R vs SAS – Which Tool is Better?

The debate around data analytics tools has been going on forever. Each time a new one comes out, comparisons transpire. Although many aspects of the tool remain subjective, beginners want to know which tool is better to start with.
The most popular and widely used tools for data analytics are R and SAS. Both of them have been around for a long time and are often pitted against each other. So, let’s compare them based on the most relevant factors.

  1. Availability and Cost: SAS is widely used in most private organizations as it is a commercial software. It is more expensive than any other data analytics tool available. It might thus be a bit difficult buying the software if you are an individual professional or a student starting out. On the other hand, R is an open source software and is completely free to use. Anyone can begin using it right away without having to spend a penny. So, regarding availability and cost, R is hands down the better tool.
  2. Ease of learning: Since SAS is a commercial software, it has a whole lot of online resources available. Also, those who already know SQL might find it easier to adapt to SAS as it comes with PROC SQL option. The tool has a user-friendly GUI. It comes with an extensive documentation and tutorial base which can help early learners get started seamlessly. Whereas, the learning curve for R is quite steep. You need to learn to code at the root level and carrying out simple tasks demand a lot of time and effort with R. However, several forums and online communities post religiously about its usage.
  3. Data Handling Capabilities: When it comes to data handling, both SAS and R perform well, but there are some caveats for the latter. While SAS can even churn through terabytes of data with ease, R might be constrained as it makes use of the available RAM in the machine. This can be a hassle for 32-bit systems with low RAM capacity. Due to this, R can at times become unresponsive or give an ‘out of memory’ error. Both of them can run parallel computations, support integrations for Hadoop, Spark, Cloudera and Apache Pig among others. Also, the availability of devices with better RAM capacity might negate the disadvantages of R.
  4. Graphical Capabilities: Graphical capabilities or data visualization is the strongest forte of R. This is where SAS lacks behind in a major way. R has access to packages like GGPlot, RGIS, Lattice, and GGVIS among others which provide superior graphical competency. In comparison, Base SAS is struggling hard to catch up with the advancements in graphics and visualization in data analytics. Even the graphics packages available in SAS are poorly documented which makes them difficult to use.
  5. Advancements in Tool: Advancements in the industry give way to advancements in tools, and both SAS and R hold up pretty well in this regard. SAS, being a corporate software, rolls out new features and technologies frequently with new versions of its software. However, the updates are not as fast as R since it is open source software and has many contributors throughout the world. Alternatively, the latest updates in SAS are pushed out after thorough testing, making them much more stable, and reliable than R. Both the tools come with a fair share of pros & cons.
  6. Job Scenario: Currently, large corporations insist on using SAS, but SMEs and start-ups are increasingly opting for R, given that it’s free. The current job trend seems to show that while SAS is losing its momentum, R is gaining potential. The job scenario is on the cusp of change, and both the tools seem strong, but since R is on an uphill path, it can probably witness more jobs in the future, albeit not in huge corporates.
  7. Deep Learning Support: While SAS has just begun work on adding deep learning support, R has added support for a few packages which enable deep learning capabilities in the tool. You can use KerasR and keras package in R which are mere interfaces for the original Keras package built on Python. Although none of the tools are excellent facilitators of deep learning, R has seen some recent active developments on this front.
  8. Customer Service Support and Community: As one would expect from full-fledged commercial software, SAS offers excellent customer service support as well as the backing of a helpful community. Since R is free open-source software, expecting customer support will be hard to justify. However, it has a vast online community that can help you with almost everything. On the other hand, no matter what problem you face with SAS, you can immediately reach out to their customer support and get it solved without any hassles.

Final Verdict
As per estimations by the Economic Times, the analytics industry will grow to $16 billion till 2025 in India. If you wish to venture into this domain, there can’t be a better time. Just start learning the tool you think is better based on the comparison points above.


Original article source at: https://www.mygreatlearning.com

#r #programming 

Shubham Ankit

Shubham Ankit

1657081614

How to Automate Excel with Python | Python Excel Tutorial (OpenPyXL)

How to Automate Excel with Python

In this article, We will show how we can use python to automate Excel . A useful Python library is Openpyxl which we will learn to do Excel Automation

What is OPENPYXL

Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file. Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more.

Workbook: A spreadsheet is represented as a workbook in openpyxl. A workbook consists of one or more sheets.

Sheet: A sheet is a single page composed of cells for organizing data.

Cell: The intersection of a row and a column is called a cell. Usually represented by A1, B5, etc.

Row: A row is a horizontal line represented by a number (1,2, etc.).

Column: A column is a vertical line represented by a capital letter (A, B, etc.).

Openpyxl can be installed using the pip command and it is recommended to install it in a virtual environment.

pip install openpyxl

CREATE A NEW WORKBOOK

We start by creating a new spreadsheet, which is called a workbook in Openpyxl. We import the workbook module from Openpyxl and use the function Workbook() which creates a new workbook.

from openpyxl
import Workbook
#creates a new workbook
wb = Workbook()
#Gets the first active worksheet
ws = wb.active
#creating new worksheets by using the create_sheet method

ws1 = wb.create_sheet("sheet1", 0) #inserts at first position
ws2 = wb.create_sheet("sheet2") #inserts at last position
ws3 = wb.create_sheet("sheet3", -1) #inserts at penultimate position

#Renaming the sheet
ws.title = "Example"

#save the workbook
wb.save(filename = "example.xlsx")

READING DATA FROM WORKBOOK

We load the file using the function load_Workbook() which takes the filename as an argument. The file must be saved in the same working directory.

#loading a workbook
wb = openpyxl.load_workbook("example.xlsx")

 

GETTING SHEETS FROM THE LOADED WORKBOOK

 

#getting sheet names
wb.sheetnames
result = ['sheet1', 'Sheet', 'sheet3', 'sheet2']

#getting a particular sheet
sheet1 = wb["sheet2"]

#getting sheet title
sheet1.title
result = 'sheet2'

#Getting the active sheet
sheetactive = wb.active
result = 'sheet1'

 

ACCESSING CELLS AND CELL VALUES

 

#get a cell from the sheet
sheet1["A1"] <
  Cell 'Sheet1'.A1 >

  #get the cell value
ws["A1"].value 'Segment'

#accessing cell using row and column and assigning a value
d = ws.cell(row = 4, column = 2, value = 10)
d.value
10

 

ITERATING THROUGH ROWS AND COLUMNS

 

#looping through each row and column
for x in range(1, 5):
  for y in range(1, 5):
  print(x, y, ws.cell(row = x, column = y)
    .value)

#getting the highest row number
ws.max_row
701

#getting the highest column number
ws.max_column
19

There are two functions for iterating through rows and columns.

Iter_rows() => returns the rows
Iter_cols() => returns the columns {
  min_row = 4, max_row = 5, min_col = 2, max_col = 5
} => This can be used to set the boundaries
for any iteration.

Example:

#iterating rows
for row in ws.iter_rows(min_row = 2, max_col = 3, max_row = 3):
  for cell in row:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C3 >

  #iterating columns
for col in ws.iter_cols(min_row = 2, max_col = 3, max_row = 3):
  for cell in col:
  print(cell) <
  Cell 'Sheet1'.A2 >
  <
  Cell 'Sheet1'.A3 >
  <
  Cell 'Sheet1'.B2 >
  <
  Cell 'Sheet1'.B3 >
  <
  Cell 'Sheet1'.C2 >
  <
  Cell 'Sheet1'.C3 >

To get all the rows of the worksheet we use the method worksheet.rows and to get all the columns of the worksheet we use the method worksheet.columns. Similarly, to iterate only through the values we use the method worksheet.values.


Example:

for row in ws.values:
  for value in row:
  print(value)

 

WRITING DATA TO AN EXCEL FILE

Writing to a workbook can be done in many ways such as adding a formula, adding charts, images, updating cell values, inserting rows and columns, etc… We will discuss each of these with an example.

 

CREATING AND SAVING A NEW WORKBOOK

 

#creates a new workbook
wb = openpyxl.Workbook()

#saving the workbook
wb.save("new.xlsx")

 

ADDING AND REMOVING SHEETS

 

#creating a new sheet
ws1 = wb.create_sheet(title = "sheet 2")

#creating a new sheet at index 0
ws2 = wb.create_sheet(index = 0, title = "sheet 0")

#checking the sheet names
wb.sheetnames['sheet 0', 'Sheet', 'sheet 2']

#deleting a sheet
del wb['sheet 0']

#checking sheetnames
wb.sheetnames['Sheet', 'sheet 2']

 

ADDING CELL VALUES

 

#checking the sheet value
ws['B2'].value
null

#adding value to cell
ws['B2'] = 367

#checking value
ws['B2'].value
367

 

ADDING FORMULAS

 

We often require formulas to be included in our Excel datasheet. We can easily add formulas using the Openpyxl module just like you add values to a cell.
 

For example:

import openpyxl
from openpyxl
import Workbook

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']

ws['A9'] = '=SUM(A2:A8)'

wb.save("new2.xlsx")

The above program will add the formula (=SUM(A2:A8)) in cell A9. The result will be as below.

image

 

MERGE/UNMERGE CELLS

Two or more cells can be merged to a rectangular area using the method merge_cells(), and similarly, they can be unmerged using the method unmerge_cells().

For example:
Merge cells

#merge cells B2 to C9
ws.merge_cells('B2:C9')
ws['B2'] = "Merged cells"

Adding the above code to the previous example will merge cells as below.

image

UNMERGE CELLS

 

#unmerge cells B2 to C9
ws.unmerge_cells('B2:C9')

The above code will unmerge cells from B2 to C9.

INSERTING AN IMAGE

To insert an image we import the image function from the module openpyxl.drawing.image. We then load our image and add it to the cell as shown in the below example.

Example:

import openpyxl
from openpyxl
import Workbook
from openpyxl.drawing.image
import Image

wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']
#loading the image(should be in same folder)
img = Image('logo.png')
ws['A1'] = "Adding image"
#adjusting size
img.height = 130
img.width = 200
#adding img to cell A3

ws.add_image(img, 'A3')

wb.save("new2.xlsx")

Result:

image

CREATING CHARTS

Charts are essential to show a visualization of data. We can create charts from Excel data using the Openpyxl module chart. Different forms of charts such as line charts, bar charts, 3D line charts, etc., can be created. We need to create a reference that contains the data to be used for the chart, which is nothing but a selection of cells (rows and columns). I am using sample data to create a 3D bar chart in the below example:

Example

import openpyxl
from openpyxl
import Workbook
from openpyxl.chart
import BarChart3D, Reference, series

wb = openpyxl.load_workbook("example.xlsx")
ws = wb.active

values = Reference(ws, min_col = 3, min_row = 2, max_col = 3, max_row = 40)
chart = BarChart3D()
chart.add_data(values)
ws.add_chart(chart, "E3")
wb.save("MyChart.xlsx")

Result
image


How to Automate Excel with Python with Video Tutorial

Welcome to another video! In this video, We will cover how we can use python to automate Excel. I'll be going over everything from creating workbooks to accessing individual cells and stylizing cells. There is a ton of things that you can do with Excel but I'll just be covering the core/base things in OpenPyXl.

⭐️ Timestamps ⭐️
00:00 | Introduction
02:14 | Installing openpyxl
03:19 | Testing Installation
04:25 | Loading an Existing Workbook
06:46 | Accessing Worksheets
07:37 | Accessing Cell Values
08:58 | Saving Workbooks
09:52 | Creating, Listing and Changing Sheets
11:50 | Creating a New Workbook
12:39 | Adding/Appending Rows
14:26 | Accessing Multiple Cells
20:46 | Merging Cells
22:27 | Inserting and Deleting Rows
23:35 | Inserting and Deleting Columns
24:48 | Copying and Moving Cells
26:06 | Practical Example, Formulas & Cell Styling

📄 Resources 📄
OpenPyXL Docs: https://openpyxl.readthedocs.io/en/stable/ 
Code Written in This Tutorial: https://github.com/techwithtim/ExcelPythonTutorial 
Subscribe: https://www.youtube.com/c/TechWithTim/featured 

#python