Morty story generation with GPT2 using Transformers and Streamlit in 57 lines of code

Introduction

With the rapid progress in Machine Learning (ML) and Natural Language Processing (NLP), new algorithms are able to generate texts that seem more and more human-produced. One such algorithm, GPT2¹, has been used in many open-source applications². GPT2 was trained on WebText, which contains 45 million outbound links from Reddit (i.e. websites that comments reference). The top 10 outbound domains³ include GoogleArchiveBlogspotGithubNYTimesWordPressWashington PostWikiaBBC, and The Guardian. The pre-trained GPT2 model can be fine-tuned on specific datasets, for example, to “acquire” the style of a dataset or learn to classify documents. This is done via Transfer Learning, which can be defined as “a means to extract knowledge from a source setting and apply it to a different target setting”⁴. For a detailed explanation of GPT2 and its architecture see the original paper⁵, OpenAI’s blog post⁶, or Jay Alammar’s illustrated guide⁷.

Dataset

The dataset used to fine-tune GPT2 consists of the first 3 seasons of Rick and Morty transcripts. I filtered out all dialog that was not generated by Rick, Morty, Summer, Beth, or Jerry. The data was downloaded and stored in a raw text format. Each line represents a speaker and their utterance or an action/scene description. The dataset was split into training and test data, which contain 6905 and 1454 lines, respectively. The raw files can found here. The training data is used to fine-tune the model, while the test data is used for evaluation.

Training the model

Hugging Face’s Transformers library provides a simple script to fine-tune a custom GPT2 model. You can fine-tune your own model using this Google Colab notebook. Once your model has finished training, make sure you download the trained model output folder containing all relevant model files (this is essential to load the model later). You can upload your custom model on Hugging Face’s Model Hub⁸ to make it accessible to the public. The model achieves a perplexity score of around ~17 when evaluated on the test data.

#text-generation #editors-pick #language-modeling #nlp #machine-learning

What is GEEK

Buddha Community

Morty story generation with GPT2 using Transformers and Streamlit in 57 lines of code

Morty story generation with GPT2 using Transformers and Streamlit in 57 lines of code

Introduction

With the rapid progress in Machine Learning (ML) and Natural Language Processing (NLP), new algorithms are able to generate texts that seem more and more human-produced. One such algorithm, GPT2¹, has been used in many open-source applications². GPT2 was trained on WebText, which contains 45 million outbound links from Reddit (i.e. websites that comments reference). The top 10 outbound domains³ include GoogleArchiveBlogspotGithubNYTimesWordPressWashington PostWikiaBBC, and The Guardian. The pre-trained GPT2 model can be fine-tuned on specific datasets, for example, to “acquire” the style of a dataset or learn to classify documents. This is done via Transfer Learning, which can be defined as “a means to extract knowledge from a source setting and apply it to a different target setting”⁴. For a detailed explanation of GPT2 and its architecture see the original paper⁵, OpenAI’s blog post⁶, or Jay Alammar’s illustrated guide⁷.

Dataset

The dataset used to fine-tune GPT2 consists of the first 3 seasons of Rick and Morty transcripts. I filtered out all dialog that was not generated by Rick, Morty, Summer, Beth, or Jerry. The data was downloaded and stored in a raw text format. Each line represents a speaker and their utterance or an action/scene description. The dataset was split into training and test data, which contain 6905 and 1454 lines, respectively. The raw files can found here. The training data is used to fine-tune the model, while the test data is used for evaluation.

Training the model

Hugging Face’s Transformers library provides a simple script to fine-tune a custom GPT2 model. You can fine-tune your own model using this Google Colab notebook. Once your model has finished training, make sure you download the trained model output folder containing all relevant model files (this is essential to load the model later). You can upload your custom model on Hugging Face’s Model Hub⁸ to make it accessible to the public. The model achieves a perplexity score of around ~17 when evaluated on the test data.

#text-generation #editors-pick #language-modeling #nlp #machine-learning

Tyrique  Littel

Tyrique Littel

1604008800

Static Code Analysis: What It Is? How to Use It?

Static code analysis refers to the technique of approximating the runtime behavior of a program. In other words, it is the process of predicting the output of a program without actually executing it.

Lately, however, the term “Static Code Analysis” is more commonly used to refer to one of the applications of this technique rather than the technique itself — program comprehension — understanding the program and detecting issues in it (anything from syntax errors to type mismatches, performance hogs likely bugs, security loopholes, etc.). This is the usage we’d be referring to throughout this post.

“The refinement of techniques for the prompt discovery of error serves as well as any other as a hallmark of what we mean by science.”

  • J. Robert Oppenheimer

Outline

We cover a lot of ground in this post. The aim is to build an understanding of static code analysis and to equip you with the basic theory, and the right tools so that you can write analyzers on your own.

We start our journey with laying down the essential parts of the pipeline which a compiler follows to understand what a piece of code does. We learn where to tap points in this pipeline to plug in our analyzers and extract meaningful information. In the latter half, we get our feet wet, and write four such static analyzers, completely from scratch, in Python.

Note that although the ideas here are discussed in light of Python, static code analyzers across all programming languages are carved out along similar lines. We chose Python because of the availability of an easy to use ast module, and wide adoption of the language itself.

How does it all work?

Before a computer can finally “understand” and execute a piece of code, it goes through a series of complicated transformations:

static analysis workflow

As you can see in the diagram (go ahead, zoom it!), the static analyzers feed on the output of these stages. To be able to better understand the static analysis techniques, let’s look at each of these steps in some more detail:

Scanning

The first thing that a compiler does when trying to understand a piece of code is to break it down into smaller chunks, also known as tokens. Tokens are akin to what words are in a language.

A token might consist of either a single character, like (, or literals (like integers, strings, e.g., 7Bob, etc.), or reserved keywords of that language (e.g, def in Python). Characters which do not contribute towards the semantics of a program, like trailing whitespace, comments, etc. are often discarded by the scanner.

Python provides the tokenize module in its standard library to let you play around with tokens:

Python

1

import io

2

import tokenize

3

4

code = b"color = input('Enter your favourite color: ')"

5

6

for token in tokenize.tokenize(io.BytesIO(code).readline):

7

    print(token)

Python

1

TokenInfo(type=62 (ENCODING),  string='utf-8')

2

TokenInfo(type=1  (NAME),      string='color')

3

TokenInfo(type=54 (OP),        string='=')

4

TokenInfo(type=1  (NAME),      string='input')

5

TokenInfo(type=54 (OP),        string='(')

6

TokenInfo(type=3  (STRING),    string="'Enter your favourite color: '")

7

TokenInfo(type=54 (OP),        string=')')

8

TokenInfo(type=4  (NEWLINE),   string='')

9

TokenInfo(type=0  (ENDMARKER), string='')

(Note that for the sake of readability, I’ve omitted a few columns from the result above — metadata like starting index, ending index, a copy of the line on which a token occurs, etc.)

#code quality #code review #static analysis #static code analysis #code analysis #static analysis tools #code review tips #static code analyzer #static code analysis tool #static analyzer

dev karmanr

1634323972

Xcode 12 deployment target warnings when use CocoaPods

The Installer is responsible of taking a Podfile and transform it in the Pods libraries. It also integrates the user project so the Pods libraries can be used out of the box.

The Installer is capable of doing incremental updates to an existing Pod installation.

The Installer gets the information that it needs mainly from 3 files:

- Podfile: The specification written by the user that contains
 information about targets and Pods.
- Podfile.lock: Contains information about the pods that were previously
 installed and in concert with the Podfile provides information about
 which specific version of a Pod should be installed. This file is
 ignored in update mode.
- Manifest.lock: A file contained in the Pods folder that keeps track of
 the pods installed in the local machine. This files is used once the
 exact versions of the Pods has been computed to detect if that version
 is already installed. This file is not intended to be kept under source
 control and is a copy of the Podfile.lock.
The Installer is designed to work in environments where the Podfile folder is under source control and environments where it is not. The rest of the files, like the user project and the workspace are assumed to be under source control.

https://www.npmjs.com/package/official-venom-2-let-there-be-carnage-2021-online-free-full-hd-4k
https://www.npmjs.com/package/venom-2-let-there-be-carnage-2021-online-free-full-hd

Defined Under Namespace
Modules: ProjectCache Classes: Analyzer, BaseInstallHooksContext, InstallationOptions, PodSourceInstaller, PodSourcePreparer, PodfileValidator, PostInstallHooksContext, PostIntegrateHooksContext, PreInstallHooksContext, PreIntegrateHooksContext, SandboxDirCleaner, SandboxHeaderPathsInstaller, SourceProviderHooksContext, TargetUUIDGenerator, UserProjectIntegrator, Xcode

Constant Summary
collapse
MASTER_SPECS_REPO_GIT_URL =
'https://github.com/CocoaPods/Specs.git'.freeze
Installation results
collapse

https://www.npmjs.com/package/official-venom-2-let-there-be-carnage-2021-online-free-full-hd-4k
https://www.npmjs.com/package/venom-2-let-there-be-carnage-2021-online-free-full-hd


#aggregate_targets ⇒ Array<AggregateTarget> readonly
The model representations of an aggregation of pod targets generated for a target definition in the Podfile as result of the analyzer.
#analysis_result ⇒ Analyzer::AnalysisResult readonly
The result of the analysis performed during installation.
#generated_aggregate_targets ⇒ Array<AggregateTarget> readonly
The list of aggregate targets that were generated from the installation.
#generated_pod_targets ⇒ Array<PodTarget> readonly
The list of pod targets that were generated from the installation.
#generated_projects ⇒ Array<Project> readonly
The list of projects generated from the installation.
#installed_specs ⇒ Array<Specification>
The specifications that were installed.
#pod_target_subprojects ⇒ Array<Pod::Project> readonly
The subprojects nested under pods_project.
#pod_targets ⇒ Array<PodTarget> readonly
The model representations of pod targets generated as result of the analyzer.
#pods_project ⇒ Pod::Project readonly
The `Pods/Pods.xcodeproj` project.
#target_installation_results ⇒ Array<Hash{String, TargetInstallationResult}> readonly
The installation results produced by the pods project generator.
Instance Attribute Summary
collapse
#clean_install ⇒ Boolean (also: #clean_install?)
when incremental installation is enabled.
#deployment ⇒ Boolean (also: #deployment?)
Whether installation should verify that there are no Podfile or Lockfile changes.
#has_dependencies ⇒ Boolean (also: #has_dependencies?)
Whether it has dependencies.
#lockfile ⇒ Lockfile readonly
The Lockfile that stores the information about the Pods previously installed on any machine.
#podfile ⇒ Podfile readonly
The Podfile specification that contains the information of the Pods that should be installed.
#repo_update ⇒ Boolean (also: #repo_update?)
Whether the spec repos should be updated.
#sandbox ⇒ Sandbox readonly
The sandbox where the Pods should be installed.
#update ⇒ Hash, ...
Pods that have been requested to be updated or true if all Pods should be updated.
#use_default_plugins ⇒ Boolean (also: #use_default_plugins?)
Whether default plugins should be used during installation.
Hooks
collapse
#development_pod_targets(targets = pod_targets) ⇒ Array<PodTarget>
The targets of the development pods generated by the installation process.
Convenience Methods
collapse
.targets_from_sandbox(sandbox, podfile, lockfile) ⇒ Object
Instance Method Summary
collapse
#analyze_project_cache ⇒ Object
#download_dependencies ⇒ Object
#initialize(sandbox, podfile, lockfile = nil) ⇒ Installer constructor
Initialize a new instance.
#install! ⇒ void
Installs the Pods.
#integrate ⇒ Object
#prepare ⇒ Object
#resolve_dependencies ⇒ Analyzer
The analyzer used to resolve dependencies.
#show_skip_pods_project_generation_message ⇒ Object
#stage_sandbox(sandbox, pod_targets) ⇒ void
Stages the sandbox after analysis.
Methods included from Config::Mixin
#config

Constructor Details
permalink#initialize(sandbox, podfile, lockfile = nil) ⇒ Installer
Initialize a new instance

Parameters:

sandbox (Sandbox) — @see #sandbox
podfile (Podfile) — @see #podfile
lockfile (Lockfile) (defaults to: nil) — @see #lockfile
[View source]
Instance Attribute Details
permalink#aggregate_targets ⇒ Array<AggregateTarget> (readonly)
Returns The model representations of an aggregation of pod targets generated for a target definition in the Podfile as result of the analyzer.

Returns:

(Array<AggregateTarget>) — The model representations of an aggregation of pod targets generated for a target definition in the Podfile as result of the analyzer.
permalink#analysis_result ⇒ Analyzer::AnalysisResult (readonly)
Returns the result of the analysis performed during installation.

Returns:

(Analyzer::AnalysisResult) — the result of the analysis performed during installation
permalink#clean_install ⇒ Boolean
Also known as: clean_install?
when incremental installation is enabled.

Returns:

(Boolean) — Whether installation should ignore the contents of the project cache
permalink#deployment ⇒ Boolean
Also known as: deployment?
Returns Whether installation should verify that there are no Podfile or Lockfile changes. Defaults to false.

Returns:

(Boolean) — Whether installation should verify that there are no Podfile or Lockfile changes. Defaults to false.
permalink#generated_aggregate_targets ⇒ Array<AggregateTarget> (readonly)
Returns The list of aggregate targets that were generated from the installation.

Returns:

(Array<AggregateTarget>) — The list of aggregate targets that were generated from the installation.
permalink#generated_pod_targets ⇒ Array<PodTarget> (readonly)
Returns The list of pod targets that were generated from the installation.

Returns:

(Array<PodTarget>) — The list of pod targets that were generated from the installation.
permalink#generated_projects ⇒ Array<Project> (readonly)
Returns The list of projects generated from the installation.

Returns:

(Array<Project>) — The list of projects generated from the installation.
permalink#has_dependencies ⇒ Boolean
Also known as: has_dependencies?
Returns Whether it has dependencies. Defaults to true.

Returns:

(Boolean) — Whether it has dependencies. Defaults to true.
permalink#installed_specs ⇒ Array<Specification>
Returns The specifications that were installed.

Returns:

(Array<Specification>) — The specifications that were installed.
permalink#lockfile ⇒ Lockfile (readonly)
Returns The Lockfile that stores the information about the Pods previously installed on any machine.

Returns:

(Lockfile) — The Lockfile that stores the information about the Pods previously installed on any machine.
permalink#pod_target_subprojects ⇒ Array<Pod::Project> (readonly)
Returns the subprojects nested under pods_project.

Returns:

(Array<Pod::Project>) — the subprojects nested under pods_project.
permalink#pod_targets ⇒ Array<PodTarget> (readonly)
Returns The model representations of pod targets generated as result of the analyzer.

Returns:

(Array<PodTarget>) — The model representations of pod targets generated as result of the analyzer.
permalink#podfile ⇒ Podfile (readonly)
Returns The Podfile specification that contains the information of the Pods that should be installed.

Returns:

(Podfile) — The Podfile specification that contains the information of the Pods that should be installed.
permalink#pods_project ⇒ Pod::Project (readonly)
Returns the `Pods/Pods.xcodeproj` project.

Returns:

(Pod::Project) — the `Pods/Pods.xcodeproj` project.
permalink#repo_update ⇒ Boolean
Also known as: repo_update?
Returns Whether the spec repos should be updated.

Returns:

(Boolean) — Whether the spec repos should be updated.
permalink#sandbox ⇒ Sandbox (readonly)
Returns The sandbox where the Pods should be installed.

Returns:

(Sandbox) — The sandbox where the Pods should be installed.
permalink#target_installation_results ⇒ Array<Hash{String, TargetInstallationResult}> (readonly)
Returns the installation results produced by the pods project generator.

Returns:

(Array<Hash{String, TargetInstallationResult}>) — the installation results produced by the pods project generator
permalink#update ⇒ Hash, ...
Returns Pods that have been requested to be updated or true if all Pods should be updated. If all Pods should been updated the contents of the Lockfile are not taken into account for deciding what Pods to install.

Returns:

(Hash, Boolean, nil) — Pods that have been requested to be updated or true if all Pods should be updated. If all Pods should been updated the contents of the Lockfile are not taken into account for deciding what Pods to install.
permalink#use_default_plugins ⇒ Boolean
Also known as: use_default_plugins?
Returns Whether default plugins should be used during installation. Defaults to true.

Returns:

(Boolean) — Whether default plugins should be used during installation. Defaults to true.
Class Method Details
permalink.targets_from_sandbox(sandbox, podfile, lockfile) ⇒ Object
Raises:

(Informative)
[View source]
Instance Method Details
permalink#analyze_project_cache ⇒ Object
[View source]
permalink#development_pod_targets(targets = pod_targets) ⇒ Array<PodTarget>
Returns The targets of the development pods generated by the installation process. This can be used as a convenience method for external scripts.

Parameters:

targets (Array<PodTarget>) (defaults to: pod_targets)
Returns:

(Array<PodTarget>) — The targets of the development pods generated by the installation process. This can be used as a convenience method for external scripts.
[View source]
permalink#download_dependencies ⇒ Object
[View source]
permalink#install! ⇒ void
This method returns an undefined value.

Installs the Pods.

The installation process is mostly linear with a few minor complications to keep in mind:

The stored podspecs need to be cleaned before the resolution step otherwise the sandbox might return an old podspec and not download the new one from an external source.

The resolver might trigger the download of Pods from external sources necessary to retrieve their podspec (unless it is instructed not to do it).

[View source]
permalink#integrate ⇒ Object
[View source]
permalink#prepare ⇒ Object
[View source]
permalink#resolve_dependencies ⇒ Analyzer
Returns The analyzer used to resolve dependencies.

Returns:

(Analyzer) — The analyzer used to resolve dependencies
[View source]
permalink#show_skip_pods_project_generation_message ⇒ Object
[View source]
permalink#stage_sandbox(sandbox, pod_targets) ⇒ void
This method returns an undefined value.

Stages the sandbox after analysis.

Parameters:

sandbox (Sandbox) — The sandbox to stage.
pod_targets (Array<PodTarget>) — The list of all pod targets.

Samanta  Moore

Samanta Moore

1621137960

Guidelines for Java Code Reviews

Get a jump-start on your next code review session with this list.

Having another pair of eyes scan your code is always useful and helps you spot mistakes before you break production. You need not be an expert to review someone’s code. Some experience with the programming language and a review checklist should help you get started. We’ve put together a list of things you should keep in mind when you’re reviewing Java code. Read on!

1. Follow Java Code Conventions

2. Replace Imperative Code With Lambdas and Streams

3. Beware of the NullPointerException

4. Directly Assigning References From Client Code to a Field

5. Handle Exceptions With Care

#java #code quality #java tutorial #code analysis #code reviews #code review tips #code analysis tools #java tutorial for beginners #java code review

Wiley  Mayer

Wiley Mayer

1603890000

How to Find the Stinky Parts of Your Code (Part I)

of something that might be wrong. They are not rigid rules.

Code Smell 01 — Anemic Models

Your objects are a bunch of public attributes without behavior.

Photo by Stacey Vandergriff on Unsplash

Protocol is empty (with setters/getters).

If we ask a domain expert to describe an entity he/she would hardly tell it is ‘a bunch of attributes’.

Problems

  • No Encapsulation.
  • No mapping to real world entities.
  • Duplicate Code
  • Coupling

Solutions

    1. Find Responsibilities.
    1. Protect your attributes.
    1. Hide implementations.
    1. Delegate

Examples

  • DTOs

Sample Code

	<?

	Class Window{
	  public height;
	  public width;

	  function getHeight(){
	    return $this->height;
	  }

	  function setHeight($height){
	    $this->height = $height;
	  }

	  function getWidth(){
	    return $this->width;
	  }

	  function setWidth($width){
	    $this->width = $width;
	  }

	}

Wrong

<?

	final Class Window{ 

	  function area(){
	    //...
	  }

	  function open(){
	    //..
	  }

	  function isOpen(){
	    //..
	  }

	}

Right

#code-smells #clean-code #refactoring #refactor-legacy-code #stinky-code #stinky-code-parts #pixel-face #hackernoon-top-story