A hands-on introduction to static code analysis - DeepSource

Static code analysis refers to the technique of approximating the runtime behaviour of a program. In other words, it is the process of predicting the output of a program without actually executing it.

Lately, however, the term “Static Code Analysis” is more commonly used to refer to one of the applications of this technique rather than the technique itself — program comprehension — understanding the program and detecting issues in it (anything from syntax errors to type mismatches, performance hogs likely bugs, security loopholes, etc.). This is the usage we’d be referring to throughout this post.

“The refinement of techniques for the prompt discovery of error serves as well as any other as a hallmark of what we mean by science.”

  • J. Robert Oppenheimer

Outline

We cover a lot of ground in this post. The aim is to build an understanding of static code analysis and to equip you with the basic theory, and the right tools so that you can write analyzers on your own.

We start our journey with laying down the essential parts of the pipeline which a compiler follows to understand what a piece of code does. We learn where to tap points in this pipeline to plug in our analyzers and extract meaningful information. In the latter half, we get our feet wet, and write four such static analyzers, completely from scratch, in Python.

Note that although the ideas here are discussed in light of Python, static code analyzers across all programming languages are carved out along similar lines. We chose Python because of the availability of an easy to use ast module, and wide adoption of the language itself.

#visual studio code

What is GEEK

Buddha Community

A hands-on introduction to static code analysis - DeepSource
Tyrique  Littel

Tyrique Littel

1604008800

Static Code Analysis: What It Is? How to Use It?

Static code analysis refers to the technique of approximating the runtime behavior of a program. In other words, it is the process of predicting the output of a program without actually executing it.

Lately, however, the term “Static Code Analysis” is more commonly used to refer to one of the applications of this technique rather than the technique itself — program comprehension — understanding the program and detecting issues in it (anything from syntax errors to type mismatches, performance hogs likely bugs, security loopholes, etc.). This is the usage we’d be referring to throughout this post.

“The refinement of techniques for the prompt discovery of error serves as well as any other as a hallmark of what we mean by science.”

  • J. Robert Oppenheimer

Outline

We cover a lot of ground in this post. The aim is to build an understanding of static code analysis and to equip you with the basic theory, and the right tools so that you can write analyzers on your own.

We start our journey with laying down the essential parts of the pipeline which a compiler follows to understand what a piece of code does. We learn where to tap points in this pipeline to plug in our analyzers and extract meaningful information. In the latter half, we get our feet wet, and write four such static analyzers, completely from scratch, in Python.

Note that although the ideas here are discussed in light of Python, static code analyzers across all programming languages are carved out along similar lines. We chose Python because of the availability of an easy to use ast module, and wide adoption of the language itself.

How does it all work?

Before a computer can finally “understand” and execute a piece of code, it goes through a series of complicated transformations:

static analysis workflow

As you can see in the diagram (go ahead, zoom it!), the static analyzers feed on the output of these stages. To be able to better understand the static analysis techniques, let’s look at each of these steps in some more detail:

Scanning

The first thing that a compiler does when trying to understand a piece of code is to break it down into smaller chunks, also known as tokens. Tokens are akin to what words are in a language.

A token might consist of either a single character, like (, or literals (like integers, strings, e.g., 7Bob, etc.), or reserved keywords of that language (e.g, def in Python). Characters which do not contribute towards the semantics of a program, like trailing whitespace, comments, etc. are often discarded by the scanner.

Python provides the tokenize module in its standard library to let you play around with tokens:

Python

1

import io

2

import tokenize

3

4

code = b"color = input('Enter your favourite color: ')"

5

6

for token in tokenize.tokenize(io.BytesIO(code).readline):

7

    print(token)

Python

1

TokenInfo(type=62 (ENCODING),  string='utf-8')

2

TokenInfo(type=1  (NAME),      string='color')

3

TokenInfo(type=54 (OP),        string='=')

4

TokenInfo(type=1  (NAME),      string='input')

5

TokenInfo(type=54 (OP),        string='(')

6

TokenInfo(type=3  (STRING),    string="'Enter your favourite color: '")

7

TokenInfo(type=54 (OP),        string=')')

8

TokenInfo(type=4  (NEWLINE),   string='')

9

TokenInfo(type=0  (ENDMARKER), string='')

(Note that for the sake of readability, I’ve omitted a few columns from the result above — metadata like starting index, ending index, a copy of the line on which a token occurs, etc.)

#code quality #code review #static analysis #static code analysis #code analysis #static analysis tools #code review tips #static code analyzer #static code analysis tool #static analyzer

Tyrique  Littel

Tyrique Littel

1604023200

Effective Code Reviews: A Primer

Peer code reviews as a process have increasingly been adopted by engineering teams around the world. And for good reason — code reviews have been proven to improve software quality and save developers’ time in the long run. A lot has been written about how code reviews help engineering teams by leading software engineering practitioners. My favorite is this quote by Karl Wiegers, author of the seminal paper on this topic, Humanizing Peer Reviews:

Peer review – an activity in which people other than the author of a software deliverable examine it for defects and improvement opportunities – is one of the most powerful software quality tools available. Peer review methods include inspections, walkthroughs, peer deskchecks, and other similar activities. After experiencing the benefits of peer reviews for nearly fifteen years, I would never work in a team that did not perform them.

It is worth the time and effort to put together a code review strategy and consistently follow it in the team. In essence, this has a two-pronged benefit: more pair of eyes looking at the code decreases the chances of bugs and bad design patterns entering your codebase, and embracing the process fosters knowledge sharing and positive collaboration culture in the team.

Here are 6 tips to ensure effective peer reviews in your team.

1. Keep the Changes Small and Focused

Code reviews require developers to look at someone else’s code, most of which is completely new most of the times. Too many lines of code to review at once requires a huge amount of cognitive effort, and the quality of review diminishes as the size of changes increases. While there’s no golden number of LOCs, it is recommended to create small pull-requests which can be managed easily. If there are a lot of changes going in a release, it is better to chunk it down into a number of small pull-requests.

2. Ensure Logical Coherence of Changes

Code reviews are the most effective when the changes are focused and have logical coherence. When doing refactoring, refrain from making behavioral changes. Similarly, behavioral changes should not include refactoring and style violation fixes. Following this convention prevents unintended changes creeping in unnoticed in the code base.

3. Have Automated Tests, and Track Coverage

Automated tests of your preferred flavor — units, integration tests, end-to-end tests, etc. help automatically ensure correctness. Consistently ensuring that changes proposed are covered by some kind of automated frees up time for more qualitative review; allowing for a more insightful and in-depth conversation on deeper issues.

4. Self-Review Changes Before Submitting for Peer Review

A change can implement a new feature or fix an existing issue. It is recommended that the requester submits only those changes that are complete, and tested for correctness manually. Before creating the pull-request, a quick glance on what changes are being proposed helps ensure that no extraneous files are added in the changeset. This saves tons of time for the reviewers.

5. Automate What Can Be Automated

Human review time is expensive, and the best use of a developer’s time is reviewing qualitative aspects of code — logic, design patterns, software architecture, and so on. Linting tools can help automatically take care of style and formatting conventions. Continuous Quality tools can help catch potential bugs, anti-patterns and security issues which can be fixed by the developer before they make a change request. Most of these tools integrate well with code hosting platforms as well.

6. Be Positive, Polite, and Respectful

Finally, be cognizant of the fact that people on both sides of the review are but human. Offer positive feedback, and accept criticism humbly. Instead of beating oneself upon the literal meaning of words, it really pays off to look at reviews as people trying to achieve what’s best for the team, albeit in possibly different ways. Being cognizant of this aspect can save a lot of resentment and unmitigated negativity.

#agile #code quality #code review #static analysis #code analysis #code reviews #static analysis tools #code review tips #continuous quality #static analyzer

Samanta  Moore

Samanta Moore

1621137960

Guidelines for Java Code Reviews

Get a jump-start on your next code review session with this list.

Having another pair of eyes scan your code is always useful and helps you spot mistakes before you break production. You need not be an expert to review someone’s code. Some experience with the programming language and a review checklist should help you get started. We’ve put together a list of things you should keep in mind when you’re reviewing Java code. Read on!

1. Follow Java Code Conventions

2. Replace Imperative Code With Lambdas and Streams

3. Beware of the NullPointerException

4. Directly Assigning References From Client Code to a Field

5. Handle Exceptions With Care

#java #code quality #java tutorial #code analysis #code reviews #code review tips #code analysis tools #java tutorial for beginners #java code review

Ilene  Jerde

Ilene Jerde

1598552640

Limitations of Linters—Is it Time to Level-Up?

Static code analyzers, seen by many as the evolution of classic linters, are code analysis suites that offer a broader scope and more in-depth capabilities than linters. In this piece, I attempt a high-level comparison of the two, while highlighting certain shortcomings of linters that can be solved through static code analyzers.

As developers, we have all used, or at the very least heard of the following two terms– ‘linter’ and ‘static code analyzer’. Both linter and static code analyzers are often used to describe similar tools, and even used interchangeably. The service these tools offer is broadly similar – examining source code for potential issues. However, they are not the same thing. In fact, based on the program you go for, the code-checking services they provide can greatly differ. In this article, we take a high-level comparative look at these two tools, with examples of some Java-specific options.

#java #testing #code quality #static analysis #source code analysis #source code analysis tools #automated analysis

Obie  Rowe

Obie Rowe

1599318960

Looking Under the Hood of Apache Pulsar: How Good is the Code?

Apache Pulsar (incubating) is an enterprise-grade publish-subscribe (aka pub-sub) messaging system that was originally developed at Yahoo. Pulsar was first open-sourced in late 2016, and is now undergoing incubation under the auspices of the Apache Software Foundation. At Yahoo, Pulsar has been in production for over three years, powering major applications like Yahoo! Mail, Yahoo! Finance, Yahoo! Sports, Flickr, the Gemini Ads platform, and Sherpa, Yahoo’s distributed key-value store.

One of the primary goals of the open-source software movement is to allow all users to freely modify software, use it in new ways, integrate it into a larger project or derive something new based on the original. The larger idea being that through open collaboration, we can make a software the best version of itself. So, in the spirit of innovation and improvement, we peeked behind the curtain and ran Apache Pulsar through the free static code analysis tool Embold. The results are very interesting, both from an observer’s and a learner’s point of view.

1. Quality Gate: Embold’s Quality gate is “closed” for Pulsar mainly due to the Code Issues and Code Duplication. The overall rating score is certainly not the best, but sits at an innocuous 1.76 (5 being the best on the scale).

Overall rating

2. Code Issues: And so let’s move on to some of the code issues found. The critical and high Code issues count in the modules pulsar-broker and pulsar-functions are close to 150. That’s a high number worth exploring. Time to do a deep-dive.

TheadLeak: This issue is fairly common across the entire codebase. To quote an occurrence, we see that in the Compactor class, shutdown() is missing for the executor service that manages the threadpool, indicating a thread resource leak. The tool recommends call the shutdown() once done to ensure there is no thread resource leaks.

Threadleak

_Resource Leak: _In the RuntimeUtils class, we see that close() on the resource rd is not guarded by a finally block. This means, that if there’s an exception while reading the data, the resource will never get closed, indicating a leak.

#java #open source #software testing #code quality #anti-patterns #static analysis tools #source code analysis #apache pulsar #code duplication #code issues