How to Search a Codebase with Python

How to Search a Codebase with Python

In this post, I will show you how you can search an entire directory to find keywords in a file with Python

In this post, I will show you how you can search an entire directory to find keywords in a file. Once we find files with offending code we will create a list so we can find the most offensive files and focus on those.

Iterating the File Tree

First, we will collect the paths for files we are interested in. Here we have the directory to search defined at the top. Also, we created an exclusion list. The exclusion list makes sure we don’t get files that aren't relevant to our platform. The node_modules folder is a prime example of one we don't want to include.

We initialize file paths outside the loop, as we will be iterating over it later.

OS.walk will help us with the hard work of iterating each of the directories. It returns the root directory name (as we iterate we are getting an updated root). Also, it will give us a list of subdirectory names and file names.

We are ignoring the subdirectory names because we just want the paths to files. Collecting them means iterating over the list of file names in the directory, check to make sure it has an appropriate extension, and make sure it doesn't exist in our exclusion list.

import os

walk_dir = 'C:\\Directory\\ForWalking\\'

# If the file path contains these we dont want them
#    eg. C:\\Directory\\ForWalking\\node_modules will be ignored
exclusions = ["node_modules", "SolutionFiles", ".bin", "Test"]

# Array to store all our file paths
file_paths = []

# Iterate file tree
for root, sub_dirs, file_names in os.walk(walk_dir):
    
    # Iterate the file names in the directory
    for file_name in file_names:
        
        # We only are interested in Typscript and JS Files
        if file_name.endswith(".ts") 
           or file_name.endswith(".tsx") 
           or file_name.endswith(".js"):
           
            # If the file path doesnt have 
            #   anything from the exclusion list
            if not any(exclusion in root for exclusion in exclusions):
                file_paths.append(os.path.join(root, file_name))
Sort Through the File

Next, we will want to go through all the files we found and see if they have any of the offensive code. We created an array that contains all the methods we want to search for. Any occurrences of these will need to be updated later.

We will visit each of the files we found in our last step. When we visit them we will open and count the occurrences of any offensive code. If we find an occurrence, we will add to the occurrences array with the path, unsupported code, and the number of times it appeared.

# Occurances will track each time an offensive bit of code is found
# Its format will be:
#   File Path, Function, Num Occurances
occurances = []

# Methods that need to be update
nogos = [
".SetFocus(",
".IsValid(",
".Clear",
".IsDirty",
".RemoveItem",
".SetTime",
".RemoveItem",
".SetTime"
]

# Iterate previously collected file paths
for file_path in file_paths:
    
    # Open the file as read only ignoring unknown chars
    with open(file_path, 'r', encoding='utf8', errors='ignore' ) as f:
        contents = f.read()
        
        # Check each of the offensive code bits
        for string in nogos:
            countNogo = contents.count(string)
            
            # If there is offensive code in the file append it to 
            #   the occurances array
            if countNogo > 0:
                occurances.append([file_path, string, str(countNogo)])

# Create an output csv string
outCSV = "\n".join([",".join(line) for line in occurances])

# Write to file
with open("Outfile.csv", 'w+') as f:
    f.write(outCSV)
Results

Using this sort bit of scripting saved me from having to open 150+ files of code. Instead, it found the 26 files with offensive code so I can focus on those. Also, I was able to give my manager a better idea of the scope of the project in just a few minutes rather than several days.

For those who may look over the code and point out this could have been done in fewer steps using map, filter, and reduce — you are right! Not every situation needs polished code though. This is a great example of how with relatively little Python experience, anyone can save time at their job.

Thank for reading

Top Python Development Companies | Hire Python Developers

Top Python Development Companies | Hire Python Developers

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When...

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When you look for the developer in hurry you may forget to take note of review and ratings of the company's aspects, but we at TopDevelopers have done a clear analysis of these top reviewed Python development companies listed here and have picked the best ones for you.

List of Best Python Web Development Companies & Expert Python Programmers.

Guide to Python Programming Language

Guide to Python Programming Language

Guide to Python Programming Language

Description
The course will lead you from beginning level to advance in Python Programming Language. You do not need any prior knowledge on Python or any programming language or even programming to join the course and become an expert on the topic.

The course is begin continuously developing by adding lectures regularly.

Please see the Promo and free sample video to get to know more.

Hope you will enjoy it.

Basic knowledge
An Enthusiast Mind
A Computer
Basic Knowledge To Use Computer
Internet Connection
What will you learn
Will Be Expert On Python Programming Language
Build Application On Python Programming Language

Python Programming Tutorials For Beginners

Python Programming Tutorials For Beginners

Python Programming Tutorials For Beginners

Description
Hello and welcome to brand new series of wiredwiki. In this series i will teach you guys all you need to know about python. This series is designed for beginners but that doesn't means that i will not talk about the advanced stuff as well.

As you may all know by now that my approach of teaching is very simple and straightforward.In this series i will be talking about the all the things you need to know to jump start you python programming skills. This series is designed for noobs who are totally new to programming, so if you don't know any thing about

programming than this is the way to go guys Here is the links to all the videos that i will upload in this whole series.

In this video i will talk about all the basic introduction you need to know about python, which python version to choose, how to install python, how to get around with the interface, how to code your first program. Than we will talk about operators, expressions, numbers, strings, boo leans, lists, dictionaries, tuples and than inputs in python. With

Lots of exercises and more fun stuff, let's get started.

Download free Exercise files.

Dropbox: https://bit.ly/2AW7FYF

Who is the target audience?

First time Python programmers
Students and Teachers
IT pros who want to learn to code
Aspiring data scientists who want to add Python to their tool arsenal
Basic knowledge
Students should be comfortable working in the PC or Mac operating system
What will you learn
know basic programming concept and skill
build 6 text-based application using python
be able to learn other programming languages
be able to build sophisticated system using python in the future

To know more: