Python Functions - Top 5 Mistakes

Functions are a critical component in any programming project. If done correctly, it’s a practical way to write readable and maintainable code. However, when the functions are not declared correctly, your code becomes hard to read and the long-term maintainability is low — even if we assume that you wrote the code that maintains the same project because any programmer can forget what they did.

Given the importance of functions in programming, Python certainly included, I would like to identify the most common mistakes that some Python programmers can make when declaring functions. By knowing these pitfalls, we can implement the corresponding best practices that will not only improve your code’s readability but also make it more maintainable.

1. Improper Function Names

Giving names isn’t something that bothers you only when you have a new baby or pet. You may disagree, but as a programmer, it’ll be a challenging task throughout your career. The challenge comes from the standard that we have for function names, which should be unique, informative, and consistent.

Unique

This is a very straightforward requirement. Like with any Python objects, we use names to identify functions. If we declare functions with the same name, either your Python IDE (Integrated Development Environment, such as PyCharm, Visual Studio Code) will complain or the late comer becomes the winner. Consider the following example. We declared two functions named say_hello, and as you can see, when we called the function, the second one that we declared got called:

>>> # define a function
>>> def say_hello():
...     print('say hello, first function')
...
>>> # define a function having the same name
>>> def say_hello():
...     print('say hello, second function')
... 
>>> say_hello()
say hello, second function

Function names should be unique.

Informative

Functions are written to perform certain defined operations, and thus their names should reflect their duties. If the names can’t explicitly inform us of these duties, we will struggle to understand other people’s programs or our own code that we wrote last month. Being informative means being specific and accurate to the intended purposes of the functions. Consider the following examples:

>>> # too generic, unspecific 
>>> def do_something():
...     print("do something")
... 
>>> # vs. a more specific name
>>> def say_hi():
...     print("say hi")
... 
>>> # not accurately describe its function
>>> def process_numbers(number1, number2):
...     result = number1 * number2
...     return result
... 
>>> # vs. a more accurate description
>>> def multiply_numbers(number1, number2):
...     result = number1 * number2
...     return result
...

Function names should be informative.

Consistent

Python programming encourages modality, which implies that we want to group related classes and functions in certain modules. Within modules and between modules, you want to name your functions consistently. In terms of consistency, we mean that we use the same conventions for particular kinds of objects and operations. Consider the following trivial examples. The first three functions all perform similar operations with two numbers, and thus I use the same format: verb + underscore + numbers. In the custom class Mask, the two functions promotion_price and sales_price have similar name structure, with the first part defining the kind of the price and the second part indicating the nature of the returned value (i.e. a price expressed as a floating-point number).

>>> # functions performing similar operations
>>> def multiply_numbers(number1, number2):
...     return number1 * number2
... 
>>> def add_numbers(number1, number2):
...     return number1 + number2
... 
>>> def divide_numbers(number1, number2):
...     return number1 / number2
... 
>>> # define a custom class
>>> class Mask:
...     def __init__(self, price):
...         self.price = price
...
...     # two functions returning two kinds of prices
...     def promotion_price(self):
...         return self.price * 0.9
...
...     def sales_price(self):
...         return self.price * 0.75
...

Function names should be consistent.

2. Mixed Duties and Excessive Length

Another common mistake is that a particular function has too many mixed duties — a mistake that even some senior programmers can make sometimes if they don’t refactor their programs continuously. In terms of duties for a given function, the best practice is that the function has only one well-defined duty that can be easily reflected by its sensible name.

Another common accompanying symptom of functions with mixed duties is they tend to be excessively long, and thus it’s harder to understand the functions and debug should any bugs arise. Let’s consider the following hypothetical example. We use the popular [pandas](https://pandas.pydata.org/) library for processing our physiological data collected in our experiment. For each subject, we have four sessions of data in the CSV format. We can write a function called process_physio_data that includes all three data processing steps. However, because of the complexity of the data, the function will be over 100 lines of code.

>>> import pandas as pd
>>> 
>>> def process_physio_data(subject_id):
...     # first step, read the related files
...     df0 = pd.read_csv(f'{subject_id}_v1.csv')
...     df1 = pd.read_csv(f'{subject_id}_v2.csv')
...     df2 = pd.read_csv(f'{subject_id}_v3.csv')
...     df3 = pd.read_csv(f'{subject_id}_v4.csv')
...     # the end of first step
...
...     # second, some clean up procedures
...     # 
...     # process these four DataFrames
...     # 50 lines of code here
...     # generate a big DataFrame
...     #
...     # the end of the second step
...     big_df = pd.DataFrame()
...
...     # third, some complex calculations
...     #
...     # process the big DataFrames
...     # 50 lines of code here
...     # generate a small DataFrame
...     #
...     # the end of the third step
...     small_df = pd.DataFrame()
...
...     return small_df
...

Long functions with mixed duties.

Instead of writing this long function, we could write the following functions that can better show the steps involved in processing these data. As shown in the code snippet below, we create three helper functions that are respectively responsible for the three steps. Notably, each of these functions has exactly one duty. The updated process_physio_data function is thinner and clearer, and its only duty is to provide a pipeline to process the physiological data. With this refactoring of these jobs, the overall readability of the code is much improved.

>>> import pandas as pd
>>> 
>>> # the helper function that reads the data
>>> def read_physio_data(subject_id):
...     df0 = pd.read_csv(f'{subject_id}_v1.csv')
...     df1 = pd.read_csv(f'{subject_id}_v2.csv')
...     df2 = pd.read_csv(f'{subject_id}_v3.csv')
...     df3 = pd.read_csv(f'{subject_id}_v4.csv')
...     return [df0, df1, df2, df3]
... 
>>> # the helper function that cleans up the data
>>> def clean_physio_data(dfs):
...     # all the 50 lines of code for data clean up
...     big_df = pd.DataFrame()
...     return big_df
... 
>>> # the helper function that calculates the data
>>> def calculate_physio_data(df):
...     # all the 50 lines of code for data calculation
...     small_df = pd.DataFrame()
...     return small_df
...
>>> # updated function
>>> def process_physio_data(subject_id):
...     # first step, reading
...     dfs = read_physio_data(subject_id)
...     # second step, cleaning
...     big_df = clean_physio_data(dfs)
...     # third step, calculation
...     small_df = pd.DataFrame()
...     
...     return small_df
...

Short functions with dedicated single duties.

3. No Documentation

This is a common mistake from which programmers have to learn their lesson in the long term. At the surface, it seems perfectly fine that your code still runs as it’s supposed to — even if you don’t have any documentation. For example, when you’re working on a single project continuously throughout a few weeks, you know exactly what you’re doing with each function. However, when you have a need to revisit your code to update some features, how much time will you have to spend figuring out what you did? I learned the lesson the hard way and you have probably had similar experiences.

A lack of documentation can be a bigger problem in a teamwork environment where people share APIs or when you’re making open-source libraries. When we use others’ functions, especially complicated ones, we don’t know the specific operations within the functions. However, we simply need to read the documentation to know how to call the function and what the expected return value is. Can you imagine if none of the libraries or frameworks that your work relies on had any documentation?

I’m not saying that you should write extensive docstrings for your functions. I believe that if you have followed the naming standards to keep your function names unique, informative, and consistent and each of your functions performs just one duty and has proper length, you don’t need to write too much documentation for your functions. However, if you work as part of a big team either in a corporate or open-source community, you have to implement standard documentation conventions for your own benefit and for others too.

4. Incorrect Use of Default Values

When we write functions, Python allows us to set some default values to certain arguments. Many built-in functions use this feature too. Consider the example below. We can create a list object using the range() function, which has the general syntax range(start, stop, step). When it’s omitted, the default step argument will use one. However, we can explicitly set the step argument (say, 2) in the code below:

>>> # range function using the default step 1
>>> list(range(5, 15))
[5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
>>> # explicitly set the step to be 2
>>> list(range(5, 15, 2))
[5, 7, 9, 11, 13]

Range function with default arguments.

However, things can become tricky when we write functions involving default values with mutable data types. When we say mutable data, we mean that the Python objects can be changed after their creation, such as lists, dictionaries, and sets. Consider the following trivial example regarding the use of default values with mutable arguments in functions:

>>> # define a function invovling the default value for a list
>>> def append_score(score, scores=[]):
...     scores.append(score)
...     print(scores)
... 
>>> append_score(98)
[98]
>>> append_score(92, [100, 95])
[100, 95, 92]
>>> append_score(94)
[98, 94]

Functions with default mutable data arguments.

When we’re trying to append the score 98, the expected outcome is printed because we omit the scores argument and we’re expecting the empty list to be used. When we’re trying to append the score 92 to a list of [100, 95], the outcome [100, 95, 92] is also as expected. However, when we’re trying to append the score 94, some of us may expect the outcome to be [94], but it’s not the case. Why can that happen?

It’s all because functions in Python are also first-class citizens and considered regular objects (see my previous article about functions being objects in Python). The implication is that when a function is defined, an object is created, including the function’s default variables. Let’s see a code snippet about these concepts:

>>> # updated function to show the id for the scores
>>> def append_score(score, scores=[]):
...     scores.append(score)
...     print(f'scores: {scores} & id: {id(scores)}')
... 
>>> append_score.__defaults__
([],)
>>> id(append_score.__defaults__[0])
4650019968
>>> append_score(95)
scores: [95] & id: 4650019968
>>> append_score(98)
scores: [95, 98] & id: 4650019968

Track default mutable data arguments.

We modify the previous function, allowing it to output the memory address for the scores list. As you can see, before we call the function, we’re able to find out the default values of the function’s argument and its memory address accessing the __default__ attribute. After calling the function twice, the same list object with the same memory address has been updated.

What’s the best practice then? We should use None as the default value for the mutable data type, such that the function doesn’t instantiate the mutable object when the function is declared. When the function is called, we can create the mutable object as applicable. See the code below for additional information. Now, everything works as expected:

>>> # use None as the default value
>>> def append_score(score, scores=None):
...     if not scores:
...         scores = []
...     scores.append(score)
...     print(scores)
... 
>>> append_score(98)
[98]
>>> append_score(92, [100, 95])
[100, 95, 92]
>>> append_score(94)
[94]

Use None for default mutable data arguments.

5. Abuse of *args & **kargs

Python allows us to write flexible functions by supporting variable numbers of arguments. If you recall, you must have seen *args and **kargs somewhere in the documentation of certain libraries. In essence, *args refers to an undetermined number of positional arguments, while **kargs refers to an undetermined number of keyword arguments.

In Python, positional arguments are arguments that are passed based on their positions, while keyword arguments are arguments that are passed based on their specified keywords. Let’s see a trivial example below:

>>> # a function involving both positional and keyword arguments
>>> def add_numbers(num0, num1, num2=2, num3=3):
...     outcome = num0 + num1 + num2 + num3
...     print(f"num0={num0}, num1={num1}, num2={num2}, num3={num3}")
...     return outcome
... 
>>> add_numbers(0, 1)
num0=0, num1=1, num2=2, num3=3
6
>>> add_numbers(0, 1, num3=4, num2=5)
num0=0, num1=1, num2=5, num3=4
10

Positional and keyword arguments.

In the function add_numbers, num0 and num1 are positional arguments, while num2 and num3 are keyword arguments. One thing to note is that you can change the order between keyword arguments but not between positional and keyword arguments. Let’s take it a step further by looking at how *args and **kargs work. It’s best to learn them with an example. Two things to note:

  1. The variable number of positional arguments is handled as a tuple, and thus we can unpack it with one asterisk

  2. The variable number of keyword arguments is handled as a dictionary, and thus we can unpack it with two asterisks

>>> # function with *args
>>> def show_numbers(*numbers):
...     print(f'type: {type(numbers)}')
...     print(f'list from *args: {numbers}')
... 
>>> show_numbers(1, 2, 3)
type: <class 'tuple'>
list from *args: (1, 2, 3)
>>> 
>>> # function with **kargs
>>> def show_scores(**scores):
...     print(f'type: {type(scores)}')
...     print(f'list from **kargs: {scores}')
... 
>>> show_scores(a=1, b=2, c=3)
type: <class 'dict'>
list from **kargs: {'a': 1, 'b': 2, 'c': 3}

*args and **kargs in Python functions.

Although the availability of *args and **kargs allows us to write more flexible Python functions, the abuse of them can lead to some confusion with your function. Earlier, I mentioned that we can use the pandas library for data manipulation and briefly mentioned the read_csv function, which reads a CSV file. Do you know how many arguments this function can take? Let’s see its official documentation:

pandas.read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal: str = '.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)

If you count them, you’ll find out that the total number of arguments is 49 — one positional and 48 keyword arguments. Theoretically, we can make the list shorter by doing this:

pandas.read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], **kargs)

However, in the actual implementation of this function, we’ll still have to unpack the **kargs and figure out how to read the CSV file correctly. Why were these seasoned Python developers willing to list all these keyword arguments? It’s all because they understand the following principle:

Although using **kargs could save us some writing for the first line of our function declaration, the cost is that our code becomes less explicit. The same idea also applies to the use of *args. As mentioned above, if we work in a code-sharing environment, we always want our code to be explicit and thus easier to understand. Therefore, whenever possible, we want to avoid using *args and **kargs to write more explicit code.

In this article, we reviewed five common mistakes that Python programmers can make in their code. Although you can have your own style of coding by overlooking these mistakes in your projects, your code can become hard to understand and result in low long-term maintainability. Therefore, if possible, we all may want to avoid these mistakes and facilitate code readability and thereby shareability.

Thank you!

#python #artificial intelligence #programming

Python Functions - Top 5 Mistakes
10.45 GEEK