Designing Intelligent Python Dictionaries

Last week while working on a hobby project, I encountered a very interesting design problem:
How do you deal with wrong user input?

Let me explain.

This is image title

Dictionaries in Python represent pairs of keys and values. For example:

student_grades = {'John': 'A', 'Mary': 'C', 'Rob': 'B'}
# To check grade of John, we call
print(student_grades['John'])
# Output: A

What happens when you try to access a key which is not present?

print(student_grades['Maple'])
# Output: 
KeyError                         Traceback (most recent call last)
<ipython-input-6-51fec14f477a> in <module>
----> print(student_grades['Maple'])

KeyError: 'Maple'

You receive a KeyError.

KeyError occurs whenever a dict() object is requested value for a key that is not present in the dictionary.

This error becomes extremely common when you take user input. For example:

student_name = input("Please enter student name: ")
print(student_grades[student_name])

This tutorial provides several ways in which we can deal with key errors in Python Dictionaries.

We will work our way towards building an intelligent python dictionary that can deal with a variety of typos in user input.

Setting a Default Value

A very lazy method would be to return a default value whenever the requested key is not present. This can be done using the get() method:

default_grade = 'Not Available'
print(student_grades.get('Maple',default_grade))
# Output:
# Not Available

You can read more about the get() method here.

Dealing with Letter Case

Let’s suppose you have a dictionary containing country-specific population data. The code will ask the user for a country name and would print its population.

# population in millions. (Source: https://www.worldometers.info/world-population/population-by-country/)
population_dict = {'China':1439, 'India':1380, 'USA':331, 'France':65,'Germany':83, 'Spain':46}

# getting user input
Country_Name = input('Please enter Country Name: ')

# access population using country name from dict
print(population_dict[Country_Name])

dict_1.py

# Output
Please enter Country Name: France
65

But, let’s say the user types input as ‘france’. Currently, in our dictionary all keys have first letter in Capital. What will be the output?

Please enter Country Name: france
-----------------------------------------------------------------KeyError                         Traceback (most recent call last)
<ipython-input-6-51fec14f477a> in <module>
      2 Country_Name = input('Please enter Country Name: ')
      3 
----> 4 print(population_dict[Country_Name])

KeyError: 'france'

As ‘france’ is not a key in the dictionary, we receive an error.

A simple workaround: store all country names in lower-case letters.

Also, convert whatever input the user types to lower-case.

# keys (Country Names) are now all lowercase
population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}
Country_Name = input('Please enter Country Name: ').lower() # lowercase input

print(population_dict[Country_Name])

dict_3.py

Please enter Country Name: france
65

Dealing with Typos

But, now let’s say the user enters ‘Frrance’ instead of ‘France’. How can we deal with this?

One way would be to use conditional statements.

We check if the given user_input is available as a key. If it is not available, then we print a message.

It’s best to put this in a loop and break on a special flag input like exit.

population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}

while(True):
    
    Country_Name = input('Please enter Country Name(type exit to close): ').lower()
    
    # break from code if user enters exit
    if Country_Name == 'exit':
        break

    if Country_Name in population_dict.keys():
        print(population_dict[Country_Name])
    else:
        print("Please check for any typos. Data not Available for ",Country_Name)

dict_4.py

The loop will run in continuation until the user enters exit .
This is image title

A Better Approach

While the above method ‘works’, it’s not the ‘intelligent method’ that we promised in the intro.

We want our program to be robust, and to detect simple typos like frrance and chhina (very similar to google search).

After some research, I was able to find a couple of libraries that could suit our purpose. My favorite is the standard python library: difflib.
difflib can be used to compare files, strings, lists etc and produce difference information in various formats.

The module provides a variety of classes and functions for comparing sequences.

We will use two features from difflib: SequenceMatcher and get_close_matches.

Let’s take a brief look at both of them. You can skip to the next section if you are only curious about the application.

# SequenceMatcher

SequenceMatcher class is used to compare two sequences. We define its object as follows:

difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)

isjunk : used to specify junk elements(white-spaces, newlines, etc.) that we wish to ignore while comparing two blocks of text. We pass None here.
a and b: strings that we wish to compare.
autojunk : a heuristic that automatically treats certain sequence items as junk.

Let’s use SequenceMatcher to compare two strings chinna and china:

from difflib import SequenceMatcher # import

# creating a SequenceMatcher object comparing two strings
check = SequenceMatcher(None, 'chinna', 'china')

# printing a similarity ratio on a scale of 0(lowest) to 1(highest)
print(check.ratio()) 
# Output
# 0.9090909090909091

dict_5.py
In the code above, we used the ratio() method.

ratio returns a measure of the sequences’ similarity as a float in the range [0, 1].

# get_close_matches

Now, we have a way of comparing two strings based on similarity.

But, what happens if we wish to find all the strings(stored in a database) that are similar to a particular string.

get_close_matches()returns a list containing the best matches from a list of possibilities.

difflib.get_close_matches(word, possibilities, n=3, cutoff=0.6)

word: String for which matches are required.
possibilities: List of strings against which to match word.
Optional n: Max number of close matches to return. By default, 3; must be greater than 0.
Optional cutoff: Similarity ratio must be higher than this value. By default, 0.6.

The best n matches among the possibilities are returned in a list, sorted by similarity score, most similar first.

Let’s take a look at an example:

from difflib import get_close_matches

print(get_close_matches("chinna", ['china','france','india','usa']))
# Output
# ['china']

dict_6.py

Putting it all together

Now that we have the difflib at our disposal, let’s bring everything together and build a typo-proof python dictionary.

We have to focus on the case when the Country_name given by the user is not present in population_dict.keys() . In this case, we try to find a country with a similar name to user input and output its population.

# pass country_name in word and dict keys in possibilities
maybe_country = get_close_matches(Country_Name, population_dict.keys())
# Then we pick the first(most similar) string from the returned list
print(population_dict[maybe_country[0]])

The final code will need to account for some other cases. For example, if there is no similar string or confirming from user if this is the string that they require. Take a look:

from difflib import get_close_matches
population_dict = {'china':1439, 'india':1380, 'usa':331, 'france':65,'germany':83, 'spain':46}

while(True):
    
    Country_Name = input('Please enter Country Name(type exit to close): ').lower()
    
    # break from code if user enters exit
    if Country_Name == 'exit':
        break

    if Country_Name in population_dict.keys():
        print(population_dict[Country_Name])
    else:
        # look for similar strings
        maybe_country = get_close_matches(Country_Name, population_dict.keys())
        if maybe_country == []:  # no similar string
            print("Please check for any typos. Data not Available for ",Country_Name)
        else:
            # user confirmation
            ans = input("Do you mean %s? Type y or n."% maybe_country[0])
            if ans == 'y':
                # if y, return population
                print(population_dict[maybe_country[0]])
            else:
                # if n, start again
                print("Bad input. Try again.")

dict_7.py

Output:

This is image title

Conclusion

The goal of this tutorial was to provide you with a guide towards building dictionaries that are robust to user input.

We looked at ways to deal with a variety of errors like type-case errors and small typos.

We can build further on this and look at a variety of other applications. Example: Using NLPs to better understand user input and bring nearby results in search engines.

#python #programming