Upgrading from Python 2 to Python 3 seamless one and simply

Introduction

There is a good chance that you might still be working on a Python 2 product or testing Python 2 code. If you are, then you might also keep seeing the deprecation message (shown below) as a reminder while working with python 2 or pip.

This document is going to provide tips and tricks while upgrading to Python 3 along with common problems encountered during the Couchbase test infra migration process.

Porting Process

At a high level, the porting is a three step process. 1) Auto conversion 2) Manual changes 3) Runtime validation and fix

At first, clone the original repository and have the basic automatic conversion changes. Checkin the changes as a new repository until full conversion is done. This way, the current regression cycles can go without interruption.

1. Auto conversion

There is an automated tool called 2to3 tool, provided by Python 3 team that helps in taking care of a few common patterns like print, exception, list wrapping, relative imports etc.

You can start with a single directory in the locally cloned workspace to do double check. Later, the conversion can be done entirely on entire code so that basic porting is taken care.

Below are some of the sample 2to3 conversion commands on the MacOS. In the last command, note that all idioms were applied. This way, the first time conversion can take care of key changes.

(myenv) jmunta-mac:myscripts jagadeshmunta$ 2to3 . -o new -n -w .
hq-mac:testrunner jagadeshmunta$ cd lib; mv lib/couchbase_helper ../couchbase_helper
hq-mac:testrunner jagadeshmunta$ 2to3 -f all -f buffer -f idioms -f set_literal -f ws_comma  -n -o ~/p3testrunner_3 -w . |tee ~/2to3_3.txt
hq-mac:testrunner jagadeshmunta$ time 2to3 -f all -f buffer -f idioms -f set_literal -f ws_comma  -n -w . |tee ~/2to3_4.txt
$ 2to3 -f all -f buffer -f idioms -f set_literal -f ws_comma  -n -o ~/p3testrunner_helper -w ../couchbase_helper |tee ~/2to3_helper.txt
cp -R ~/p3testrunner_helper/* .

2. Manual changes

The auto conversion doesn’t do the complete porting. The below common problems might be experienced during the porting process than the common syntax changes done by the auto conversion 2to3 tool.

Run the test class and see if any errors and fix appropriately whether to switch from bytes to str or str to bytes or some sort/comparison issue where one has to fix the key name in the sorted function. This is an iterative process until all the code runtime has been validated.

Once a common pattern for sure is clear, then you can do grep and sed to replace across many class files. If you are not sure on other code until runtime, then defer until that test class is executed.

There might be issues with third party libraries/modules might have changed, those need to be searched on the web and use appropriately.

Make sure all the code path is covered by running across all supported platforms and parameters.

3. Runtime Validation and Fix

Once the conversion is done, then perform a lot of code runtime as Python is a dynamic language. Otherwise, the changes can break the things if you do just visual static code inspection/changes. You can start with basic sanity tests, acceptance tests and then select full tests from a single module tests.

Once comfortable, and then go with all other modules one by one. Keep checkin the changes into new repository. In addition, you need to make sure no regressions with ported changes from this new repository by running sanity tests on the newer builds. Also, the validation should include all the supported platforms with Python 3.

Python 3 Ported Code and Status

Below is the new repository for Python 3 ported code until it is merged to the main repository. The plan is to do one cycle of porting or intermediately take the changes from main repo and do manual merge to this.

https://github.com/couchbaselabs/testrunner-py3/

Many common changes were already done but not completed as there might be some other runtime issues. Fixes in common can also be regressed the earlier fixes because of assumptions on input value type conversions. There is still some more ported code to be validated with Python 3 and the effort is in still in progress.

Now, let me show you the common issues happened during the runtime validation. You can use this as a reference when you hit an issue to see if you are having the similar issue. You can apply the same solution and see if it works for you. Any new ideas, you can put in comments.

Common Runtime Problems

1. Problem(s):

You might get some of the below TypeErrors during runtime like str instead of bytes and bytes instead of str
Error#1. TypeError: can’t concat str to bytes
Error#2. TypeError: must be str, not bytes

File "lib/mc_bin_client.py", line 53, in __init__&nbsp;&nbsp;&nbsp;&nbsp;
if msg: supermsg += ":&nbsp; " + str(msg)
TypeError: must be str, not bytes

File "lib/mc_bin_client.py", line 141, in _recvMsg    
response += data
TypeError: must be str, not bytes

Error#3. TypeError: a bytes-like object is required, not ‘str’

File "lib/remote/remote_util.py", line 3038, in log_command_output&nbsp;&nbsp;&nbsp;&nbsp;
if "Warning" in line and "hugepages" in line:
TypeError: a bytes-like object is required, not 'str'

File "lib/tasks/task.py", line 1167, in run_high_throughput_mode&nbsp;&nbsp;&nbsp;&nbsp;
raise Exception(rv["err"])
Exception: a bytes-like object is required, not 'str'

File "lib/mc_bin_client.py", line 936, in _set_vbucket&nbsp;&nbsp;&nbsp;&nbsp;
self.vbucketId = (((zlib.crc32(key)) >> 16) & 0x7fff) & (self.vbucket_count - 1)
TypeError: a bytes-like object is required, not 'str'

File "lib/mc_bin_client.py", line 148, in _recvMsg    
magic = struct.unpack(">B", response[0:1])[0]
TypeError: a bytes-like object is required, not 'str'

File "lib/remote/remote_util.py", line 4560, in check_cmd&nbsp;&nbsp;&nbsp;&nbsp;
if out and command_output in out[0]:
TypeError: a bytes-like object is required, not 'str'

Error#4. TypeError: Cannot mix str and non-str arguments

File "lib/mc_bin_client.py", line 126, in _sendMsg&nbsp;&nbsp;&nbsp;&nbsp;
self.s.send(msg + extraHeader + key + val + extended_meta_data)
TypeError: can't concat str to bytes

File "/usr/lib64/python3.6/urllib/parse.py", line 120, in _coerce_args    
raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments

Solution(s):

See the types of the variables in the statement and use xxx.encode() to get the bytes or xxx.decode() to get the string or use b prefix or use str(). Sometimes, the input might not be unknown and in this case, use try x.encode() except AttributeError: pass

2. Problem(s):

TypeError: root - ERROR - ------->installation failed: a bytes-like object is required, not ‘str’

Solution(s):

In this case, Add b as prefix to the string under comparison or change the byte type to string type. Example: lib/remote/remote_util.py.

if o[0] != b"":                    
   o = o[0].split(b" ")

Surround with try-except to check the exact line causing the error (say above TypeError).

 import traceback                  
 try:  

 ..                  

 except Exception as e:                     
   log.info("{}".format(e))                     
   traceback.print_exc()                     
   exc_type, exc_obj, exc_tb = sys.exc_info()                     
   fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]                     
   print(exc_type, fname, exc_tb.tb_lineno)

The sample output after traceback.print_exec() to see the full stack trace is similar to Java.

Fix with changes to lib/remote/remote_util.py as below.

for line in output:   
	try:   
    	line=line.decode()   
    except AttributeError:   
        pass

3. Problem(s):

File "lib/membase/api/rest_client.py", line 4178, in multiscan_count_for_gsi_index_with_rest&nbsp;&nbsp;&nbsp;&nbsp;
content = content.split("[]")[0]
TypeError: a bytes-like object is required, not 'str'

Solution(s):

content = content.split(b'[]')[0].decode()

4. Problem(s):

AttributeError suite_setUp() or suite_tearDown() are missing for some test suites.

AttributeError: type object 'XDCRAdvFilterTests' has no attribute 'suite_setUp'

Solution(s):

Add the dummy suite_setUp() and suite_tearDown() methods.

11a12,18
> 
>     def suite_setUp(self):
>         print("*** XDCRAdvFilterTests : suite_Setup() ***")
> 
>     def suite_tearDown(self):
>         print("*** XDCRAdvFilterTests : suite_tearDown() ***")
>

5. Problem(s):

File "./testrunner.py", line 416, in main    
result.errors = [(name, e.message)]
AttributeError: 'AttributeError' object has no attribute 'message'

Solution(s):

result.errors = [(name, str(e))]

6. Problem(s):

AttributeError: ‘Transport’ object has no attribute ‘_Thread__stop’

File "./testrunner.py", line 529, in main&nbsp;&nbsp;&nbsp;&nbsp;
t._Thread__stop()
AttributeError: 'Transport' object has no attribute '_Thread__stop'

File "pytests/view/viewquerytests.py", line 45, in stop    
self._Thread__stop()
AttributeError: 'StoppableThread' object has no attribute '_Thread__stop'    

self._stop()
TypeError: 'Event' object is not callable

Solution(s):

There is no direct stopping of a non daemonic thread. But syntax wise, you should use t._stop(). The recommendation is to use the graceful shutdown using a global flag and check in the thread’s run() to break.

7. Problem(s):

Test expirytests.ExpiryTests.test_expired_keys was not found: module ‘string’ has no attribute ‘translate’

Solution(s):

Rewrite with str static methods. There is no old way of getting all chars, so used the earlier code used total set.

**vi lib/membase/api/tap.py **

def buildGoodSet(goodChars=string.printable, badChar='?'):    
143     """Build a translation table that turns all characters not in goodChars    
144     to badChar"""    
145     allChars = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123        456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x9     3\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\x bb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\        xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'    
146     badchars = str.maketrans(allChars, allChars, goodChars)    
147     badchars1=str.translate(allChars,badchars)    
148     rv = str.maketrans(badchars1, badChar * len(badchars1))    
149     return rv

8. Problem(s):

TabError: inconsistent use of tabs and spaces in indentation

File "pytests/security/audittest.py", line 396    
shell = RemoteMachineShellConnection(self.master)                                                    
^
TabError: inconsistent use of tabs and spaces in indentation

Solution(s):

Search for tab characters and replace with a space character.

For the above issue, remove tab characters.

sed -i 's/\t/        /g' pytests/security/audittest.py

9. Problem(s):

File "lib/couchbase_helper/documentgenerator.py", line 83, in __next__    
value = arg[seed % len(arg)]
TypeError: list indices must be integers or slices, not float

File "lib/membase/helper/bucket_helper.py", line 517, in load_some_data    
keys = ["key_%s_%d" % (testuuid, i) for i in range(number_of_buckets)]
TypeError: 'float' object cannot be interpreted as an integer 

File "lib/membase/helper/bucket_helper.py", line 372, in verify_data    
test.assertEqual(value, key, msg='values dont match')
AssertionError: b'key_d918f450-5858-4430-a016-230e1f45bcf9_0' != 'key_d918f450-5858-4430-a016-230e1f45bcf9_0' : values dont match

File "pytests/setgettests.py", line 90, in set_get_test    
self.test.fail("value mismatch for key {0}".format(key))
AssertionError: value mismatch for key 9fcbd36f-e34d-477b-9fc5-0a5d067dff4b

File "pytests/security/auditmain.py", line 320, in returnFieldsDef    
if (isinstance((particulars['mandatory_fields'][items.encode('utf-8')]), dict)):
KeyError: b'bucket_name'

File "lib/tasks/task.py", line 2370, in _check_ddoc_revision    
new_rev_id = self._parse_revision(meta['rev'])
KeyError: 'rev'

Solution(s):

Case sensitiveness issue. Fixed by changing from x_couchbase_meta key to X_Couchbase_Meta

10. Problem(s):

Error#1. TypeError: ‘<’ not supported between instances of ‘dict’ and ‘dict’
Error#2. TypeError: ‘cmp’ is an invalid keyword argument for this function

File "pytests/tuqquery/tuq_dml.py", line 455, in test_insert_with_select    
expected_result = sorted([{bucket.name: {'name': doc['name']}} for doc in values[:num_docs]])
TypeError: '<' not supported between instances of 'dict' and 'dict'

Solution(s):

expected_result = sorted(expected_result,key=(lambda x: x[bucket.name]['name']))

11. Problem(s):

File "pytests/tuqquery/tuq_2i_index.py", line 1057, in test_simple_array_index    
self.assertTrue(sorted(actual_result_within['results']) == sorted(expected_result['results']))
TypeError: '<' not supported between instances of 'dict' and 'dict'

Solution(s):

-                self.assertTrue(sorted(actual_result_within['results']) == sorted(expected_result['results']))
+                self.assertTrue(sorted(actual_result_within['results'], key=(lambda x: x['name'])) == \
+                                sorted(expected_result['results'], key=(lambda x: x['name'])))

12. Problem(s):

File "pytests/tuqquery/tuq.py", line 1221, in _verify_results&nbsp;&nbsp;&nbsp;&nbsp;
self.fail("Results are incorrect.Actual num %s. Expected num: %s.\n" % (len(actual_result), len(expected_result)))
AssertionError: Results are incorrect.Actual num 0\. Expected num: 72.

File "lib/tasks/task.py", line 3638, in filter_emitted_rows    
reverse=descending_set)
TypeError: 'cmp' is an invalid keyword argument for this function

Solution(s):

expected_rows = sorted(self.emitted_rows, key=(lambda x: (x['key'],x['id'])),reverse=descending_set)

13. Problem(s):

 File "lib/tasks/task.py", line 3675, in <listcomp>    
expected_rows = [row for row in expected_rows if row['key'] >= start_key and row['key'] <= end_key]
TypeError: '>=' not supported between instances of 'int' and 'NoneType'

Solution(s):

Here, it should return int as python 3 doesn’t compare automatically as in python 2.

14. Problem(s):

hasattr(items,’iteritems’) doesn’t return true

Solution(s):

@@ -754,7 +755,7 @@ class MemcachedClient(object):         
# If this is a dict, convert it to a pair generator         
collection = self.collection_name(collection) 
-        if hasattr(items, 'iteritems'):
+        if hasattr(items, 'items'):             
items = iter(items.items())

if hasattr(items, 'items'):

15. Problem(s):

 File "lib/crc32.py", line 78, in crc32_hash    
crc = (crc >> 8) ^ crc32tab[int((crc ^ ord(ch)) & 0xff)]
TypeError: ord() expected string of length 1, but int found

Solution(s):

Converted the key to string so that ch is a string instead of int with binary key. See the file.

try: 
  key = key.decode()
except 
  AttributeError: pass

16. Problem(s):

TypeError: ‘FileNotFoundError’ object is not subscriptable

Solution(s):

Changed in Python 3 as FileNotFoundError is not sub-scriptable and instead, use errno attribute, e.errno

File "lib/remote/remote_util.py", line 1714, in create_directory    
if e[0] == 2:
TypeError: 'FileNotFoundError' object is not subscriptable
-            if e[0] == 2:
+            if e.errno == 2:

17. Problem(s):

Traceback (most recent call last):  
File "lib/couchbase_helper/tuq_helper.py", line 521, in run_query_and_verify_result    
self._verify_results(sorted_actual_result, sorted_expected_result)  
File "lib/couchbase_helper/tuq_helper.py", line 114, in _verify_results    
raise Exception(msg)
Exception: The number of rows match but the results mismatch, please check

Solution(s):

The nested dictionary/list comparison was not working because of the earlier sorted function to sort completely is now not available. Use deepdiff module and DeepDiff class to do the comparison

18. Problem(s):

AttributeError: module ‘string’ has no attribute ‘replace’

File "scripts/populateIni.py", line 52, in main    
data[i] = string.replace(data[i], 'dynamic', servers[0])
AttributeError: module 'string' has no attribute 'replace'

Solution(s):

Use direct str variable to replace like below for fixing the issue.

data[i].replace( 'dynamic', servers[0])

19. Problem(s):

TypeError: '>' not supported between instances of 'int' and 'str'

Solution(s):

Use str or int function appropriately.

  if where_clause:
+            where_clause = where_clause.replace('if  t > "', 'if str(t) > "') # to fix the type error between int, str comparison

20. Problem(s):

NameError: name ‘cmp’ is not defined

Solution(s):

Use deepdiff module and DeepDiff class to do object comparison.

21. Problem(s):

  File "lib/couchbase_helper/tuq_helper.py", line 782, in verify_indexes_redistributed    
if cmp(items_count_before_rebalance, items_count_after_rebalance) != 0:
NameError: name 'cmp' is not defined

Solution(s):

Use deepdiff module and DeepDiff class to do object comparison.
-        if cmp(index_state_before_rebalance, index_state_after_rebalance) != 0:
+        if DeepDiff(index_state_before_rebalance, index_sFile "lib/couchbase_helper/documentgenerator.py", line 19, in has_next    
return self.itr < self.end
TypeError: '<' not supported between instances of 'int' and 'str'

Convert str to int as below for the above type error issue.

return int(self.itr) < int(self.end)

Resources

I hope you had a good reading time! Please view this as a quick reference for your Python 3 upgrade rather than complete porting issues resolution. Our intent here is to help you at some level and give you a jump start on the porting process. Please feel free to share if you learned new that can help us. Your feedback is appreciated!

Thank you so much!

#python #webdev #tutorial