The original script made the assumption that only two file types (CSV/JSON) were involved. If one was imported then it would be converted to the second and vice versa.
But what happens when we add a third file type?
Each additional file type would require a function to do the import, then new conversion functions for each supported file type. Deciding on a standard data type for the converter was necessary for scalability.
Now, whenever a new file type is added to the converter, only two functions need to be made — converting to the standard format and converting from the standard format.
Given our design decisions from above, we create a skeleton with the necessary import/export functions and comments to outline the execution.
import os
import csv
import json
import xlsxwriter
from collections import OrderedDict
import math
def main():
# supported file types
# prompt user for file name
# attempt to import based on file name
# prompt user for export file name
# export data
################################
# IMPORT FUNCTIONS
################################
def importJSON(f):
pass
def importCSV(f):
pass
################################
# EXPORT FUNCTIONS
################################
def exportJSON(data,filename):
pass
def exportCSV(data,filename):
pass
#----------
if __name__ == "__main__":
main()
Here’s the main()
function fleshed out:
def main():
# supported file types
supported_file_types = (
"json",
"csv"
)
# prompt user for file name
while True:
filename = input(f"Enter a file to load ({','.join(supported_file_types)}): ").strip()
if os.path.isfile(filename) == False:
print(">> Error, file does not exist")
elif os.path.basename(filename).split(".")[1].lower() not in supported_file_types:
print(">> Error, file type is not supported.")
else:
break
# attempt to import based on file name
print("Attempting to load file...")
try:
f = open(filename)
file_type = os.path.basename(filename).split(".")[1].lower()
if file_type == "json":
imported_data = importJSON(f)
elif file_type == "csv":
imported_data = importCSV(f)
if imported_data is False:
raise Exception(">> Error, could not load file, exiting...")
except Exception as e:
print(e)
else:
print(f">> File loaded, {len(imported_data)} records imported...")
# prompt user for export file name
while True:
export_filename = input(f"Enter the output file name ({','.join(supported_file_types)}): ").strip().lower()
if os.path.basename(export_filename).split(".")[1].lower() not in supported_file_types:
print(">> Error, file type is not supported.")
elif os.path.isfile(export_filename) == True:
print(">> Error, file already exists")
else:
converted_file_type = os.path.basename(export_filename).split(".")[1].lower()
break
# export data
if converted_file_type == "json":
data_size = exportJSON(imported_data, export_filename)
elif converted_file_type == "csv":
data_size = exportCSV(imported_data, export_filename)
if data_size > 1000000:
print(f">> Records exported to: {export_filename} <{math.floor(data_size/1000)/10} MB>")
elif data_size > 0:
print(f">> Records exported to: {export_filename} <{math.floor(data_size/10)/10} KB>")
else:
print(">> Error, no file exported...")
Planning for the future, I added in a tuple to hold the supported file types for easy validation. I chose to use f-string literals for transposing variables because there were not too many insertions and the strings were still easily read when scanning the code.
The import functions are relatively straightforward and unchanged from the original converter, with the exception of some small performance tweaks.
def importJSON(f):
try:
data = json.load(f,object_pairs_hook=OrderedDict)
if isinstance(data,OrderedDict):
data = [data]
return data
except Exception as e:
print(e)
return False
def importCSV(f):
try:
data = list(csv.reader(f))
keys = data[0]
converted = []
for line in data[1:]:
obj = OrderedDict()
for key in keys:
if len(line[key]) > 0:
obj[key] = line[key]
else:
obj[key] = None
converted.append(obj)
return converted
except Exception as e:
print(e)
return False
When importing CSV, I opted for directly iterating over the list instead of using for i in range(len(data))
. Additionally, I chose to use slice notation to remove the first index. When importing JSON, OrderedDicts are used to preserve the arrangement of the key-value pairs in the event we want to import and export in the same file format, effectively creating an identical copy.
As the standard format for this convert is JSON, exporting JSON is very direct. As per the design of this converter, when exporting CSV the first step is to convert from JSON.
def exportJSON(data,filename):
try:
with open(filename, 'w') as outfile:
json.dump(data, outfile,indent=2)
return os.path.getsize(filename)
except Exception as e:
print(e)
return False
def exportCSV(data,filename):
try:
keys = set([
cell
for line in data
for cell in line
])
# map data in each row to key index
converted = []
converted.append(keys)
for line in data:
row = []
for key in keys:
if key in line:
if isinstance(line[key],(list,OrderedDict,dict)):
continue # skip nested data structures
else:
row.append(line[key])
else:
row.append(None)
converted.append(row)
with open(filename, 'w') as outfile:
writer = csv.writer(outfile)
writer.writerows(converted)
return os.path.getsize(filename)
except Exception as e:
print(e)
return False
I like to set an indent value when using json.dump()
to avoid a long one-line dump, which can slow down some editors, like Atom. The CSV export uses list comprehension instead of nested for loops for a nice performance and pythonic improvement.
I was happy to share this with the community. It’s a great example of continual improvement and beginner optimization. Visit the repository to see the final result and keep up to date with new additions to the tool. Feel free to use and contribute to it, or fork it and make it your own.
Repository: https://github.com/jhsu98/data-converter
Thank you !
#Python #Data Science #Programming #Technology #Webdev