1640235600
ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.
The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee
codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.
Huffman trees highly inspire zee
codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!
ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.
pip install -U zcode
You can run the algorithm for any utf-8
encoded file using the zcode
command. It will automatically decompress files ending with a .zee
extensions and compress others into .zee
files, but you can always override the default behavior by providing optional arguments like:
zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]
The symbol-size
argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size
controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).
MIT LICENSE, see vahidzee/zcode/LICENSE
Author: vahidzee
Source Code: https://github.com/vahidzee/zcode
License: MIT License
1650960540
Python has a set of built-in methods that you can use on strings.
Note: All string methods returns new values. They do not change the original string.
Method | Description |
---|---|
capitalize() | Converts the first character to upper case |
casefold() | Converts string into lower case |
center() | Returns a centered string |
count() | Returns the number of times a specified value occurs in a string |
encode() | Returns an encoded version of the string |
endswith() | Returns true if the string ends with the specified value |
expandtabs() | Sets the tab size of the string |
find() | Searches the string for a specified value and returns the position of where it was found |
format() | Formats specified values in a string |
format_map() | Formats specified values in a string |
index() | Searches the string for a specified value and returns the position of where it was found |
isalnum() | Returns True if all characters in the string are alphanumeric |
isalpha() | Returns True if all characters in the string are in the alphabet |
isascii() | Returns True if all characters in the string are ascii characters |
isdecimal() | Returns True if all characters in the string are decimals |
isdigit() | Returns True if all characters in the string are digits |
isidentifier() | Returns True if the string is an identifier |
islower() | Returns True if all characters in the string are lower case |
isnumeric() | Returns True if all characters in the string are numeric |
isprintable() | Returns True if all characters in the string are printable |
isspace() | Returns True if all characters in the string are whitespaces |
istitle() | Returns True if the string follows the rules of a title |
isupper() | Returns True if all characters in the string are upper case |
join() | Converts the elements of an iterable into a string |
ljust() | Returns a left justified version of the string |
lower() | Converts a string into lower case |
lstrip() | Returns a left trim version of the string |
maketrans() | Returns a translation table to be used in translations |
partition() | Returns a tuple where the string is parted into three parts |
replace() | Returns a string where a specified value is replaced with a specified value |
rfind() | Searches the string for a specified value and returns the last position of where it was found |
rindex() | Searches the string for a specified value and returns the last position of where it was found |
rjust() | Returns a right justified version of the string |
rpartition() | Returns a tuple where the string is parted into three parts |
rsplit() | Splits the string at the specified separator, and returns a list |
rstrip() | Returns a right trim version of the string |
split() | Splits the string at the specified separator, and returns a list |
splitlines() | Splits the string at line breaks and returns a list |
startswith() | Returns true if the string starts with the specified value |
strip() | Returns a trimmed version of the string |
swapcase() | Swaps cases, lower case becomes upper case and vice versa |
title() | Converts the first character of each word to upper case |
translate() | Returns a translated string |
upper() | Converts a string into upper case |
zfill() | Fills the string with a specified number of 0 values at the beginning |
Upper case the first letter in this sentence:
txt = "hello, and welcome to my world."
x = txt.capitalize()
print (x)
The capitalize()
method returns a string where the first character is upper case, and the rest is lower case.
string.capitalize()
No parameters
The first character is converted to upper case, and the rest are converted to lower case:
txt = "python is FUN!"
x = txt.capitalize()
print (x)
See what happens if the first character is a number:
txt = "36 is my age."
x = txt.capitalize()
print (x)
Make the string lower case:
txt = "Hello, And Welcome To My World!"
x = txt.casefold()
print(x)
The casefold()
method returns a string where all the characters are lower case.
This method is similar to the lower()
method, but the casefold()
method is stronger, more aggressive, meaning that it will convert more characters into lower case, and will find more matches when comparing two strings and both are converted using the casefold()
method.
string.casefold()
No parameters
Print the word "banana", taking up the space of 20 characters, with "banana" in the middle:
txt = "banana"
x = txt.center(20)
print(x)
The center()
method will center align the string, using a specified character (space is default) as the fill character.
string.center(length, character)
Parameter | Description |
---|---|
length | Required. The length of the returned string |
character | Optional. The character to fill the missing space on each side. Default is " " (space) |
Using the letter "O" as the padding character:
txt = "banana"
x = txt.center(20, "O")
print(x)
Return the number of times the value "apple" appears in the string:
txt = "I love apples, apple are my favorite fruit"
x = txt.count("apple")
print(x)
The count()
method returns the number of times a specified value appears in the string.
string.count(value, start, end)
Parameter | Description |
---|---|
value | Required. A String. The string to value to search for |
start | Optional. An Integer. The position to start the search. Default is 0 |
end | Optional. An Integer. The position to end the search. Default is the end of the string |
Search from position 10 to 24:
txt = "I love apples, apple are my favorite fruit"
x = txt.count("apple", 10, 24)
print(x
UTF-8 encode the string:
txt = "My name is Ståle"
x = txt.encode()
print(x)
The encode()
method encodes the string, using the specified encoding. If no encoding is specified, UTF-8 will be used.
string.encode(encoding=encoding, errors=errors)
Parameter | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
encoding | Optional. A String specifying the encoding to use. Default is UTF-8 | ||||||||||||
errors | Optional. A String specifying the error method. Legal values are:
|
These examples uses ascii encoding, and a character that cannot be encoded, showing the result with different errors:
txt = "My name is Ståle"
print(txt.encode(encoding="ascii",errors="backslashreplace"))
print(txt.encode(encoding="ascii",errors="ignore"))
print(txt.encode(encoding="ascii",errors="namereplace"))
print(txt.encode(encoding="ascii",errors="replace"))
print(txt.encode(encoding="ascii",errors="xmlcharrefreplace"))
Check if the string ends with a punctuation sign (.):
txt = "Hello, welcome to my world."
x = txt.endswith(".")
print(x)
The endswith()
method returns True if the string ends with the specified value, otherwise False.
string.endswith(value, start, end)
Parameter | Description |
---|---|
value | Required. The value to check if the string ends with |
start | Optional. An Integer specifying at which position to start the search |
end | Optional. An Integer specifying at which position to end the search |
Check if the string ends with the phrase "my world.":
txt = "Hello, welcome to my world."
x = txt.endswith("my world.")
print(x)
Check if position 5 to 11 ends with the phrase "my world.":
txt = "Hello, welcome to my world."
x = txt.endswith("my world.", 5, 11)
print(x)
Set the tab size to 2 whitespaces:
txt = "H\te\tl\tl\to"
x = txt.expandtabs(2)
print(x)
The expandtabs()
method sets the tab size to the specified number of whitespaces.
string.expandtabs(tabsize)
Parameter | Description |
---|---|
tabsize | Optional. A number specifying the tabsize. Default tabsize is 8 |
See the result using different tab sizes:
txt = "H\te\tl\tl\to"
print(txt)
print(txt.expandtabs())
print(txt.expandtabs(2))
print(txt.expandtabs(4))
print(txt.expandtabs(10))
Where in the text is the word "welcome"?:
txt = "Hello, welcome to my world."
x = txt.find("welcome")
print(x)
The find()
method finds the first occurrence of the specified value.
The find()
method returns -1 if the value is not found.
The find()
method is almost the same as the index()
method, the only difference is that the index()
method raises an exception if the value is not found. (See example below)
string.find(value, start, end)
Parameter | Description |
---|---|
value | Required. The value to search for |
start | Optional. Where to start the search. Default is 0 |
end | Optional. Where to end the search. Default is to the end of the string |
Where in the text is the first occurrence of the letter "e"?:
txt = "Hello, welcome to my world."
x = txt.find("e")
print(x)
Where in the text is the first occurrence of the letter "e" when you only search between position 5 and 10?:
txt = "Hello, welcome to my world."
x = txt.find("e", 5, 10)
print(x)
If the value is not found, the find() method returns -1, but the index() method will raise an exception:
txt = "Hello, welcome to my world."
print(txt.find("q"))
print(txt.index("q"))
Insert the price inside the placeholder, the price should be in fixed point, two-decimal format:
txt = "For only {price:.2f} dollars!"
print(txt.format(price = 49))
The format()
method formats the specified value(s) and insert them inside the string's placeholder.
The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below.
The format()
method returns the formatted string.
string.format(value1, value2...)
Parameter | Description |
---|---|
value1, value2... | Required. One or more values that should be formatted and inserted in the string. The values are either a list of values separated by commas, a key=value list, or a combination of both. The values can be of any data type. |
The placeholders can be identified using named indexes {price}
, numbered indexes {0}
, or even empty placeholders {}
.
Using different placeholder values:
txt1 = "My name is {fname}, I'm {age}".format(fname = "John", age = 36)
txt2 = "My name is {0}, I'm {1}".format("John",36)
txt3 = "My name is {}, I'm {}".format("John",36)
Inside the placeholders you can add a formatting type to format the result:
:< | Left aligns the result (within the available space) | |
:> | Right aligns the result (within the available space) | |
:^ | Center aligns the result (within the available space) | |
:= | Places the sign to the left most position | |
:+ | Use a plus sign to indicate if the result is positive or negative | |
:- | Use a minus sign for negative values only | |
: | Use a space to insert an extra space before positive numbers (and a minus sign before negative numbers) | |
:, | Use a comma as a thousand separator | |
:_ | Use a underscore as a thousand separator | |
:b | Binary format | |
:c | Converts the value into the corresponding unicode character | |
:d | Decimal format | |
:e | Scientific format, with a lower case e | |
:E | Scientific format, with an upper case E | |
:f | Fix point number format | |
:F | Fix point number format, in uppercase format (show inf and nan as INF and NAN ) | |
:g | General format | |
:G | General format (using a upper case E for scientific notations) | |
:o | Octal format | |
:x | Hex format, lower case | |
:X | Hex format, upper case | |
:n | Number format | |
:% | Percentage format |
Where in the text is the word "welcome"?:
txt = "Hello, welcome to my world."
x = txt.index("welcome")
print(x)
The index()
method finds the first occurrence of the specified value.
The index()
method raises an exception if the value is not found.
The index()
method is almost the same as the find()
method, the only difference is that the find()
method returns -1 if the value is not found. (See example below)
string.index(value, start, end)
Parameter | Description |
---|---|
value | Required. The value to search for |
start | Optional. Where to start the search. Default is 0 |
end | Optional. Where to end the search. Default is to the end of the string |
Where in the text is the first occurrence of the letter "e"?:
txt = "Hello, welcome to my world."
x = txt.index("e")
print(x)
Where in the text is the first occurrence of the letter "e" when you only search between position 5 and 10?:
txt = "Hello, welcome to my world."
x = txt.index("e", 5, 10)
print(x)
If the value is not found, the find() method returns -1, but the index() method will raise an exception:
txt = "Hello, welcome to my world."
print(txt.find("q"))
print(txt.index("q"))
Check if all the characters in the text are alphanumeric:
txt = "Company12"
x = txt.isalnum()
print(x)
The isalnum()
method returns True if all the characters are alphanumeric, meaning alphabet letter (a-z) and numbers (0-9).
Example of characters that are not alphanumeric: (space)!#%&? etc.
string.isalnum()
No parameters.
Check if all the characters in the text is alphanumeric:
txt = "Company 12"
x = txt.isalnum()
print(x)
Check if all the characters in the text are letters:
txt = "CompanyX"
x = txt.isalpha()
print(x)
The isalpha()
method returns True if all the characters are alphabet letters (a-z).
Example of characters that are not alphabet letters: (space)!#%&? etc.
string.isalpha()
No parameters.
Check if all the characters in the text is alphabetic:
txt = "Company10"
x = txt.isalpha()
print(x)
Check if all the characters in the text are ascii characters:
txt = "Company123"
x = txt.isascii()
print(x)
The isascii()
method returns True if all the characters are ascii characters (a-z).
string.isascii()
No parameters.
Check if all the characters in the unicode object are decimals:
txt = "\u0033" #unicode for 3
x = txt.isdecimal()
print(x)
The isdecimal()
method returns True if all the characters are decimals (0-9).
This method is used on unicode objects.
string.isdecimal()
No parameters.
Check if all the characters in the unicode are decimals:
a = "\u0030" #unicode for 0
b = "\u0047" #unicode for G
print(a.isdecimal())
print(b.isdecimal())
Check if all the characters in the text are digits:
txt = "50800"
x = txt.isdigit()
print(x)
The isdigit()
method returns True if all the characters are digits, otherwise False.
Exponents, like ², are also considered to be a digit.
string.isdigit()
No parameters.
Check if all the characters in the text are digits:
a = "\u0030" #unicode for 0
b = "\u00B2" #unicode for ²
print(a.isdigit())
print(b.isdigit())
Check if the string is a valid identifier:
txt = "Demo"
x = txt.isidentifier()
print(x)
The isidentifier()
method returns True if the string is a valid identifier, otherwise False.
A string is considered a valid identifier if it only contains alphanumeric letters (a-z) and (0-9), or underscores (_). A valid identifier cannot start with a number, or contain any spaces.
string.isidentifier()
No parameters.
Check if the strings are valid identifiers:
a = "MyFolder"
b = "Demo002"
c = "2bring"
d = "my demo"
print(a.isidentifier())
print(b.isidentifier())
print(c.isidentifier())
print(d.isidentifier())
Check if all the characters in the text are in lower case:
txt = "hello world!"
x = txt.islower()
print(x)
The islower()
method returns True if all the characters are in lower case, otherwise False.
Numbers, symbols and spaces are not checked, only alphabet characters.
string.islower()
No parameters.
Check if all the characters in the texts are in lower case:
a = "Hello world!"
b = "hello 123"
c = "mynameisPeter"
print(a.islower())
print(b.islower())
print(c.islower())
Check if all the characters in the text are numeric:
txt = "565543"
x = txt.isnumeric()
print(x)
The isnumeric()
method returns True if all the characters are numeric (0-9), otherwise False.
Exponents, like ² and ¾ are also considered to be numeric values.
"-1"
and "1.5"
are NOT considered numeric values, because all the characters in the string must be numeric, and the -
and the .
are not.
string.isnumeric()
No parameters.
Check if the characters are numeric:
a = "\u0030" #unicode for 0
b = "\u00B2" #unicode for ²
c = "10km2"
d = "-1"
e = "1.5"
print(a.isnumeric())
print(b.isnumeric())
print(c.isnumeric())
print(d.isnumeric())
print(e.isnumeric())
Check if all the characters in the text are printable:
txt = "Hello! Are you #1?"
x = txt.isprintable()
print(x)
The isprintable()
method returns True if all the characters are printable, otherwise False.
Example of none printable character can be carriage return and line feed.
string.isprintable()
No parameters.
Check if all the characters in the text are printable:
txt = "Hello!\nAre you #1?"
x = txt.isprintable()
print(x)
Check if all the characters in the text are whitespaces:
txt = " "
x = txt.isspace()
print(x)
The isspace()
method returns True if all the characters in a string are whitespaces, otherwise False.
string.isspace()
No parameters.
Check if all the characters in the text are whitespaces:
txt = " s "
x = txt.isspace()
print(x)
Check if each word start with an upper case letter:
txt = "Hello, And Welcome To My World!"
x = txt.istitle()
print(x)
The istitle()
method returns True if all words in a text start with a upper case letter, AND the rest of the word are lower case letters, otherwise False.
Symbols and numbers are ignored.
string.istitle()
No parameters.
Check if each word start with an upper case letter:
a = "HELLO, AND WELCOME TO MY WORLD"
b = "Hello"
c = "22 Names"
d = "This Is %'!?"
print(a.istitle())
print(b.istitle())
print(c.istitle())
print(d.istitle())
Check if all the characters in the text are in upper case:
txt = "THIS IS NOW!"
x = txt.isupper()
print(x)
The isupper()
method returns True if all the characters are in upper case, otherwise False.
Numbers, symbols and spaces are not checked, only alphabet characters.
string.isupper()
No parameters.
Check if all the characters in the texts are in upper case:
a = "Hello World!"
b = "hello 123"
c = "MY NAME IS PETER"
print(a.isupper())
print(b.isupper())
print(c.isupper())
Join all items in a tuple into a string, using a hash character as separator:
myTuple = ("John", "Peter", "Vicky")
x = "#".join(myTuple)
print(x)
The join()
method takes all items in an iterable and joins them into one string.
A string must be specified as the separator.
string.join(iterable)
Parameter | Description |
---|---|
iterable | Required. Any iterable object where all the returned values are strings |
Join all items in a dictionary into a string, using the word "TEST" as separator:
myDict = {"name": "John", "country": "Norway"}
mySeparator = "TEST"
x = mySeparator.join(myDict)
print(x)
Return a 20 characters long, left justified version of the word "banana":
txt = "banana"
x = txt.ljust(20)
print(x, "is my favorite fruit.")
Note: In the result, there are actually 14 whitespaces to the right of the word banana.
The ljust()
method will left align the string, using a specified character (space is default) as the fill character.
string.ljust(length, character)
Parameter | Description |
---|---|
length | Required. The length of the returned string |
character | Optional. A character to fill the missing space (to the right of the string). Default is " " (space). |
Using the letter "O" as the padding character:
txt = "banana"
x = txt.ljust(20, "O")
print(x)
Lower case the string:
txt = "Hello my FRIENDS"
x = txt.lower()
print(x)
The lower()
method returns a string where all characters are lower case.
Symbols and Numbers are ignored.
string.lower()
No parameters
Remove spaces to the left of the string:
txt = " banana "
x = txt.lstrip()
print("of all fruits", x, "is my favorite")
The lstrip()
method removes any leading characters (space is the default leading character to remove)
string.lstrip(characters)
Parameter | Description |
---|---|
characters | Optional. A set of characters to remove as leading characters |
Remove the leading characters:
txt = ",,,,,ssaaww.....banana"
x = txt.lstrip(",.asw")
print(x)
Create a mapping table, and use it in the translate()
method to replace any "S" characters with a "P" character:
txt = "Hello Sam!"
mytable = txt.maketrans("S", "P")
print(txt.translate(mytable))
The maketrans()
method returns a mapping table that can be used with the translate()
method to replace specified characters.
string.maketrans(x, y, z)
Parameter | Description |
---|---|
x | Required. If only one parameter is specified, this has to be a dictionary describing how to perform the replace. If two or more parameters are specified, this parameter has to be a string specifying the characters you want to replace. |
y | Optional. A string with the same length as parameter x. Each character in the first parameter will be replaced with the corresponding character in this string. |
z | Optional. A string describing which characters to remove from the original string. |
Use a mapping table to replace many characters:
txt = "Hi Sam!"
x = "mSa"
y = "eJo"
mytable = txt.maketrans(x, y)
print(txt.translate(mytable))
The third parameter in the mapping table describes characters that you want to remove from the string:
txt = "Good night Sam!"
x = "mSa"
y = "eJo"
z = "odnght"
mytable = txt.maketrans(x, y, z)
print(txt.translate(mytable))
The maketrans()
method itself returns a dictionary describing each replacement, in unicode:
txt = "Good night Sam!"
x = "mSa"
y = "eJo"
z = "odnght"
print(txt.maketrans(x, y, z))
Search for the word "bananas", and return a tuple with three elements:
1 - everything before the "match"
2 - the "match"
3 - everything after the "match"
txt = "I could eat bananas all day"
x = txt.partition("bananas")
print(x)
The partition()
method searches for a specified string, and splits the string into a tuple containing three elements.
The first element contains the part before the specified string.
The second element contains the specified string.
The third element contains the part after the string.
Note: This method searches for the first occurrence of the specified string.
string.partition(value)
Parameter | Description |
---|---|
value | Required. The string to search for |
If the specified value is not found, the partition() method returns a tuple containing: 1 - the whole string, 2 - an empty string, 3 - an empty string:
txt = "I could eat bananas all day"
x = txt.partition("apples")
print(x)
Replace the word "bananas":
txt = "I like bananas"
x = txt.replace("bananas", "apples")
print(x)
The replace()
method replaces a specified phrase with another specified phrase.
Note: All occurrences of the specified phrase will be replaced, if nothing else is specified.
string.replace(oldvalue, newvalue, count)
Parameter | Description |
---|---|
oldvalue | Required. The string to search for |
newvalue | Required. The string to replace the old value with |
count | Optional. A number specifying how many occurrences of the old value you want to replace. Default is all occurrences |
Replace all occurrence of the word "one":
txt = "one one was a race horse, two two was one too."
x = txt.replace("one", "three")
print(x)
Replace the two first occurrence of the word "one":
txt = "one one was a race horse, two two was one too."
x = txt.replace("one", "three", 2)
print(x)
Where in the text is the last occurrence of the string "casa"?:
txt = "Mi casa, su casa."
x = txt.rfind("casa")
print(x)
The rfind()
method finds the last occurrence of the specified value.
The rfind()
method returns -1 if the value is not found.
The rfind()
method is almost the same as the rindex()
method. See example below.
string.rfind(value, start, end)
Parameter | Description |
---|---|
value | Required. The value to search for |
start | Optional. Where to start the search. Default is 0 |
end | Optional. Where to end the search. Default is to the end of the string |
Where in the text is the last occurrence of the letter "e"?:
txt = "Hello, welcome to my world."
x = txt.rfind("e")
print(x)
Where in the text is the last occurrence of the letter "e" when you only search between position 5 and 10?:
txt = "Hello, welcome to my world."
x = txt.rfind("e", 5, 10)
print(x)
If the value is not found, the rfind() method returns -1, but the rindex() method will raise an exception:
txt = "Hello, welcome to my world."
print(txt.rfind("q"))
print(txt.rindex("q"))
Where in the text is the last occurrence of the string "casa"?:
txt = "Mi casa, su casa."
x = txt.rindex("casa")
print(x)
The rindex()
method finds the last occurrence of the specified value.
The rindex()
method raises an exception if the value is not found.
The rindex()
method is almost the same as the rfind()
method. See example below.
string.rindex(value, start, end)
Parameter | Description |
---|---|
value | Required. The value to search for |
start | Optional. Where to start the search. Default is 0 |
end | Optional. Where to end the search. Default is to the end of the string |
Where in the text is the last occurrence of the letter "e"?:
txt = "Hello, welcome to my world."
x = txt.rindex("e")
print(x)
Where in the text is the last occurrence of the letter "e" when you only search between position 5 and 10?:
txt = "Hello, welcome to my world."
x = txt.rindex("e", 5, 10)
print(x)
If the value is not found, the rfind() method returns -1, but the rindex() method will raise an exception:
txt = "Hello, welcome to my world."
print(txt.rfind("q"))
print(txt.rindex("q"))
Return a 20 characters long, right justified version of the word "banana":
txt = "banana"
x = txt.rjust(20)
print(x, "is my favorite fruit.")
Note: In the result, there are actually 14 whitespaces to the left of the word banana.
The rjust()
method will right align the string, using a specified character (space is default) as the fill character.
string.rjust(length, character)
Parameter | Description |
---|---|
length | Required. The length of the returned string |
character | Optional. A character to fill the missing space (to the left of the string). Default is " " (space). |
Using the letter "O" as the padding character:
txt = "banana"
x = txt.rjust(20, "O")
print(x)
Search for the last occurrence of the word "bananas", and return a tuple with three elements:
1 - everything before the "match"
2 - the "match"
3 - everything after the "match"
txt = "I could eat bananas all day, bananas are my favorite fruit"
x = txt.rpartition("bananas")
print(x)
The rpartition()
method searches for the last occurrence of a specified string, and splits the string into a tuple containing three elements.
The first element contains the part before the specified string.
The second element contains the specified string.
The third element contains the part after the string.
string.rpartition(value)
Parameter | Description |
---|---|
value | Required. The string to search for |
If the specified value is not found, the rpartition() method returns a tuple containing: 1 - an empty string, 2 - an empty string, 3 - the whole string:
txt = "I could eat bananas all day, bananas are my favorite fruit"
x = txt.rpartition("apples")
print(x)
Split a string into a list, using comma, followed by a space (, ) as the separator:
txt = "apple, banana, cherry"
x = txt.rsplit(", ")
print(x)
The rsplit()
method splits a string into a list, starting from the right.
If no "max" is specified, this method will return the same as the split()
method.
Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
string.rsplit(separator, maxsplit)
Parameter | Description |
---|---|
separator | Optional. Specifies the separator to use when splitting the string. By default any whitespace is a separator |
maxsplit | Optional. Specifies how many splits to do. Default value is -1, which is "all occurrences" |
Split the string into a list with maximum 2 items:
txt = "apple, banana, cherry"
# setting the maxsplit parameter to 1, will return a list with 2 elements!
x = txt.rsplit(", ", 1)
print(x)
Remove any white spaces at the end of the string:
txt = " banana "
x = txt.rstrip()
print("of all fruits", x, "is my favorite")
The rstrip()
method removes any trailing characters (characters at the end a string), space is the default trailing character to remove.
string.rstrip(characters)
Parameter | Description |
---|---|
characters | Optional. A set of characters to remove as trailing characters |
Remove the trailing characters if they are commas, s, q, or w:
txt = "banana,,,,,ssqqqww....."
x = txt.rstrip(",.qsw")
print(x)
Split a string into a list where each word is a list item:
txt = "welcome to the jungle"
x = txt.split()
print(x)
The split()
method splits a string into a list.
You can specify the separator, default separator is any whitespace.
Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
string.split(separator, maxsplit)
Parameter | Description |
---|---|
separator | Optional. Specifies the separator to use when splitting the string. By default any whitespace is a separator |
maxsplit | Optional. Specifies how many splits to do. Default value is -1, which is "all occurrences" |
Split the string, using comma, followed by a space, as a separator:
txt = "hello, my name is Peter, I am 26 years old"
x = txt.split(", ")
print(x)
Use a hash character as a separator:
txt = "apple#banana#cherry#orange"
x = txt.split("#")
print(x)
Split the string into a list with max 2 items:
txt = "apple#banana#cherry#orange"
# setting the maxsplit parameter to 1, will return a list with 2 elements!
x = txt.split("#", 1)
print(x)
Split a string into a list where each line is a list item:
txt = "Thank you for the music\nWelcome to the jungle"
x = txt.splitlines()
print(x)
The splitlines()
method splits a string into a list. The splitting is done at line breaks.
string.splitlines(keeplinebreaks)
Parameter | Description |
---|---|
keeplinebreaks | Optional. Specifies if the line breaks should be included (True), or not (False). Default value is False |
Split the string, but keep the line breaks:
txt = "Thank you for the music\nWelcome to the jungle"
x = txt.splitlines(True)
print(x)
Check if the string starts with "Hello":
txt = "Hello, welcome to my world."
x = txt.startswith("Hello")
print(x)
The startswith()
method returns True if the string starts with the specified value, otherwise False.
string.startswith(value, start, end)
Parameter | Description |
---|---|
value | Required. The value to check if the string starts with |
start | Optional. An Integer specifying at which position to start the search |
end | Optional. An Integer specifying at which position to end the search |
Check if position 7 to 20 starts with the characters "wel":
txt = "Hello, welcome to my world."
x = txt.startswith("wel", 7, 20)
print(x)
Remove spaces at the beginning and at the end of the string:
txt = " banana "
x = txt.strip()
print("of all fruits", x, "is my favorite")
The strip()
method removes any leading (spaces at the beginning) and trailing (spaces at the end) characters (space is the default leading character to remove)
string.strip(characters)
Parameter | Description |
---|---|
characters | Optional. A set of characters to remove as leading/trailing characters |
Remove the leading and trailing characters:
txt = ",,,,,rrttgg.....banana....rrr"
x = txt.strip(",.grt")
print(x)
Make the lower case letters upper case and the upper case letters lower case:
txt = "Hello My Name Is PETER"
x = txt.swapcase()
print(x)
The swapcase()
method returns a string where all the upper case letters are lower case and vice versa.
string.swapcase()
No parameters.
Make the first letter in each word upper case:
txt = "Welcome to my world"
x = txt.title()
print(x)
The title()
method returns a string where the first character in every word is upper case. Like a header, or a title.
If the word contains a number or a symbol, the first letter after that will be converted to upper case.
string.title()
No parameters.
Make the first letter in each word upper case:
txt = "Welcome to my 2nd world"
x = txt.title()
print(x)
Note that the first letter after a non-alphabet letter is converted into a upper case letter:
txt = "hello b2b2b2 and 3g3g3g"
x = txt.title()
print(x)
Replace any "S" characters with a "P" character:
#use a dictionary with ascii codes to replace 83 (S) with 80 (P):
mydict = {83: 80}
txt = "Hello Sam!"
print(txt.translate(mydict))
The translate()
method returns a string where some specified characters are replaced with the character described in a dictionary, or in a mapping table.
Use the maketrans()
method to create a mapping table.
If a character is not specified in the dictionary/table, the character will not be replaced.
If you use a dictionary, you must use ascii codes instead of characters.
string.translate(table)
Parameter | Description |
---|---|
table | Required. Either a dictionary, or a mapping table describing how to perform the replace |
Use a mapping table to replace "S" with "P":
txt = "Hello Sam!"
mytable = txt.maketrans("S", "P")
print(txt.translate(mytable))
Use a mapping table to replace many characters:
txt = "Hi Sam!"
x = "mSa"
y = "eJo"
mytable = txt.maketrans(x, y)
print(txt.translate(mytable))
The third parameter in the mapping table describes characters that you want to remove from the string:
txt = "Good night Sam!"
x = "mSa"
y = "eJo"
z = "odnght"
mytable = txt.maketrans(x, y, z)
print(txt.translate(mytable))
The same example as above, but using a dictionary instead of a mapping table:
txt = "Good night Sam!"
mydict = {109: 101, 83: 74, 97: 111, 111: None, 100: None, 110: None, 103: None, 104: None, 116: None}
print(txt.translate(mydict))
Upper case the string:
txt = "Hello my friends"
x = txt.upper()
print(x)
The upper()
method returns a string where all characters are in upper case.
Symbols and Numbers are ignored.
string.upper()
No parameters
Fill the string with zeros until it is 10 characters long:
txt = "50"
x = txt.zfill(10)
print(x)
The zfill()
method adds zeros (0) at the beginning of the string, until it reaches the specified length.
If the value of the len parameter is less than the length of the string, no filling is done.
string.zfill(len)
Parameter | Description |
---|---|
len | Required. A number specifying the desired length of the string |
Fill the strings with zeros until they are 10 characters long:
a = "hello"
b = "welcome to the jungle"
c = "10.000"
print(a.zfill(10))
print(b.zfill(10))
print(c.zfill(10))
#python #programming #developer
1598201280
A diff algorithm outputs the set of differences between two inputs. These algorithms are the basis of a number of commonly used developer tools. Yet understanding the inner workings of diff algorithms is rarely necessary to use said tools.
Git is one example where a developer can read, commit, pull, and merge diffs without ever understanding the underlying diff algorithm. Having said that there is very limited knowledge on the subject across the developer community.
The purpose of this article is not to detail how Ably programmatically implemented a diff algorithm across its distributed pub/sub messaging platform, but rather to share our research and provide systematic knowledge on the subject of diff algorithms that could be useful to implementers of diff/delta/patch functionality.
For Ably customers like Tennis Australia or HubSpot, Message Delta Compression reduces the bandwidth required to transmit realtime messages by sending only the diff of a message.
This means subscribers receive only the changes since the last update instead of the entire stream. Sending fewer bits is more bandwidth-efficient and reduces overall costs and latencies for our customers. To develop this feature we needed to implement a diff algorithm that supported binary encoding and didn’t sacrifice latency when generating deltas.
Purpose and usage
The output of a diff algorithm is called patch or delta. The delta format might be human readable (text) or only machine readable (binary). Human readable format is usually employed for tracking and reconciling changes to human readable text like source code. Binary format is usually space optimized and used in order to save bandwidth. It transfers only the set of changes to an old version of the data already available to a recipient as opposed to transferring all the new data. The formal term for this is delta encoding.
Binary VS Text?
There seems to be a common misconception that diff algorithms are specialized based on the type of input. The truth is, diff algorithms are omnivorous and can handle any input, as long as the input can simply be treated as a string of bytes. That string might consist of the English alphabet or opaque binary data. Any diff algorithm will generate a correct delta given two input strings in the same alphabet.
The misconception that a different algorithm is required to handle binary data arises from commonly used diff/merge tools treating text and binary as if they were actually different. These tools generally aim to provide a human-readable delta, and as such focus on human-readable input to the exclusion of binary data.
The assumption is that binary data is not human-readable so the delta between two binary data inputs will also not be human readable, and thus rendering it human-readable is deemed to be too much effort.
Equality is the only relevant output in the case of binary diffs, and as such, a simple bit-by-bit comparison is considered to be the fastest and most appropriate solution. This categorization of algorithms by the efficiency of solution causes a partitioning of inputs into different types.
Another aspect that adds to the confusion is the line-based, word-based, and character-based classification of textual diff outputs produced by diff/merge tools. A diff algorithm that is described as “line-based” gives the impression that it produces “text-only” output, and that this means that it accepts only text input and never binary data inputs.
#data #compression #distributed-systems #aws #git #diff-algorithms #delta-compression #good-company
1593368933
Whether stored or sent over some network, every bit counts and costs money. There are tens, probably hundreds of compression algorithms available, but the most popular one is probably zip. gzip, even though it has a similar name, is a different algorithm. It is one of the three standard formats used in HTTP compression, making it also a broadly used algorithm. These algorithms…
#compression #algorithms #algorithms
1593347004
The Greedy Method is an approach for solving certain types of optimization problems. The greedy algorithm chooses the optimum result at each stage. While this works the majority of the times, there are numerous examples where the greedy approach is not the correct approach. For example, let’s say that you’re taking the greedy algorithm approach to earning money at a certain point in your life. You graduate high school and have two options:
#computer-science #algorithms #developer #programming #greedy-algorithms #algorithms
1596427800
Finding a certain piece of text inside a document represents an important feature nowadays. This is widely used in many practical things that we regularly do in our everyday lives, such as searching for something on Google or even plagiarism. In small texts, the algorithm used for pattern matching doesn’t require a certain complexity to behave well. However, big processes like searching the word ‘cake’ in a 300 pages book can take a lot of time if a naive algorithm is used.
Before, talking about KMP, we should analyze the inefficient approach for finding a sequence of characters into a text. This algorithm slides over the text one by one to check for a match. The complexity provided by this solution is O (m * (n — m + 1)), where m is the length of the pattern and n the length of the text.
Find all the occurrences of string pat in string txt (naive algorithm).
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
string pat = "ABA"; // the pattern
string txt = "CABBCABABAB"; // the text in which we are searching
bool checkForPattern(int index, int patLength) {
int i;
// checks if characters from pat are different from those in txt
for(i = 0; i < patLength; i++) {
if(txt[index + i] != pat[i]) {
return false;
}
}
return true;
}
void findPattern() {
int patternLength = pat.size();
int textLength = txt.size();
for(int i = 0; i <= textLength - patternLength; i++) {
// check for every index if there is a match
if(checkForPattern(i,patternLength)) {
cout << "Pattern at index " << i << "\n";
}
}
}
int main()
{
findPattern();
return 0;
}
view raw
main6.cpp hosted with ❤ by GitHub
This algorithm is based on a degenerating property that uses the fact that our pattern has some sub-patterns appearing more than once. This approach is significantly improving our complexity to linear time. The idea is when we find a mismatch, we already know some of the characters in the next searching window. This way we save time by skip matching the characters that we already know will surely match. To know when to skip, we need to pre-process an auxiliary array prePos in our pattern. prePos will hold integer values that will tell us the count of characters to be jumped. This supporting array can be described as the longest proper prefix that is also a suffix.
#programming #data-science #coding #kmp-algorithm #algorithms #algorithms