Lina  Biyinzika

Lina Biyinzika

1640235600

My Own Unicode Compression Algorithm

Zee Code

ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.

The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.

Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!

Installation

ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.

pip install -U zcode

Usage

You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior by providing optional arguments like:

zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).

LICENSE

MIT LICENSE, see vahidzee/zcode/LICENSE

Author: vahidzee
Source Code: https://github.com/vahidzee/zcode
License: MIT License

#algorithm 

What is GEEK

Buddha Community

My Own Unicode Compression Algorithm

Python String Methods Explained with Examples

Python has a set of built-in methods that you can use on strings.

Note: All string methods returns new values. They do not change the original string.

MethodDescription
capitalize()Converts the first character to upper case
casefold()Converts string into lower case
center()Returns a centered string
count()Returns the number of times a specified value occurs in a string
encode()Returns an encoded version of the string
endswith()Returns true if the string ends with the specified value
expandtabs()Sets the tab size of the string
find()Searches the string for a specified value and returns the position of where it was found
format()Formats specified values in a string
format_map()Formats specified values in a string
index()Searches the string for a specified value and returns the position of where it was found
isalnum()Returns True if all characters in the string are alphanumeric
isalpha()Returns True if all characters in the string are in the alphabet
isascii()Returns True if all characters in the string are ascii characters
isdecimal()Returns True if all characters in the string are decimals
isdigit()Returns True if all characters in the string are digits
isidentifier()Returns True if the string is an identifier
islower()Returns True if all characters in the string are lower case
isnumeric()Returns True if all characters in the string are numeric
isprintable()Returns True if all characters in the string are printable
isspace()Returns True if all characters in the string are whitespaces
istitle()Returns True if the string follows the rules of a title
isupper()Returns True if all characters in the string are upper case
join()Converts the elements of an iterable into a string
ljust()Returns a left justified version of the string
lower()Converts a string into lower case
lstrip()Returns a left trim version of the string
maketrans()Returns a translation table to be used in translations
partition()Returns a tuple where the string is parted into three parts
replace()Returns a string where a specified value is replaced with a specified value
rfind()Searches the string for a specified value and returns the last position of where it was found
rindex()Searches the string for a specified value and returns the last position of where it was found
rjust()Returns a right justified version of the string
rpartition()Returns a tuple where the string is parted into three parts
rsplit()Splits the string at the specified separator, and returns a list
rstrip()Returns a right trim version of the string
split()Splits the string at the specified separator, and returns a list
splitlines()Splits the string at line breaks and returns a list
startswith()Returns true if the string starts with the specified value
strip()Returns a trimmed version of the string
swapcase()Swaps cases, lower case becomes upper case and vice versa
title()Converts the first character of each word to upper case
translate()Returns a translated string
upper()Converts a string into upper case
zfill()Fills the string with a specified number of 0 values at the beginning

 


Python String capitalize() Method

Example

Upper case the first letter in this sentence:

txt = "hello, and welcome to my world."

x = txt.capitalize()

print (x)

Definition and Usage

The capitalize() method returns a string where the first character is upper case, and the rest is lower case.

Syntax

string.capitalize()

Parameter Values

No parameters

More Examples

Example

The first character is converted to upper case, and the rest are converted to lower case:

txt = "python is FUN!"

x = txt.capitalize()

print (x)

Example

See what happens if the first character is a number:

txt = "36 is my age."

x = txt.capitalize()

print (x)

Python String casefold() Method

Example

Make the string lower case:

txt = "Hello, And Welcome To My World!"

x = txt.casefold()

print(x)

Definition and Usage

The casefold() method returns a string where all the characters are lower case.

This method is similar to the lower() method, but the casefold() method is stronger, more aggressive, meaning that it will convert more characters into lower case, and will find more matches when comparing two strings and both are converted using the casefold() method.

Syntax

string.casefold()

Parameter Values

No parameters


Python String center() Method

Example

Print the word "banana", taking up the space of 20 characters, with "banana" in the middle:

txt = "banana"

x = txt.center(20)

print(x)

Definition and Usage

The center() method will center align the string, using a specified character (space is default) as the fill character.

Syntax

string.center(length, character)

Parameter Values

ParameterDescription
lengthRequired. The length of the returned string
characterOptional. The character to fill the missing space on each side. Default is " " (space)

More Examples

Example

Using the letter "O" as the padding character:

txt = "banana"

x = txt.center(20, "O")

print(x)

Python String count() Method

Example

Return the number of times the value "apple" appears in the string:

txt = "I love apples, apple are my favorite fruit"

x = txt.count("apple")

print(x)

Definition and Usage

The count() method returns the number of times a specified value appears in the string.

Syntax

string.count(value, start, end)

Parameter Values

ParameterDescription
valueRequired. A String. The string to value to search for
startOptional. An Integer. The position to start the search. Default is 0
endOptional. An Integer. The position to end the search. Default is the end of the string

More Examples

Example

Search from position 10 to 24:

txt = "I love apples, apple are my favorite fruit"

x = txt.count("apple", 10, 24)

print(x

Python String encode() Method

Example

UTF-8 encode the string:

txt = "My name is Ståle"

x = txt.encode()

print(x)

Definition and Usage

The encode() method encodes the string, using the specified encoding. If no encoding is specified, UTF-8 will be used.

Syntax

string.encode(encoding=encoding, errors=errors)

Parameter Values

ParameterDescription
encodingOptional. A String specifying the encoding to use. Default is UTF-8
errors

Optional. A String specifying the error method. Legal values are:
 

'backslashreplace'- uses a backslash instead of the character that could not be encoded
'ignore'- ignores the characters that cannot be encoded
'namereplace'- replaces the character with a text explaining the character
'strict'- Default, raises an error on failure
'replace'- replaces the character with a questionmark
'xmlcharrefreplace'- replaces the character with an xml character

More Examples

Example

These examples uses ascii encoding, and a character that cannot be encoded, showing the result with different errors:

txt = "My name is Ståle"

print(txt.encode(encoding="ascii",errors="backslashreplace"))
print(txt.encode(encoding="ascii",errors="ignore"))
print(txt.encode(encoding="ascii",errors="namereplace"))
print(txt.encode(encoding="ascii",errors="replace"))
print(txt.encode(encoding="ascii",errors="xmlcharrefreplace"))

Python String endswith() Method

Example

Check if the string ends with a punctuation sign (.):

txt = "Hello, welcome to my world."

x = txt.endswith(".")

print(x)

Definition and Usage

The endswith() method returns True if the string ends with the specified value, otherwise False.

Syntax

string.endswith(value, start, end)

Parameter Values

ParameterDescription
valueRequired. The value to check if the string ends with
startOptional. An Integer specifying at which position to start the search
endOptional. An Integer specifying at which position to end the search

More Examples

Example

Check if the string ends with the phrase "my world.":

txt = "Hello, welcome to my world."

x = txt.endswith("my world.")

print(x)

Example

Check if position 5 to 11 ends with the phrase "my world.":

txt = "Hello, welcome to my world."

x = txt.endswith("my world.", 5, 11)

print(x)

Python String expandtabs() Method

Example

Set the tab size to 2 whitespaces:

txt = "H\te\tl\tl\to"

x =  txt.expandtabs(2)

print(x)

Definition and Usage

The expandtabs() method sets the tab size to the specified number of whitespaces.

Syntax

string.expandtabs(tabsize)

Parameter Values

ParameterDescription
tabsizeOptional. A number specifying the tabsize. Default tabsize is 8

More Examples

Example

See the result using different tab sizes:

txt = "H\te\tl\tl\to"

print(txt)
print(txt.expandtabs())
print(txt.expandtabs(2))
print(txt.expandtabs(4))
print(txt.expandtabs(10))

Python String find() Method

Example

Where in the text is the word "welcome"?:

txt = "Hello, welcome to my world."

x = txt.find("welcome")

print(x)

Definition and Usage

The find() method finds the first occurrence of the specified value.

The find() method returns -1 if the value is not found.

The find() method is almost the same as the index() method, the only difference is that the index() method raises an exception if the value is not found. (See example below)

Syntax

string.find(value, start, end)

Parameter Values

ParameterDescription
valueRequired. The value to search for
startOptional. Where to start the search. Default is 0
endOptional. Where to end the search. Default is to the end of the string

More Examples

Example

Where in the text is the first occurrence of the letter "e"?:

txt = "Hello, welcome to my world."

x = txt.find("e")

print(x)

Example

Where in the text is the first occurrence of the letter "e" when you only search between position 5 and 10?:

txt = "Hello, welcome to my world."

x = txt.find("e", 5, 10)

print(x)

Example

If the value is not found, the find() method returns -1, but the index() method will raise an exception:

txt = "Hello, welcome to my world."

print(txt.find("q"))
print(txt.index("q"))

Python String format() Method

Example

Insert the price inside the placeholder, the price should be in fixed point, two-decimal format:

txt = "For only {price:.2f} dollars!"
print(txt.format(price = 49))

Definition and Usage

The format() method formats the specified value(s) and insert them inside the string's placeholder.

The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below.

The format() method returns the formatted string.

Syntax

string.format(value1, value2...)

Parameter Values

ParameterDescription
value1, value2...Required. One or more values that should be formatted and inserted in the string.

The values are either a list of values separated by commas, a key=value list, or a combination of both.

The values can be of any data type.

The Placeholders

The placeholders can be identified using named indexes {price}, numbered indexes {0}, or even empty placeholders {}.

Example

Using different placeholder values:

txt1 = "My name is {fname}, I'm {age}".format(fname = "John", age = 36)
txt2 = "My name is {0}, I'm {1}".format("John",36)
txt3 = "My name is {}, I'm {}".format("John",36)

Formatting Types

Inside the placeholders you can add a formatting type to format the result:

:<

Try it

Left aligns the result (within the available space)
:>

Try it

Right aligns the result (within the available space)
:^

Try it

Center aligns the result (within the available space)
:=

Try it

Places the sign to the left most position
:+

Try it

Use a plus sign to indicate if the result is positive or negative
:-

Try it

Use a minus sign for negative values only

Try it

Use a space to insert an extra space before positive numbers (and a minus sign before negative numbers)
:,

Try it

Use a comma as a thousand separator
:_

Try it

Use a underscore as a thousand separator
:b

Try it

Binary format
:c Converts the value into the corresponding unicode character
:d

Try it

Decimal format
:e

Try it

Scientific format, with a lower case e
:E

Try it

Scientific format, with an upper case E
:f

Try it

Fix point number format
:F

Try it

Fix point number format, in uppercase format (show inf and nan as INF and NAN)
:g General format
:G General format (using a upper case E for scientific notations)
:o

Try it

Octal format
:x

Try it

Hex format, lower case
:X

Try it

Hex format, upper case
:n Number format
:%

Try it

Percentage format

Python String index() Method

Example

Where in the text is the word "welcome"?:

txt = "Hello, welcome to my world."

x = txt.index("welcome")

print(x)

Definition and Usage

The index() method finds the first occurrence of the specified value.

The index() method raises an exception if the value is not found.

The index() method is almost the same as the find() method, the only difference is that the find() method returns -1 if the value is not found. (See example below)

Syntax

string.index(value, start, end)

Parameter Values

ParameterDescription
valueRequired. The value to search for
startOptional. Where to start the search. Default is 0
endOptional. Where to end the search. Default is to the end of the string

More Examples

Example

Where in the text is the first occurrence of the letter "e"?:

txt = "Hello, welcome to my world."

x = txt.index("e")

print(x)

Example

Where in the text is the first occurrence of the letter "e" when you only search between position 5 and 10?:

txt = "Hello, welcome to my world."

x = txt.index("e", 5, 10)

print(x)

Example

If the value is not found, the find() method returns -1, but the index() method will raise an exception:

txt = "Hello, welcome to my world."

print(txt.find("q"))
print(txt.index("q"))

Python String isalnum() Method

Example

Check if all the characters in the text are alphanumeric:

txt = "Company12"

x = txt.isalnum()

print(x)

Definition and Usage

The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet letter (a-z) and numbers (0-9).

Example of characters that are not alphanumeric: (space)!#%&? etc.

Syntax

string.isalnum()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the text is alphanumeric:

txt = "Company 12"

x = txt.isalnum()

print(x)

Python String isalpha() Method

Example

Check if all the characters in the text are letters:

txt = "CompanyX"

x = txt.isalpha()

print(x)

Definition and Usage

The isalpha() method returns True if all the characters are alphabet letters (a-z).

Example of characters that are not alphabet letters: (space)!#%&? etc.

Syntax

string.isalpha()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the text is alphabetic:

txt = "Company10"

x = txt.isalpha()

print(x)

Python String isascii() Method

Example

Check if all the characters in the text are ascii characters:

txt = "Company123"

x = txt.isascii()

print(x)

Definition and Usage

The isascii() method returns True if all the characters are ascii characters  (a-z).

Check our ASCII Reference.

Syntax

string.isascii()

Parameter Values

No parameters.


Python String isdecimal() Method

Example

Check if all the characters in the unicode object are decimals:

txt = "\u0033" #unicode for 3

x = txt.isdecimal()

print(x)

Definition and Usage

The isdecimal() method returns True if all the characters are decimals (0-9).

This method is used on unicode objects.

Syntax

string.isdecimal()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the unicode are decimals:

a = "\u0030" #unicode for 0
b = "\u0047" #unicode for G

print(a.isdecimal())
print(b.isdecimal())

Python String isdigit() Method

Example

Check if all the characters in the text are digits:

txt = "50800"

x = txt.isdigit()

print(x)

Definition and Usage

The isdigit() method returns True if all the characters are digits, otherwise False.

Exponents, like ², are also considered to be a digit.

Syntax

string.isdigit()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the text are digits:

a = "\u0030" #unicode for 0
b = "\u00B2" #unicode for ²

print(a.isdigit())
print(b.isdigit())

Python String isidentifier() Method

Example

Check if the string is a valid identifier:

txt = "Demo"

x = txt.isidentifier()

print(x)

Definition and Usage

The isidentifier() method returns True if the string is a valid identifier, otherwise False.

A string is considered a valid identifier if it only contains alphanumeric letters (a-z) and (0-9), or underscores (_). A valid identifier cannot start with a number, or contain any spaces.

Syntax

string.isidentifier()

Parameter Values

No parameters.

More Examples

Example

Check if the strings are valid identifiers:

a = "MyFolder"
b = "Demo002"
c = "2bring"
d = "my demo"

print(a.isidentifier())
print(b.isidentifier())
print(c.isidentifier())
print(d.isidentifier())

Python String islower() Method

Example

Check if all the characters in the text are in lower case:

txt = "hello world!"

x = txt.islower()

print(x)

Definition and Usage

The islower() method returns True if all the characters are in lower case, otherwise False.

Numbers, symbols and spaces are not checked, only alphabet characters.

Syntax

string.islower()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the texts are in lower case:

a = "Hello world!"
b = "hello 123"
c = "mynameisPeter"

print(a.islower())
print(b.islower())
print(c.islower())

Python String isnumeric() Method

Example

Check if all the characters in the text are numeric:

txt = "565543"

x = txt.isnumeric()

print(x)

Definition and Usage

The isnumeric() method returns True if all the characters are numeric (0-9), otherwise False.

Exponents, like ² and ¾ are also considered to be numeric values.

"-1" and "1.5" are NOT considered numeric values, because all the characters in the string must be numeric, and the - and the . are not.

Syntax

string.isnumeric()

Parameter Values

No parameters.

More Examples

Example

Check if the characters are numeric:

a = "\u0030" #unicode for 0
b = "\u00B2" #unicode for &sup2;
c = "10km2"
d = "-1"
e = "1.5"

print(a.isnumeric())
print(b.isnumeric())
print(c.isnumeric())
print(d.isnumeric())
print(e.isnumeric())

Python String isprintable() Method

Example

Check if all the characters in the text are printable:

txt = "Hello! Are you #1?"

x = txt.isprintable()

print(x)

Definition and Usage

The isprintable() method returns True if all the characters are printable, otherwise False.

Example of none printable character can be carriage return and line feed.

Syntax

string.isprintable()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the text are printable:

txt = "Hello!\nAre you #1?"

x = txt.isprintable()

print(x)

Python String isspace() Method

Example

Check if all the characters in the text are whitespaces:

txt = "   "

x = txt.isspace()

print(x)

Definition and Usage

The isspace() method returns True if all the characters in a string are whitespaces, otherwise False.

Syntax

string.isspace()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the text are whitespaces:

txt = "   s   "

x = txt.isspace()

print(x)

Python String istitle() Method

Example

Check if each word start with an upper case letter:

txt = "Hello, And Welcome To My World!"

x = txt.istitle()

print(x)

Definition and Usage

The istitle() method returns True if all words in a text start with a upper case letter, AND the rest of the word are lower case letters, otherwise False.

Symbols and numbers are ignored.

Syntax

string.istitle()

Parameter Values

No parameters.

More Examples

Example

Check if each word start with an upper case letter:

a = "HELLO, AND WELCOME TO MY WORLD"
b = "Hello"
c = "22 Names"
d = "This Is %'!?"

print(a.istitle())
print(b.istitle())
print(c.istitle())
print(d.istitle())

Python String isupper() Method

Example

Check if all the characters in the text are in upper case:

txt = "THIS IS NOW!"

x = txt.isupper()

print(x)

Definition and Usage

The isupper() method returns True if all the characters are in upper case, otherwise False.

Numbers, symbols and spaces are not checked, only alphabet characters.

Syntax

string.isupper()

Parameter Values

No parameters.

More Examples

Example

Check if all the characters in the texts are in upper case:

a = "Hello World!"
b = "hello 123"
c = "MY NAME IS PETER"

print(a.isupper())
print(b.isupper())
print(c.isupper())

Python String join() Method

Example

Join all items in a tuple into a string, using a hash character as separator:

myTuple = ("John", "Peter", "Vicky")

x = "#".join(myTuple)

print(x)

Definition and Usage

The join() method takes all items in an iterable and joins them into one string.

A string must be specified as the separator.

Syntax

string.join(iterable)

Parameter Values

ParameterDescription
iterableRequired. Any iterable object where all the returned values are strings

More Examples

Example

Join all items in a dictionary into a string, using the word "TEST" as separator:

myDict = {"name": "John", "country": "Norway"}
mySeparator = "TEST"

x = mySeparator.join(myDict)

print(x)

Python String ljust() Method

Example

Return a 20 characters long, left justified version of the word "banana":

txt = "banana"

x = txt.ljust(20)

print(x, "is my favorite fruit.")

Note: In the result, there are actually 14 whitespaces to the right of the word banana.

Definition and Usage

The ljust() method will left align the string, using a specified character (space is default) as the fill character.

Syntax

string.ljust(length, character)

Parameter Values

ParameterDescription
lengthRequired. The length of the returned string
characterOptional. A character to fill the missing space (to the right of the string). Default is " " (space).

More Examples

Example

Using the letter "O" as the padding character:

txt = "banana"

x = txt.ljust(20, "O")

print(x)

Python String lower() Method

Example

Lower case the string:

txt = "Hello my FRIENDS"

x = txt.lower()

print(x)

Definition and Usage

The lower() method returns a string where all characters are lower case.

 Symbols and Numbers are ignored.

Syntax

string.lower()

Parameter Values

No parameters


Python String lstrip() Method

Example

Remove spaces to the left of the string:

txt = "     banana     "

x = txt.lstrip()

print("of all fruits", x, "is my favorite")

Definition and Usage

The lstrip() method removes any leading characters (space is the default leading character to remove)

Syntax

string.lstrip(characters)

Parameter Values

ParameterDescription
charactersOptional. A set of characters to remove as leading characters

More Examples

Example

Remove the leading characters:

txt = ",,,,,ssaaww.....banana"

x = txt.lstrip(",.asw")

print(x)

Python String maketrans() Method

Example

Create a mapping table, and use it in the translate() method to replace any "S" characters with a "P" character:

txt = "Hello Sam!"
mytable = txt.maketrans("S", "P")
print(txt.translate(mytable))

Definition and Usage

The maketrans() method returns a mapping table that can be used with the translate() method to replace specified characters.

Syntax

string.maketrans(x, y, z)

Parameter Values

ParameterDescription
xRequired. If only one parameter is specified, this has to be a dictionary describing how to perform the replace. If two or more parameters are specified, this parameter has to be a string specifying the characters you want to replace.
yOptional. A string with the same length as parameter x. Each character in the first parameter will be replaced with the corresponding character in this string.
zOptional. A string describing which characters to remove from the original string.

More Examples

Example

Use a mapping table to replace many characters:

txt = "Hi Sam!"
x = "mSa"
y = "eJo"
mytable = txt.maketrans(x, y)
print(txt.translate(mytable))

Example

The third parameter in the mapping table describes characters that you want to remove from the string:

txt = "Good night Sam!"
x = "mSa"
y = "eJo"
z = "odnght"
mytable = txt.maketrans(x, y, z)
print(txt.translate(mytable))

Example

The maketrans() method itself returns a dictionary describing each replacement, in unicode:

txt = "Good night Sam!"
x = "mSa"
y = "eJo"
z = "odnght"
print(txt.maketrans(x, y, z))

Python String partition() Method

Example

Search for the word "bananas", and return a tuple with three elements:

1 - everything before the "match"
2 - the "match"
3 - everything after the "match"

txt = "I could eat bananas all day"

x = txt.partition("bananas")

print(x)

Definition and Usage

The partition() method searches for a specified string, and splits the string into a tuple containing three elements.

The first element contains the part before the specified string.

The second element contains the specified string.

The third element contains the part after the string.

Note: This method searches for the first occurrence of the specified string.

Syntax

string.partition(value)

Parameter Values

ParameterDescription
valueRequired. The string to search for

More Examples

Example

If the specified value is not found, the partition() method returns a tuple containing: 1 - the whole string, 2 - an empty string, 3 - an empty string:

txt = "I could eat bananas all day"

x = txt.partition("apples")

print(x)

Python String replace() Method

Example

Replace the word "bananas":

txt = "I like bananas"

x = txt.replace("bananas", "apples")

print(x)

Definition and Usage

The replace() method replaces a specified phrase with another specified phrase.

Note: All occurrences of the specified phrase will be replaced, if nothing else is specified.

Syntax

string.replace(oldvalue, newvalue, count)

Parameter Values

ParameterDescription
oldvalueRequired. The string to search for
newvalueRequired. The string to replace the old value with
countOptional. A number specifying how many occurrences of the old value you want to replace. Default is all occurrences

More Examples

Example

Replace all occurrence of the word "one":

txt = "one one was a race horse, two two was one too."

x = txt.replace("one", "three")

print(x)

Example

Replace the two first occurrence of the word "one":

txt = "one one was a race horse, two two was one too."

x = txt.replace("one", "three", 2)

print(x)

Python String rfind() Method

Example

Where in the text is the last occurrence of the string "casa"?:

txt = "Mi casa, su casa."

x = txt.rfind("casa")

print(x)

Definition and Usage

The rfind() method finds the last occurrence of the specified value.

The rfind() method returns -1 if the value is not found.

The rfind() method is almost the same as the rindex() method. See example below.

Syntax

string.rfind(value, start, end)

Parameter Values

ParameterDescription
valueRequired. The value to search for
startOptional. Where to start the search. Default is 0
endOptional. Where to end the search. Default is to the end of the string

More Examples

Example

Where in the text is the last occurrence of the letter "e"?:

txt = "Hello, welcome to my world."

x = txt.rfind("e")

print(x)

Example

Where in the text is the last occurrence of the letter "e" when you only search between position 5 and 10?:

txt = "Hello, welcome to my world."

x = txt.rfind("e", 5, 10)

print(x)

Example

If the value is not found, the rfind() method returns -1, but the rindex() method will raise an exception:

txt = "Hello, welcome to my world."

print(txt.rfind("q"))
print(txt.rindex("q"))

Python String rindex() Method

Example

Where in the text is the last occurrence of the string "casa"?:

txt = "Mi casa, su casa."

x = txt.rindex("casa")

print(x)

Definition and Usage

The rindex() method finds the last occurrence of the specified value.

The rindex() method raises an exception if the value is not found.

The rindex() method is almost the same as the rfind() method. See example below.

Syntax

string.rindex(value, start, end)

Parameter Values

ParameterDescription
valueRequired. The value to search for
startOptional. Where to start the search. Default is 0
endOptional. Where to end the search. Default is to the end of the string

More Examples

Example

Where in the text is the last occurrence of the letter "e"?:

txt = "Hello, welcome to my world."

x = txt.rindex("e")

print(x)

Example

Where in the text is the last occurrence of the letter "e" when you only search between position 5 and 10?:

txt = "Hello, welcome to my world."

x = txt.rindex("e", 5, 10)

print(x)

Example

If the value is not found, the rfind() method returns -1, but the rindex() method will raise an exception:

txt = "Hello, welcome to my world."

print(txt.rfind("q"))
print(txt.rindex("q"))

Python String rjust() Method

Example

Return a 20 characters long, right justified version of the word "banana":

txt = "banana"

x = txt.rjust(20)

print(x, "is my favorite fruit.")

Note: In the result, there are actually 14 whitespaces to the left of the word banana.

Definition and Usage

The rjust() method will right align the string, using a specified character (space is default) as the fill character.

Syntax

string.rjust(length, character)

Parameter Values

ParameterDescription
lengthRequired. The length of the returned string
characterOptional. A character to fill the missing space (to the left of the string). Default is " " (space).

More Examples

Example

Using the letter "O" as the padding character:

txt = "banana"

x = txt.rjust(20, "O")

print(x)

Python String rpartition() Method

Example

Search for the last occurrence of the word "bananas", and return a tuple with three elements:

1 - everything before the "match"
2 - the "match"
3 - everything after the "match"

txt = "I could eat bananas all day, bananas are my favorite fruit"

x = txt.rpartition("bananas")

print(x)

Definition and Usage

The rpartition() method searches for the last occurrence of a specified string, and splits the string into a tuple containing three elements.

The first element contains the part before the specified string.

The second element contains the specified string.

The third element contains the part after the string.

Syntax

string.rpartition(value)

Parameter Values

ParameterDescription
valueRequired. The string to search for

More Examples

Example

If the specified value is not found, the rpartition() method returns a tuple containing: 1 - an empty string, 2 - an empty string, 3 - the whole string:

txt = "I could eat bananas all day, bananas are my favorite fruit"

x = txt.rpartition("apples")

print(x)

Python String rsplit() Method

Example

Split a string into a list, using comma, followed by a space (, ) as the separator:

txt = "apple, banana, cherry"

x = txt.rsplit(", ")

print(x)

Definition and Usage

The rsplit() method splits a string into a list, starting from the right.

If no "max" is specified, this method will return the same as the split() method.

Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

Syntax

string.rsplit(separator, maxsplit)

Parameter Values

ParameterDescription
separatorOptional. Specifies the separator to use when splitting the string. By default any whitespace is a separator
maxsplitOptional. Specifies how many splits to do. Default value is -1, which is "all occurrences"

More Examples

Example

Split the string into a list with maximum 2 items:

txt = "apple, banana, cherry"

# setting the maxsplit parameter to 1, will return a list with 2 elements!
x = txt.rsplit(", ", 1)

print(x)

Python String rstrip() Method

Example

Remove any white spaces at the end of the string:

txt = "     banana     "

x = txt.rstrip()

print("of all fruits", x, "is my favorite")

Definition and Usage

The rstrip() method removes any trailing characters (characters at the end a string), space is the default trailing character to remove.

Syntax

string.rstrip(characters)

Parameter Values

ParameterDescription
charactersOptional. A set of characters to remove as trailing characters

More Examples

Example

Remove the trailing characters if they are commas, s, q, or w:

txt = "banana,,,,,ssqqqww....."

x = txt.rstrip(",.qsw")

print(x)

Python String split() Method

Example

Split a string into a list where each word is a list item:

txt = "welcome to the jungle"

x = txt.split()

print(x)

Definition and Usage

The split() method splits a string into a list.

You can specify the separator, default separator is any whitespace.

Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

Syntax

string.split(separator, maxsplit)

Parameter Values

ParameterDescription
separatorOptional. Specifies the separator to use when splitting the string. By default any whitespace is a separator
maxsplitOptional. Specifies how many splits to do. Default value is -1, which is "all occurrences"

More Examples

Example

Split the string, using comma, followed by a space, as a separator:

txt = "hello, my name is Peter, I am 26 years old"

x = txt.split(", ")

print(x)

Example

Use a hash character as a separator:

txt = "apple#banana#cherry#orange"

x = txt.split("#")

print(x)

Example

Split the string into a list with max 2 items:

txt = "apple#banana#cherry#orange"

# setting the maxsplit parameter to 1, will return a list with 2 elements!
x = txt.split("#", 1)

print(x)

Python String splitlines() Method

Example

Split a string into a list where each line is a list item:

txt = "Thank you for the music\nWelcome to the jungle"

x = txt.splitlines()

print(x)

Definition and Usage

The splitlines() method splits a string into a list. The splitting is done at line breaks.

Syntax

string.splitlines(keeplinebreaks)

Parameter Values

ParameterDescription
keeplinebreaksOptional. Specifies if the line breaks should be included (True), or not (False). Default value is False

More Examples

Example

Split the string, but keep the line breaks:

txt = "Thank you for the music\nWelcome to the jungle"

x = txt.splitlines(True)

print(x)

Python String startswith() Method

Example

Check if the string starts with "Hello":

txt = "Hello, welcome to my world."

x = txt.startswith("Hello")

print(x)

Definition and Usage

The startswith() method returns True if the string starts with the specified value, otherwise False.

Syntax

string.startswith(value, start, end)

Parameter Values

ParameterDescription
valueRequired. The value to check if the string starts with
startOptional. An Integer specifying at which position to start the search
endOptional. An Integer specifying at which position to end the search

More Examples

Example

Check if position 7 to 20 starts with the characters "wel":

txt = "Hello, welcome to my world."

x = txt.startswith("wel", 7, 20)

print(x)

Python String strip() Method

Example

Remove spaces at the beginning and at the end of the string:

txt = "     banana     "

x = txt.strip()

print("of all fruits", x, "is my favorite")

Definition and Usage

The strip() method removes any leading (spaces at the beginning) and trailing (spaces at the end) characters (space is the default leading character to remove)

Syntax

string.strip(characters)

Parameter Values

ParameterDescription
charactersOptional. A set of characters to remove as leading/trailing characters

More Examples

Example

Remove the leading and trailing characters:

txt = ",,,,,rrttgg.....banana....rrr"

x = txt.strip(",.grt")

print(x)

Python String swapcase() Method

Example

Make the lower case letters upper case and the upper case letters lower case:

txt = "Hello My Name Is PETER"

x = txt.swapcase()

print(x)

Definition and Usage

The swapcase() method returns a string where all the upper case letters are lower case and vice versa.

Syntax

string.swapcase()

Parameter Values

No parameters.


Python String title() Method

Example

Make the first letter in each word upper case:

txt = "Welcome to my world"

x = txt.title()

print(x)

Definition and Usage

The title() method returns a string where the first character in every word is upper case. Like a header, or a title.

If the word contains a number or a symbol, the first letter after that will be converted to upper case.

Syntax

string.title()

Parameter Values

No parameters.

More Examples

Example

Make the first letter in each word upper case:

txt = "Welcome to my 2nd world"

x = txt.title()

print(x)

Example

Note that the first letter after a non-alphabet letter is converted into a upper case letter:

txt = "hello b2b2b2 and 3g3g3g"

x = txt.title()

print(x)

Python String translate() Method

Example

Replace any "S" characters with a "P" character:

#use a dictionary with ascii codes to replace 83 (S) with 80 (P):
mydict = {83:  80}
txt = "Hello Sam!"
print(txt.translate(mydict))

Definition and Usage

The translate() method returns a string where some specified characters are replaced with the character described in a dictionary, or in a mapping table.

Use the maketrans() method to create a mapping table.

If a character is not specified in the dictionary/table, the character will not be replaced.

If you use a dictionary, you must use ascii codes instead of characters.

Syntax

string.translate(table)

Parameter Values

ParameterDescription
tableRequired. Either a dictionary, or a mapping table describing how to perform the replace

More Examples

Example

Use a mapping table to replace "S" with "P":

txt = "Hello Sam!"
mytable = txt.maketrans("S", "P")
print(txt.translate(mytable))

Example

Use a mapping table to replace many characters:

txt = "Hi Sam!"
x = "mSa"
y = "eJo"
mytable = txt.maketrans(x, y)
print(txt.translate(mytable))

Example

The third parameter in the mapping table describes characters that you want to remove from the string:

txt = "Good night Sam!"
x = "mSa"
y = "eJo"
z = "odnght"
mytable = txt.maketrans(x, y, z)
print(txt.translate(mytable))

Example

The same example as above, but using a dictionary instead of a mapping table:

txt = "Good night Sam!"
mydict = {109: 101, 83: 74, 97: 111, 111: None, 100: None, 110: None, 103: None, 104: None, 116: None}
print(txt.translate(mydict))

Python String upper() Method

Example

Upper case the string:

txt = "Hello my friends"

x = txt.upper()

print(x)

Definition and Usage

The upper() method returns a string where all characters are in upper case.

 Symbols and Numbers are ignored.

Syntax

string.upper()

Parameter Values

No parameters


Python String zfill() Method

Example

Fill the string with zeros until it is 10 characters long:

txt = "50"

x = txt.zfill(10)

print(x)

Definition and Usage

The zfill() method adds zeros (0) at the beginning of the string, until it reaches the specified length.

If the value of the len parameter is less than the length of the string, no filling is done.

Syntax

string.zfill(len)

Parameter Values

ParameterDescription
lenRequired. A number specifying the desired length of the string

More Examples

Example

Fill the strings with zeros until they are 10 characters long:

a = "hello"
b = "welcome to the jungle"
c = "10.000"

print(a.zfill(10))
print(b.zfill(10))
print(c.zfill(10))

#python #programming #developer 

Bette  Shanahan

Bette Shanahan

1598201280

Delta Compression: Diff Algorithms And Delta File Formats [Practical Guide]

A diff algorithm outputs the set of differences between two inputs. These algorithms are the basis of a number of commonly used developer tools. Yet understanding the inner workings of diff algorithms is rarely necessary to use said tools.

Git is one example where a developer can read, commit, pull, and merge diffs without ever understanding the underlying diff algorithm. Having said that there is very limited knowledge on the subject across the developer community.

The purpose of this article is not to detail how Ably programmatically implemented a diff algorithm across its distributed pub/sub messaging platform, but rather to share our research and provide systematic knowledge on the subject of diff algorithms that could be useful to implementers of diff/delta/patch functionality.

A quick bit of context

For Ably customers like Tennis Australia or HubSpot, Message Delta Compression reduces the bandwidth required to transmit realtime messages by sending only the diff of a message.

This means subscribers receive only the changes since the last update instead of the entire stream. Sending fewer bits is more bandwidth-efficient and reduces overall costs and latencies for our customers. To develop this feature we needed to implement a diff algorithm that supported binary encoding and didn’t sacrifice latency when generating deltas.

Diff algorithms

Purpose and usage

The output of a diff algorithm is called patch or delta. The delta format might be human readable (text) or only machine readable (binary). Human readable format is usually employed for tracking and reconciling changes to human readable text like source code. Binary format is usually space optimized and used in order to save bandwidth. It transfers only the set of changes to an old version of the data already available to a recipient as opposed to transferring all the new data. The formal term for this is delta encoding.

Binary VS Text?

There seems to be a common misconception that diff algorithms are specialized based on the type of input. The truth is, diff algorithms are omnivorous and can handle any input, as long as the input can simply be treated as a string of bytes. That string might consist of the English alphabet or opaque binary data. Any diff algorithm will generate a correct delta given two input strings in the same alphabet.

The misconception that a different algorithm is required to handle binary data arises from commonly used diff/merge tools treating text and binary as if they were actually different. These tools generally aim to provide a human-readable delta, and as such focus on human-readable input to the exclusion of binary data.

The assumption is that binary data is not human-readable so the delta between two binary data inputs will also not be human readable, and thus rendering it human-readable is deemed to be too much effort.

Equality is the only relevant output in the case of binary diffs, and as such, a simple bit-by-bit comparison is considered to be the fastest and most appropriate solution. This categorization of algorithms by the efficiency of solution causes a partitioning of inputs into different types.

Another aspect that adds to the confusion is the line-based, word-based, and character-based classification of textual diff outputs produced by diff/merge tools. A diff algorithm that is described as “line-based” gives the impression that it produces “text-only” output, and that this means that it accepts only text input and never binary data inputs.

#data #compression #distributed-systems #aws #git #diff-algorithms #delta-compression #good-company

Understand the DEFLATE Compression behind the zip and gzip Formats

Whether stored or sent over some network, every bit counts and costs money. There are tens, probably hundreds of compression algorithms available, but the most popular one is probably zip. gzip, even though it has a similar name, is a different algorithm. It is one of the three standard formats used in HTTP compression, making it also a broadly used algorithm. These algorithms…

#compression #algorithms #algorithms

A greedy algorithm is a simple

The Greedy Method is an approach for solving certain types of optimization problems. The greedy algorithm chooses the optimum result at each stage. While this works the majority of the times, there are numerous examples where the greedy approach is not the correct approach. For example, let’s say that you’re taking the greedy algorithm approach to earning money at a certain point in your life. You graduate high school and have two options:

#computer-science #algorithms #developer #programming #greedy-algorithms #algorithms

Tia  Gottlieb

Tia Gottlieb

1596427800

KMP — Pattern Matching Algorithm

Finding a certain piece of text inside a document represents an important feature nowadays. This is widely used in many practical things that we regularly do in our everyday lives, such as searching for something on Google or even plagiarism. In small texts, the algorithm used for pattern matching doesn’t require a certain complexity to behave well. However, big processes like searching the word ‘cake’ in a 300 pages book can take a lot of time if a naive algorithm is used.

The naive algorithm

Before, talking about KMP, we should analyze the inefficient approach for finding a sequence of characters into a text. This algorithm slides over the text one by one to check for a match. The complexity provided by this solution is O (m * (n — m + 1)), where m is the length of the pattern and n the length of the text.

Find all the occurrences of string pat in string txt (naive algorithm).

#include <iostream>
	#include <string>
	#include <algorithm>
	using namespace std;

	string pat = "ABA"; // the pattern
	string txt = "CABBCABABAB"; // the text in which we are searching

	bool checkForPattern(int index, int patLength) {
	    int i;
	    // checks if characters from pat are different from those in txt
	    for(i = 0; i < patLength; i++) {
	        if(txt[index + i] != pat[i]) {
	            return false;
	        }
	    }
	    return true;
	}

	void findPattern() {
	    int patternLength = pat.size();
	    int textLength = txt.size();

	    for(int i = 0; i <= textLength - patternLength; i++) {
	        // check for every index if there is a match
	        if(checkForPattern(i,patternLength)) {
	            cout << "Pattern at index " << i << "\n";
	        }
	    }

	}

	int main() 
	{
	    findPattern();
	    return 0;
	}
view raw
main6.cpp hosted with ❤ by GitHub

KMP approach

This algorithm is based on a degenerating property that uses the fact that our pattern has some sub-patterns appearing more than once. This approach is significantly improving our complexity to linear time. The idea is when we find a mismatch, we already know some of the characters in the next searching window. This way we save time by skip matching the characters that we already know will surely match. To know when to skip, we need to pre-process an auxiliary array prePos in our pattern. prePos will hold integer values that will tell us the count of characters to be jumped. This supporting array can be described as the longest proper prefix that is also a suffix.

#programming #data-science #coding #kmp-algorithm #algorithms #algorithms