How does the ZIP format work?

In this post we are going to explore LZ77, a lossless data-compression algorithm created by Lempel and Ziv in 1977. This algorithm is widely spread in our current systems since, for instance, ZIP and GZIP are based on LZ77.

Compression

LZ77 iterates sequentially through the input string and stores any new match into a search buffer. The process of compression can be divided in 3 steps:

  1. Find the longest match of a string that starts at the current position with a pattern available in the search buffer.
  2. Output a triple (o, l, c) where,
  • o: offset, represents the number of positions that we would need to move backwards in order to find the start of the matching string.
  • l: length, represents the length of the match.
  • c: character, represents the character that is found after the match.
  1. Move the cursor l+1 positions to the right.

#data-science #compression #lz77 #algorithms #zip

How LZ77 Data Compression Works
10.10 GEEK