For the past several years, I have been using all kinds of data formats in Big Data projects. During this time I have strongly favored one format over other — my failures have taught me a few lessons. During my lectures I keep stressing the importance of using the correct **Data Format **for the correct purpose — it makes a world of a difference.

All this time I have wondered whether I am delivering the right knowledge to my customers and students. Can I support my claims using data? Therefore I decided to do this performance comparison.

Before I start the comparison let me briefly describe to you the various Data Formats that were considered.

Data Lake -Comparing Performance of Known Big Data Formats
