In this article, we look at suitable tools for calculating large files.

What is a large file? A large file is a file that is too large to be read in at one time because of insufficient computer memory. In this case, direct use of desktop data tools (such as Excel) is powerless, often need to write a program to deal with it. Even if the program is written, a large file must be read in batches for calculation and processing. Finally, the batch processing results need to be properly summarized according to different calculation types, which is much more complicated than the processing of small file. There are many types of large files, such as text files, Excel files, XML files, JSON files, HTTP files. Among them, text (txt or CSV) is the most common.

The program languages that can be used to process large files are as follows:

1. Conventional high-level programming languages, such as Java, C/C++, C#, Basic, etc.

2. The file data is imported into the database and processed by SQL

3. Python

4. esProc SPL

Taking text file as an example, this paper introduces the characteristics of the above methods for large file calculation in turn. For other types of file, except the way of reading data is different, the processing idea after reading is similar to text file.

The file to be used in this paper orders.txt has five columns: orderkey, orderdate, state, quantity and amount. The column is separated by tab. The first line in the file is the column name, with a total of 10 million lines of data. As follows:

#java #big data #big data analytics #large files #calculating large files #suitable tools

Looking for Suitable Tools for Calculating Large Files
1.10 GEEK