Any computer system in today’s world generates a very high amount of logs or data daily. As the system grows, it is not feasible to store the debugging data into a database, as they’ve are immutable and it’s only going to be used for analytics and fault resolution purposes. So organisations tend to store it in files, which resides in local disks storage.

We are going to extract logs from a .txt or .log file of size 16 GB, having millions of lines using Golang.


Lets Code…!

Let’s open the file first. We will be using standard Go os.File for any file IO.

f, err := os.Open(fileName)

 if err != nil {
   fmt.Println("cannot able to read the file", err)
   return
 }
// UPDATE: close after checking error
defer file.Close()  //Do not forget to close the file

Once the file is opened, we have below two options to proceed with

  1. Read the file line by line, it helps to reduce the strain on memory but will take more time in IO.
  2. Read an entire file into memory at once and process the file, which will consume more memory but significantly increase the time.

As we are having file size too high, i.e 16 GB, we can’t load an entire file into memory. But the first option is also not feasible for us, as we want to process the file within seconds.

But guess what, there is a third option. Voila…! Instead on loading entire file into memory we will load the file in chunks, using **bufio.NewReader(), **available in Go.

r := bufio.NewReader(f)

for {
buf := make([]byte,4*1024) //the chunk size
n, err := r.Read(buf) //loading chunk into buffer
   buf = buf[:n]
if n == 0 {

     if err != nil {
       fmt.Println(err)
       break
     }
     if err == io.EOF {
       break
     }
     return err
  }
}

#go #processing #concurrency #golang

Processing 16GB File in Seconds, Golang
10.40 GEEK