If you have JPEG or PNG images, you can read them directly into TensorFlow using  tf.io.decode_image. What if your data is in some industry-specific binary format?

Why HRRR to TensorFlow records?

The High Resolution Rapid Refresh (HRRR) model is a numerical weather model. Because weather models work best when countries all over the world pool their observations, the format for weather data is decided by the World Meteorological Organization and it is super-hard to change. So, the HRRR data is disseminated in a #@!$@&= binary format called GRIB.

Regardless of the industry you are in — manufacturing, electricity generation, pharmaceutical research, genomics, astronomy— you probably have some format like this. A format that no modern software framework supports. Even though this article is about HRRR, the techniques here will apply to any binary files you have.

The most efficient format for TensorFlow training is TensorFlow records. This is a protobuf format that makes it possible for the training program to buffer, prefetch, and parallelize the reading of records. So, a good first step for machine learning is to convert your industry-specific binary format files into TensorFlow records.

#apache-beam #google-cloud #tensorflow #machine-learning #weather-forecasts

How to Convert binary files into TensorFlow records
3.50 GEEK