Let’s get these clear, existing Datasets aren’t enough
While most Machine Learning programmers start with common open-source datasets like MNIST or CIFAR-10, and that’s all well and good, but to expand your horizon, and solve problems, you need to go beyond these and get your own data. While the collection of data may or may not be too hard, most people find difficulties in making this data ready for training. This is mostly because of the large number of intermediate steps like format conversion (usually for Computer Vision), Tokenizing (for NLP) and the general steps of Data Augmentation, Shuffling etc.
To make this process easier, let’s first understand this process of getting a Dataset ready.

#data #keras #data-pipeline #tensorflow #tensorflow-dataset

Create Efficient Data Pipelines with TFDS and tf.Data
1.20 GEEK