Last year in February, the TensorFlow’s team introduced TensorFlow Datasets. Machine learning community can access public research datasets as** tf.data.Datasets** and as NumPy arrays. TFDS does all the tedious work of fetching the source data and preparing it into a common format on disk. It uses the** tf.data** API to build high-performance input pipelines, which are TensorFlow 2.0-ready and can be used with tf.keras models.
TensorFlow Datasets provides many public datasets as
tf.data.Datasets
Installation:
pip install tensorflow-datasets
## Snippet:
import tensorflow_datasets as tfds
mnist_data = tfds.load("mnist")
mnist_train, mnist_test = mnist_data["train"], mnist_data["test"]
assert isinstance(mnist_train, tf.data.Dataset)
In the next section we take a look at few important datasets(h/t Lionbridge) that TensorFlow allows you to access with a single line of code:
[LSUN](https://www.tensorflow.org/datasets/catalog/lsun) contains around one million labeled images for each of 10 scene categories and 20 object categories. We experiment with training popular convolutional networks and find that they achieve substantial performance gains when trained on this dataset.
#developers corner #datasets #tensorflow datasets #tfds #tensorflow