Speakers: Alex Wu, Clark Zinzow

ML tasks such as distributed training and batch inference stretch the abstractions of modern data processing systems, leading to performance or learning efficiency tradeoffs. In this talk we introduce Ray Dataset, a universal compatibility layer built on Arrow and Python that allows data processing to be combined with ML pipelines without such tradeoffs.

#data  #ml #machine-learning  #pipelining #pydata 

Unified Data Preprocessing & ML Pipelines with Ray Datasets
1.00 GEEK