Join us on this weekly Office Hours for Oracle Machine Learning on Autonomous Database, where Jie Liu, Data Scientist for Oracle Machine Learning, will cover the different methods of encoding categorical attributes like One-Hot Encoding, Mean Econding and Weight-of-Evidence (WoE), and review the best usage for each of them. He will also present a demo running on a notebook with OML4Py.

The Oracle Machine Learning product family supports data scientists, analysts, developers, and IT to achieve data science project goals faster while taking full advantage of the Oracle platform.

The Oracle Machine Learning Notebooks offers an easy-to-use, interactive, multi-user, collaborative interface based on Apache Zeppelin notebook technology, and support SQL, PL/SQL, Python and Markdown interpreters. It is available on all Autonomous Database versions and Tiers, including the always-free editions.

OML includes AutoML, which provides automated machine learning algorithm features for algorithm selection, feature selection and model tuning, in addition to a specialized AutoML UI exclusive to the Autonomous Database.

OML Services is also included in Autonomous Database, where you can deploy and manage native in-database OML models as well as ONNX ML models (for classification and regression) built using third-party engines, and can also invoke cognitive text analytics.

Video highlights:

  • 00:49 Outline of the presentation
  • 01:21 Categorical Variable Encoding
  • 03:23 Popular Techniques for encoding categorical variables
  • 05:40 Mean Encoding
  • 07:44 Mean Encoding
  • 09:08 Weight of Evidence - Definition and benefit
  • 11:48 Weight of Evidence - Dive into the formula
  • 14:05 Weight of Evidence - Limitations
  • 14:57 Information Value - byproduct of Weight of Evidence
  • 16:33 WoE implementation on OML4Py
  • 17:16 OML4Py Weight of Evidence ML 101 demo
  • 26:15 Q&A

#machine-learning

ML Concepts - Encoding of Categorical Attributes: OneHot vs Mean vs WoE and when to use them
1.75 GEEK