For this month's machine learning practitioners series, Analytics India ... From the popularity of support vector machines to the explosion of deep learning algorithms ... In this interview, he talks about ways to master Kaggle competitions ... Learning: Interview With Astrophysicist And Kaggle GM Martin Henze.
For this week’s ML practitioner’s series, we got in touch with Kaggle Grandmaster Martin Henze. Martin is an astrophysicist by training who ventured into machine learning fascinated by data. His notebooks on Kaggle are a must read where he brings his decade long expertise in handling vast data into play. In this interview Martin shared his own perspective on making it big in the machine learning industry as an outsider.
Martin Henze is an astrophysicist by training and holds a doctorate in Astrophysics. He spent the better part of his academics observing exploding stars in nearby galaxies. As an observational astronomer, his job was to work with different types of telescope data and to extract insights from distant stars. The data generated in experiments related to deep space is literally astronomical. For example, the black hole that was imaged last year generated data that was equal to half a tonne of hard drives and took more than a year and many flights to move the data to get it stitched. Martin too, is no stranger to this kind of data.
As part of his master’s thesis, he had to skim through a large archival dataset containing images of hundreds of thousands of stars taken over a time range of 35 years to discover the signatures of distant stellar explosions.
Back then, data science as a domain hasn’t gained traction and Martin was working on MIDAS to churn time-series data. At the time, explained Martin, I knew very little about coding in general and I was working with an astro-specific, Fortran-based language called MIDAS and it was terribly slow. “One of my main tasks was to create a time series of the luminosities of all the detectable stars. I estimated that my first working prototype would take one and a half years to run on my local machine – significantly more time than I had left in my 1-year project. Coming up with different optimisation tricks and reducing the runtime to 3 weeks (on the same machine) was a great puzzle to solve, and it taught me a great deal about programming structures. I also learned something valuable about incremental backups after the first of these 3-week runs was crashed by a power outage,” he added.
“Studying Physics gave me a solid foundation in mathematics beyond the key Algebra and Vector Calculus concepts needed for ML.”
Though the ML aspects of the project were mostly confined to regression fits, for Martin, however, this has been the first step towards the world of machine learning.
His zeal for deciphering data helped him take the leap from academia to industry. Currently, Martin works as a Data Scientist at Edison Software, a consumer technology and market research company based in Silicon Valley. He is part of a team that developed a market intelligence platform that helps enterprise customers understand consumer purchase behaviour.
For most part of his academics, Martin usually worked with tools like decision trees, PCA, or clustering. And, not until he joined Kaggle, he would learn about state of the art methods. “Kaggle opened my eyes not only to the full spectrum of exciting ML algorithms, but also to all the different ways to use data to understand our world – not just the distant universe,” said Martin.
_“I remember feeling a little overwhelmed and having difficulties to decide where and how to get started.” _
Martin joined Kaggle to learn more about ML, and to use these tools for his astrophysics projects. Though he had working experience with techniques like regression or decision trees, seeing all of these sophisticated tools like XGBoost or neural networks on Kaggle, alongside the large models stacks some people were building, intimidated him. So, to fill the gaps, Martin started reading other people’s Kernels, code, and discussions. He also advises the newcomers to go through the scikit-learn documentation, which he thinks is underrated.
Martin Henze joined Kaggle to learn more about machine learning, and to use ML tools for his astrophysics projects.
Watch us get candid with Usha and talk about her recommended certifications, her Kaggle journey etc. Usha is a polymath and India’s first woman Kaggle Grandm...
Agnis looks at Kaggle as a great way to stay up-to-date with all bleeding-edge technologies and approaches in Data Science.
Great Learning brings you this live session on “Kaggle Competition for Beginners”. Kaggle is a great site for Data Science enthusiasts. Whether you are a beginner in data science or an expert, there is definitely something of value for you on this site name Kaggle. In this session, we will start off by going through the website and and understand the different features. Then we will start off with our first competition on Kaggle. We shall see, how to enroll for a competition and once we complete the analysis, how should we upload the result. This live session will also have a QnA part, where all your queries and doubts would be answered with respect to whatever is being covered in the session.
Kaggle House Prices Prediction with Linear Regression and Gradient Boosting. This notebook achieved a score of 0.12 and within the top 25% in this Kaggle House Price competition