For this week’s ML practitioner’s series, we got in touch with Kaggle Grandmaster Martin Henze. Martin is an astrophysicist by training who ventured into machine learning fascinated by data. His notebooks on Kaggle are a must read where he brings his decade long expertise in handling vast data into play. In this interview Martin shared his own perspective on making it big in the machine learning industry as an outsider.

About His Early Days Into ML

Martin Henze is an astrophysicist by training and holds a doctorate in Astrophysics. He spent the better part of his academics observing exploding stars in nearby galaxies. As an observational astronomer, his job was to work with different types of telescope data and to extract insights from distant stars. The data generated in experiments related to deep space is literally astronomical. For example, the black hole that was imaged last year generated data that was equal to half a tonne of hard drives and took more than a year and many flights to move the data to get it stitched. Martin too, is no stranger to this kind of data.

As part of his master’s thesis, he had to skim through a large archival dataset containing images of hundreds of thousands of stars taken over a time range of 35 years to discover the signatures of distant stellar explosions.


Back then, data science as a domain hasn’t gained traction and Martin was working on MIDAS to churn time-series data. At the time, explained Martin, I knew very little about coding in general and I was working with an astro-specific, Fortran-based language called MIDAS and it was terribly slow. “One of my main tasks was to create a time series of the luminosities of all the detectable stars. I estimated that my first working prototype would take one and a half years to run on my local machine – significantly more time than I had left in my 1-year project. Coming up with different optimisation tricks and reducing the runtime to 3 weeks (on the same machine) was a great puzzle to solve, and it taught me a great deal about programming structures. I also learned something valuable about incremental backups after the first of these 3-week runs was crashed by a power outage,” he added.

“Studying Physics gave me a solid foundation in mathematics beyond the key Algebra and Vector Calculus concepts needed for ML.”

Though the ML aspects of the project were mostly confined to regression fits, for Martin, however, this has been the first step towards the world of machine learning.

His zeal for deciphering data helped him take the leap from academia to industry. Currently, Martin works as a Data Scientist at Edison Software, a consumer technology and market research company based in Silicon Valley. He is part of a team that developed a market intelligence platform that helps enterprise customers understand consumer purchase behaviour.

For most part of his academics, Martin usually worked with tools like decision trees, PCA, or clustering. And, not until he joined Kaggle, he would learn about state of the art methods. “Kaggle opened my eyes not only to the full spectrum of exciting ML algorithms, but also to all the different ways to use data to understand our world – not just the distant universe,” said Martin.

On His Kaggle Journey

_“I remember feeling a little overwhelmed and having difficulties to decide where and how to get started.” _

Martin joined Kaggle to learn more about ML, and to use these tools for his astrophysics projects. Though he had working experience with techniques like regression or decision trees, seeing all of these sophisticated tools like XGBoost or neural networks on Kaggle, alongside the large models stacks some people were building, intimidated him. So, to fill the gaps, Martin started reading other people’s Kernels, code, and discussions. He also advises the newcomers to go through the scikit-learn documentation, which he thinks is underrated.


#people #astrophysics #kaggle grandmasters #martin henze

Deep Space To Deep Learning: Interview With Astrophysicist And Kaggle GM Martin Henze
1.15 GEEK