Random forest algorithm is one of my favorites. It can be used for both classification and regression. To put it in simpler language, a random forest collects predictions from various decision trees and gives out the average of those prediction. This way there is a chance that the predictions actually converge to the true value. Each decision tree is implemented only on a subset of data. This subset is randomly selected by the algorithm where in an observation is picked at random and replaced back into the dataset and another observation is chosen at random adding up to the subset of data; this is commonly known as Bootstrapping. Therefore, you can understand that single observation could be part of a decision tree multiple times since we replace the observation in the dataset and make a random selection. This process is repeated multiple times and for multiple decision trees. All these decision trees are collectively known as random forest and now, you exactly know why the words random and forest are used.

The basic idea here is to train each tree on different samples of data and use the average of thier predictions as the final output. This output has low variance and that is intuitive to understand.

I can strongly say that random forest is better than a single decision tree. Why? It is because the results are more robust. Every single decision tree brings in its own information and predicts accordingly. When we combine all such trees the result is expected to be more accurate and close to the true value on average.

For this week’s data science career interview, we got in touch with Dr Suman Sanyal, Associate Professor of Computer Science and Engineering at NIIT University. In this interview, Dr Sanyal shares his insights on how universities can contribute to this highly promising sector and what aspirants can do to build a successful data science career.

With industry-linkage, technology and research-driven seamless education, NIIT University has been recognised for addressing the growing demand for data science experts worldwide with its industry-ready courses. The university has recently introduced B.Tech in Data Science course, which aims to deploy data sets models to solve real-world problems. The programme provides industry-academic synergy for the students to establish careers in data science, artificial intelligence and machine learning.

“Students with skills that are aligned to new-age technology will be of huge value. The industry today wants young, ambitious students who have the know-how on how to get things done,” Sanyal said.

