At Apollo Agriculture, a Kenya based agro-tech startup, one of the challenging problems we face is to predict yields of Kenyan maize farmers. Like almost all data-sets, this data-set has a hierarchical structure: farmers within the same region aren’t independent. By ignoring this fact, a model could predict yields entirely from the region of the farmer, but fails to find any other meaningful insights, and we may not even realize. However, if we “overcorrected,” treating each region as completely separate, each individual analysis could be underpowered. Enter the hero of our story: Bayesian hierarchical modeling. Using a practical example in Pymc3, we’ll follow this hero as they identify and overcome clustered data-sets.



#bayesian  #data  #clustering #pydata #py 

Bayesian Modeling for Data Clustering
1.05 GEEK