Some thoughts from Google Cloud's "Let's Talk AI" virtual conference
Yesterday I was fortunate to attend Google Cloud's virtual "Let's Talk AI" conference, where we heard from Google Cloud's leadership, product managers, and customers on the future of Analytics, Machine Learning, and AI. I chose to attend the breakout sessions in the "Data and Analytics" track, and found the "Zen Guide to Prepping Data for ML" to be especially practical and insightful!
There were three tenets to Machine Learning Best practices that hit home. Let us know what you think of them in the comments! Any you would add to this list?
- Empower all your analysts with data - Here, Google Cloud featured their platform to store data from a myriad of sources. More generally however, providing data professionals with the datasets to build models fosters development of new and exciting data models!
- Look at your outliers - I cannot agree with this more. Outliers are the interesting points of your data. As my statistics professor would say, "Once upon a time, a person would do <normal behavior>, and then one day they <outlier>." Outliers make up the exciting stories. Oftentimes, it may be tempting to discard an outlier if it appears to be skewing the results of your model. However, the key here is to flag these outliers and understand them further. If most customers spend $100 on your platform, and you have one who wants to spend $10,000, your business would be very interested in knowing how to provide recommendations to that customer!
- Monitor your model and Iterate - Every model is built on assumptions, and that isn't a bad thing. In fact, it is practically impossible to know everything about a system and be able to include those factors in a model. However, we need to iterate on the model. As new data comes in, we need to both build it into the model and ensure that our assumptions still hold. If they don't, that signals an opportunity to reassess the model!