This is representative of how we implemented and created end to end framework to push machine learning models in production.
Working in large organisations give a challenge to how actually run your code in production to create meaningful business value. Often Machine learning models / Analytics sits in source control (e.g git) for long long time before actually running and helping customers. This flow can be represented by figure below in which Data Scientists / Analysts make something ( e.g R model ) and now they do not know or have power to actually test it in a line of fire where customers actually are.
We made a machine learning pipeline using Spark , H2O Sparkling water which gives very nice modular api's to all work from data munging , training , scoring etc. Everything in single code base.
Spark has fundamentally changed Bigdata world. All the work which we were doing in different sets of tools in past have now unified into one single Swiss knife.
Our new machine learning pipeline looks like as shown below.
- Data scientists / Analysts working on specific use case use Spark + Sparkling water to create machine learning models.
- Commit there model in git
- Code is build in Jenkins to create jar/rpm artifacts which are stored in Nexus
- The deployment is automated via the Chef
The whole time to actually running model in production is now short circuit to as small as time taken to train new model + 5 mins.
Data scientists have full power to push any new model to production without going through huge bureaucracy and do A/B testing for the new ideas. All they have to do is just push new code / follow standard peer code review process.
A organisation who can quickly try out new things can fail quickly , learn quickly and innovate quickly. This is kind of culture we are trying to create. Data driven culture of experimentation.
If you liked reading this , you will also like reading my upcoming book Apache Oozie Essentials , its a use case driven Oozie implementation. This book is sprinkled with the examples and exercises to help you take your big data learning to the next level and you will also get to read my lovely memories in form of Bed Time stories from my awesome Bigdata implementation at Commonwealth Bank Australia.