Product Overview

Groot is a Data scientist’s tool to solve data problems using a conversational interface. Its purpose is to go from data to insights faster.

Demo

Hosted at: http://groot.imaginea.com/

Pre-recorded demo of a sample Groot session

Early story

We built Groot to solve our problems as consulting data scientists. While working on data problems, it was tough to communicate the finds and inferences which we found in data in an exploratory environment. The turn-around time to check the readiness for the data to undergo the model development process was considerable and we felt we could do with better tooling. Groot was built to solve these gaps. At first we featured Groot as part of Infinity 2019 as a prototype to solve the above problems.

Groot engine is the core soul, and to expose the engine capability we plugged it with a conversation engine to bring out its power, we have wrapped most popular data science packages such as Sklearn, Scipy and Snorkel.

Over time, the plan changed and we thought Groot can take a bigger route in solving an end to end platform for data scientists.

Groot Today

Groot is capable of performing EDA, basic feature engineering, build and deploy models and serve to customers. The time to model data is a few minutes once the users are aware of the Groot environment.

Features

Visualization: We can use this feature to start with, if we don't find an entry point to our data analysis. Visualize feature provides you the important charts you need to know. The chart's importance is gained using probabilistic approaches based on entropy, cardinality, etc.

Data labeling: Groot can be used to prepare datasets. The labels are prepared using labeling functions using plain text language.

Machine learning: Groot currently supports all supervised ML algorithms in sklearn. The user can select the choice of  ML algorithm to use, and the training features and the class variables to be determined. Groot trains the model over the given data and on completion the test results are produced. We can use different models to find out which ML algorithm is the best fit for the data.

Deployment: ML-flow a model serving platform is used in Groot to serve the models. After finding the best fit model. The user can deploy the model to ML-flow using a “deploy” conversational command to deploy the model. The model is served as a REST-api and it can be consumed by external users outside Groot.

Inference: We can use the served model for prediction. Right now this feature is not smooth conversation. We need to take a syntactic approach to do this. You can use the help feature for easiness.

Containerization: Groot can run as an on-prem platform since its modules are dockerized.

Data-isolation: The users data are isolated from other users. Groot also provides flexibility to share the work with other users and also to collaborate on development.