Introduction

StreamSQL Docs
HOME PRICING SIGN UP
Search…
Introduction
Getting Started
Guide
API Overview
Connect and Upload Data
Transform and Join Data
Generate Training Data
Examples
(Local) Iris Dataset
Join our Slack Community
Introduction
                                    StreamSQL is a feature
                                        store for machine learning.

                                    StreamSQL
                                        accelerates machine learning development by:

                              1.
                                          Generating model
                                              features for serving using declarative definitions
2.
                                          Generating training
                                              sets using the same feature definitions as serving
3.
                                          Versioning,
                                              monitoring, and managing features
4.
                                          Allowing features to
                                              be shared, re-used, and discovered features across teams and
                                              models

                            

                                    How it
                                            works

                                  

                                    The general workflow to
                                        getting the feature store up and running is to:

                              1.
                                          Connect your data
                                              sources or upload data directly to StreamSQL
2.
                                          Optionally transform
                                              and join your data using SQL
3.
                                          Register your feature
                                              definitions
4.
                                          Serve features in
                                              production or generate training datasets from your labels
                                        

                            

                                    At any point, you can
                                        also:

                              
                                        
                                      

                                          Add new data sources
                                              or transformations

                                        
                                      

                                          Create or evolve
                                              features

                                        
                                      

                                          Analyze and discover
                                              features in the feature registry

                            

                                    Why to use
                                            StreamSQL

                                  

                                  Guarantee consistent
                                          features between training and serving

                                

                                     StreamSQL allow new model
                                        features to be deployed confidently and with ease. It uses the feature
                                        definitions that you declare to generate training datasets and to serve the same
                                        features in production. This removes all the time spent re-engineering model
                                        pipelines to generate the serving features, and removes a class of bugs stemming
                                        from inconsistent features in serving and production.

                                  Maintain a single source
                                          of truth for features

                                

                                    StreamSQL allows
                                        organizations to keep a repository of versioned features. It's common for
                                        multiple models to require essentially the same features. Without a central
                                        feature repository, teams will have to build and maintain their own feature
                                        generation pipelines. This can lead to a large amount of inconsistent features
                                        trying to model the same thing, and tons of wasted time and repeated
                                        effort.

                                  Share and re-use features
                                          across teams and models

                                

                                    StreamSQL allows feature
                                        engineering advancements made by one team to be shared by others. Feature
                                        engineering is a creative and time-consuming effort. By treating features as
                                        building blocks for your models, teams can share and re-use features to increase
                                        model performance across the organization.

                                  Unify stream and batch
                                          processing for feature generation

                                

                                    StreamSQL allows machine
                                        learning teams to think at a higher level of abstraction then is possible with
                                        Flink and Spark. Files, tables, and streams can be connected to StreamSQL and
                                        then transformed and joined using SQL before being turned into features. Once
                                        the data is prepared features may be defined declaratively and StreamSQL will
                                        handle generating them for training and serving.

                                  Manage your feature
                                          development with built-in versioning

                                

                                    Good feature management
                                        simplifies and accelerates the machine learning process. Features are defined
                                        with a consistent interface in a central repository. Anyone can dig into how a
                                        feature is being generated and depend on a specific version without breaking
                                        changes. Using the feature registry UI, you can quickly understand the features
                                        datatype and statistical properties.
Getting Started
Last modified 1yr ago
Copy link
Contents
How it works
Why to use StreamSQL
Guarantee consistent features between training and serving
Maintain a single source of truth for features
Share and re-use features across teams and models
Unify stream and batch processing for feature generation
Manage your feature development with built-in versioning