From google tutorial we know how to train a model in TensorFlow. But what is the best way to save a trained model, then serve the prediction using a basic minimal python api in production server.
My question is basically for TensorFlow best practices to save the model and serve prediction on live server without compromising speed and memory issue. Since the API server will be running on the background for forever.
A small snippet of python code will be appreciated.
TensorFlow Serving is a high performance, open source serving system for machine learning models, designed for production environments and optimized for TensorFlow. The initial release contains C++ server and Python client examples based on gRPC. The basic architecture is shown in the diagram below.
To get started quickly, check out the tutorial.