[Avg. reading time: 6 minutes]

Model Serving

mlflow server

Instantly turn a registered model into a REST API endpoint.

Make sure the MLFlow is still running as per the example.

mlflow server --host 127.0.0.1 --port 8080 \
--backend-store-uri sqlite:///mlflow.db

Windows

SET MLFLOW_TRACKING_URI=http://127.0.0.1:8080

MAC/Linux

export MLFLOW_TRACKING_URI=http://127.0.0.1:8080

Serve the Model

mlflow models serve \
  -m "models:/Linear_Regression_Model/1" \
  --host 127.0.0.1 \
  --port 5001 \
  --env-manager local

Use the Model

curl -X POST "http://127.0.0.1:5001/invocations" \
  -H "Content-Type: application/json" \
  --data '{"inputs": [{"ENGINESIZE": 2.0}, {"ENGINESIZE": 3.0}, {"ENGINESIZE": 4.0}]}'

Pros

  • Zero-code serving: Just one CLI command — no need to build an API yourself.
  • Auto-handles environment: Loads dependencies automatically.
  • Ideal for testing and demos.
  • Supports model URIs.

Cons

  • Single-threaded process.
  • Limited customization.
  • Minimal built in monitoring.
  • Not suited for blue-green / CICD promotion pipelines.

Fast API

  • Modern, high-performance Python web framework for building REST APIs.

  • FastAPI turns Python functions into fully documented, high-performance REST APIs with minimal code.

  • Built on ASGI (Asynchronous Server Gateway Interface) .

  • Designed for speed, type safety, and developer productivity.

Key Features

  • Fast execution: Comparable to Node.js & Go — async by design.
  • Automatic validation: Uses Pydantic models to validate and parse JSON inputs.
  • Auto-generated API docs: Swagger UI available at /docs, ReDoc at /redoc.
  • Type hints = API schema: Python typing directly defines request/response schema.
  • Easy to test & extend: Works great with Docker, CI/CD, and modern MLOps stacks.
  • Supports both sync & async: You can mix blocking ML inference and async endpoints.
export MLFLOW_TRACKING_URI=http://127.0.0.1:8080

Open uni_multi_model in VSCode

cd uni_multi_model
uvicorn fast_app:app --host 127.0.0.1 --port 5002

Uvicorn

  • Python runtime Application server used to run Python app code.
  • A lightweight, lightning-fast ASGI server (ASGI = Asynchronous Server Gateway Interface).
  • Built on uvloop (fast event loop) and httptools (HTTP parser), with native WebSocket support.
  • Works great with FastAPI, Pydandic.

#modelserving #mlflow #fastapiVer 0.3.6

Last change: 2025-12-02