⚡
Usama's Lab
>Home>Projects>Research>Blog
GitHubTwitterLinkedIn
status: researching
Download CV
>Home>Projects>Research>Blog
status: researching
Download CV

Connect

Let's build something together

Open to research collaborations, consulting opportunities, and conversations about AI/ML, medical imaging, and industrial systems.

get in touch→

Find me elsewhere

GitHub
@Usamarana01
Twitter
@UsamaRajput01
LinkedIn
/in/muhammad-usama-0307aa1ba
Email
work.muhammadusama@gmail.com
Forged with& code

© 2025 Usama's Lab — All rights reserved

back to blog
ML Engineering

Building Real-Time ML Pipelines: From Batch to Streaming

Transitioning from batch processing to real-time ML inference using Apache Kafka and FastAPI. Architecture patterns and production lessons learned.

MU

Muhammad Usama

Senior AI/ML Engineer

Aug 10, 202413 min read
#ml-ops#kafka#streaming#fastapi

Building Real-Time ML Pipelines

Batch ML pipelines process data in scheduled intervals. Real-time systems respond instantly to new data, enabling use cases like fraud detection, recommendation systems, and predictive maintenance.

Batch vs. Streaming

Batch Processing

  • Latency: Minutes to hours
  • Use case: Daily reports, model retraining
  • Tools: Airflow, Spark, dbt

Streaming Processing

  • Latency: Milliseconds to seconds
  • Use case: Fraud detection, live recommendations
  • Tools: Kafka, Flink, FastAPI

Architecture Overview

Data Sources -> Kafka -> ML Service -> Predictions -> Downstream Systems
                  |
              Feature Store

Implementation with FastAPI + Kafka

1. Kafka Producer

python
from kafka import KafkaProducer import json producer = KafkaProducer( bootstrap_servers=['localhost:9092'], value_serializer=lambda v: json.dumps(v).encode('utf-8') ) # Stream events producer.send('ml-features', {'user_id': 123, 'features': [...]})

2. ML Inference Service

python
from fastapi import FastAPI from kafka import KafkaConsumer app = FastAPI() model = load_model() @app.post("/predict") async def predict(features: Features): prediction = model.predict(features.values) return {"prediction": prediction}

3. Stream Consumer

python
consumer = KafkaConsumer( 'ml-features', bootstrap_servers=['localhost:9092'], value_deserializer=lambda m: json.loads(m.decode('utf-8')) ) for message in consumer: features = message.value prediction = await predict(features) producer.send('predictions', prediction)

Feature Store Integration

A feature store provides:

  • Low-latency access to precomputed features
  • Feature versioning and lineage tracking
  • Online/offline consistency
  • Feature sharing across teams

Example: Feast Feature Store

python
from feast import FeatureStore store = FeatureStore(repo_path=".") features = store.get_online_features( features=["user:age", "user:country"], entity_rows=[{"user_id": 123}] ).to_dict()

Challenges & Solutions

1. Model Latency

Problem: Model inference too slow for real-time requirements

Solutions:

  • Model quantization (TensorRT, ONNX)
  • Model distillation
  • Batch inference
  • Model caching

2. Data Quality

Problem: Missing or invalid features in production

Solutions:

  • Feature validation with Pydantic
  • Default values for missing features
  • Monitoring and alerting
  • Circuit breakers

3. Scalability

Problem: High traffic volumes

Solutions:

  • Horizontal scaling with Kubernetes
  • Load balancing
  • Asynchronous processing
  • Feature precomputation

Monitoring

Key metrics to track:

  • Inference latency: p50, p95, p99
  • Throughput: Requests per second
  • Error rate: Failed predictions
  • Model drift: Prediction distribution changes

Production Lessons

  1. Start simple: Begin with synchronous API, add streaming later
  2. Monitor everything: Latency, accuracy, data quality
  3. Plan for failures: Implement retries, fallbacks, circuit breakers
  4. Test thoroughly: Load testing, chaos engineering
  5. Version models: Track model versions deployed in production

Performance Results

Our production system handles:

  • 10,000 predictions/second
  • <50ms p95 latency
  • 99.9% uptime
  • Real-time feature updates

Technologies: Python, FastAPI, Apache Kafka, Redis, Docker, Kubernetes, Prometheus, Grafana

share
share: