MLOps Pipeline: Game Recommender System

1. Data Ingestion & Versioning
Raw Data (CSV Files)

User interaction and game metadata are stored as raw CSV files.

DVC (Data Version Control)

DVC tracks large data files, ensuring reproducibility without committing them to Git.

Google Cloud Storage (Remote)

GCS acts as the central remote storage for our DVC-tracked data.

2. Jenkins Automation Pipeline
Trigger (Git Push to `main`)

The entire pipeline is automatically triggered by a code change in the main branch.

DVC PullDownloads the versioned data from GCS into the Jenkins workspace.
Data PreprocessingRuns the script to clean, merge, and transform the raw data.
Model TrainingTrains the TensorFlow/Keras model on the processed data and saves the model artifact.
Docker BuildBuilds a Docker image containing the Flask app and the trained model.
Push to GCRPushes the newly built Docker image to Google Container Registry.
Deploy to GKEApplies the Kubernetes configuration to deploy the new image to the GKE cluster.
3. Application Deployment & Serving
Google Container Registry (GCR)

Stores and manages our application's Docker images.

Google Kubernetes Engine (GKE)

Orchestrates and runs our application containers, handling scaling and availability.

Flask Web App (UI)

The final user-facing application that serves game recommendations.