Start Containerizing your LLM Apps

A hobby project to learn containerization, cloud deployment, and automation with Ollama, MCP, and CI/CD

Containerizing LLM Apps

If you wanted to gain real-world experience in containerizing applications, and get on the LLM train, start a hobby project: package an open-source LLM and a simple MCP orchestrator, then deploy it so you can share it with your friends to test.

This guide walks through the process and shares tips for cloud deployment and automation.

✏️ Prerequisites

  • Ollama (for LLM)
  • Node.js (for MCP server)
  • Docker
  • (Optional) Free-tier AWS or GCP accounts for cloud deployment

💬 Step 1: Get the Model (GGUF)

Download the GGUF file for your chosen model (e.g., from Hugging Face):

wget https://huggingface.co/TheBloke/gpt-oss-GGUF/resolve/main/gpt-oss.Q4_K_M.gguf -O ./models/gpt-oss.gguf

Model size and hardware

Many GGUF models are large (hundreds of MBs to multiple GBs). Before you download:

  • Check the model page for the exact file size and quantization options (Q4, Q8, etc.).
  • CPU-only machines can run quantized models slowly; for comfortable performance consider a machine with a GPU or high-core-count CPU and >=16GB RAM for mid-sized models. Smaller quantized models can run on modern laptops but expect higher latency.

🐳 Step 2: Dockerize Ollama with the Model

There are multiple models to choose from, in this excerpt, we're using gpt-oss.

Create a Dockerfile to build a container with Ollama and your GGUF model:

FROM ollama/ollama:latest
COPY models/gpt-oss.gguf /models/gpt-oss.gguf
RUN ollama create gpt-oss -f /models/gpt-oss.gguf
EXPOSE 11434
CMD ["ollama", "serve"]

Build and run the container locally:

docker build -t my-ollama-llm .
docker run -p 11434:11434 my-ollama-llm

🦾 Step 3: Minimal MCP Orchestrator

Create a simple MCP server that acts as an orchestrator, forwarding prompts to your Ollama container:

// server.js
const express = require('express');
const fetch = require('node-fetch');
const app = express();

app.use(express.json());

// Simple API key middleware (keep this secret!)
const REQUIRED_API_KEY = process.env.API_KEY || 'CHANGE_ME';
function checkApiKey(req, res, next) {
  const key = req.headers['x-api-key'] || req.query.api_key;
  if (!key || key !== REQUIRED_API_KEY) {
    return res.status(401).json({ error: 'Unauthorized' });
  }
  next();
}

app.post('/api/llm', checkApiKey, async (req, res) => {
  const { prompt } = req.body;
  // Refer to Ollama docs: https://github.com/ollama/ollama/blob/main/docs/api.md
  const ollamaRes = await fetch('http://localhost:11434/api/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ model: 'gpt-oss', prompt })
  });
  const data = await ollamaRes.json();
  res.json({ response: data.response });
});

app.listen(3000, () => {
  console.log('MCP orchestrator running on http://localhost:3000');
});

🗄️ Step 4: Deploying to the Cloud (AWS/GCP)

To make your model accessible to others, push your Docker image to a cloud registry and deploy it:

  • AWS:
    • Push your image to Amazon ECR
    • Deploy with ECS, EC2, or Lambda (for serverless)
  • GCP:
    • Push your image to Google Container Registry (GCR)
    • Deploy with Cloud Run, GKE, or Compute Engine

Example (AWS ECR):

# Authenticate Docker to ECR
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
# Tag and push
docker tag my-ollama-llm:latest <account>.dkr.ecr.<region>.amazonaws.com/my-ollama-llm:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/my-ollama-llm:latest

🛟 Usage & Safety Notes

  • Cloud: Secure your endpoints and restrict access as needed.
  • For Learning: This project is for hands-on experience with containerization and cloud deployment only.

➡️ Conclusion & Next Steps

Containerizing LLM apps is a great way to gain real-world experience with Docker, cloud deployment, and automation. Once your app is running, consider setting up CI/CD pipelines (e.g., GitHub Actions) to automate your build and deploy steps, making it even easier to share and iterate.