Start Containerizing your LLM Apps
A hobby project to learn containerization, cloud deployment, and automation with Ollama, MCP, and CI/CD
Containerizing LLM Apps
If you wanted to gain real-world experience in containerizing applications, and get on the LLM train, start a hobby project: package an open-source LLM and a simple MCP orchestrator, then deploy it so you can share it with your friends to test.
This guide walks through the process and shares tips for cloud deployment and automation.
✏️ Prerequisites
- Ollama (for LLM)
- Node.js (for MCP server)
- Docker
- (Optional) Free-tier AWS or GCP accounts for cloud deployment
💬 Step 1: Get the Model (GGUF)
Download the GGUF file for your chosen model (e.g., from Hugging Face):
wget https://huggingface.co/TheBloke/gpt-oss-GGUF/resolve/main/gpt-oss.Q4_K_M.gguf -O ./models/gpt-oss.gguf
Model size and hardware
Many GGUF models are large (hundreds of MBs to multiple GBs). Before you download:
- Check the model page for the exact file size and quantization options (Q4, Q8, etc.).
- CPU-only machines can run quantized models slowly; for comfortable performance consider a machine with a GPU or high-core-count CPU and >=16GB RAM for mid-sized models. Smaller quantized models can run on modern laptops but expect higher latency.
🐳 Step 2: Dockerize Ollama with the Model
There are multiple models to choose from, in this excerpt, we're using gpt-oss.
Create a Dockerfile to build a container with Ollama and your GGUF model:
FROM ollama/ollama:latest
COPY models/gpt-oss.gguf /models/gpt-oss.gguf
RUN ollama create gpt-oss -f /models/gpt-oss.gguf
EXPOSE 11434
CMD ["ollama", "serve"]
Build and run the container locally:
docker build -t my-ollama-llm .
docker run -p 11434:11434 my-ollama-llm
🦾 Step 3: Minimal MCP Orchestrator
Create a simple MCP server that acts as an orchestrator, forwarding prompts to your Ollama container:
// server.js
const express = require('express');
const fetch = require('node-fetch');
const app = express();
app.use(express.json());
// Simple API key middleware (keep this secret!)
const REQUIRED_API_KEY = process.env.API_KEY || 'CHANGE_ME';
function checkApiKey(req, res, next) {
const key = req.headers['x-api-key'] || req.query.api_key;
if (!key || key !== REQUIRED_API_KEY) {
return res.status(401).json({ error: 'Unauthorized' });
}
next();
}
app.post('/api/llm', checkApiKey, async (req, res) => {
const { prompt } = req.body;
// Refer to Ollama docs: https://github.com/ollama/ollama/blob/main/docs/api.md
const ollamaRes = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: 'gpt-oss', prompt })
});
const data = await ollamaRes.json();
res.json({ response: data.response });
});
app.listen(3000, () => {
console.log('MCP orchestrator running on http://localhost:3000');
});
🗄️ Step 4: Deploying to the Cloud (AWS/GCP)
To make your model accessible to others, push your Docker image to a cloud registry and deploy it:
- AWS:
- Push your image to Amazon ECR
- Deploy with ECS, EC2, or Lambda (for serverless)
- GCP:
- Push your image to Google Container Registry (GCR)
- Deploy with Cloud Run, GKE, or Compute Engine
Example (AWS ECR):
# Authenticate Docker to ECR
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
# Tag and push
docker tag my-ollama-llm:latest <account>.dkr.ecr.<region>.amazonaws.com/my-ollama-llm:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/my-ollama-llm:latest
🛟 Usage & Safety Notes
- Cloud: Secure your endpoints and restrict access as needed.
- For Learning: This project is for hands-on experience with containerization and cloud deployment only.
➡️ Conclusion & Next Steps
Containerizing LLM apps is a great way to gain real-world experience with Docker, cloud deployment, and automation. Once your app is running, consider setting up CI/CD pipelines (e.g., GitHub Actions) to automate your build and deploy steps, making it even easier to share and iterate.