diff --git a/PrivateAI/gpu-cuda-install/images/img01.png b/PrivateAI/gpu-cuda-install/images/img01.png new file mode 100644 index 0000000..916ebb8 Binary files /dev/null and b/PrivateAI/gpu-cuda-install/images/img01.png differ diff --git a/PrivateAI/gpu-cuda-install/images/img02.png b/PrivateAI/gpu-cuda-install/images/img02.png new file mode 100644 index 0000000..30343a7 Binary files /dev/null and b/PrivateAI/gpu-cuda-install/images/img02.png differ diff --git a/PrivateAI/gpu-cuda-install/readme.md b/PrivateAI/gpu-cuda-install/readme.md new file mode 100644 index 0000000..dae4976 --- /dev/null +++ b/PrivateAI/gpu-cuda-install/readme.md @@ -0,0 +1,618 @@ +# GPU Drivers, CUDA Toolkit & LocalAGI Setup + +> Converted from `GPU Drivers CUDA tool kit LocalAGI.docx` + + + +GPU Drivers and CUDA tool kit + + +Repaired drivers and installed CUDA 13 using the following outline process + + + +![Image](images/img01.png) + + +Did the following so that the cuda-tools are available to users when they login + +![Image](images/img02.png) + + +Additional software installed + +```bash +apt-get install nvtop (shows GPU usage, graphically, nice and useful tool) +``` + + +Docker install + +Followed the official guide as per the following + + + +```bash +Once installed if you get the following error you need to be added to the docker group in /etc/group. Once added you need to log out and log back in. +``` + + +```bash +richard@capstone-gpu1:~$ docker ps +permission denied while trying to connect to the docker API at unix:///var/run/docker.sock +``` + + +```bash +added the docker proxy config per the document “docker with proxy” (in markup link to the document) +``` + + +then installed the nvidia-container toolkit per the instructins from Nvidia. + + +# Add the NVIDIA container toolkit repo + +distribution=$(. /etc/os-release; echo $ID$VERSION_ID) + + +```bash +curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ +``` + +| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg + + +```bash +curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list \ +``` + +| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' \ + +| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list + + +```bash +sudo apt-get update +sudo apt-get install -y nvidia-container-toolkit +``` + + +```bash +Then configure docker to use it +sudo nvidia-ctk runtime configure --runtime=docker +sudo systemctl restart docker +``` + + +localAGI Install + + +```bash +git clone +``` + + +added a + +```bash +docker-compose.local.yaml +services: +``` + +localagi: + +image: localai/localagi:latest + +container_name: localagi + +restart: unless-stopped + +# Internal only — not exposed to host + +expose: + +- "8080" + +environment: + +## - Tz=Utc + +# optional volumes: + +# volumes: + +# - ./models:/app/models + +networks: + +- internal + +build: + +context: . + +```bash +dockerfile: Dockerfile.webui +``` + +args: + +## Http_Proxy: ${Http_Proxy} + +## Https_Proxy: ${Https_Proxy} + +## No_Proxy: ${No_Proxy} + + +nginx: + +image: nginx:latest + +container_name: localagi-proxy + +restart: unless-stopped + +depends_on: + +- localagi + +ports: + +- "80:80" + +- "443:443" + +volumes: + +- ./nginx/conf.d:/etc/nginx/conf.d:ro + +- ./nginx/.htpasswd:/etc/nginx/.htpasswd:ro + +# - ./certs:/etc/letsencrypt:ro # optional if using HTTPS + +networks: + +- internal + +- public + + +networks: + +internal: + +internal: true + +public: + +driver: bridge + + +nginx config + +In the LocalAGI folder that was created with the git clone create the following hierarchy. + +./nginix + +./nginix/conf.d + + +put the following file localagi.conf in ./nginx/conf.d/ + + +localagi.conf + +############################## + +## # Upstream Definitions # + +############################## + + +upstream localagi_upstream { + +server localagi:3000; + +} + + +upstream localai_upstream { + +server localai:8080; + +} + + +upstream localrecall_upstream { + +server localrecall:8080; + +} + + +############################################### + +# 9000 — LocalAGI WEB UI (REQUIRE BASIC AUTH) # + +############################################### + +server { + +listen 9000; + +server_name _; + + +auth_basic "LocalAGI Web UI"; + +auth_basic_user_file /etc/nginx/.htpasswd; + + +location / { + +proxy_pass http://localagi_upstream; + +proxy_set_header Host $host; + +proxy_set_header X-Real-IP $remote_addr; + +proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + +proxy_set_header X-Forwarded-Proto $scheme; + +proxy_buffering off; + +} + +} + + +############################################### + +# 9001 — Localrecall (REQUIRE BASIC AUTH) # + +############################################### + +server { + +listen 9080; + +server_name _; + + +auth_basic "Localrecall Web UI"; + +auth_basic_user_file /etc/nginx/.htpasswd; + + +location / { + +proxy_pass http://localrecall_upstream; + +proxy_set_header Host $host; + +proxy_set_header X-Real-IP $remote_addr; + +proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + +proxy_set_header X-Forwarded-Proto $scheme; + +proxy_buffering off; + +} + +} + + +############################################################### + +# 80 — LocalAI API + UI # + +# BASIC AUTH for WEB UI ONLY # + +# NO BASIC AUTH for /v1/* (OpenAI API — token required) # + +############################################################### + +server { + +listen 80; + +server_name _; + + +############################################################### + +# SECTION 1 — OpenAI API (/v1/*) — TOKEN ONLY, NO BASIC AUTH # + +############################################################### + +location /v1/ { + +# NO basic auth here — scripts authenticate via Bearer token + +proxy_pass http://localai_upstream; + + +proxy_set_header Host $host; + +proxy_set_header X-Real-IP $remote_addr; + +proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + +proxy_set_header X-Forwarded-Proto $scheme; + +proxy_buffering off; + +} + + +##################################################### + +# SECTION 2 — LocalAI WEB CONSOLE — BASIC AUTH REQ # + +##################################################### + +location / { + +auth_basic "LocalAI Web Console"; + +auth_basic_user_file /etc/nginx/.htpasswd; + + +proxy_pass http://localai_upstream; + + +proxy_set_header Host $host; + +proxy_set_header X-Real-IP $remote_addr; + +proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + +proxy_set_header X-Forwarded-Proto $scheme; + +proxy_buffering off; + +} + +} + +htpasswd file + +The purpose of the .htpasswd file is to secure the administrative portals of the following applications + +Localai + +Localrecall + +localagi. + + +And in ./nginix create admin users in the .htpasswd file. These are users who need to have permission to create/modify the configuration of the services. Use the following command to add a user + + +htpasswd .htpasswd username + + +make sure that the file has permissions 644 + +LocalAGI Configuration + +```bash +Due to the nature of the deakin environment there are a number of configuration steps required to setup LocalAGI so that it works properly and can reach various external sites for docker images, models and docker builds. +``` + +Docker.webui + +This file needs to be updated to support the Deakin Proxy configuration. + +------------ + +# Use Bun container for building the React UI + +```dockerfile +FROM oven/bun:1 AS ui-builder +``` + +## Arg Http_Proxy + +## Arg Https_Proxy + +## Arg No_Proxy + +```bash +ENV http_proxy=$HTTP_PROXY +ENV https_proxy=$HTTPS_PROXY +ENV no_proxy=$NO_PROXY +``` + + +# Set the working directory for the React UI + +WORKDIR /app + + +# Copy package.json and bun.lockb (if exists) + +COPY webui/react-ui/package.json webui/react-ui/bun.lockb* ./ + + +# Install dependencies + +```bash +RUN bun install --frozen-lockfile +``` + + +# Copy the rest of the React UI source code + +COPY webui/react-ui/ ./ + + +# Build the React UI + +```bash +RUN bun run build +``` + + +# Use a temporary build image based on Golang 1.24-alpine + +```dockerfile +FROM golang:1.24-alpine AS builder +``` + +## Arg Http_Proxy + +## Arg Https_Proxy + +## Arg No_Proxy + +```bash +ENV http_proxy=$HTTP_PROXY +ENV https_proxy=$HTTPS_PROXY +ENV no_proxy=$NO_PROXY +``` + + +# Define argument for linker flags + +ARG LDFLAGS="-s -w" + + +# Install git + +```bash +RUN apk add --no-cache git +RUN rm -rf /tmp/* /var/cache/apk/* +``` + + +# Set the working directory + +WORKDIR /work + + +# Copy go.mod and go.sum files first to leverage Docker cache + +COPY go.mod go.sum ./ + + +# Download dependencies - this layer will be cached as long as go.mod and go.sum don't change + +```bash +RUN go mod download +``` + + +# Now copy the rest of the source code + +## Copy . . + + +# Copy the built React UI from the ui-builder stage + +COPY --from=ui-builder /app/dist /work/webui/react-ui/dist + + +# Build the application + +```bash +RUN CGO_ENABLED=0 go build -ldflags="$LDFLAGS" -o localagi ./ +``` + + +```dockerfile +FROM ubuntu:24.04 +``` + +## Arg Http_Proxy + +## Arg Https_Proxy + +## Arg No_Proxy + +```bash +ENV http_proxy=$HTTP_PROXY +ENV https_proxy=$HTTPS_PROXY +ENV no_proxy=$NO_PROXY +``` + + +```bash +ENV DEBIAN_FRONTEND=noninteractive +``` + + +# Install runtime dependencies + +```bash +RUN apt-get update && apt-get install -y \ +``` + +ca-certificates \ + +tzdata \ + +```bash +docker.io \ +``` + +bash \ + +wget \ + +curl + + +# Copy the webui binary from the builder stage to the final image + +COPY --from=builder /work/localagi /localagi + + +# Define the command that will be run when the container is started + +ENTRYPOINT ["/localagi"] + +Docker-compose.nvidia.yaml + +.env + +HTTP_PROXY=http://proxy1.it.deakin.edu.au:3128 + +HTTPS_PROXY=http://proxy1.it.deakin.edu.au:3128 + +NO_PROXY=localhost,127.0.0.1 + + +Added /etc/profile.d/proxy.sh + +export HTTP_PROXY=http://proxy1.it.deakin.edu.au:3128 + +export HTTPS_PROXY=http://proxy1.it.deakin.edu.au:3128 + +export NO_PROXY="localhost,127.0.0.1,::1" + + + +Setting up so that you can connect to the OpenAI API via Token Auth but still have basic auth for the admin pannels + + +Generate an API_TOKEN + + + +```bash +openssl rand -hex 32 +``` + + +sk-a07de3d7880e0b602068b3eb58fb784b808b724278aa775bac66953c11b3c4ff + + + + + + + diff --git a/PrivateAI/localai/docker-compose.yaml b/PrivateAI/localai/docker-compose.yaml new file mode 100644 index 0000000..0cba78c --- /dev/null +++ b/PrivateAI/localai/docker-compose.yaml @@ -0,0 +1,86 @@ +services: + localai: + container_name: local-ai + hostname: localai + image: localai/localai:latest-gpu-nvidia-cuda-12 + restart: unless-stopped + ports: + - 4000:8080 + #network_mode: host + runtime: nvidia + deploy: {} + + # Compose v2: + gpus: all + + environment: + # Keep core behaviour + #- LOCALAI_SINGLE_ACTIVE_BACKEND=true + # Outbound proxy for model/gallery downloads + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + # Don't proxy internal Docker traffic + - NO_PROXY=localhost,127.0.0.1,::1,localai,postgres,mcphub,mcp-hub-mcphub-1 + + #- BACKENDS=llama-cpp + #- DISABLE_BACKEND_AUTODETECT=true + - AUTO_LOAD_MODELS=false + #- AUTO_UPDATE_MODELS=true + - DISABLE_TELEMETRY=true + - HEALTHCHECKS=false + - DISABLE_GRAMMAR=true + - DISABLE_TOKENIZER_CHECKS=true + #- DEFAULT_MODEL=llama-3.3-70b-instruct + - MCP_HEADERS={"Accept":"application/json, text/event-stream"} + - NVIDIA_VISIBLE_DEVICES=all + - NVIDIA_DRIVER_CAPABILITIES=compute,utility + - DEBUG=true + - LOCALAI_LOG_LEVEL=debug + #- LOGLEVEL=trace + #- LOCALAI_AUTOLOAD_GALLERIES=false + # - LOCALAI_GALLERIES=[] + #- LOCALAI_DATA_PATH=/data + # PostgreSQL-backed knowledge base + - LOCALAI_AGENT_POOL_VECTOR_ENGINE=postgres + - LOCALAI_AGENT_POOL_DATABASE_URL=postgresql://localrecall:localrecall@postgres:5432/localrecall?sslmode=disable + #- LOCALAI_AGENT_POOL_DEFAULT_MODEL=hermes-3-llama3.1-8b-lorablated + # disabled this and nominated gemma4 so that we don't double up on models that are running, save some GPU memory + - LOCALAI_AGENT_POOL_DEFAULT_MODEL=gemma-4-e4b-it + - LOCALAI_AGENT_POOL_EMBEDDING_MODEL=granite-embedding-107m-multilingual + - LOCALAI_AGENT_POOL_ENABLE_SKILLS=true + - LOCALAI_AGENT_POOL_ENABLE_LOGS=true + logging: + driver: "json-file" + options: + max-size: "20m" + max-file: "5" + volumes: + - /opt/redback/privateai/volumes/models:/models:cached + - /opt/redback/privateai/volumes/images/:/tmp/generated/images/ + - /opt/redback/privateai/volumes/backends:/usr/share/localai/backends + - /opt/redback/privateai/volumes/localai_data:/data + - /opt/redback/privateai/volumes/localai_config:/etc/localai + + + # Make libcuda visible to backends that overwrite LD_LIBRARY_PATH: + #- /usr/lib/x86_64-linux-gnu/libcuda.so.1:/backends/cuda12-stablediffusion-ggml/lib/libcuda.so.1:ro + #- /usr/lib/x86_64-linux-gnu/libcuda.so.1:/backends/cuda12-llama-cpp/lib/libcuda.so.1:ro + # + # + + postgres: + image: quay.io/mudler/localrecall:v0.5.2-postgresql + environment: + - POSTGRES_DB=localrecall + - POSTGRES_USER=localrecall + - POSTGRES_PASSWORD=localrecall + + # Runtime: don't force HTTP(S)_PROXY, just no-proxy for internal services + - NO_PROXY=localhost,127.0.0.1,::1,localai,postgres,mcphub,mcp-hub-mcphub-1 + volumes: + - /opt/redback/privateai/volumes/localai_data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U localrecall"] + interval: 10s + timeout: 5s + retries: 5 diff --git a/PrivateAI/localai/env b/PrivateAI/localai/env new file mode 100644 index 0000000..2ccbaae --- /dev/null +++ b/PrivateAI/localai/env @@ -0,0 +1,7 @@ +HTTP_PROXY=http://proxy1.it.deakin.edu.au:3128 +HTTPS_PROXY=http://proxy1.it.deakin.edu.au:3128 + +LOCALAI_API_KEY=sk-changethistobethelocalaiapikeyforclients +MODEL_NAME=gemma-3-4b-it-qat +MULTIMODAL_MODEL=moondream2-20250414 +IMAGE_MODEL=sd-1.5-ggml diff --git a/PrivateAI/localai/readme.md b/PrivateAI/localai/readme.md new file mode 100644 index 0000000..bc08646 --- /dev/null +++ b/PrivateAI/localai/readme.md @@ -0,0 +1,730 @@ +# LocalAI GPU Service with PostgreSQL Knowledge Base + +This compose service runs LocalAI with NVIDIA CUDA 12 GPU support and a PostgreSQL-backed LocalRecall knowledge base. + +It is designed for a private AI stack where LocalAI provides the OpenAI-compatible API, model hosting, agent memory, skills, image outputs, and backend management. + +To setup you will need to copy the env file to .env and then generate a new API key for clients to connect. + +## Service Overview + +```yaml +services: + localai: + container_name: local-ai + hostname: localai + image: localai/localai:latest-gpu-nvidia-cuda-12 + restart: unless-stopped + ports: + - 4000:8080 + #network_mode: host + runtime: nvidia + deploy: {} + + # Compose v2: + gpus: all + + environment: + # Keep core behaviour + #- LOCALAI_SINGLE_ACTIVE_BACKEND=true + # Outbound proxy for model/gallery downloads + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + # Don't proxy internal Docker traffic + - NO_PROXY=localhost,127.0.0.1,::1,localai,postgres,mcphub,mcp-hub-mcphub-1 + + #- BACKENDS=llama-cpp + #- DISABLE_BACKEND_AUTODETECT=true + - AUTO_LOAD_MODELS=false + #- AUTO_UPDATE_MODELS=true + - DISABLE_TELEMETRY=true + - HEALTHCHECKS=false + - DISABLE_GRAMMAR=true + - DISABLE_TOKENIZER_CHECKS=true + #- DEFAULT_MODEL=llama-3.3-70b-instruct + - MCP_HEADERS={"Accept":"application/json, text/event-stream"} + - NVIDIA_VISIBLE_DEVICES=all + - NVIDIA_DRIVER_CAPABILITIES=compute,utility + - DEBUG=true + - LOCALAI_LOG_LEVEL=debug + #- LOGLEVEL=trace + #- LOCALAI_AUTOLOAD_GALLERIES=false + # - LOCALAI_GALLERIES=[] + #- LOCALAI_DATA_PATH=/data + # PostgreSQL-backed knowledge base + - LOCALAI_AGENT_POOL_VECTOR_ENGINE=postgres + - LOCALAI_AGENT_POOL_DATABASE_URL=postgresql://localrecall:localrecall@postgres:5432/localrecall?sslmode=disable + #- LOCALAI_AGENT_POOL_DEFAULT_MODEL=hermes-3-llama3.1-8b-lorablated + # disabled this and nominated gemma4 so that we don't double up on models that are running, save some GPU memory + - LOCALAI_AGENT_POOL_DEFAULT_MODEL=gemma-4-e4b-it + - LOCALAI_AGENT_POOL_EMBEDDING_MODEL=granite-embedding-107m-multilingual + - LOCALAI_AGENT_POOL_ENABLE_SKILLS=true + - LOCALAI_AGENT_POOL_ENABLE_LOGS=true + logging: + driver: "json-file" + options: + max-size: "20m" + max-file: "5" + volumes: + - /opt/redback/privateai/volumes/models:/models:cached + - /opt/redback/privateai/volumes/images/:/tmp/generated/images/ + - /opt/redback/privateai/volumes/backends:/usr/share/localai/backends + - /opt/redback/privateai/volumes/localai_data:/data + - /opt/redback/privateai/volumes/localai_config:/etc/localai + + + # Make libcuda visible to backends that overwrite LD_LIBRARY_PATH: + #- /usr/lib/x86_64-linux-gnu/libcuda.so.1:/backends/cuda12-stablediffusion-ggml/lib/libcuda.so.1:ro + #- /usr/lib/x86_64-linux-gnu/libcuda.so.1:/backends/cuda12-llama-cpp/lib/libcuda.so.1:ro + # + # + + postgres: + image: quay.io/mudler/localrecall:v0.5.2-postgresql + environment: + - POSTGRES_DB=localrecall + - POSTGRES_USER=localrecall + - POSTGRES_PASSWORD=localrecall + + # Runtime: don't force HTTP(S)_PROXY, just no-proxy for internal services + - NO_PROXY=localhost,127.0.0.1,::1,localai,postgres,mcphub,mcp-hub-mcphub-1 + volumes: + - /opt/redback/privateai/volumes/localai_data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U localrecall"] + interval: 10s + timeout: 5s + retries: 5 +``` + +## What This Stack Provides + +This compose file starts two main services: + +- `localai` — the LocalAI inference/API service using the CUDA 12 NVIDIA GPU image. +- `postgres` — a PostgreSQL-backed LocalRecall database used by LocalAI agent memory and knowledge base features. + +LocalAI is exposed on host port `4000`, mapped to container port `8080`. + +The API endpoint is therefore: + +```text +http://localhost:4000 +``` + +From another Docker container on the same network, the service should be reachable as: + +```text +http://localai:8080 +``` + +## LocalAI Service + +### Image + +```yaml +image: localai/localai:latest-gpu-nvidia-cuda-12 +``` + +This uses the LocalAI GPU image built for NVIDIA CUDA 12. + +This is suitable for hosts with NVIDIA GPUs, the NVIDIA driver installed, Docker installed, and NVIDIA Container Toolkit configured. + +### Container Name and Hostname + +```yaml +container_name: local-ai +hostname: localai +``` + +The explicit hostname `localai` is useful for internal Docker service resolution and for other services that need to call the LocalAI API. + +### Port Mapping + +```yaml +ports: + - 4000:8080 +``` + +This maps the LocalAI API to the host on port `4000`. + +Use: + +```bash +curl http://localhost:4000/v1/models +``` + +Or from a remote machine: + +```bash +curl http://:4000/v1/models +``` + +## GPU Configuration + +The service enables NVIDIA GPU access using both the legacy runtime setting and Compose v2 GPU syntax: + +```yaml +runtime: nvidia +gpus: all +``` + +The NVIDIA environment variables are: + +```yaml +- NVIDIA_VISIBLE_DEVICES=all +- NVIDIA_DRIVER_CAPABILITIES=compute,utility +``` + +These allow the container to access all visible GPUs for compute workloads and NVIDIA utility functions such as `nvidia-smi`. + +### GPU Validation + +After startup, test GPU visibility: + +```bash +docker exec -it local-ai nvidia-smi +``` + +If this fails, check the host first: + +```bash +nvidia-smi +``` + +Then verify Docker GPU access: + +```bash +docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi +``` + +## Proxy Configuration + +The LocalAI service passes through outbound proxy settings: + +```yaml +- HTTP_PROXY=${HTTP_PROXY} +- HTTPS_PROXY=${HTTPS_PROXY} +``` + +These are useful for model downloads, gallery downloads, and other outbound network access. + +Internal Docker traffic is excluded from the proxy using: + +```yaml +- NO_PROXY=localhost,127.0.0.1,::1,localai,postgres,mcphub,mcp-hub-mcphub-1 +``` + +This prevents calls to nearby containers from being routed through the external proxy. + +If additional internal services are added, append them to `NO_PROXY`. + +Example: + +```yaml +- NO_PROXY=localhost,127.0.0.1,::1,localai,postgres,mcphub,mcp-hub-mcphub-1,litellm,semantic-router +``` + +## Core LocalAI Behaviour + +### Model Autoloading + +```yaml +- AUTO_LOAD_MODELS=false +``` + +Automatic model loading is disabled. + +This helps avoid loading every available model at startup and gives more direct control over GPU memory usage. + +### Telemetry Disabled + +```yaml +- DISABLE_TELEMETRY=true +``` + +Disables telemetry. + +### Healthchecks Disabled + +```yaml +- HEALTHCHECKS=false +``` + +Disables LocalAI healthchecks. + +This can reduce noisy healthcheck behaviour during debugging or when backends take a long time to initialise. + +### Grammar and Tokenizer Checks Disabled + +```yaml +- DISABLE_GRAMMAR=true +- DISABLE_TOKENIZER_CHECKS=true +``` + +These settings reduce startup and runtime issues with some model/backend combinations. + +### Debug Logging + +```yaml +- DEBUG=true +- LOCALAI_LOG_LEVEL=debug +``` + +Enables verbose debug output from LocalAI. + +This is useful when troubleshooting backend loading, model startup, memory issues, MCP connectivity, or knowledge base behaviour. + +## MCP Headers + +```yaml +- MCP_HEADERS={"Accept":"application/json, text/event-stream"} +``` + +This configures request headers for MCP interactions. + +The `text/event-stream` accept value is important for streamable HTTP MCP servers. + +## PostgreSQL-Backed Agent Pool and Knowledge Base + +LocalAI is configured to use PostgreSQL as the vector engine: + +```yaml +- LOCALAI_AGENT_POOL_VECTOR_ENGINE=postgres +``` + +The connection string points to the `postgres` service: + +```yaml +- LOCALAI_AGENT_POOL_DATABASE_URL=postgresql://localrecall:localrecall@postgres:5432/localrecall?sslmode=disable +``` + +The database credentials are defined in the `postgres` service: + +```yaml +- POSTGRES_DB=localrecall +- POSTGRES_USER=localrecall +- POSTGRES_PASSWORD=localrecall +``` + +## Agent Pool Models + +### Default Agent Pool Model + +```yaml +- LOCALAI_AGENT_POOL_DEFAULT_MODEL=gemma-4-e4b-it +``` + +This selects `gemma-4-e4b-it` as the default agent pool model. + +The intent is to avoid doubling up on GPU-heavy models and reduce unnecessary GPU memory use. + +### Embedding Model + +```yaml +- LOCALAI_AGENT_POOL_EMBEDDING_MODEL=granite-embedding-107m-multilingual +``` + +This model is used for embeddings in the LocalAI agent pool and knowledge base workflows. + +## Skills and Logs + +```yaml +- LOCALAI_AGENT_POOL_ENABLE_SKILLS=true +- LOCALAI_AGENT_POOL_ENABLE_LOGS=true +``` + +These enable LocalAI agent skills and logs. + +This is useful when using LocalAI as part of an agent-oriented private AI stack. + +## Volumes + +The service uses host-mounted volumes under: + +```text +/opt/redback/privateai/volumes +``` + +### Model Storage + +```yaml +- /opt/redback/privateai/volumes/models:/models:cached +``` + +Stores LocalAI models. + +Inside the container, models are available at: + +```text +/models +``` + +### Generated Images + +```yaml +- /opt/redback/privateai/volumes/images/:/tmp/generated/images/ +``` + +Stores generated image outputs. + +### Backend Storage + +```yaml +- /opt/redback/privateai/volumes/backends:/usr/share/localai/backends +``` + +Stores LocalAI backend binaries and backend-related files. + +This allows backends to persist across container restarts. + +### LocalAI Data + +```yaml +- /opt/redback/privateai/volumes/localai_data:/data +``` + +Stores LocalAI data. + +Note that the same host path is also used by the PostgreSQL service as its database directory: + +```yaml +- /opt/redback/privateai/volumes/localai_data:/var/lib/postgresql/data +``` + +If you want stricter separation between LocalAI application data and PostgreSQL database files, consider using separate paths, for example: + +```yaml +- /opt/redback/privateai/volumes/localai_data:/data +- /opt/redback/privateai/volumes/postgres_data:/var/lib/postgresql/data +``` + +### LocalAI Config + +```yaml +- /opt/redback/privateai/volumes/localai_config:/etc/localai +``` + +Stores LocalAI configuration files. + +## Logging + +```yaml +logging: + driver: "json-file" + options: + max-size: "20m" + max-file: "5" +``` + +This limits Docker JSON logs to five files of 20 MB each. + +This prevents LocalAI debug logs from filling the host disk. + +## PostgreSQL Service + +The PostgreSQL service uses the LocalRecall PostgreSQL image: + +```yaml +image: quay.io/mudler/localrecall:v0.5.2-postgresql +``` + +It creates a database called: + +```text +localrecall +``` + +With username: + +```text +localrecall +``` + +And password: + +```text +localrecall +``` + +## PostgreSQL Healthcheck + +```yaml +healthcheck: + test: ["CMD-SHELL", "pg_isready -U localrecall"] + interval: 10s + timeout: 5s + retries: 5 +``` + +This checks that PostgreSQL is accepting connections. + +## Suggested Directory Layout + +```text +/opt/redback/privateai/ +└── volumes/ + ├── models/ + ├── images/ + ├── backends/ + ├── localai_data/ + └── localai_config/ +``` + +Create the directories before starting the stack: + +```bash +sudo mkdir -p /opt/redback/privateai/volumes/models +sudo mkdir -p /opt/redback/privateai/volumes/images +sudo mkdir -p /opt/redback/privateai/volumes/backends +sudo mkdir -p /opt/redback/privateai/volumes/localai_data +sudo mkdir -p /opt/redback/privateai/volumes/localai_config +``` + +Set ownership if running Docker as your user: + +```bash +sudo chown -R "$USER:$USER" /opt/redback/privateai/volumes +``` + +## Starting the Stack + +Start both services: + +```bash +docker compose up -d +``` + +Start only PostgreSQL: + +```bash +docker compose up -d postgres +``` + +Start LocalAI: + +```bash +docker compose up -d localai +``` + +Follow logs: + +```bash +docker compose logs -f localai +``` + +Check PostgreSQL logs: + +```bash +docker compose logs -f postgres +``` + +## Basic API Tests + +List models: + +```bash +curl http://localhost:4000/v1/models +``` + +Check LocalAI root endpoint: + +```bash +curl http://localhost:4000 +``` + +Test from another container on the same Docker network: + +```bash +curl http://localai:8080/v1/models +``` + +## Useful Operational Commands + +Pull the latest LocalAI image: + +```bash +docker compose pull localai +``` + +Recreate the service after pulling: + +```bash +docker compose up -d localai +``` + +Restart LocalAI: + +```bash +docker compose restart localai +``` + +View running containers: + +```bash +docker compose ps +``` + +Stop the stack: + +```bash +docker compose down +``` + +Stop the stack and remove anonymous volumes: + +```bash +docker compose down -v +``` + +## Troubleshooting + +### LocalAI cannot see the GPU + +Check the host: + +```bash +nvidia-smi +``` + +Check Docker GPU support: + +```bash +docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi +``` + +Check inside the LocalAI container: + +```bash +docker exec -it local-ai nvidia-smi +``` + +### LocalAI cannot reach PostgreSQL + +Check the PostgreSQL container: + +```bash +docker compose ps postgres +docker compose logs postgres +``` + +Check name resolution from LocalAI: + +```bash +docker exec -it local-ai sh +getent hosts postgres +``` + +Check the PostgreSQL port from inside LocalAI: + +```bash +docker exec -it local-ai sh +nc -vz postgres 5432 +``` + +If `nc` is not installed in the container, use a temporary debug container on the same Docker network. + +### Internal traffic is going through the proxy + +Make sure all internal service names are included in `NO_PROXY`. + +Current value: + +```text +localhost,127.0.0.1,::1,localai,postgres,mcphub,mcp-hub-mcphub-1 +``` + +Add any additional internal services, such as: + +```text +litellm,semantic-router,openwebui,semgrep-mcp +``` + +### Model does not load automatically + +This is expected because: + +```yaml +- AUTO_LOAD_MODELS=false +``` + +Load or select models explicitly through LocalAI configuration, API calls, model galleries, or mounted model files. + +### Debug logs are very noisy + +Debugging is enabled: + +```yaml +- DEBUG=true +- LOCALAI_LOG_LEVEL=debug +``` + +Once the service is stable, reduce log verbosity by setting: + +```yaml +- DEBUG=false +- LOCALAI_LOG_LEVEL=info +``` + +Or remove those environment variables. + +## Notes on Shared `localai_data` + +This compose file maps the same host path to both: + +```text +/data +``` + +For LocalAI, and: + +```text +/var/lib/postgresql/data +``` + +For PostgreSQL. + +That may work depending on the intended LocalAI layout, but it is usually cleaner to separate application data from database data. + +Recommended alternative: + +```yaml +localai: + volumes: + - /opt/redback/privateai/volumes/localai_data:/data + +postgres: + volumes: + - /opt/redback/privateai/volumes/postgres_data:/var/lib/postgresql/data +``` + +This makes backups, restores, and troubleshooting easier. + +## Security Notes + +The database credentials in this compose file are simple defaults: + +```text +localrecall / localrecall +``` + +For production or shared environments, change the database password and update: + +```yaml +LOCALAI_AGENT_POOL_DATABASE_URL +``` + +Do not expose the LocalAI API port directly to untrusted networks without authentication, firewalling, or a reverse proxy. + +## Summary + +This compose stack runs LocalAI with: + +- NVIDIA CUDA 12 GPU support +- OpenAI-compatible API access on host port `4000` +- Persistent model, backend, image, data, and config volumes +- Debug logging enabled +- Proxy-aware outbound access +- PostgreSQL-backed LocalAI agent pool and knowledge base support +- Skills and agent logs enabled +- Docker log rotation to prevent runaway logs diff --git a/PrivateAI/mcphub/Dockerfile b/PrivateAI/mcphub/Dockerfile new file mode 100644 index 0000000..a6c690b --- /dev/null +++ b/PrivateAI/mcphub/Dockerfile @@ -0,0 +1,24 @@ +FROM samanhappy/mcphub:latest + +ARG HTTP_PROXY +ARG HTTPS_PROXY +ARG NO_PROXY + +ENV HTTP_PROXY=${HTTP_PROXY} +ENV HTTPS_PROXY=${HTTPS_PROXY} +ENV NO_PROXY=${NO_PROXY} +ENV http_proxy=${HTTP_PROXY} +ENV https_proxy=${HTTPS_PROXY} +ENV no_proxy=${NO_PROXY} + +WORKDIR /app + +# Make npm aware of the proxy during build, if provided +RUN if [ -n "$HTTP_PROXY" ]; then npm config set proxy "$HTTP_PROXY"; fi \ + && if [ -n "$HTTPS_PROXY" ]; then npm config set https-proxy "$HTTPS_PROXY"; fi \ + && npx playwright install chromium + +# Copy docker CLI from official image +COPY --from=docker:cli /usr/local/bin/docker /usr/local/bin/docker + +CMD ["node", "dist/index.js"] diff --git a/PrivateAI/mcphub/docker-compose.yaml b/PrivateAI/mcphub/docker-compose.yaml new file mode 100644 index 0000000..b4d55f9 --- /dev/null +++ b/PrivateAI/mcphub/docker-compose.yaml @@ -0,0 +1,31 @@ +services: + mcphub: + build: + context: . + dockerfile: Dockerfile + args: + HTTP_PROXY: ${MCPHUB_HTTP_PROXY} + HTTPS_PROXY: ${MCPHUB_HTTPS_PROXY} + NO_PROXY: ${MCPHUB_NO_PROXY} + http_proxy: ${MCPHUB_HTTP_PROXY} + https_proxy: ${MCPHUB_HTTPS_PROXY} + no_proxy: ${MCPHUB_NO_PROXY} + image: samanhappy/mcphub + ports: + - "3003:3000" + volumes: + - ./mcp_settings.json:/app/mcp_settings.json + - ./entrypoint-proxy.sh:/app/entrypoint-proxy.sh:ro + - ./proxychains.conf:/etc/proxychains.conf:ro + - ./repo_scan_job_mcp.js:/app/repo_scan_job_mcp.js:ro + - /var/run/docker.sock:/var/run/docker.sock + - /opt/redback/repos:/repos + environment: + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=${NO_PROXY} + restart: unless-stopped + extra_hosts: + - "proxy1.it.deakin.edu.au:10.137.0.162" + entrypoint: ["/app/entrypoint-proxy.sh"] + command: ["pnpm","start"] diff --git a/PrivateAI/mcphub/entrypoint-proxy.sh b/PrivateAI/mcphub/entrypoint-proxy.sh new file mode 100755 index 0000000..5fa3b47 --- /dev/null +++ b/PrivateAI/mcphub/entrypoint-proxy.sh @@ -0,0 +1,62 @@ +#!/bin/sh +set -eu + +# Ensure apt and other tools see proxy envs (many tools expect lowercase) +if [ -n "${HTTP_PROXY:-}" ] && [ -z "${http_proxy:-}" ]; then export http_proxy="$HTTP_PROXY"; fi +if [ -n "${HTTPS_PROXY:-}" ] && [ -z "${https_proxy:-}" ]; then export https_proxy="$HTTPS_PROXY"; fi +if [ -n "${NO_PROXY:-}" ] && [ -z "${no_proxy:-}" ]; then export no_proxy="$NO_PROXY"; fi + +# If we're on Debian/Ubuntu, also configure apt to use the proxy explicitly +if command -v apt-get >/dev/null 2>&1; then + mkdir -p /etc/apt/apt.conf.d + # Prefer HTTPS proxy if set, otherwise HTTP proxy + APT_PROXY="${https_proxy:-${http_proxy:-}}" + if [ -n "$APT_PROXY" ]; then + cat > /etc/apt/apt.conf.d/99proxy </dev/null 2>&1; then + if command -v apk >/dev/null 2>&1; then + apk add --no-cache proxychains-ng + elif command -v apt-get >/dev/null 2>&1; then + apt-get update + # Try common package names + apt-get install -y proxychains4 || apt-get install -y proxychains-ng + else + echo "No supported package manager found to install proxychains." >&2 + exit 1 + fi +fi + +# Generate a proxychains config if none provided +CONF="/etc/proxychains.conf" +if [ ! -f "$CONF" ]; then + PROXY_URL="${https_proxy:-${http_proxy:-}}" + if [ -z "$PROXY_URL" ]; then + echo "HTTP_PROXY/HTTPS_PROXY not set and no $CONF provided." >&2 + exit 1 + fi + + HOSTPORT="$(echo "$PROXY_URL" | sed -E 's#^[a-zA-Z]+://##' | sed -E 's#/.*$##' | sed -E 's#^[^@]*@##')" + HOST="$(echo "$HOSTPORT" | cut -d: -f1)" + PORT="$(echo "$HOSTPORT" | cut -d: -f2)" + + cat > "$CONF" <:3003 +``` + +## Enabled MCP Integrations + +The MCP Hub configuration currently includes the following server integrations: + +```json +{ + "servers": [ + { + "name": "amap", + "tools": "all" + }, + { + "name": "playwright", + "tools": "all" + }, + { + "name": "fetch", + "tools": "all" + }, + { + "name": "sequential-thinking", + "tools": "all" + }, + { + "name": "time", + "tools": "all" + }, + { + "name": "mindmap", + "tools": "all" + }, + { + "name": "playwright-mcp", + "tools": "all" + }, + { + "name": "fetch-mcp", + "tools": "all" + }, + { + "name": "time-mcp", + "tools": "all" + }, + { + "name": "mongodb", + "tools": "all" + }, + { + "name": "git-mcp-server", + "tools": "all" + }, + { + "name": "repo-security-scan", + "tools": "all" + }, + { + "name": "arxiv-mcp", + "tools": "all" + }, + { + "name": "wazuh", + "tools": "all" + }, + { + "name": "postgresql", + "tools": "all" + }, + { + "name": "supabase-postgres", + "tools": "all" + } + ] +} +``` + +## Integration Summary + +| Integration | Purpose | +|---|---| +| `amap` | Map/location-related MCP integration | +| `playwright` | Browser automation | +| `fetch` | HTTP fetching | +| `sequential-thinking` | Structured multi-step reasoning tool | +| `time` | Time/date tools | +| `mindmap` | Mind map generation or manipulation | +| `playwright-mcp` | Additional Playwright MCP server | +| `fetch-mcp` | Additional fetch MCP server | +| `time-mcp` | Additional time MCP server | +| `mongodb` | MongoDB access | +| `git-mcp-server` | Git repository interaction | +| `repo-security-scan` | Job-based Semgrep repository scanning | +| `arxiv-mcp` | arXiv search/research integration | +| `wazuh` | Wazuh security platform integration | +| `postgresql` | PostgreSQL access | +| `supabase-postgres` | Supabase PostgreSQL access | + +## Docker Compose + +```yaml +services: + mcphub: + build: + context: . + dockerfile: Dockerfile + args: + HTTP_PROXY: ${MCPHUB_HTTP_PROXY} + HTTPS_PROXY: ${MCPHUB_HTTPS_PROXY} + NO_PROXY: ${MCPHUB_NO_PROXY} + http_proxy: ${MCPHUB_HTTP_PROXY} + https_proxy: ${MCPHUB_HTTPS_PROXY} + no_proxy: ${MCPHUB_NO_PROXY} + image: samanhappy/mcphub + ports: + - "3003:3000" + volumes: + - ./mcp_settings.json:/app/mcp_settings.json + - ./entrypoint-proxy.sh:/app/entrypoint-proxy.sh:ro + - ./proxychains.conf:/etc/proxychains.conf:ro + - ./repo_scan_job_mcp.js:/app/repo_scan_job_mcp.js:ro + - /var/run/docker.sock:/var/run/docker.sock + - /opt/redback/repos:/repos + environment: + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=${NO_PROXY} + restart: unless-stopped + extra_hosts: + - "proxy1.it.deakin.edu.au:10.137.0.162" + entrypoint: ["/app/entrypoint-proxy.sh"] + command: ["pnpm","start"] +``` + +## What This Compose File Does + +The `mcphub` service: + +- Builds from a local custom `Dockerfile` +- Uses `samanhappy/mcphub` as the base image +- Publishes MCP Hub on host port `3003` +- Mounts MCP settings from the host +- Mounts a proxy-aware entrypoint script +- Mounts a `proxychains.conf` +- Mounts the repository security scan MCP wrapper +- Mounts the Docker socket so repo scan jobs can launch Semgrep containers +- Mounts `/opt/redback/repos` into the hub at `/repos` +- Runs the hub through `entrypoint-proxy.sh` +- Starts MCP Hub using `pnpm start` + +## Port Mapping + +```yaml +ports: + - "3003:3000" +``` + +This maps container port `3000` to host port `3003`. + +Access MCP Hub at: + +```text +http://localhost:3003 +``` + +Or from another machine: + +```text +http://:3003 +``` + +## Mounted Files and Directories + +### MCP Settings + +```yaml +- ./mcp_settings.json:/app/mcp_settings.json +``` + +This is the main MCP Hub configuration file. + +It defines the enabled MCP servers and how they are launched or connected. + +### Proxy Entrypoint + +```yaml +- ./entrypoint-proxy.sh:/app/entrypoint-proxy.sh:ro +``` + +This script configures proxy environment variables, installs `proxychains4` if needed, and launches MCP Hub through proxychains. + +### Proxychains Configuration + +```yaml +- ./proxychains.conf:/etc/proxychains.conf:ro +``` + +This defines which traffic should go through the proxy and which networks should remain local. + +### Repo Scan MCP Wrapper + +```yaml +- ./repo_scan_job_mcp.js:/app/repo_scan_job_mcp.js:ro +``` + +This mounts the custom job-oriented Semgrep scan MCP server into the hub container. + +The MCP settings can then run this file as a stdio MCP server. + +### Docker Socket + +```yaml +- /var/run/docker.sock:/var/run/docker.sock +``` + +This gives the MCP Hub container access to the host Docker daemon. + +This is required by `repo_scan_job_mcp.js`, because that tool launches Semgrep scan jobs using `docker run`. + +Important: mounting the Docker socket is powerful and should only be used in trusted environments. + +### Repository Mount + +```yaml +- /opt/redback/repos:/repos +``` + +This exposes host repositories to MCP Hub at: + +```text +/repos +``` + +The repo scan MCP uses this path to list repositories and validate scan targets. + +## Custom Dockerfile + +```dockerfile +FROM samanhappy/mcphub:latest + +ARG HTTP_PROXY +ARG HTTPS_PROXY +ARG NO_PROXY + +ENV HTTP_PROXY=${HTTP_PROXY} +ENV HTTPS_PROXY=${HTTPS_PROXY} +ENV NO_PROXY=${NO_PROXY} +ENV http_proxy=${HTTP_PROXY} +ENV https_proxy=${HTTPS_PROXY} +ENV no_proxy=${NO_PROXY} + +WORKDIR /app + +# Make npm aware of the proxy during build, if provided +RUN if [ -n "$HTTP_PROXY" ]; then npm config set proxy "$HTTP_PROXY"; fi \ + && if [ -n "$HTTPS_PROXY" ]; then npm config set https-proxy "$HTTPS_PROXY"; fi \ + && npx playwright install chromium + +# Copy docker CLI from official image +COPY --from=docker:cli /usr/local/bin/docker /usr/local/bin/docker + +CMD ["node", "dist/index.js"] +``` + +## What the Dockerfile Adds + +The custom image extends `samanhappy/mcphub:latest` and adds: + +1. Proxy environment variables during image build +2. npm proxy configuration +3. Playwright Chromium browser installation +4. Docker CLI copied from the official Docker CLI image + +### Why Playwright Chromium Is Installed + +Some MCP tools use Playwright for browser automation. + +The build step installs Chromium: + +```dockerfile +npx playwright install chromium +``` + +This avoids runtime failures when a Playwright MCP server needs a browser binary. + +### Why Docker CLI Is Added + +The repo scan MCP launches Semgrep jobs using: + +```bash +docker run +``` + +Mounting `/var/run/docker.sock` is not enough by itself. The container also needs the `docker` command available. + +The Dockerfile copies the CLI from the official Docker image: + +```dockerfile +COPY --from=docker:cli /usr/local/bin/docker /usr/local/bin/docker +``` + +## Entrypoint Proxy Script + +```sh +#!/bin/sh +set -eu + +# Ensure apt and other tools see proxy envs (many tools expect lowercase) +if [ -n "${HTTP_PROXY:-}" ] && [ -z "${http_proxy:-}" ]; then export http_proxy="$HTTP_PROXY"; fi +if [ -n "${HTTPS_PROXY:-}" ] && [ -z "${https_proxy:-}" ]; then export https_proxy="$HTTPS_PROXY"; fi +if [ -n "${NO_PROXY:-}" ] && [ -z "${no_proxy:-}" ]; then export no_proxy="$NO_PROXY"; fi + +# If we're on Debian/Ubuntu, also configure apt to use the proxy explicitly +if command -v apt-get >/dev/null 2>&1; then + mkdir -p /etc/apt/apt.conf.d + # Prefer HTTPS proxy if set, otherwise HTTP proxy + APT_PROXY="${https_proxy:-${http_proxy:-}}" + if [ -n "$APT_PROXY" ]; then + cat > /etc/apt/apt.conf.d/99proxy </dev/null 2>&1; then + if command -v apk >/dev/null 2>&1; then + apk add --no-cache proxychains-ng + elif command -v apt-get >/dev/null 2>&1; then + apt-get update + # Try common package names + apt-get install -y proxychains4 || apt-get install -y proxychains-ng + else + echo "No supported package manager found to install proxychains." >&2 + exit 1 + fi +fi + +# Generate a proxychains config if none provided +CONF="/etc/proxychains.conf" +if [ ! -f "$CONF" ]; then + PROXY_URL="${https_proxy:-${http_proxy:-}}" + if [ -z "$PROXY_URL" ]; then + echo "HTTP_PROXY/HTTPS_PROXY not set and no $CONF provided." >&2 + exit 1 + fi + + HOSTPORT="$(echo "$PROXY_URL" | sed -E 's#^[a-zA-Z]+://##' | sed -E 's#/.*$##' | sed -E 's#^[^@]*@##')" + HOST="$(echo "$HOSTPORT" | cut -d: -f1)" + PORT="$(echo "$HOSTPORT" | cut -d: -f2)" + + cat > "$CONF" < { + try { + const r = await fetch(\"http://10.137.17.254:4045/health\", { + headers: { Authorization: \"redacted\"} + }); + console.log(\"status\", r.status); + console.log(await r.text()); + } catch (e) { + console.error(\"FETCH_ERR\", e); + } +})();"' +``` + +## What the Node Fetch Test Does + +This command runs a Node.js `fetch()` call from inside the MCP Hub container. + +It tests whether the container can reach: + +```text +http://10.137.17.254:4045/health +``` + +with an `Authorization` header. + +This is useful for debugging MCP servers such as Wazuh or other internal HTTP services. + +## Expected Fetch Test Result + +A healthy result should show an HTTP status and response body, for example: + +```text +status 200 +{"status":"ok"} +``` + +A failure may show: + +```text +FETCH_ERR TypeError: fetch failed +``` + +Common causes include: + +- Target service is down +- Wrong port +- Authentication failure +- Proxy recursion +- Internal IP missing from `NO_PROXY` +- proxychains intercepting traffic that should be local +- Firewall or routing issue +- MCP service bound only to `127.0.0.1` instead of `0.0.0.0` + +## Repo Security Scan Integration + +The file: + +```text +./repo_scan_job_mcp.js +``` + +is mounted into the MCP Hub container as: + +```text +/app/repo_scan_job_mcp.js +``` + +It provides the `repo-security-scan` MCP integration. + +This integration launches Semgrep scans as background Docker jobs and exposes tools such as: + +```text +repo_list +repo_security_scan_start +repo_security_scan_status +repo_security_scan_result +repo_security_scan_list_jobs +``` + +## Docker Requirements for Repo Security Scan + +The repo scan MCP requires: + +1. Docker CLI inside the MCP Hub container +2. Docker socket mounted from the host +3. Repositories mounted at `/repos` +4. Correct `HOST_REPOS_BASE_DIR` +5. Semgrep image available or pullable + +The Dockerfile provides the Docker CLI: + +```dockerfile +COPY --from=docker:cli /usr/local/bin/docker /usr/local/bin/docker +``` + +The compose file provides the Docker socket: + +```yaml +- /var/run/docker.sock:/var/run/docker.sock +``` + +The compose file provides repository access: + +```yaml +- /opt/redback/repos:/repos +``` + +## Recommended Repo Security Scan Environment + +In `mcp_settings.json`, the `repo-security-scan` server should set environment variables similar to: + +```json +{ + "REPOS_BASE_DIR": "/repos", + "HOST_REPOS_BASE_DIR": "/opt/redback/repos", + "JOBS_BASE_DIR": "/repos/logs/security-jobs", + "SEMGREP_IMAGE": "semgrep/semgrep:1.159.0", + "SEMGREP_JOBS": "2", + "SEMGREP_PER_FILE_TIMEOUT": "2", + "SEMGREP_TIMEOUT_THRESHOLD": "1", + "SEMGREP_MAX_TARGET_BYTES": "500000", + "JOB_RETENTION_LIMIT": "100" +} +``` + +If you want `/repos` to be read-only, use a separate job directory: + +```json +{ + "JOBS_BASE_DIR": "/jobs" +} +``` + +And mount: + +```yaml +- /opt/redback/repos:/repos:ro +- /opt/redback/security-jobs:/jobs +``` + +## Suggested Directory Layout + +```text +mcp-hub/ +├── docker-compose.yaml +├── Dockerfile +├── .env +├── mcp_settings.json +├── entrypoint-proxy.sh +├── proxychains.conf +├── nodefetch.sh +└── repo_scan_job_mcp.js +``` + +Host repositories are stored at: + +```text +/opt/redback/repos +``` + +Inside the MCP Hub container they appear as: + +```text +/repos +``` + +## Build and Start + +Build the custom image: + +```bash +docker compose build mcphub +``` + +Start MCP Hub: + +```bash +docker compose up -d +``` + +Follow logs: + +```bash +docker compose logs -f mcphub +``` + +Check status: + +```bash +docker compose ps +``` + +## Rebuild After Dockerfile Changes + +If you change the Dockerfile, rebuild without cache: + +```bash +docker compose build --no-cache mcphub +docker compose up -d mcphub +``` + +## Restart After Settings Changes + +If you change `mcp_settings.json`: + +```bash +docker compose restart mcphub +``` + +Then check logs: + +```bash +docker compose logs -f mcphub +``` + +## Validate Container Basics + +Enter the container: + +```bash +docker exec -it mcp-hub-mcphub-1 sh +``` + +Check proxy environment: + +```bash +env | grep -i proxy +``` + +Check Docker CLI: + +```bash +docker version +docker ps +``` + +Check repository mount: + +```bash +ls -la /repos +``` + +Check MCP settings: + +```bash +ls -la /app/mcp_settings.json +``` + +Check repo scan script: + +```bash +ls -la /app/repo_scan_job_mcp.js +``` + +Check proxychains: + +```bash +which proxychains4 +cat /etc/proxychains.conf +``` + +## Validate Playwright + +Inside the container: + +```bash +npx playwright --version +``` + +Check installed Chromium browser files: + +```bash +ls -la /root/.cache/ms-playwright || true +``` + +If Playwright complains that Chromium is missing, rebuild the image: + +```bash +docker compose build --no-cache mcphub +docker compose up -d mcphub +``` + +## Validate Docker Socket Access + +Inside the MCP Hub container: + +```bash +docker ps +``` + +If this fails, check: + +```bash +ls -la /var/run/docker.sock +``` + +If permission is denied, the container user may not have access to the Docker socket. + +Possible fixes include: + +- Run the container as a user with access to the Docker socket +- Adjust Docker socket group permissions +- Use a Docker socket proxy +- Run the MCP Hub container as root if appropriate for the environment + +## Validate Repo Scan Docker Launch + +Inside the MCP Hub container: + +```bash +docker run --rm -v /opt/redback/repos:/repos:ro semgrep/semgrep:1.159.0 semgrep --version +``` + +If the Semgrep image cannot be pulled, check proxy access. + +If the mount path is wrong, check that `/opt/redback/repos` exists on the Docker host. + +## Validate Internal HTTP Access + +Run the provided Node fetch test: + +```bash +./nodefetch.sh +``` + +Or directly: + +```bash +docker exec -it mcp-hub-mcphub-1 sh -lc 'node -e " +(async () => { + try { + const r = await fetch(\"http://10.137.17.254:4045/health\", { + headers: { Authorization: \"redacted\"} + }); + console.log(\"status\", r.status); + console.log(await r.text()); + } catch (e) { + console.error(\"FETCH_ERR\", e); + } +})();"' +``` + +## Troubleshooting + +### MCP Hub Does Not Start + +Check logs: + +```bash +docker compose logs mcphub +``` + +Common causes: + +- Invalid `mcp_settings.json` +- Entrypoint script not executable +- Proxychains installation failure +- Missing proxy environment variables +- `pnpm start` failing inside the base image + +Check script permissions: + +```bash +chmod +x entrypoint-proxy.sh +``` + +### Proxychains Cannot Connect + +Inspect proxychains config: + +```bash +docker exec -it mcp-hub-mcphub-1 cat /etc/proxychains.conf +``` + +Check the proxy host: + +```bash +docker exec -it mcp-hub-mcphub-1 getent hosts proxy1.it.deakin.edu.au +``` + +Check connection to proxy: + +```bash +docker exec -it mcp-hub-mcphub-1 sh -lc 'nc -vz proxy1.it.deakin.edu.au 3128' +``` + +If `nc` is not installed, use another debug container or install netcat temporarily. + +### Proxy Recursion + +Proxy recursion can happen if the container tries to reach the proxy through the proxy. + +This is why the proxy IP is excluded: + +```conf +localnet 10.137.0.162/32 +``` + +Also ensure it appears in `NO_PROXY`: + +```text +10.137.0.162,proxy1.it.deakin.edu.au +``` + +### Internal MCP Server Connection Fails + +For internal IPs such as: + +```text +10.137.17.254 +``` + +make sure they are in: + +```env +NO_PROXY +MCPHUB_NO_PROXY +``` + +And that proxychains excludes the relevant network: + +```conf +localnet 10.0.0.0/8 +``` + +### Node Fetch Shows `bad port` + +This can happen when proxy environment variables or Node/undici proxy handling are malformed. + +Check: + +```bash +docker exec -it mcp-hub-mcphub-1 env | grep -i proxy +``` + +Make sure proxy values look like: + +```text +http://proxy1.it.deakin.edu.au:3128 +``` + +and not like: + +```text +proxy1.it.deakin.edu.au:3128 +``` + +or values with hidden characters. + +### MCP Server Bound to Localhost Only + +If an HTTP MCP server runs in another container and binds only to `127.0.0.1`, MCP Hub will not be able to reach it. + +The target MCP server should bind to: + +```text +0.0.0.0 +``` + +For FastMCP-based servers, this is often: + +```env +FASTMCP_HOST=0.0.0.0 +``` + +### Docker CLI Missing + +Check: + +```bash +docker exec -it mcp-hub-mcphub-1 which docker +``` + +If missing, rebuild the image: + +```bash +docker compose build --no-cache mcphub +docker compose up -d mcphub +``` + +### Docker Socket Missing + +Check: + +```bash +docker exec -it mcp-hub-mcphub-1 ls -la /var/run/docker.sock +``` + +If missing, confirm the compose volume: + +```yaml +- /var/run/docker.sock:/var/run/docker.sock +``` + +### Repositories Missing + +Check on host: + +```bash +ls -la /opt/redback/repos +``` + +Check inside container: + +```bash +docker exec -it mcp-hub-mcphub-1 ls -la /repos +``` + +### Playwright Browser Missing + +Rebuild the image: + +```bash +docker compose build --no-cache mcphub +docker compose up -d mcphub +``` + +Then check logs for: + +```text +npx playwright install chromium +``` + +## Security Notes + +This deployment is powerful because MCP Hub has: + +- Access to many MCP integrations +- Access to repositories under `/repos` +- Access to the Docker socket +- Access to internal network services +- Proxy-enabled outbound network access + +Important safeguards: + +- Keep MCP Hub on a trusted network. +- Do not expose port `3003` to untrusted clients. +- Protect `mcp_settings.json`, `.env`, and any credentials. +- Treat Docker socket access as equivalent to privileged host access. +- Keep repository mounts read-only where possible. +- Prefer separate writable job storage for scan outputs. +- Avoid logging secrets in test scripts or MCP server output. + +## Recommended Hardening + +Consider the following improvements: + +- Put MCP Hub behind authentication. +- Use a reverse proxy with TLS. +- Restrict Docker socket access using a Docker socket proxy. +- Make `/opt/redback/repos` read-only in MCP Hub if possible. +- Use a separate `/jobs` volume for security scan outputs. +- Split high-risk MCP tools into a separate hub instance. +- Limit exposed tools per integration instead of using `"tools": "all"` everywhere. +- Store secrets in a secret manager or Docker secrets where practical. +- Add healthchecks for MCP Hub and critical MCP servers. +- Pin container image versions instead of using `latest`. +- Keep a known-good backup of `mcp_settings.json`. + +## Useful Commands + +### Start + +```bash +docker compose up -d +``` + +### Stop + +```bash +docker compose down +``` + +### Restart + +```bash +docker compose restart mcphub +``` + +### Logs + +```bash +docker compose logs -f mcphub +``` + +### Rebuild + +```bash +docker compose build --no-cache mcphub +docker compose up -d mcphub +``` + +### Shell + +```bash +docker exec -it mcp-hub-mcphub-1 sh +``` + +### Check Enabled Servers in Settings + +```bash +docker exec -it mcp-hub-mcphub-1 sh -lc 'cat /app/mcp_settings.json' +``` + +### Check Proxy + +```bash +docker exec -it mcp-hub-mcphub-1 env | grep -i proxy +``` + +### Check Docker Access + +```bash +docker exec -it mcp-hub-mcphub-1 docker ps +``` + +### Check Repositories + +```bash +docker exec -it mcp-hub-mcphub-1 ls -la /repos +``` + +## Summary + +This MCP Hub configuration provides a proxy-aware MCP integration layer for a private AI stack. + +It includes: + +- MCP Hub exposed on host port `3003` +- Multiple MCP integrations enabled +- Playwright Chromium support +- Docker CLI support +- Docker socket access for repo scan jobs +- Repository access through `/repos` +- Custom proxychains entrypoint +- Corporate proxy support +- Internal network bypass rules +- A Node fetch test for validating internal HTTP connectivity + +This setup is suitable for an internal lab or trusted private AI environment where MCP tools need access to repositories, browsers, databases, security systems, and internal HTTP services. diff --git a/PrivateAI/mcphub/nodefetch.sh b/PrivateAI/mcphub/nodefetch.sh new file mode 100644 index 0000000..af2195b --- /dev/null +++ b/PrivateAI/mcphub/nodefetch.sh @@ -0,0 +1,12 @@ +docker exec -it mcp-hub-mcphub-1 sh -lc 'node -e " +(async () => { + try { + const r = await fetch(\"http://10.137.17.254:4045/health\", { + headers: { Authorization: \"98bb99ab7db3f86cdaba0276c9b913890cacfa92fa7514a01a52c3792d1753e3\"} + }); + console.log(\"status\", r.status); + console.log(await r.text()); + } catch (e) { + console.error(\"FETCH_ERR\", e); + } +})();"' diff --git a/PrivateAI/mcphub/proxychains.conf b/PrivateAI/mcphub/proxychains.conf new file mode 100644 index 0000000..f5f3649 --- /dev/null +++ b/PrivateAI/mcphub/proxychains.conf @@ -0,0 +1,17 @@ +strict_chain +proxy_dns +tcp_read_time_out 15000 +tcp_connect_time_out 8000 + +# IMPORTANT: do NOT proxy connections to the proxy itself (avoid recursion) +localnet 10.137.0.162/32 + +# Also keep local traffic unproxied +localnet 127.0.0.0/8 +localnet 10.0.0.0/8 +localnet 172.16.0.0/12 +localnet 192.168.0.0/16 + +[ProxyList] +http 10.137.0.162 3128 + diff --git a/PrivateAI/mcphub/readme.md b/PrivateAI/mcphub/readme.md new file mode 100644 index 0000000..413d078 --- /dev/null +++ b/PrivateAI/mcphub/readme.md @@ -0,0 +1,288 @@ +# MCPHub — Install & Configuration Guide + +This folder contains a small **Docker Compose** deployment for **MCPHub** plus supporting configuration files for running behind a corporate proxy (e.g. Deakin). +It is designed so you can unpack the tarball, adjust a few values, and run the service reliably. + +before starting copy env file to .env + +> **Security note:** The provided configs include placeholders and/or example secrets (API keys, bearer tokens, hashed admin password). +> Treat them as **defaults** and rotate/replace before exposing the service beyond a trusted network. + +--- + +## Contents + +After extracting `mcphub.tar`, you should have: + +```text +mcphub/ +├── docker-compose.yaml +├── entrypoint-proxy.sh +├── proxychains.conf +└── mcp_settings.json +``` + +--- + +## Quick install + +### 1) Extract + +```bash +tar -xf mcphub.tar +cd mcphub +``` + +### 2) Create an `.env` file + +Create `mcphub/.env` (recommended) with your proxy settings: + +```env +# Outbound proxy (optional but required in restricted networks) +HTTP_PROXY=http://proxy1.it.deakin.edu.au:3128 +HTTPS_PROXY=http://proxy1.it.deakin.edu.au:3128 + +# Internal destinations that must NOT be proxied +NO_PROXY=localhost,127.0.0.1,::1,mcphub,proxy1.it.deakin.edu.au,10.137.0.162,api.mcprouter.to +``` + +If you’re not behind a proxy, you can omit `HTTP_PROXY` / `HTTPS_PROXY` and rely on direct outbound access. + +### 3) Start MCPHub + +```bash +docker compose up -d +``` + +### 4) Verify + +The Compose file maps container port **3000** to host port **3003**: + +- MCPHub UI/API: `http://:3003` + +Check logs: + +```bash +docker compose logs -f --tail=200 +``` + +--- + +## Configuration files + +## `docker-compose.yaml` + +**Purpose:** Runs the `samahappy/mcphub` image with local configuration injected via bind mounts, plus proxy environment variables. + +Key sections: + +- **`image: samanhappy/mcphub`** + Uses a prebuilt MCPHub container image. + +- **`ports: "3003:3000"`** + Exposes MCPHub on **host port 3003**. + +- **`volumes:`** + Mounts your configuration into the container: + - `./mcp_settings.json` → `/app/mcp_settings.json` *(read-only)* + Main MCPHub configuration (servers, users, routing, providers). + - `./entrypoint-proxy.sh` → `/app/entrypoint-proxy.sh` *(read-only)* + Wrapper entrypoint to ensure proxy support works inside container. + - `./proxychains.conf` → `/etc/proxychains.conf` *(read-only)* + Proxychains config (forces outbound TCP via your proxy). + +- **`environment:`** + - `HTTP_PROXY`, `HTTPS_PROXY` are passed through from `.env` + - `NO_PROXY` ensures internal calls (including Docker DNS names) are not proxied + +- **`extra_hosts:`** + - `"proxy1.it.deakin.edu.au:10.137.0.162"` + Forces resolution of the proxy hostname inside the container (useful if DNS can’t resolve it). + +- **`entrypoint:` / `command:`** + - Entry is overridden to run `/app/entrypoint-proxy.sh` + - Then runs MCPHub via `pnpm start` + +--- + +## `entrypoint-proxy.sh` + +**Purpose:** Makes proxy behaviour reliable inside the container by: + +1. Normalising proxy env vars (`HTTP_PROXY` → `http_proxy`, etc.) +2. Configuring `apt` to use the proxy (if `apt-get` exists) +3. Installing `proxychains4` (or `proxychains-ng`) if missing +4. Ensuring a proxychains config exists (uses `/etc/proxychains.conf` or generates one from proxy env vars) +5. Running the container’s original entrypoint under proxychains: + +```sh +exec proxychains4 -q /usr/local/bin/entrypoint.sh "$@" +``` + +**Why this matters:** Some tools ignore `HTTP_PROXY`/`HTTPS_PROXY` for certain network calls. +Proxychains forces TCP connections through the proxy when needed. + +--- + +## `proxychains.conf` + +**Purpose:** Defines how proxychains routes traffic. + +Notable directives: + +- `strict_chain` + Use the proxies in the listed order. +- `proxy_dns` + Resolve DNS through proxychains (helps in locked-down DNS scenarios). +- Timeouts: + - `tcp_read_time_out 15000` + - `tcp_connect_time_out 8000` + +Local network bypasses (important): + +- `localnet 10.137.0.162/32` + Prevents proxying traffic *to the proxy itself* (avoids recursion). +- Also bypasses: + - `127.0.0.0/8` + - `10.0.0.0/8` + - `172.16.0.0/12` + - `192.168.0.0/16` + +Proxy list: + +- `http 10.137.0.162 3128` + +If your proxy changes, update this file (or rely on auto-generation from env vars by removing the mounted file). + +--- + +## `mcp_settings.json` + +**Purpose:** The main MCPHub configuration file. + +It contains several top-level sections: + +### `mcpServers` +Defines MCP servers MCPHub can launch and route to. This file includes examples such as: + +- `playwright` / `playwright-mcp` (Playwright MCP server) +- `fetch` / `fetch-mcp` (fetch MCP server) +- `time` / `time-mcp` (time MCP server) +- `slack` (Slack MCP server; requires tokens) +- `sequential-thinking` (reasoning helper server) +- `mindmap` (mindmap server) +- `amap` (Amap maps server; requires API key) + +Each server entry generally looks like: + +```json +{ + "command": "npx", + "args": ["-y", "@some/package"], + "env": { "SOME_KEY": "your-value" } +} +``` + +**What to edit:** +- Replace placeholder API keys (e.g. `SLACK_BOT_TOKEN`, `AMAP_MAPS_API_KEY`) +- Remove servers you don’t want MCPHub to expose +- Pin versions if you need reproducible deployments + +### `users` +Defines MCPHub users. The sample includes an `admin` user with a **bcrypt hashed password**. + +**What to edit:** +- Replace the default password hash with your own +- Consider disabling password auth if you are using bearer/OAuth only (depends on your deployment model) + +### `systemConfig` +Controls platform-wide behaviour. Notable subsections include: + +- **`routing`** + - `enableGlobalRoute`: global routing on/off + - `enableGroupNameRoute`: group-based routing on/off + - `enableBearerAuth`: bearer auth on/off + - `bearerAuthKey`: bearer token key (rotate before public exposure) + +- **`install`** + - `baseUrl`: base URL used by MCPHub (ensure it matches your deployment) + - `pythonIndexUrl` / `npmRegistry`: optional private registries + +- **`oauthServer`** + Enables an embedded OAuth server and controls lifetimes and registration behaviour. + If you don’t need OAuth, set `enabled` to `false`. + +- **`mcpRouter`** + Contains settings for the upstream routing API (including API key, base URL, referer/title). + Treat API keys here as secrets. + +### `providers` +Defines LLM providers MCPHub can talk to. The sample includes a LocalAI provider entry (OpenAI-compatible): +- `base_url`: your LocalAI endpoint +- `models`: list of model IDs you want available through this provider + +Update this to match your LocalAI host and model list. + +### `groups` +Defines groups and which servers/tools are available to each group. + +### `bearerKeys` +Defines bearer tokens and access scoping: +- `accessType: all` allows everything (tighten if needed) +- `allowedGroups` / `allowedServers` can restrict access + +--- + +## Recommended hardening (if exposing beyond LAN) + +- Put MCPHub behind a reverse proxy (Nginx/Caddy/Traefik) with TLS +- Rotate/replace: + - bearer auth token(s) + - any `mcpRouter.apiKey` + - admin password hash + - any third-party API tokens (Slack, Amap, etc.) +- Restrict allowed servers/tools to the minimum needed +- Consider network policies/firewall rules to limit who can reach port 3003 + +--- + +## Common changes + +### Change the host port +Edit `docker-compose.yaml`: + +```yaml +ports: + - "3003:3000" +``` + +For example, to run on 8088: + +```yaml +ports: + - "8088:3000" +``` + +### Change the proxy target +Update `proxychains.conf` and/or `.env`, plus `extra_hosts` if required. + +--- + +## Troubleshooting + +### Proxy loops / “connection refused” to proxy +Make sure `proxychains.conf` includes a `localnet` rule for the proxy IP itself (it already does for `10.137.0.162/32`). + +### NPM/Python installs fail +- Confirm `HTTP_PROXY`/`HTTPS_PROXY` are correct +- If you use private registries, set `systemConfig.install.npmRegistry` / `pythonIndexUrl` + +### MCP server packages change unexpectedly +Pin versions in `mcpServers` (avoid `@latest` where reproducibility matters). + +--- + +## License / Attribution + +This repository contains **deployment configuration and documentation**. +MCPHub and MCP servers remain under their respective upstream licenses. diff --git a/PrivateAI/mcphub/repo_scan_job_mcp.js b/PrivateAI/mcphub/repo_scan_job_mcp.js new file mode 100644 index 0000000..675fef3 --- /dev/null +++ b/PrivateAI/mcphub/repo_scan_job_mcp.js @@ -0,0 +1,669 @@ +#!/usr/bin/env node + +// +// This code generated by OpenAI and Claude with the instruction and guidance of ejb (Richard Edwards) +// It should be uses as proof of concept and may require additional hardening/validation/testing for large scale production use +// + +import { spawn } from "node:child_process"; +import { promises as fs } from "node:fs"; +import path from "node:path"; +import process from "node:process"; +import crypto from "node:crypto"; + +import { Server } from "/app/node_modules/@modelcontextprotocol/sdk/dist/esm/server/index.js"; +import { StdioServerTransport } from "/app/node_modules/@modelcontextprotocol/sdk/dist/esm/server/stdio.js"; +import { + CallToolRequestSchema, + ListToolsRequestSchema +} from "/app/node_modules/@modelcontextprotocol/sdk/dist/esm/types.js"; + +const REPOS_BASE_DIR = process.env.REPOS_BASE_DIR || "/repos"; +const HOST_REPOS_BASE_DIR = process.env.HOST_REPOS_BASE_DIR || "/opt/redback/repos"; +const JOBS_BASE_DIR = process.env.JOBS_BASE_DIR || path.join(REPOS_BASE_DIR, "logs", "security-jobs"); +const SEMGREP_IMAGE = process.env.SEMGREP_IMAGE || "semgrep/semgrep:1.159.0"; +const SEMGREP_JOBS = String(process.env.SEMGREP_JOBS || "2"); +const SEMGREP_PER_FILE_TIMEOUT = String(process.env.SEMGREP_PER_FILE_TIMEOUT || "2"); +const SEMGREP_TIMEOUT_THRESHOLD = String(process.env.SEMGREP_TIMEOUT_THRESHOLD || "1"); +const SEMGREP_MAX_TARGET_BYTES = String(process.env.SEMGREP_MAX_TARGET_BYTES || "500000"); +const JOB_RETENTION_LIMIT = Number(process.env.JOB_RETENTION_LIMIT || "100"); + +function normalizeRepoInput(repo) { + if (!repo || typeof repo !== "string") { + throw new Error("Missing required field: repo"); + } + + let value = repo.trim(); + + if (value === "/repos") { + throw new Error("repo must identify a repository, not /repos itself"); + } + + if (value.startsWith("/repos/")) { + value = value.slice("/repos/".length); + } + + value = value.replace(/^\/+/, "").replace(/\/+$/, ""); + + if (!value) { + throw new Error("repo must not be empty"); + } + + if (value.includes("\\") || value.includes("..") || value.includes("/")) { + throw new Error( + "repo must be a repository directory name under /repos, for example 'redback-smartbike-iot'" + ); + } + + return value; +} + +function normalizeScope(scope) { + if (scope === undefined || scope === null || scope === "") { + return "."; + } + if (typeof scope !== "string") { + throw new Error("scope must be a string"); + } + return scope.trim() || "."; +} + +function profileToConfigs(profile) { + const p = String(profile || "security").toLowerCase(); + + if (p === "security") return ["--config=p/security-audit"]; + if (p === "secrets") return ["--config=p/secrets"]; + if (p === "full") { + return [ + "--config=p/security-audit", + "--config=p/secrets", + "--config=p/owasp-top-ten" + ]; + } + + throw new Error("Unsupported profile. Use one of: security, secrets, full"); +} + +function safeResolve(repo, scope = ".") { + const safeRepo = normalizeRepoInput(repo); + const safeScope = normalizeScope(scope); + + const repoPath = path.resolve(REPOS_BASE_DIR, safeRepo); + const targetPath = path.resolve(repoPath, safeScope); + + if (!targetPath.startsWith(repoPath)) { + throw new Error("Scope escapes repository root"); + } + + const relativeTarget = path.relative(repoPath, targetPath); + const containerTargetPath = + relativeTarget && relativeTarget !== "." + ? path.posix.join("/repos", safeRepo, relativeTarget.split(path.sep).join("/")) + : path.posix.join("/repos", safeRepo); + + return { + safeRepo, + safeScope, + repoPath, + targetPath, + containerTargetPath + }; +} + +async function pathExists(p) { + try { + await fs.access(p); + return true; + } catch { + return false; + } +} + +async function ensureDir(p) { + await fs.mkdir(p, { recursive: true }); +} + +async function readJson(filePath, fallback = null) { + try { + const text = await fs.readFile(filePath, "utf8"); + return JSON.parse(text); + } catch { + return fallback; + } +} + +async function writeJson(filePath, data) { + await fs.writeFile(filePath, JSON.stringify(data, null, 2) + "\n", "utf8"); +} + +function summarizeFindings(findings) { + const summary = { + findings_total: findings.length, + critical: 0, + high: 0, + medium: 0, + low: 0, + info: 0, + unknown: 0 + }; + + for (const finding of findings) { + const severity = String(finding.severity || "").toUpperCase(); + if (severity === "CRITICAL") summary.critical += 1; + else if (severity === "HIGH" || severity === "ERROR") summary.high += 1; + else if (severity === "MEDIUM" || severity === "WARNING") summary.medium += 1; + else if (severity === "LOW") summary.low += 1; + else if (severity === "INFO" || severity === "NOTICE") summary.info += 1; + else summary.unknown += 1; + } + + return summary; +} + +function makeJobId(repo) { + const ts = new Date().toISOString().replace(/[:.]/g, "-"); + const rand = crypto.randomBytes(4).toString("hex"); + return `scan-${ts}-${repo}-${rand}`; +} + +function getJobPaths(jobId) { + const dir = path.join(JOBS_BASE_DIR, jobId); + return { + dir, + meta: path.join(dir, "job.json"), + stdout: path.join(dir, "stdout.json"), + stderr: path.join(dir, "stderr.log") + }; +} + +async function listRepos() { + const entries = await fs.readdir(REPOS_BASE_DIR, { withFileTypes: true }); + const repos = []; + + for (const entry of entries) { + if (!entry.isDirectory()) continue; + if (entry.name.startsWith(".")) continue; + if (entry.name === "logs") continue; + + repos.push({ + repo: entry.name, + path: path.join(REPOS_BASE_DIR, entry.name) + }); + } + + repos.sort((a, b) => a.repo.localeCompare(b.repo)); + + return { + success: true, + repos_base_dir: REPOS_BASE_DIR, + repos + }; +} + +async function cleanupOldJobs() { + await ensureDir(JOBS_BASE_DIR); + const entries = await fs.readdir(JOBS_BASE_DIR, { withFileTypes: true }); + const dirs = []; + + for (const entry of entries) { + if (!entry.isDirectory()) continue; + const full = path.join(JOBS_BASE_DIR, entry.name); + const stat = await fs.stat(full); + dirs.push({ name: entry.name, full, mtimeMs: stat.mtimeMs }); + } + + dirs.sort((a, b) => b.mtimeMs - a.mtimeMs); + + for (const old of dirs.slice(JOB_RETENTION_LIMIT)) { + await fs.rm(old.full, { recursive: true, force: true }); + } +} + +async function startScanJob({ repo, scope = ".", profile = "security" }) { + const { + safeRepo, + safeScope, + repoPath, + targetPath, + containerTargetPath + } = safeResolve(repo, scope); + + if (!(await pathExists(repoPath))) { + throw new Error( + `Repository not found in container view: ${repoPath}. Call repo_list to see available repositories.` + ); + } + + if (!(await pathExists(targetPath))) { + throw new Error(`Target path does not exist in container view: ${targetPath}`); + } + + await ensureDir(JOBS_BASE_DIR); + await cleanupOldJobs(); + + const jobId = makeJobId(safeRepo); + const paths = getJobPaths(jobId); + await ensureDir(paths.dir); + + const configs = profileToConfigs(profile); + + const initialMeta = { + success: true, + job_id: jobId, + status: "queued", + repo: safeRepo, + scope: safeScope, + profile, + created_at: new Date().toISOString(), + started_at: null, + finished_at: null, + repo_path: repoPath, + target_path: targetPath, + container_target_path: containerTargetPath, + jobs_base_dir: JOBS_BASE_DIR, + host_repos_base_dir: HOST_REPOS_BASE_DIR, + image: SEMGREP_IMAGE, + pid: null, + exit_code: null, + summary: null, + error: null + }; + + await writeJson(paths.meta, initialMeta); + await fs.writeFile(paths.stderr, "", "utf8"); + + const dockerArgs = [ + "run", + "--rm", + "-v", + `${HOST_REPOS_BASE_DIR}:/repos:ro`, + "-w", + "/repos", + "-e", + "SEMGREP_ENABLE_VERSION_CHECK=0", + "-e", + "SEMGREP_SEND_METRICS=off", + "-e", + `HTTP_PROXY=${process.env.HTTP_PROXY || ""}`, + "-e", + `HTTPS_PROXY=${process.env.HTTPS_PROXY || ""}`, + "-e", + `NO_PROXY=${process.env.NO_PROXY || ""}`, + SEMGREP_IMAGE, + "semgrep", + "scan", + "--json", + "--disable-version-check", + "--metrics=off", + "--optimizations=all", + "--jobs", + SEMGREP_JOBS, + "--timeout", + SEMGREP_PER_FILE_TIMEOUT, + "--timeout-threshold", + SEMGREP_TIMEOUT_THRESHOLD, + "--max-target-bytes", + SEMGREP_MAX_TARGET_BYTES, + "--exclude=node_modules", + "--exclude=.git", + "--exclude=dist", + "--exclude=build", + "--exclude=.venv", + "--exclude=vendor", + "--exclude=coverage", + "--exclude=.next", + "--exclude=.cache", + "--exclude=target", + ...configs, + containerTargetPath + ]; + + const child = spawn("docker", dockerArgs, { + detached: true, + stdio: ["ignore", "pipe", "pipe"], + env: { ...process.env } + }); + + const stdoutChunks = []; + const stderrChunks = []; + + child.stdout.on("data", (chunk) => { + stdoutChunks.push(chunk); + }); + + child.stderr.on("data", (chunk) => { + stderrChunks.push(chunk); + }); + + const runningMeta = { + ...initialMeta, + status: "running", + started_at: new Date().toISOString(), + pid: child.pid + }; + + await writeJson(paths.meta, runningMeta); + + child.on("close", async (code) => { + const stdout = Buffer.concat(stdoutChunks).toString("utf8"); + const stderr = Buffer.concat(stderrChunks).toString("utf8"); + + await fs.writeFile(paths.stderr, stderr, "utf8"); + + let finalMeta = await readJson(paths.meta, runningMeta); + finalMeta.exit_code = code; + finalMeta.finished_at = new Date().toISOString(); + + if (![0, 1].includes(code)) { + finalMeta.status = "failed"; + finalMeta.error = `Semgrep failed with exit code ${code}`; + await writeJson(paths.meta, finalMeta); + return; + } + + try { + const parsed = stdout.trim() ? JSON.parse(stdout) : {}; + await writeJson(paths.stdout, parsed); + + const findings = (parsed.results || []).map((item) => ({ + rule_id: item.check_id || null, + severity: item.extra?.severity || null, + message: item.extra?.message || null, + path: item.path || null, + start_line: item.start?.line || null, + end_line: item.end?.line || null + })); + + finalMeta.status = "completed"; + finalMeta.summary = { + ...summarizeFindings(findings), + errors_total: (parsed.errors || []).length + }; + finalMeta.error = null; + await writeJson(paths.meta, finalMeta); + } catch (err) { + finalMeta.status = "failed"; + finalMeta.error = `Failed to parse Semgrep JSON output: ${err.message}`; + await writeJson(paths.meta, finalMeta); + } + }); + + child.unref(); + + return { + success: true, + job_id: jobId, + status: "queued", + repo: safeRepo, + scope: safeScope, + profile, + created_at: initialMeta.created_at + }; +} + +async function getJobStatus({ job_id }) { + if (!job_id || typeof job_id !== "string") { + throw new Error("Missing required field: job_id"); + } + + const paths = getJobPaths(job_id); + const meta = await readJson(paths.meta); + + if (!meta) { + throw new Error(`Job not found: ${job_id}`); + } + + return { + success: true, + job_id, + status: meta.status, + created_at: meta.created_at, + started_at: meta.started_at, + finished_at: meta.finished_at, + repo: meta.repo, + scope: meta.scope, + profile: meta.profile, + summary: meta.summary, + error: meta.error, + exit_code: meta.exit_code, + pid: meta.pid + }; +} + +async function getJobResult({ job_id }) { + if (!job_id || typeof job_id !== "string") { + throw new Error("Missing required field: job_id"); + } + + const paths = getJobPaths(job_id); + const meta = await readJson(paths.meta); + + if (!meta) { + throw new Error(`Job not found: ${job_id}`); + } + + if (meta.status !== "completed") { + return { + success: false, + job_id, + status: meta.status, + error: meta.error || "Job is not completed yet" + }; + } + + const parsed = await readJson(paths.stdout, {}); + const findings = (parsed.results || []).map((item) => ({ + rule_id: item.check_id || null, + severity: item.extra?.severity || null, + message: item.extra?.message || null, + path: item.path || null, + start_line: item.start?.line || null, + end_line: item.end?.line || null + })); + + const stderr = await fs.readFile(paths.stderr, "utf8").catch(() => ""); + + return { + success: true, + job_id, + status: meta.status, + tool: "semgrep", + image: meta.image, + repo: meta.repo, + scope: meta.scope, + profile: meta.profile, + created_at: meta.created_at, + started_at: meta.started_at, + finished_at: meta.finished_at, + repo_path: meta.repo_path, + target_path: meta.target_path, + container_target_path: meta.container_target_path, + summary: meta.summary, + findings, + errors: parsed.errors || [], + stderr + }; +} + +async function listJobs() { + await ensureDir(JOBS_BASE_DIR); + const entries = await fs.readdir(JOBS_BASE_DIR, { withFileTypes: true }); + const jobs = []; + + for (const entry of entries) { + if (!entry.isDirectory()) continue; + const meta = await readJson(path.join(JOBS_BASE_DIR, entry.name, "job.json")); + if (!meta) continue; + + jobs.push({ + job_id: entry.name, + status: meta.status, + repo: meta.repo, + scope: meta.scope, + profile: meta.profile, + created_at: meta.created_at, + started_at: meta.started_at, + finished_at: meta.finished_at, + summary: meta.summary, + error: meta.error + }); + } + + jobs.sort((a, b) => String(b.created_at).localeCompare(String(a.created_at))); + + return { + success: true, + jobs + }; +} + +const server = new Server( + { + name: "repo-scan-job-mcp", + version: "1.0.0" + }, + { + capabilities: { + tools: {} + } + } +); + +server.setRequestHandler(ListToolsRequestSchema, async () => { + return { + tools: [ + { + name: "repo_list", + description: "List available repository folders under /repos.", + inputSchema: { + type: "object", + properties: {} + } + }, + { + name: "repo_security_scan_start", + description: + "Start a background Semgrep scan job for a repository under /repos. Returns a job_id immediately.", + inputSchema: { + type: "object", + properties: { + repo: { + type: "string", + description: + "Repository name under /repos, for example 'redback-smartbike-iot'. '/repos/redback-smartbike-iot' is also accepted." + }, + scope: { + type: "string", + description: + "Optional file or subdirectory inside the repository. Use '.' for the whole repository.", + default: "." + }, + profile: { + type: "string", + description: "Scan profile", + enum: ["security", "secrets", "full"], + default: "security" + } + }, + required: ["repo"] + } + }, + { + name: "repo_security_scan_status", + description: "Check the status of a previously started scan job.", + inputSchema: { + type: "object", + properties: { + job_id: { + type: "string", + description: "Job ID returned by repo_security_scan_start" + } + }, + required: ["job_id"] + } + }, + { + name: "repo_security_scan_result", + description: "Fetch the result of a completed scan job.", + inputSchema: { + type: "object", + properties: { + job_id: { + type: "string", + description: "Job ID returned by repo_security_scan_start" + } + }, + required: ["job_id"] + } + }, + { + name: "repo_security_scan_list_jobs", + description: "List recent scan jobs and their statuses.", + inputSchema: { + type: "object", + properties: {} + } + } + ] + }; +}); + +server.setRequestHandler(CallToolRequestSchema, async (request) => { + const { name, arguments: args } = request.params; + + try { + let result; + + if (name === "repo_list") { + result = await listRepos(); + } else if (name === "repo_security_scan_start") { + result = await startScanJob(args || {}); + } else if (name === "repo_security_scan_status") { + result = await getJobStatus(args || {}); + } else if (name === "repo_security_scan_result") { + result = await getJobResult(args || {}); + } else if (name === "repo_security_scan_list_jobs") { + result = await listJobs(); + } else { + return { + content: [ + { + type: "text", + text: JSON.stringify( + { success: false, error: `Unknown tool: ${name}` }, + null, + 2 + ) + } + ], + isError: true + }; + } + + return { + content: [ + { + type: "text", + text: JSON.stringify(result, null, 2) + } + ] + }; + } catch (err) { + return { + content: [ + { + type: "text", + text: JSON.stringify( + { success: false, error: err.message }, + null, + 2 + ) + } + ], + isError: true + }; + } +}); + +const transport = new StdioServerTransport(); +await server.connect(transport); diff --git a/PrivateAI/mcphub/repo_scan_job_mcp.md b/PrivateAI/mcphub/repo_scan_job_mcp.md new file mode 100644 index 0000000..4ed3882 --- /dev/null +++ b/PrivateAI/mcphub/repo_scan_job_mcp.md @@ -0,0 +1,1052 @@ +# Repo Scan Job MCP + +This service is a proof-of-concept Model Context Protocol (MCP) server for launching and managing Semgrep repository scan jobs. + +It exposes MCP tools over stdio and lets an MCP client: + +- List repositories mounted under `/repos` +- Start a Semgrep scan job against a repository +- Check scan job status +- Retrieve completed scan results +- List recent scan jobs + +The implementation is intended for use inside an MCP hub or similar orchestration container where repositories are mounted read-only and Semgrep is executed in a separate Docker container. + +## Important Notice + +This code was generated by OpenAI and Claude with instruction and guidance from ejb / Richard Edwards. + +It should be treated as a proof of concept. + +Additional hardening, validation, testing, access control, and production readiness work may be required before using it at large scale or in security-sensitive environments. + +## What This MCP Server Does + +The server provides an MCP interface around Semgrep scanning. + +Rather than running a long Semgrep scan directly inside the MCP tool call, it starts a background job and immediately returns a `job_id`. + +The user or client can then poll the job status and fetch the result once the scan completes. + +This avoids blocking the MCP client while Semgrep scans larger repositories. + +## Runtime Model + +The MCP server itself runs as a Node.js process. + +When a scan starts, it launches Semgrep using Docker: + +```bash +docker run --rm \ + -v "${HOST_REPOS_BASE_DIR}:/repos:ro" \ + -w /repos \ + semgrep/semgrep:1.159.0 \ + semgrep scan ... +``` + +This means the MCP server container needs access to the Docker CLI and Docker socket if it is running inside Docker. + +## Main Paths + +### Repository Base Directory Inside the MCP Container + +```text +/repos +``` + +Controlled by: + +```env +REPOS_BASE_DIR=/repos +``` + +This is where the MCP server looks for available repositories. + +### Repository Base Directory on the Docker Host + +```text +/opt/redback/repos +``` + +Controlled by: + +```env +HOST_REPOS_BASE_DIR=/opt/redback/repos +``` + +This path is passed into the Semgrep scan container using Docker volume mounting. + +The MCP server sees repositories through `REPOS_BASE_DIR`, while the spawned Semgrep container sees the host path through `HOST_REPOS_BASE_DIR`. + +These must point to the same repository tree from their respective viewpoints. + +### Job Storage Directory + +Default: + +```text +/repos/logs/security-jobs +``` + +Controlled by: + +```env +JOBS_BASE_DIR=/repos/logs/security-jobs +``` + +Each scan job gets its own directory containing: + +```text +job.json +stdout.json +stderr.log +``` + +## Environment Variables + +### `REPOS_BASE_DIR` + +Default: + +```env +REPOS_BASE_DIR=/repos +``` + +The directory inside the MCP server container where repositories are visible. + +### `HOST_REPOS_BASE_DIR` + +Default: + +```env +HOST_REPOS_BASE_DIR=/opt/redback/repos +``` + +The host-side directory passed to the spawned Semgrep Docker container. + +### `JOBS_BASE_DIR` + +Default: + +```env +JOBS_BASE_DIR=/repos/logs/security-jobs +``` + +Where job metadata and results are stored. + +### `SEMGREP_IMAGE` + +Default: + +```env +SEMGREP_IMAGE=semgrep/semgrep:1.159.0 +``` + +The Semgrep container image used when running scan jobs. + +### `SEMGREP_JOBS` + +Default: + +```env +SEMGREP_JOBS=2 +``` + +Controls Semgrep parallelism. + +Passed to Semgrep as: + +```bash +--jobs 2 +``` + +### `SEMGREP_PER_FILE_TIMEOUT` + +Default: + +```env +SEMGREP_PER_FILE_TIMEOUT=2 +``` + +Controls the Semgrep per-file timeout. + +Passed to Semgrep as: + +```bash +--timeout 2 +``` + +### `SEMGREP_TIMEOUT_THRESHOLD` + +Default: + +```env +SEMGREP_TIMEOUT_THRESHOLD=1 +``` + +Controls Semgrep timeout threshold behaviour. + +Passed to Semgrep as: + +```bash +--timeout-threshold 1 +``` + +### `SEMGREP_MAX_TARGET_BYTES` + +Default: + +```env +SEMGREP_MAX_TARGET_BYTES=500000 +``` + +Controls the maximum file size scanned by Semgrep. + +Passed to Semgrep as: + +```bash +--max-target-bytes 500000 +``` + +### `JOB_RETENTION_LIMIT` + +Default: + +```env +JOB_RETENTION_LIMIT=100 +``` + +Limits the number of retained job directories. + +Older job directories beyond this limit are removed when a new job starts. + +### Proxy Variables + +The spawned Semgrep container receives: + +```env +HTTP_PROXY +HTTPS_PROXY +NO_PROXY +``` + +This allows Semgrep to operate in proxy-controlled environments. + +## MCP Server Identity + +The MCP server registers as: + +```json +{ + "name": "repo-scan-job-mcp", + "version": "1.0.0" +} +``` + +It exposes tool capabilities only. + +## MCP Tools + +## `repo_list` + +Lists available repository folders under `/repos`. + +### Input + +```json +{} +``` + +### Output + +```json +{ + "success": true, + "repos_base_dir": "/repos", + "repos": [ + { + "repo": "example-repo", + "path": "/repos/example-repo" + } + ] +} +``` + +### Notes + +The repository list excludes: + +- Hidden directories +- The `logs` directory + +This prevents job logs from appearing as repositories. + +## `repo_security_scan_start` + +Starts a background Semgrep scan job for a repository under `/repos`. + +The tool returns immediately with a `job_id`. + +### Input + +```json +{ + "repo": "example-repo", + "scope": ".", + "profile": "security" +} +``` + +### Required Fields + +```json +{ + "repo": "example-repo" +} +``` + +### Optional Fields + +```json +{ + "scope": ".", + "profile": "security" +} +``` + +### Supported Profiles + +```text +security +secrets +full +``` + +### Profile Behaviour + +#### `security` + +Uses: + +```bash +--config=p/security-audit +``` + +#### `secrets` + +Uses: + +```bash +--config=p/secrets +``` + +#### `full` + +Uses: + +```bash +--config=p/security-audit +--config=p/secrets +--config=p/owasp-top-ten +``` + +### Output + +```json +{ + "success": true, + "job_id": "scan-2026-05-18T06-00-00-000Z-example-repo-a1b2c3d4", + "status": "queued", + "repo": "example-repo", + "scope": ".", + "profile": "security", + "created_at": "2026-05-18T06:00:00.000Z" +} +``` + +## `repo_security_scan_status` + +Checks the status of a previously started scan job. + +### Input + +```json +{ + "job_id": "scan-2026-05-18T06-00-00-000Z-example-repo-a1b2c3d4" +} +``` + +### Output + +```json +{ + "success": true, + "job_id": "scan-2026-05-18T06-00-00-000Z-example-repo-a1b2c3d4", + "status": "completed", + "created_at": "2026-05-18T06:00:00.000Z", + "started_at": "2026-05-18T06:00:01.000Z", + "finished_at": "2026-05-18T06:00:15.000Z", + "repo": "example-repo", + "scope": ".", + "profile": "security", + "summary": { + "findings_total": 3, + "critical": 0, + "high": 1, + "medium": 2, + "low": 0, + "info": 0, + "unknown": 0, + "errors_total": 0 + }, + "error": null, + "exit_code": 1, + "pid": 12345 +} +``` + +## `repo_security_scan_result` + +Fetches the result of a completed scan job. + +### Input + +```json +{ + "job_id": "scan-2026-05-18T06-00-00-000Z-example-repo-a1b2c3d4" +} +``` + +### Output + +```json +{ + "success": true, + "job_id": "scan-2026-05-18T06-00-00-000Z-example-repo-a1b2c3d4", + "status": "completed", + "tool": "semgrep", + "image": "semgrep/semgrep:1.159.0", + "repo": "example-repo", + "scope": ".", + "profile": "security", + "summary": { + "findings_total": 3, + "critical": 0, + "high": 1, + "medium": 2, + "low": 0, + "info": 0, + "unknown": 0, + "errors_total": 0 + }, + "findings": [ + { + "rule_id": "rule.id", + "severity": "HIGH", + "message": "Finding message", + "path": "example.js", + "start_line": 10, + "end_line": 12 + } + ], + "errors": [], + "stderr": "" +} +``` + +### Behaviour When Job Is Not Complete + +If the job is still queued or running, the result tool returns: + +```json +{ + "success": false, + "job_id": "scan-...", + "status": "running", + "error": "Job is not completed yet" +} +``` + +## `repo_security_scan_list_jobs` + +Lists recent scan jobs and their statuses. + +### Input + +```json +{} +``` + +### Output + +```json +{ + "success": true, + "jobs": [ + { + "job_id": "scan-...", + "status": "completed", + "repo": "example-repo", + "scope": ".", + "profile": "security", + "created_at": "2026-05-18T06:00:00.000Z", + "started_at": "2026-05-18T06:00:01.000Z", + "finished_at": "2026-05-18T06:00:15.000Z", + "summary": { + "findings_total": 3, + "critical": 0, + "high": 1, + "medium": 2, + "low": 0, + "info": 0, + "unknown": 0, + "errors_total": 0 + }, + "error": null + } + ] +} +``` + +## Repository Input Validation + +The `repo` parameter is deliberately restricted. + +Accepted examples: + +```text +redback-smartbike-iot +/repos/redback-smartbike-iot +``` + +Rejected examples: + +```text +/repos +../somewhere +repo/subdir +repo\subdir +``` + +The `repo` value must identify a direct repository directory under `/repos`. + +This prevents a client from escaping the repository base directory. + +## Scope Handling + +The `scope` parameter can be used to scan a subdirectory or file inside the repository. + +Examples: + +```json +{ + "repo": "example-repo", + "scope": "." +} +``` + +```json +{ + "repo": "example-repo", + "scope": "src" +} +``` + +```json +{ + "repo": "example-repo", + "scope": "src/server.js" +} +``` + +The server resolves the requested scope and checks that it remains inside the selected repository. + +If the scope escapes the repository root, the request is rejected. + +## Semgrep Scan Command + +The spawned Semgrep command uses: + +```bash +semgrep scan \ + --json \ + --disable-version-check \ + --metrics=off \ + --optimizations=all \ + --jobs "${SEMGREP_JOBS}" \ + --timeout "${SEMGREP_PER_FILE_TIMEOUT}" \ + --timeout-threshold "${SEMGREP_TIMEOUT_THRESHOLD}" \ + --max-target-bytes "${SEMGREP_MAX_TARGET_BYTES}" \ + --exclude=node_modules \ + --exclude=.git \ + --exclude=dist \ + --exclude=build \ + --exclude=.venv \ + --exclude=vendor \ + --exclude=coverage \ + --exclude=.next \ + --exclude=.cache \ + --exclude=target \ + --config=p/security-audit \ + /repos/example-repo +``` + +Additional configs are added depending on the selected profile. + +## Excluded Paths + +The scan excludes common generated or dependency directories: + +```text +node_modules +.git +dist +build +.venv +vendor +coverage +.next +.cache +target +``` + +This reduces noise and improves scan performance. + +## Job Lifecycle + +A job moves through these states: + +```text +queued +running +completed +failed +``` + +### `queued` + +The job metadata has been created. + +### `running` + +The Semgrep Docker process has been spawned. + +### `completed` + +Semgrep exited successfully with code `0` or with findings using code `1`. + +### `failed` + +The Semgrep process failed with an unexpected exit code, or the JSON output could not be parsed. + +## Semgrep Exit Codes + +The wrapper treats exit codes `0` and `1` as non-fatal: + +```javascript +if (![0, 1].includes(code)) { + finalMeta.status = "failed"; +} +``` + +This is important because Semgrep may return exit code `1` when findings are present. + +## Finding Summary + +The wrapper normalises Semgrep results into a compact finding format: + +```json +{ + "rule_id": "check_id", + "severity": "HIGH", + "message": "message", + "path": "file.js", + "start_line": 10, + "end_line": 12 +} +``` + +It also generates a severity summary: + +```json +{ + "findings_total": 3, + "critical": 0, + "high": 1, + "medium": 2, + "low": 0, + "info": 0, + "unknown": 0, + "errors_total": 0 +} +``` + +Severity mapping: + +```text +CRITICAL -> critical +HIGH or ERROR -> high +MEDIUM or WARNING -> medium +LOW -> low +INFO or NOTICE -> info +Anything else -> unknown +``` + +## Suggested File Name + +```text +repo_scan_job_mcp.js +``` + +## Example Docker Compose Service + +This MCP server needs the Node.js runtime, the MCP SDK, access to repositories, and access to the Docker socket. + +Example service: + +```yaml +services: + repo-scan-job-mcp: + image: node:22 + container_name: repo-scan-job-mcp + working_dir: /app + command: ["node", "/app/repo_scan_job_mcp.js"] + environment: + - REPOS_BASE_DIR=/repos + - HOST_REPOS_BASE_DIR=/opt/redback/repos + - JOBS_BASE_DIR=/repos/logs/security-jobs + - SEMGREP_IMAGE=semgrep/semgrep:1.159.0 + - SEMGREP_JOBS=2 + - SEMGREP_PER_FILE_TIMEOUT=2 + - SEMGREP_TIMEOUT_THRESHOLD=1 + - SEMGREP_MAX_TARGET_BYTES=500000 + - JOB_RETENTION_LIMIT=100 + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=${NO_PROXY} + volumes: + - ./repo_scan_job_mcp.js:/app/repo_scan_job_mcp.js:ro + - /opt/redback/repos:/repos + - /var/run/docker.sock:/var/run/docker.sock + restart: unless-stopped +``` + +Important: the `/repos` mount is not read-only in this example because the job log directory defaults to `/repos/logs/security-jobs`. + +If you want repositories to remain read-only, use a separate writable job directory: + +```yaml +environment: + - JOBS_BASE_DIR=/jobs + +volumes: + - /opt/redback/repos:/repos:ro + - /opt/redback/security-jobs:/jobs +``` + +## Recommended Safer Compose Pattern + +```yaml +services: + repo-scan-job-mcp: + image: node:22 + container_name: repo-scan-job-mcp + working_dir: /app + command: ["node", "/app/repo_scan_job_mcp.js"] + environment: + - REPOS_BASE_DIR=/repos + - HOST_REPOS_BASE_DIR=/opt/redback/repos + - JOBS_BASE_DIR=/jobs + - SEMGREP_IMAGE=semgrep/semgrep:1.159.0 + - SEMGREP_JOBS=2 + - SEMGREP_PER_FILE_TIMEOUT=2 + - SEMGREP_TIMEOUT_THRESHOLD=1 + - SEMGREP_MAX_TARGET_BYTES=500000 + - JOB_RETENTION_LIMIT=100 + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=${NO_PROXY} + volumes: + - ./repo_scan_job_mcp.js:/app/repo_scan_job_mcp.js:ro + - /opt/redback/repos:/repos:ro + - /opt/redback/security-jobs:/jobs + - /var/run/docker.sock:/var/run/docker.sock + restart: unless-stopped +``` + +## MCP Hub Configuration + +Because this MCP server uses stdio transport, it should be registered as a command-based MCP server. + +Example MCP hub entry: + +```json +{ + "repo-scan-job-mcp": { + "command": "node", + "args": ["/app/repo_scan_job_mcp.js"], + "env": { + "REPOS_BASE_DIR": "/repos", + "HOST_REPOS_BASE_DIR": "/opt/redback/repos", + "JOBS_BASE_DIR": "/jobs", + "SEMGREP_IMAGE": "semgrep/semgrep:1.159.0", + "SEMGREP_JOBS": "2", + "SEMGREP_PER_FILE_TIMEOUT": "2", + "SEMGREP_TIMEOUT_THRESHOLD": "1", + "SEMGREP_MAX_TARGET_BYTES": "500000", + "JOB_RETENTION_LIMIT": "100" + } + } +} +``` + +If running inside an MCP hub container, ensure the hub container also has: + +```yaml +volumes: + - /opt/redback/repos:/repos:ro + - /opt/redback/security-jobs:/jobs + - /var/run/docker.sock:/var/run/docker.sock +``` + +## Testing the MCP Server Manually + +Start the server: + +```bash +node repo_scan_job_mcp.js +``` + +Because it uses stdio transport, it expects MCP JSON-RPC messages on stdin and writes responses to stdout. + +For practical testing, use it through an MCP client or MCP hub. + +## Operational Flow + +A typical workflow is: + +1. Call `repo_list` +2. Pick a repository name +3. Call `repo_security_scan_start` +4. Store the returned `job_id` +5. Call `repo_security_scan_status` until the job is completed +6. Call `repo_security_scan_result` +7. Review findings + +Example request sequence: + +```text +repo_list +repo_security_scan_start(repo="example-repo", profile="security") +repo_security_scan_status(job_id="scan-...") +repo_security_scan_result(job_id="scan-...") +``` + +## Security Considerations + +This service can trigger Docker containers. + +That is powerful and should be treated as sensitive. + +### Docker Socket Risk + +Mounting: + +```text +/var/run/docker.sock +``` + +into a container effectively gives that container significant control over the Docker host. + +Use this only in a trusted environment. + +### Repository Access + +The scanner should normally receive repositories read-only: + +```text +/opt/redback/repos:/repos:ro +``` + +The MCP server itself needs a writable job directory, but that does not need to be inside `/repos`. + +Recommended: + +```text +/opt/redback/security-jobs:/jobs +``` + +### Input Restrictions + +The code includes basic safeguards: + +- Repository must be a direct child of `/repos` +- Repository cannot contain `..` +- Repository cannot include path separators +- Scope must resolve inside the selected repository + +These checks are useful, but production deployments should still apply authentication and authorisation controls at the MCP client or hub layer. + +### Network Access + +The spawned Semgrep container receives proxy variables. + +If outbound access is not required, restrict or remove proxy configuration. + +### Secrets Scanning + +The `secrets` and `full` profiles may detect sensitive material in repositories. + +Ensure scan results are stored securely and not exposed to untrusted users. + +## Limitations + +Current limitations include: + +- No authentication built into this MCP server. +- No per-user authorisation. +- No cancellation tool for running jobs. +- No streaming of scan progress. +- Results are stored as local files. +- Docker socket access is required for scan execution. +- Job retention is count-based, not age-based. +- No central database or multi-node job coordination. +- No separate severity thresholding or policy gate output. + +## Possible Future Improvements + +Useful future enhancements: + +- Add a `repo_security_scan_cancel` tool. +- Add age-based cleanup for old jobs. +- Add result filtering by severity. +- Add SARIF output support. +- Add Git commit metadata to job results. +- Add branch/tag awareness. +- Add allowlist/denylist repository controls. +- Add authentication/authorisation at the MCP wrapper layer. +- Add structured progress updates. +- Add optional direct Semgrep execution without Docker. +- Add support for custom Semgrep rules mounted from a rules directory. +- Add policy modes such as `audit`, `warn`, and `fail`. + +## Troubleshooting + +### `Repository not found` + +Run: + +```text +repo_list +``` + +Confirm the requested repository exists under `/repos`. + +Also check the container mount: + +```bash +ls -la /repos +``` + +### `Scope escapes repository root` + +The requested `scope` resolved outside the selected repository. + +Use a relative path inside the repository, such as: + +```text +. +src +app/server.js +``` + +### Semgrep Job Fails + +Check the job status: + +```text +repo_security_scan_status +``` + +Then inspect the job directory: + +```bash +ls -la /repos/logs/security-jobs/ +cat /repos/logs/security-jobs//stderr.log +cat /repos/logs/security-jobs//job.json +``` + +If using the safer `/jobs` path: + +```bash +ls -la /jobs/ +cat /jobs//stderr.log +cat /jobs//job.json +``` + +### Docker Is Not Available + +The MCP server launches scans using: + +```bash +docker run +``` + +Check that the container has Docker access: + +```bash +docker ps +``` + +If running inside Docker, mount the Docker socket: + +```yaml +- /var/run/docker.sock:/var/run/docker.sock +``` + +The container image must also include the Docker CLI. + +The plain `node:22` image may not include Docker CLI by default, so a custom image may be required. + +### Semgrep Cannot See Repositories + +Check that `HOST_REPOS_BASE_DIR` points to the correct host path. + +The spawned Semgrep container uses: + +```text +${HOST_REPOS_BASE_DIR}:/repos:ro +``` + +If `HOST_REPOS_BASE_DIR` is wrong, the MCP server may see the repo but the Semgrep container will not. + +### Proxy Problems + +Check: + +```env +HTTP_PROXY +HTTPS_PROXY +NO_PROXY +``` + +If the Semgrep image cannot reach rule sources, proxy configuration may be required. + +If local services are incorrectly routed through the proxy, add them to `NO_PROXY`. + +## Summary + +This MCP server provides a job-based Semgrep scanning interface for repositories mounted under `/repos`. + +It is designed to fit into a private AI / MCP hub workflow where an assistant can request repository scans without blocking while the scan runs. + +The key design points are: + +- MCP stdio server +- Background Semgrep scan jobs +- Docker-based Semgrep execution +- Repository input validation +- Scoped scanning inside repositories +- Job metadata and result files +- Summary and finding extraction +- Configurable Semgrep profile, image, timeout, and retention behaviour diff --git a/PrivateAI/openwebui/caddy/Caddyfile b/PrivateAI/openwebui/caddy/Caddyfile new file mode 100644 index 0000000..6dea1e9 --- /dev/null +++ b/PrivateAI/openwebui/caddy/Caddyfile @@ -0,0 +1,16 @@ +{ + # Turn off auto HTTPS completely + auto_https off +} + +# HTTP → redirect to HTTPS +:80 { + redir https://10.137.17.254{uri} +} + +# HTTPS with our self-signed cert +:443 { + tls /etc/caddy/certs/local.crt /etc/caddy/certs/local.key + + reverse_proxy http://openwebui:8080 +} diff --git a/PrivateAI/openwebui/caddy/Caddyfile.orig b/PrivateAI/openwebui/caddy/Caddyfile.orig new file mode 100644 index 0000000..10003b6 --- /dev/null +++ b/PrivateAI/openwebui/caddy/Caddyfile.orig @@ -0,0 +1,3 @@ +10.137.17.254 { + reverse_proxy openwebui:3000 +} diff --git a/PrivateAI/openwebui/caddy/caddy_config/caddy/autosave.json b/PrivateAI/openwebui/caddy/caddy_config/caddy/autosave.json new file mode 100644 index 0000000..bd155e9 --- /dev/null +++ b/PrivateAI/openwebui/caddy/caddy_config/caddy/autosave.json @@ -0,0 +1 @@ +{"apps":{"http":{"servers":{"srv0":{"automatic_https":{"disable":true},"listen":[":443"],"routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"openwebui:8080"}]}]}],"tls_connection_policies":[{"certificate_selection":{"any_tag":["cert0"]}}]},"srv1":{"automatic_https":{"disable":true},"listen":[":80"],"routes":[{"handle":[{"handler":"static_response","headers":{"Location":["https://10.137.17.254{http.request.uri}"]},"status_code":302}]}]}}},"tls":{"certificates":{"load_files":[{"certificate":"/etc/caddy/certs/local.crt","key":"/etc/caddy/certs/local.key","tags":["cert0"]}]}}}} \ No newline at end of file diff --git a/PrivateAI/openwebui/caddy/caddy_data/caddy/instance.uuid b/PrivateAI/openwebui/caddy/caddy_data/caddy/instance.uuid new file mode 100644 index 0000000..ef9eade --- /dev/null +++ b/PrivateAI/openwebui/caddy/caddy_data/caddy/instance.uuid @@ -0,0 +1 @@ +2a8c158e-bd37-4c58-a81b-8dee66267377 \ No newline at end of file diff --git a/PrivateAI/openwebui/caddy/caddy_data/caddy/last_clean.json b/PrivateAI/openwebui/caddy/caddy_data/caddy/last_clean.json new file mode 100644 index 0000000..8fb76d8 --- /dev/null +++ b/PrivateAI/openwebui/caddy/caddy_data/caddy/last_clean.json @@ -0,0 +1 @@ +{"tls":{"timestamp":"2026-01-29T23:59:16.675342759Z","instance_id":"2a8c158e-bd37-4c58-a81b-8dee66267377"}} \ No newline at end of file diff --git a/PrivateAI/openwebui/caddy/certs/local.crt b/PrivateAI/openwebui/caddy/certs/local.crt new file mode 100644 index 0000000..8ac896b --- /dev/null +++ b/PrivateAI/openwebui/caddy/certs/local.crt @@ -0,0 +1,19 @@ +-----BEGIN CERTIFICATE----- +MIIDETCCAfmgAwIBAgIUcmLthbydWk0jTdqODawaCTUIOSswDQYJKoZIhvcNAQEL +BQAwGDEWMBQGA1UEAwwNMTAuMTM3LjE3LjI1NDAeFw0yNTEyMDEwNjA5MzBaFw0z +NTExMjkwNjA5MzBaMBgxFjAUBgNVBAMMDTEwLjEzNy4xNy4yNTQwggEiMA0GCSqG +SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDpHqfhlt/KExEDjKyphPuX+Zrpxg4BSdJ+ +h8cBsBqUyRHtpzYjWOIGogCG9lGhKTJCuadahHzDMGgBSpDYVSX7Gj0Mpmz/yPqd +em79qhAt+gJUQ307xLjMkgOckCu9rhSyFwcefGybT/0wecvkxQmDILhweVmqhqc5 +CUo9JJ6AENsaEPP4Yv01YyW3CKcU/aW3CyJl4ILB879qnV1+6BCvNS+lLjJnuu1c +cB4ODOuEHmkEA4l6kugQpNX0dCT3DZzLFQ+4PxXa9qdllRT3+vXEiZfgn9SC+HMz +OPhPfUL3vCMpNjpwy8EXAAIrj4cKUrmOQDpZxA4G5QUEvbYA+qhdAgMBAAGjUzBR +MB0GA1UdDgQWBBTy6B3+IS68wVeXrtbtmDALg8nBWTAfBgNVHSMEGDAWgBTy6B3+ +IS68wVeXrtbtmDALg8nBWTAPBgNVHRMBAf8EBTADAQH/MA0GCSqGSIb3DQEBCwUA +A4IBAQAcx9d4t8NmXiIeF02IXNhubfhQPMpsEtdbfqooGlyR3zfS+7R/JlMwVP1w +C2rZfXw3zx+eXK0DzQazUEHpVDLqdrxW7YlzIbOR539V8hOayGgFzQkKG3BC3F/F +4Ygldl8ZZQWMMyc/+Nb/iN+rgQul7VYvW4KK5PORUGuvVWFg3RYhpusgf+8Fk8NG +kGup0miEXKGTCCh86fMElV9GjGdD8ZuQ0McV18dwUIsyC9MDrsDVPcxcGcCAHMb4 +nsyBQB/CUurPf5yFRUJL4T3G3y5FKGM4hku44fKdFHcH9AyEhFYG2JPp/ElVTIl0 +cbN3TkJRWtWIgMaBYe4qUkKmsKRj +-----END CERTIFICATE----- diff --git a/PrivateAI/openwebui/caddy/certs/local.key b/PrivateAI/openwebui/caddy/certs/local.key new file mode 100644 index 0000000..90f6f0c --- /dev/null +++ b/PrivateAI/openwebui/caddy/certs/local.key @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQDpHqfhlt/KExED +jKyphPuX+Zrpxg4BSdJ+h8cBsBqUyRHtpzYjWOIGogCG9lGhKTJCuadahHzDMGgB +SpDYVSX7Gj0Mpmz/yPqdem79qhAt+gJUQ307xLjMkgOckCu9rhSyFwcefGybT/0w +ecvkxQmDILhweVmqhqc5CUo9JJ6AENsaEPP4Yv01YyW3CKcU/aW3CyJl4ILB879q +nV1+6BCvNS+lLjJnuu1ccB4ODOuEHmkEA4l6kugQpNX0dCT3DZzLFQ+4PxXa9qdl +lRT3+vXEiZfgn9SC+HMzOPhPfUL3vCMpNjpwy8EXAAIrj4cKUrmOQDpZxA4G5QUE +vbYA+qhdAgMBAAECggEAARc7aknfXFSeFEMZ4koHEnrlSeCWklSLBjwbsDH59dz4 ++7d4ErxozS/bk8YVamiOJtzF8lRn8C4mTUmH0K694a5lGt/VVN8Ny2yxlhxmzx8e +s9AWdLSi3G9SLbgkZomWn6FEqubrtSvB/7uRA+BeI6tJKshqAH148YuXFR7e0Gnc +6xhsNRV6+Be+rYV9BL4Sq0KNKDE2AE7/E2VOyfMEAPUdhq53jrdbJvu8TgoXYYd/ +yQJarj5GG3dtnR897zw2m+nnBbgpP9Okf8dDLEOWFzwrMQXgwIJcW77Z7NBCjGgv +Y6fBHDWpQkr4Qs4Zxv+s9kos07o+DSy9qhy/g/V++wKBgQDtRGVEtfbzHImgWebV +7LKjUXCOgpjkmPeBrb2rvhoRFUOiFzS2dWTBIYF+/wXv6w0EtF1AZcAg4CgK5a5/ +zEjgrLaT4EZ0re8mL0QEMyqFzUAXrk3IRh3cfIce5Ml/viklYzZ1xidbMO9fuQsy +mn6c1WABDIgwtatLY+hRqDH+vwKBgQD7hm7nyPlSqvLCBpYva/VkZHc5wmndLL5/ +1yZDhIBHEBgmBjLFFNEBT1w5CZF0fY7z5d2Ga76/S9Q0zYkw2B1Mj3s4W5rLhlQ3 +sT0CnS5cFCu/AgUw8ij++xiw99Db5HaZAmfXx2WSe25UwYaG6/ZBIGbVvO0YqR4S +IBn7OMB74wKBgQCjK1wxaqpP+pozKmBzUfpwEnvDpdCbtQ7RobhEudGXWfZPLIJV +0Fnf77jsq1lb61vill9jABanBUDEbbwZq1WbHWvaOmx5pXxH2E2ATee6aLLhFj/r +sTyr+v+5oUFplk8ZpSc4y3MZZYfZXppyzIiyNpN1ZTbruKP6jtSgA3mOZQKBgAsL +pkcrfjdxJmP64hGHDimwd8Pjk76QvnTiv91rLi7wt/7DeutItLz3/TbMAsU41lRD +nezPQnsoG1OOSx4H/5FjI6gf7bZOWdhwQhuhR23nvNwQfKXfnIlGAZmT6GofqE2j +22eQbBd4sCmsrfmy1weZIqr0Nv1EP/vPyRRNM7a9AoGALuJALLb1MMTf2g/OHBLV +69uGu6Lywx6q0P65/jxvyBYteqcFN92/GJ59Qo7I29gUC+43AUHDssE7goewxN0s +blZOZWDGhatKz901GxM8zuYem4IcelRO98k8xCfAQRFsWyv9uhrWrDs63gcdsL/h +7mrruuqPhQCn9qPE6NzL98M= +-----END PRIVATE KEY----- diff --git a/PrivateAI/openwebui/docker-compose-entra.yaml b/PrivateAI/openwebui/docker-compose-entra.yaml new file mode 100644 index 0000000..511a5a6 --- /dev/null +++ b/PrivateAI/openwebui/docker-compose-entra.yaml @@ -0,0 +1,88 @@ +version: "3.9" + +services: + openwebui: + image: ghcr.io/open-webui/open-webui:main + container_name: openwebui + expose: + - "8080" # expose instead of ports (Caddy will proxy) + volumes: + - ./data:/app/backend/data + #- ./caddy/certs/local.crt:/etc/ssl/certs/ca-certificates.crt:ro + environment: + - WEBUI_AUTH=true + - SAFE_MODE=true + - ENABLE_COMMUNITY_SHARING=false + - OPENAI_API_BASE_URL=http://10.137.17.254:9443/v1 + - OPENAI_API_KEY=$LOCALAI_API_KEY + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=localhost,127.0.0.1,::1,10.137.17.254 + - ENABLE_LOGIN_FORM=true + - ENABLE_SIGNPUT=true + - ENABLE_OIDC=true + - ENABLE_LDAP=false + #- REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt + #- SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt + + # Proxy + session behaviour + - TRUST_PROXY_HEADERS=true + - COOKIE_SECURE=true # HTTPS is now enabled via Caddy + - SESSION_COOKIE_SAMESITE=Lax + + # Public URL of OpenWebUI + - WEBUI_URL=https://10.137.17.254/ + + # OAuth / Microsoft Entra + - ENABLE_OAUTH_SIGNUP=true + - ENABLE_OAUTH_PERSISTENT_CONFIG=false + - OAUTH_MERGE_ACCOUNTS_BY_EMAIL=true + - OAUTH_UPDATE_PICTURE_ON_LOGIN=true + - OAUTH_MICROSOFT_ENABLED=true + - OAUTH_SCOPES=["openid", "profile","email"] + - MICROSOFT_CLIENT_ID=${MICROSOFT_CLIENT_ID} + - MICROSOFT_CLIENT_SECRET=${MICROSOFT_CLIENT_SECRET} + - MICROSOFT_CLIENT_TENANT_ID=${MICROSOFT_CLIENT_TENANT_ID} + - MICROSOFT_REDIRECT_URI=https://10.137.17.254/oauth/microsoft/callback + - OPENID_PROVIDER_URL=https://login.microsoftonline.com/secret/v2.0 # required for logout + #- OPENID_PROVIDER_URL=https://login.microsoftonline.com/${MICROSOFT_CLIENT_TENANT_ID}/v2.0 + - ENABLE_OAUTH_WITHOUT_EMAIL=true #this is a work around and breaks things like password reset and trusts that microsoft sub is stable which it normally isn't + - OAUTH_EMAIL_CLAIM=preferred_username + - OAUTH_PICTURE_CLAIM=picture + + extra_hosts: + - "host.docker.internal:host-gateway" + - "proxy1.it.deakin.edu.au:10.137.0.162" + networks: + - web + restart: unless-stopped + + caddy: + image: caddy:latest + container_name: caddy + ports: + - "3000:443" + - "80:80" + - "443:443" + volumes: + - ./caddy/Caddyfile:/etc/caddy/Caddyfile + - ./caddy/caddy_data:/data + - ./caddy/caddy_config:/config + - ./caddy/certs:/etc/caddy/certs:ro # <— add this + environment: + #- HTTP_PROXY=${HTTP_PROXY} + #- HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=localhost,127.0.0.1,::1,10.137.17.254,openwebui + networks: + - web + restart: unless-stopped + extra_hosts: + - "host.docker.internal:host-gateway" + - "proxy1.it.deakin.edu.au:10.137.0.162" +networks: + web: + +volumes: + caddy_data: + caddy_config: + data: diff --git a/PrivateAI/openwebui/env b/PrivateAI/openwebui/env new file mode 100644 index 0000000..923b99b --- /dev/null +++ b/PrivateAI/openwebui/env @@ -0,0 +1,7 @@ +HTTP_PROXY=http://proxy1.it.deakin.edu.au:3128 +HTTPS_PROXY=http://proxy1.it.deakin.edu.au:3128 +MICROSOFT_CLIENT_ID=insertoauthclientid # Microsoft OAuth client ID +MICROSOFT_CLIENT_TENANT_ID=insertoauthclientsecret # Microsoft OAuth client secret +MICROSOFT_CLIENT_SECRET=nserttenantid # Microsoft tenant ID - use 9188040d-6c67-4c5b-b112-36a304b66dad for personal accounts + +LOCALAI_API_KEY=sk-fromthelocalaienvfile diff --git a/PrivateAI/openwebui/readme.md b/PrivateAI/openwebui/readme.md new file mode 100644 index 0000000..990391e --- /dev/null +++ b/PrivateAI/openwebui/readme.md @@ -0,0 +1,247 @@ +# OpenWebUI (Docker Compose + Caddy + Microsoft Entra ID) + +This bundle runs **OpenWebUI** behind a **Caddy** reverse proxy with **HTTPS** (self‑signed cert) and **Microsoft Entra ID (OAuth/OIDC)** login enabled. +It is also pre-wired to talk to an **OpenAI-compatible API** (e.g. LocalAI) via `OPENAI_API_BASE_URL`. + +> **Security note:** The included `dot.env.example` contains placeholder/example values. +> Treat any secrets as **compromised** if they were ever committed or shared—rotate them in Microsoft Entra and your model backend. + +--- + +## What’s in this tarball + +``` +openwebui/ + docker-compose-entra.yaml + dot.env.example + caddy/ + Caddyfile + Caddyfile.orig + certs/ + local.crt + local.key + caddy_data/... + caddy_config/... +``` + +### File-by-file: what each configuration does + +#### `docker-compose-entra.yaml` +Defines **two services** on the same Docker network: + +- **`openwebui`** + - Image: `ghcr.io/open-webui/open-webui:main` + - Exposes port `8080` **only to the internal Docker network** (`expose:`) — not published to the host. + - Persists OpenWebUI state in `./data` (mounted to `/app/backend/data`). + - Enables: + - local login form (`ENABLE_LOGIN_FORM=true`) + - OAuth/OIDC (`ENABLE_OIDC=true`, `OAUTH_MICROSOFT_ENABLED=true`) + - auth (`WEBUI_AUTH=true`) + - safe mode (`SAFE_MODE=true`) + - Routes model calls to an OpenAI-compatible endpoint: + - `OPENAI_API_BASE_URL=http://10.137.17.254:9443/v1` + - `OPENAI_API_KEY=$LOCALAI_API_KEY` (read from `.env`) + - Supports proxies via `HTTP_PROXY` / `HTTPS_PROXY` from `.env` and a `NO_PROXY` list. + +- **`caddy`** + - Image: `caddy:latest` + - Publishes ports: + - `80:80` (HTTP redirect to HTTPS) + - `443:443` (HTTPS) + - `3000:443` (optional alternate access to the same HTTPS listener) + - Mounts: + - `./caddy/Caddyfile` to `/etc/caddy/Caddyfile` + - `./caddy/certs` to `/etc/caddy/certs` (read-only) for TLS cert/key + - `./caddy_data` and `./caddy_config` for Caddy runtime state + +> ⚠️ **Potential typo:** the compose includes `ENABLE_SIGNPUT=true`. +> OpenWebUI uses `ENABLE_SIGNUP`. If you have issues with signup, change it to `ENABLE_SIGNUP=true`. + +--- + +#### `dot.env.example` +An example environment file you copy to `.env` and edit. It provides: + +- `HTTP_PROXY`, `HTTPS_PROXY` – outbound proxy settings (optional) +- `MICROSOFT_CLIENT_ID` – Entra app registration client ID +- `MICROSOFT_CLIENT_SECRET` – Entra app registration client secret +- `MICROSOFT_CLIENT_TENANT_ID` – Entra tenant ID (or `common`/`organizations` depending on your setup) +- `LOCALAI_API_KEY` – API key for your OpenAI-compatible backend (LocalAI, etc.) + +> ⚠️ The comments in this example file are slightly mismatched (tenant vs secret). +> Use the variable names as the source of truth. + +--- + +#### `caddy/Caddyfile` +Caddy reverse proxy + TLS configuration: + +- Disables Caddy auto-HTTPS (`auto_https off`) so it **only** uses the provided cert. +- HTTP listener `:80` **redirects** to `https://10.137.17.254{uri}`. +- HTTPS listener `:443`: + - Uses the self-signed TLS cert: + - cert: `/etc/caddy/certs/local.crt` + - key: `/etc/caddy/certs/local.key` + - Proxies traffic to OpenWebUI at `http://openwebui:8080`. + +> ⚠️ The redirect + public URL are hard-coded to `10.137.17.254`. +> If your host IP/domain differs, update: +> - `caddy/Caddyfile` (redirect target) +> - `WEBUI_URL` and `MICROSOFT_REDIRECT_URI` in the compose + +--- + +#### `caddy/Caddyfile.orig` +A prior/original version of the Caddyfile kept for reference. + +--- + +#### `caddy/certs/local.crt` and `caddy/certs/local.key` +A **self-signed** TLS certificate and private key used by Caddy for HTTPS. + +- Browsers will show a certificate warning unless you **trust** the certificate on your machine. +- For a production deployment, replace these with a proper certificate (e.g., Let’s Encrypt with a real domain). + +--- + +#### `caddy/caddy_data/*` and `caddy/caddy_config/*` +Caddy’s persisted runtime state: + +- `caddy_data` – instance UUID, lock files, last-clean metadata, etc. +- `caddy_config/caddy/autosave.json` – Caddy’s autosaved config snapshot (generated/maintained by Caddy) + +You normally **do not edit** these manually. + +--- + +## Prerequisites + +- Docker + Docker Compose plugin (`docker compose version`) +- Ports **80** and **443** available on the host (and optionally **3000**) +- If you’ll use Entra login: + - A Microsoft Entra App Registration with a redirect URI matching your deployment URL. + +--- + +## Quick start + +### 1) Extract the tarball +From the directory containing the tar: + +```bash +tar -xf openwebui.tar +cd openwebui +``` + +### 2) Create your `.env` +Copy the example and edit values: + +```bash +cp dot.env.example .env +nano .env +``` + +At minimum, set: + +- `LOCALAI_API_KEY=...` +- `MICROSOFT_CLIENT_ID=...` +- `MICROSOFT_CLIENT_SECRET=...` +- `MICROSOFT_CLIENT_TENANT_ID=...` + +Optionally set proxy variables (or delete them if not needed). + +### 3) Update host/IP references (recommended) +This bundle is hardcoded to `10.137.17.254`. + +Search & replace in: + +- `caddy/Caddyfile` (redirect line) +- `docker-compose-entra.yaml`: + - `WEBUI_URL` + - `MICROSOFT_REDIRECT_URI` + - (optionally) `OPENAI_API_BASE_URL` if your model endpoint differs + +### 4) Start the stack +Run: + +```bash +docker compose --env-file .env -f docker-compose-entra.yaml up -d +``` + +Check logs: + +```bash +docker compose -f docker-compose-entra.yaml logs -f --tail=200 +``` + +Stop: + +```bash +docker compose -f docker-compose-entra.yaml down +``` + +--- + +## Accessing OpenWebUI + +- Primary (standard HTTPS): `https:///` +- Optional alternate mapping: `https://:3000/` + +Because the certificate is self-signed, your browser will warn unless you trust `caddy/certs/local.crt`. + +### Trusting the self-signed cert (quick guidance) + +- **macOS:** Keychain Access → System (or Login) → Certificates → Import `local.crt` → set to “Always Trust”. +- **Windows:** `certmgr.msc` → Trusted Root Certification Authorities → Certificates → Import `local.crt`. +- **Linux:** depends on distro; typically copy to `/usr/local/share/ca-certificates/` and run `update-ca-certificates`. + +--- + +## Microsoft Entra ID (OAuth/OIDC) notes + +In Entra App Registration: + +- Add a **Redirect URI** matching: + - `https:///oauth/microsoft/callback` +- Ensure the app is configured for the correct tenant type: + - Single-tenant: use your tenant ID + - Multi-tenant/personal: you may need `common` and adjust scopes/claims + +This compose sets: +- `OAUTH_SCOPES=["openid","profile","email"]` +- `OAUTH_EMAIL_CLAIM=preferred_username` +- `OAUTH_MERGE_ACCOUNTS_BY_EMAIL=true` + +> The compose includes `ENABLE_OAUTH_WITHOUT_EMAIL=true` (marked as a workaround). +> If you don’t need it, consider disabling it for cleaner account semantics. + +--- + +## Troubleshooting + +### Port 80/443 already in use +- Another service (nginx, Traefik, etc.) may be listening. +- Either stop the conflicting service or change the published ports in the `caddy` service. + +### Redirect goes to the wrong IP +Update the `redir` target in `caddy/Caddyfile`. + +### OAuth login loops or fails +- Confirm the redirect URI matches **exactly** (scheme/host/path). +- Confirm the tenant setting and the `MICROSOFT_CLIENT_*` values in `.env`. +- Check OpenWebUI logs for OAuth errors. + +### OpenWebUI can’t reach the model backend +- Confirm `OPENAI_API_BASE_URL` is reachable **from inside Docker**. +- If the backend runs on the Docker host, `host.docker.internal` is available due to `extra_hosts`. + +--- + +## Data persistence + +- OpenWebUI data is stored in `./data` (in the `openwebui/` folder). +- Caddy state is stored in `./caddy/caddy_data` and `./caddy/caddy_config`. + +Back up those directories if you want to preserve state. + +--- diff --git a/PrivateAI/private-ai-readme.md b/PrivateAI/private-ai-readme.md new file mode 100644 index 0000000..58a1c54 --- /dev/null +++ b/PrivateAI/private-ai-readme.md @@ -0,0 +1,1103 @@ +# PrivateAI Stack Architecture and Capabilities + +This document provides a top-level architecture overview of the PrivateAI stack. + +The stack combines a local/private AI inference layer, a browser-based user interface, an MCP tool integration layer, repository security scanning, knowledge/memory services, and reverse proxy access. + +It is designed as a modular private AI platform that can run local models, expose an OpenAI-compatible API, integrate external tools through MCP, scan repositories, and provide a secure web interface for users. + +## Documented Components + +The current stack documentation covers: + +1. **Semgrep MCP Service** +2. **LocalAI with PostgreSQL / LocalRecall** +3. **OpenWebUI with Caddy HTTPS reverse proxy** +4. **Repo Security Scan Job MCP** +5. **MCP Hub with proxychains and integrations** + +Together, these form the current PrivateAI platform. + +## High-Level Architecture + +```text + Users / Browser + | + v + https://10.137.17.254/ + | + v + Caddy + HTTPS reverse proxy layer + | + v + OpenWebUI + Browser UI / chat interface + | + v + OpenAI-compatible API endpoint / gateway + http://10.137.17.254:9443/v1 + | + v + LocalAI + Local inference, models, agents, memory + | + +--------------+---------------+ + | | + v v + Local models PostgreSQL / + GPU-backed inference LocalRecall + knowledge base + + | + v + MCP Hub + Tool and integration orchestration + | + +------------+-------------+--------------+-------------+ + | | | | | + v v v v v + Playwright Fetch Git tools Wazuh Databases + Browser HTTP tools Repo tools Security PostgreSQL / + automation platform Supabase + | + v + Repo Security Scan MCP + | + v + Semgrep scans + | + v + /opt/redback/repos +``` + +## Core Design Goals + +The stack is intended to provide: + +- A private AI interface for users +- Local model inference using GPU-backed LocalAI +- OpenAI-compatible API access for tools and frontends +- Persistent model, backend, config, image, and data storage +- PostgreSQL-backed agent memory and knowledge base capability +- MCP tool access for browsing, fetching, databases, security tools, and code repositories +- Repository security scanning using Semgrep +- HTTPS access via Caddy +- Corporate proxy compatibility +- Internal service routing without proxy interference +- Docker-based operational simplicity + +## Main User Entry Point + +The main user-facing entry point is OpenWebUI behind Caddy. + +Users access: + +```text +https://10.137.17.254/ +``` + +Caddy handles HTTPS and forwards traffic to OpenWebUI internally: + +```text +openwebui:8080 +``` + +OpenWebUI provides the browser chat interface, authentication, OAuth login, and connection to the OpenAI-compatible backend. + +## Reverse Proxy Layer + +### Component + +```text +Caddy +``` + +### Purpose + +Caddy provides: + +- HTTPS termination +- Reverse proxying to OpenWebUI +- Certificate handling +- Public-facing access on ports `443` and optionally `3000` + +### Published Ports + +```text +80 -> HTTP +443 -> HTTPS +3000 -> alternate HTTPS mapping +``` + +### Why It Matters + +OpenWebUI is not exposed directly to the host. It is only exposed inside Docker using: + +```yaml +expose: + - "8080" +``` + +This keeps the UI behind the reverse proxy and allows secure cookie/session behaviour to work properly. + +## Web UI Layer + +### Component + +```text +OpenWebUI +``` + +### Purpose + +OpenWebUI provides the user-facing AI chat interface. + +It is configured with: + +- Authentication enabled +- Microsoft Entra / Azure AD OAuth +- Login form support +- Community sharing disabled +- Safe mode enabled +- HTTPS-aware cookie/session settings +- OpenAI-compatible backend API integration + +### Backend API + +OpenWebUI is configured to use: + +```text +http://10.137.17.254:9443/v1 +``` + +This means OpenWebUI talks to an OpenAI-compatible API endpoint rather than directly embedding model logic. + +Depending on routing, that endpoint may point to LocalAI directly or to an API gateway such as LiteLLM or another routing layer. + +## Inference Layer + +### Component + +```text +LocalAI +``` + +### Purpose + +LocalAI provides the local model inference backend. + +It is configured using the NVIDIA CUDA 12 GPU image: + +```text +localai/localai:latest-gpu-nvidia-cuda-12 +``` + +LocalAI exposes an OpenAI-compatible API and supports local models, backends, image outputs, agent features, memory integration, and skills. + +### LocalAI API Port + +LocalAI is mapped as: + +```text +host port 4000 -> container port 8080 +``` + +Direct LocalAI endpoint: + +```text +http://localhost:4000 +``` + +Container-internal endpoint: + +```text +http://localai:8080 +``` + +### GPU Capability + +LocalAI is configured for NVIDIA GPUs: + +```yaml +runtime: nvidia +gpus: all +NVIDIA_VISIBLE_DEVICES=all +NVIDIA_DRIVER_CAPABILITIES=compute,utility +``` + +This allows GPU-backed model inference. + +### Model Storage + +Models are persisted at: + +```text +/opt/redback/privateai/volumes/models +``` + +Mounted inside LocalAI as: + +```text +/models +``` + +### Backend Storage + +LocalAI backends are stored at: + +```text +/opt/redback/privateai/volumes/backends +``` + +Mounted inside LocalAI as: + +```text +/usr/share/localai/backends +``` + +### Image Output Storage + +Generated images are stored at: + +```text +/opt/redback/privateai/volumes/images +``` + +Mounted inside LocalAI as: + +```text +/tmp/generated/images +``` + +## Knowledge and Memory Layer + +### Components + +```text +PostgreSQL +LocalRecall +LocalAI Agent Pool +``` + +### Purpose + +The knowledge/memory layer provides persistent storage for LocalAI agent memory and knowledge base workflows. + +LocalAI is configured to use PostgreSQL as the vector engine: + +```env +LOCALAI_AGENT_POOL_VECTOR_ENGINE=postgres +``` + +The database connection is: + +```text +postgresql://localrecall:localrecall@postgres:5432/localrecall?sslmode=disable +``` + +### PostgreSQL Service + +The stack uses: + +```text +quay.io/mudler/localrecall:v0.5.2-postgresql +``` + +Database: + +```text +localrecall +``` + +User: + +```text +localrecall +``` + +### Agent Pool Defaults + +Default agent model: + +```text +gemma-4-e4b-it +``` + +Embedding model: + +```text +granite-embedding-107m-multilingual +``` + +### Capability + +This gives the stack the foundation for: + +- Knowledge bases +- Agent memory +- Embedding-backed retrieval +- Local RAG-style workflows +- Skills-enabled agent behaviour +- Persistent logs for agent operations + +## MCP Integration Layer + +### Component + +```text +MCP Hub +``` + +### Purpose + +MCP Hub acts as the tool orchestration layer. + +It connects AI clients and agents to external tools through the Model Context Protocol. + +It is exposed on: + +```text +host port 3003 -> container port 3000 +``` + +Access: + +```text +http://localhost:3003 +``` + +### Current MCP Integrations + +The hub is currently configured with: + +```text +amap +playwright +fetch +sequential-thinking +time +mindmap +playwright-mcp +fetch-mcp +time-mcp +mongodb +git-mcp-server +repo-security-scan +arxiv-mcp +wazuh +postgresql +supabase-postgres +``` + +### Capability Categories + +#### Browser Automation + +```text +playwright +playwright-mcp +``` + +Provides browser-driven workflows, page inspection, and automation. + +#### HTTP Fetching + +```text +fetch +fetch-mcp +``` + +Provides tool-based HTTP retrieval. + +#### Reasoning Support + +```text +sequential-thinking +mindmap +``` + +Provides structured reasoning and planning style tools. + +#### Time Tools + +```text +time +time-mcp +``` + +Provides time/date utilities. + +#### Repository and Git Tools + +```text +git-mcp-server +repo-security-scan +``` + +Provides repository interaction and security scanning. + +#### Security Platform Integration + +```text +wazuh +``` + +Connects the AI tool layer to Wazuh security data. + +#### Database Integrations + +```text +mongodb +postgresql +supabase-postgres +``` + +Allows MCP-enabled access to database systems. + +#### Research Integration + +```text +arxiv-mcp +``` + +Provides arXiv research search capability. + +## Repository Security Scanning + +Repository scanning is implemented in two related ways. + +## Semgrep MCP Service + +### Component + +```text +semgrep-mcp +``` + +### Purpose + +Runs Semgrep as a streamable HTTP MCP server. + +It exposes Semgrep functionality over MCP and mounts repositories read-only: + +```text +/opt/redback/repos:/repos:ro +``` + +It listens on: + +```text +4004 +``` + +This service is useful when an MCP client wants direct Semgrep MCP access over HTTP. + +## Repo Security Scan Job MCP + +### Component + +```text +repo_scan_job_mcp.js +``` + +### Purpose + +This is a custom job-oriented MCP wrapper around Semgrep. + +Instead of blocking while a scan runs, it starts a background Docker job and returns a `job_id`. + +The client can then: + +1. List repositories +2. Start a scan +3. Check scan status +4. Fetch completed results +5. List recent jobs + +### Main Tools + +```text +repo_list +repo_security_scan_start +repo_security_scan_status +repo_security_scan_result +repo_security_scan_list_jobs +``` + +### Why This Exists + +Large Semgrep scans can take time. + +The job wrapper allows the assistant or MCP client to start a scan and come back for results later, without blocking the MCP tool call. + +### Repository Root + +Host path: + +```text +/opt/redback/repos +``` + +Container path: + +```text +/repos +``` + +### Scan Profiles + +Supported profiles: + +```text +security +secrets +full +``` + +Profile mapping: + +```text +security -> p/security-audit +secrets -> p/secrets +full -> p/security-audit + p/secrets + p/owasp-top-ten +``` + +### Docker Requirement + +The job wrapper launches Semgrep using Docker: + +```bash +docker run --rm ... +``` + +Therefore MCP Hub is configured with: + +```yaml +- /var/run/docker.sock:/var/run/docker.sock +``` + +and the custom MCP Hub image includes the Docker CLI. + +## Proxy and Network Design + +The stack is designed to work in an environment that requires a corporate proxy. + +Proxy host: + +```text +proxy1.it.deakin.edu.au +``` + +Proxy IP: + +```text +10.137.0.162 +``` + +Proxy port: + +```text +3128 +``` + +### HTTP Proxy Variables + +Common proxy variables: + +```env +HTTP_PROXY=http://proxy1.it.deakin.edu.au:3128 +HTTPS_PROXY=http://proxy1.it.deakin.edu.au:3128 +``` + +### NO_PROXY + +Internal traffic is excluded using `NO_PROXY`. + +Typical entries include: + +```text +localhost +127.0.0.1 +::1 +localai +postgres +mcphub +openwebui +semgrep-mcp +mcp-hub-mcphub-1 +proxy1.it.deakin.edu.au +10.137.0.162 +10.137.17.254 +api.mcprouter.to +``` + +### Why NO_PROXY Matters + +Without correct `NO_PROXY`, internal Docker and internal network traffic may be sent through the external proxy. + +That can cause: + +- `fetch failed` +- connection refused +- bad port errors +- proxy recursion +- internal services becoming unreachable +- MCP servers failing to connect +- OAuth/backend API calls behaving unexpectedly + +## Proxychains in MCP Hub + +MCP Hub uses a custom entrypoint that runs the hub under `proxychains4`. + +This helps with tools or dependencies that do not respect normal proxy environment variables. + +### Proxychains Local Network Exclusions + +The configuration excludes: + +```text +10.137.0.162/32 +127.0.0.0/8 +10.0.0.0/8 +172.16.0.0/12 +192.168.0.0/16 +``` + +This prevents internal traffic and the proxy itself from being proxied. + +### Why This Matters + +Some MCP tools make outbound network calls through Node.js, Python, browser tooling, or subprocesses. + +Proxychains gives a broad fallback mechanism for forcing outbound traffic through the proxy when native proxy handling is unreliable. + +## Data and Storage Layout + +The stack uses a host-based storage layout under `/opt/redback`. + +### AI Stack Data + +```text +/opt/redback/privateai/volumes/ +├── models/ +├── images/ +├── backends/ +├── localai_data/ +└── localai_config/ +``` + +### Repository Data + +```text +/opt/redback/repos/ +``` + +This directory is used by: + +- Semgrep MCP +- Repo security scan MCP +- Git MCP tooling +- Code analysis workflows + +### MCP Hub Files + +A typical MCP Hub working directory contains: + +```text +mcp-hub/ +├── docker-compose.yaml +├── Dockerfile +├── .env +├── mcp_settings.json +├── entrypoint-proxy.sh +├── proxychains.conf +├── nodefetch.sh +└── repo_scan_job_mcp.js +``` + +### OpenWebUI Stack Files + +A typical OpenWebUI/Caddy directory contains: + +```text +openwebui-stack/ +├── docker-compose.yml +├── .env +├── data/ +└── caddy/ + ├── Caddyfile + ├── caddy_data/ + ├── caddy_config/ + └── certs/ +``` + +## Authentication and Access Control + +### OpenWebUI + +OpenWebUI authentication is enabled: + +```env +WEBUI_AUTH=true +``` + +Microsoft Entra OAuth is enabled: + +```env +ENABLE_OIDC=true +OAUTH_MICROSOFT_ENABLED=true +ENABLE_OAUTH_SIGNUP=true +``` + +This allows users to authenticate with Microsoft Entra / Azure AD. + +### Caddy + +Caddy provides HTTPS access and certificate handling. + +### MCP Hub + +MCP Hub is powerful and should be treated as sensitive. + +It has access to: + +- MCP tools +- Internal services +- Repositories +- Docker socket +- Databases +- Security systems +- Browser automation +- Outbound network access + +MCP Hub should only be exposed to trusted clients or placed behind authentication and network controls. + +## Capability Overview + +The current PrivateAI stack can provide the following capabilities. + +## 1. Private Chat Interface + +Users can access OpenWebUI through a browser and chat with local or routed models. + +Capability: + +```text +User -> OpenWebUI -> OpenAI-compatible API -> LocalAI / gateway +``` + +## 2. Local GPU Inference + +LocalAI can run local models using NVIDIA GPUs. + +Capability: + +```text +Prompt -> LocalAI -> GPU-backed model -> Response +``` + +## 3. OpenAI-Compatible API + +LocalAI exposes an OpenAI-compatible interface. + +This allows tools like OpenWebUI, LiteLLM, agents, or custom applications to call local models using familiar API patterns. + +## 4. Knowledge Base and Memory + +LocalAI is configured with PostgreSQL-backed vector/memory support. + +Capability: + +```text +Documents / memory -> embeddings -> PostgreSQL / LocalRecall -> retrieval -> model context +``` + +## 5. MCP Tool Use + +MCP Hub exposes external tools to AI clients. + +Capability examples: + +- Fetch webpages +- Use browser automation +- Query databases +- Search arXiv +- Interact with Git repositories +- Check time/date +- Use structured reasoning tools +- Query security systems +- Run repository security scans + +## 6. Repository Security Scanning + +Semgrep can scan repositories under: + +```text +/opt/redback/repos +``` + +Capabilities: + +- Security audit scans +- Secret scans +- OWASP Top Ten scans +- Background scan jobs +- Scan status polling +- JSON result retrieval +- Severity summaries + +## 7. Security Operations Integration + +The Wazuh MCP integration gives the stack a pathway into security monitoring data. + +Potential capabilities: + +- Query alerts +- Investigate endpoints +- Summarise security events +- Connect findings to repository or infrastructure context + +## 8. Database-Aware Assistance + +MCP integrations include: + +```text +mongodb +postgresql +supabase-postgres +``` + +Potential capabilities: + +- Query operational data +- Inspect schemas +- Summarise records +- Support application debugging +- Assist with reporting and analysis + +## 9. Browser and Web Automation + +Playwright tools provide browser automation. + +Potential capabilities: + +- Page testing +- UI validation +- Screenshot-style inspection +- Web workflow automation +- Login/session testing where appropriately configured + +## 10. Research Support + +The arXiv MCP integration provides research discovery capability. + +Potential capabilities: + +- Search papers +- Summarise technical topics +- Support research workflows +- Combine local reasoning with external paper discovery + +## Deployment Boundaries + +The stack has several major trust boundaries. + +## User Boundary + +```text +Users -> Caddy -> OpenWebUI +``` + +Users should interact through HTTPS and authenticated OpenWebUI sessions. + +## API Boundary + +```text +OpenWebUI -> OpenAI-compatible backend +``` + +OpenWebUI sends prompts and receives model responses through the configured backend API. + +## Tool Boundary + +```text +AI client / hub -> MCP tools +``` + +MCP tools can access sensitive systems. This boundary requires careful trust and configuration. + +## Host Boundary + +```text +MCP Hub -> Docker socket -> host Docker daemon +``` + +Docker socket access is effectively privileged host access. + +This is the highest-risk boundary in the current architecture. + +## Data Boundary + +```text +Repositories, models, generated files, databases, configs +``` + +Persistent data is stored on the host and mounted into containers. + +Permissions, backups, and separation of writable/read-only paths matter. + +## Security Considerations + +Important security considerations: + +- Do not expose MCP Hub to untrusted networks. +- Treat Docker socket access as privileged. +- Keep repository mounts read-only where possible. +- Use separate writable job storage for scan outputs. +- Store secrets in `.env`, secret stores, or Docker secrets. +- Do not commit `.env` files. +- Avoid using simple default database passwords in production. +- Keep OAuth redirect URIs exact. +- Use HTTPS for OpenWebUI. +- Limit MCP tools where possible instead of exposing `"tools": "all"` everywhere. +- Pin container image versions for reproducibility. +- Monitor logs for proxy and authentication failures. +- Consider splitting risky tools into separate MCP Hub instances. + +## Operational Validation + +## Validate OpenWebUI + +```bash +curl -k https://10.137.17.254/ +``` + +## Validate LocalAI + +```bash +curl http://localhost:4000/v1/models +``` + +## Validate LocalAI GPU Access + +```bash +docker exec -it local-ai nvidia-smi +``` + +## Validate PostgreSQL + +```bash +docker compose logs postgres +``` + +or: + +```bash +docker exec -it pg_isready -U localrecall +``` + +## Validate MCP Hub + +```bash +curl http://localhost:3003 +``` + +Check logs: + +```bash +docker compose logs -f mcphub +``` + +## Validate MCP Hub Docker Access + +```bash +docker exec -it mcp-hub-mcphub-1 docker ps +``` + +## Validate Repository Mount + +```bash +docker exec -it mcp-hub-mcphub-1 ls -la /repos +``` + +## Validate Internal HTTP Fetch + +```bash +./nodefetch.sh +``` + +## Validate Semgrep Scan Image + +```bash +docker exec -it mcp-hub-mcphub-1 sh -lc \ +'docker run --rm -v /opt/redback/repos:/repos:ro semgrep/semgrep:1.159.0 semgrep --version' +``` + +## Current Strengths + +The current architecture has several strengths: + +- Modular Docker-based design +- Local model support +- GPU-backed inference +- OpenAI-compatible API pattern +- Browser UI with OAuth support +- Caddy-based HTTPS +- MCP tool integration layer +- Repository scanning and security tooling +- Proxy-aware networking +- Persistent host-mounted storage +- Extensible integration approach + +## Current Risks and Gaps + +Areas that may need future hardening: + +- MCP Hub has broad tool access +- Docker socket mount is high risk +- Some images use `latest` +- Some credentials are simple defaults +- Repo scan jobs may write logs under repository mount unless separated +- Tool exposure currently uses `"tools": "all"` for many integrations +- Runtime proxy variables should be checked carefully +- OAuth without email is a workaround and may affect account stability +- More explicit network segmentation may be useful +- Centralised backup/restore documentation is still needed + +## Recommended Next Improvements + +Recommended next steps: + +1. Pin all container image versions. +2. Split high-risk MCP tools into separate MCP Hub instances. +3. Put MCP Hub behind authentication and TLS. +4. Use a Docker socket proxy instead of mounting the Docker socket directly. +5. Move repo scan job logs to a dedicated `/jobs` volume. +6. Make repository mounts read-only wherever possible. +7. Replace default PostgreSQL credentials. +8. Add healthchecks for key services. +9. Add backup and restore procedures. +10. Add a network diagram with actual hostnames/IPs. +11. Add a model inventory document. +12. Add a runbook for proxy troubleshooting. +13. Add a runbook for OAuth troubleshooting. +14. Add a security model for MCP tool exposure. +15. Add a standard onboarding guide for new tools. + +## Architecture Summary + +The PrivateAI stack is a locally controlled AI platform made up of: + +- **Caddy** for HTTPS ingress +- **OpenWebUI** for the user interface +- **LocalAI** for local GPU-backed inference +- **PostgreSQL / LocalRecall** for knowledge base and memory support +- **MCP Hub** for tool orchestration +- **Semgrep and repo scan MCP** for repository security scanning +- **Proxychains and proxy configuration** for reliable operation behind a corporate proxy +- **Host-mounted storage** for models, repositories, configs, generated outputs, and persistent data + +The platform is capable of private chat, local inference, tool use, browser automation, repository analysis, security scanning, database interaction, research assistance, and knowledge-backed workflows. + +The overall design is flexible and powerful, but MCP Hub and Docker socket access should be treated as sensitive infrastructure and hardened before broader production use. diff --git a/PrivateAI/semgrep/docker-compose.yaml b/PrivateAI/semgrep/docker-compose.yaml new file mode 100644 index 0000000..162989b --- /dev/null +++ b/PrivateAI/semgrep/docker-compose.yaml @@ -0,0 +1,16 @@ +services: + semgrep-mcp: + image: returntocorp/semgrep:latest + command: ["semgrep", "mcp", "-t", "streamable-http", "-p", "4004"] + environment: + - FASTMCP_HOST=0.0.0.0 + - SEMGREP_ENABLE_VERSION_CHECK=0 + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=localhost,127.0.0.1,::1,proxy1.it.deakin.edu.au,10.137.0.162,api.mcprouter.to + volumes: + - /opt/redback/repos:/repos:ro + working_dir: /repos + ports: + - "4004:4004" + restart: unless-stopped diff --git a/PrivateAI/semgrep/readme.md b/PrivateAI/semgrep/readme.md new file mode 100644 index 0000000..1b11da1 --- /dev/null +++ b/PrivateAI/semgrep/readme.md @@ -0,0 +1,426 @@ +# Semgrep MCP Service + +This service runs Semgrep as a Model Context Protocol (MCP) server using the `streamable-http` transport. It is intended to expose Semgrep scanning capability to an MCP hub or MCP-compatible client while mounting local repositories read-only. + +## Service Overview + +```yaml +semgrep-mcp: + image: returntocorp/semgrep:latest + command: ["semgrep", "mcp", "-t", "streamable-http", "-p", "4004"] + environment: + - FASTMCP_HOST=0.0.0.0 + - SEMGREP_ENABLE_VERSION_CHECK=0 + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=localhost,127.0.0.1,::1,proxy1.it.deakin.edu.au,10.137.0.162,api.mcprouter.to + volumes: + - /opt/redback/repos:/repos:ro + working_dir: /repos + ports: + - "4004:4004" + restart: unless-stopped +``` + +## What This Service Does + +The `semgrep-mcp` container starts Semgrep in MCP server mode and exposes it over HTTP on port `4004`. + +It can be used by an MCP hub, AI assistant, or other MCP-aware client to run Semgrep-based code analysis against repositories mounted into the container. + +The mounted repository path is: + +```text +/opt/redback/repos +``` + +Inside the container, this is available as: + +```text +/repos +``` + +The volume is mounted read-only, so Semgrep can inspect code but cannot modify repository files. + +## Key Settings + +### Image + +```yaml +image: returntocorp/semgrep:latest +``` + +Uses the official Semgrep container image. + +### Command + +```yaml +command: ["semgrep", "mcp", "-t", "streamable-http", "-p", "4004"] +``` + +Starts Semgrep as an MCP server using: + +- `mcp` — run Semgrep in MCP server mode +- `-t streamable-http` — use streamable HTTP transport +- `-p 4004` — listen on port `4004` + +### Host Binding + +```yaml +FASTMCP_HOST=0.0.0.0 +``` + +This makes the MCP server listen on all container interfaces. + +This is important because some MCP servers default to `127.0.0.1`, which would make them reachable only inside the container itself. + +### Version Check Disabled + +```yaml +SEMGREP_ENABLE_VERSION_CHECK=0 +``` + +Disables Semgrep version checking. + +This is useful for repeatable container startup and avoids unnecessary outbound checks during service launch. + +### Proxy Configuration + +```yaml +HTTP_PROXY=${HTTP_PROXY} +HTTPS_PROXY=${HTTPS_PROXY} +NO_PROXY=localhost,127.0.0.1,::1,proxy1.it.deakin.edu.au,10.137.0.162,api.mcprouter.to +``` + +The service supports outbound network access via the host proxy environment. + +The `NO_PROXY` list excludes local services and known internal hosts from being routed through the proxy. + +This is especially important when the Semgrep MCP server is being accessed by nearby containers or internal MCP routing services. + +### Repository Mount + +```yaml +volumes: + - /opt/redback/repos:/repos:ro +``` + +Mounts local repositories into the container at `/repos`. + +The `:ro` suffix makes the mount read-only. + +This is recommended for code scanning services because Semgrep only needs to inspect files, not change them. + +### Working Directory + +```yaml +working_dir: /repos +``` + +Sets `/repos` as the default working directory inside the container. + +This allows Semgrep to operate relative to the mounted repository directory. + +### Port Mapping + +```yaml +ports: + - "4004:4004" +``` + +Maps container port `4004` to host port `4004`. + +The service should be reachable from the host at: + +```text +http://localhost:4004 +``` + +Or from another machine/container using the host IP: + +```text +http://:4004 +``` + +### Restart Policy + +```yaml +restart: unless-stopped +``` + +Docker will restart the service automatically unless it has been manually stopped. + +## Example MCP Hub Configuration + +An MCP hub entry may look similar to this: + +```json +{ + "semgrep": { + "type": "http", + "url": "http://semgrep-mcp:4004/mcp" + } +} +``` + +If the MCP hub is not on the same Docker network, use the host IP instead: + +```json +{ + "semgrep": { + "type": "http", + "url": "http://10.137.0.162:4004/mcp" + } +} +``` + +## Docker Network Notes + +If another container needs to reach this service by name, both containers must be on the same Docker network. + +For example, if your MCP hub is running on a network called `mcp-hub_default`, attach this service to that network: + +```yaml +networks: + default: + external: true + name: mcp-hub_default +``` + +Or define the service like this in a compose file that joins the existing network: + +```yaml +services: + semgrep-mcp: + image: returntocorp/semgrep:latest + command: ["semgrep", "mcp", "-t", "streamable-http", "-p", "4004"] + environment: + - FASTMCP_HOST=0.0.0.0 + - SEMGREP_ENABLE_VERSION_CHECK=0 + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=localhost,127.0.0.1,::1,proxy1.it.deakin.edu.au,10.137.0.162,api.mcprouter.to + volumes: + - /opt/redback/repos:/repos:ro + working_dir: /repos + ports: + - "4004:4004" + restart: unless-stopped + networks: + - mcp-hub_default + +networks: + mcp-hub_default: + external: true +``` + +## Testing the Service + +Start the service: + +```bash +docker compose up -d semgrep-mcp +``` + +Check logs: + +```bash +docker compose logs -f semgrep-mcp +``` + +Check that the container is running: + +```bash +docker ps | grep semgrep-mcp +``` + +Test from the host: + +```bash +curl -v http://localhost:4004/mcp +``` + +Test from another container on the same Docker network: + +```bash +docker exec -it sh +curl -v http://semgrep-mcp:4004/mcp +``` + +If DNS resolution fails, check that both containers are on the same Docker network: + +```bash +docker network inspect mcp-hub_default +``` + +## Troubleshooting + +### MCP hub cannot connect + +Check that Semgrep is listening on all interfaces: + +```yaml +FASTMCP_HOST=0.0.0.0 +``` + +Without this, the service may only listen on `127.0.0.1` inside the container. + +### Connection refused + +Confirm the container is running: + +```bash +docker compose ps +``` + +Check logs: + +```bash +docker compose logs semgrep-mcp +``` + +Check port binding: + +```bash +docker port semgrep-mcp +``` + +### Host can connect, but another container cannot + +Make sure both containers are on the same Docker network. + +Check networks: + +```bash +docker inspect semgrep-mcp | grep -A20 Networks +docker inspect | grep -A20 Networks +``` + +### Proxy issues + +If the MCP hub or Semgrep service is trying to reach local services through the proxy, expand the `NO_PROXY` list. + +Useful entries usually include: + +```text +localhost,127.0.0.1,::1 +``` + +Docker service names may also need to be added, for example: + +```text +semgrep-mcp,mcp-hub +``` + +Example: + +```yaml +NO_PROXY=localhost,127.0.0.1,::1,semgrep-mcp,mcp-hub,proxy1.it.deakin.edu.au,10.137.0.162,api.mcprouter.to +``` + +### Repository path is empty + +Check that the host path exists: + +```bash +ls -la /opt/redback/repos +``` + +Check that files are visible inside the container: + +```bash +docker exec -it semgrep-mcp sh +ls -la /repos +``` + +## Security Notes + +The repository mount is read-only: + +```yaml +/opt/redback/repos:/repos:ro +``` + +This reduces the risk of accidental file modification by the container. + +If exposing this service beyond the local Docker network, place it behind appropriate authentication, firewalling, or reverse proxy controls. + +Avoid exposing the MCP service directly to untrusted networks. + +## Recommended Directory Layout + +```text +/opt/redback/ +└── repos/ + ├── repo-one/ + ├── repo-two/ + └── repo-three/ +``` + +The Semgrep MCP server will see these as: + +```text +/repos/repo-one +/repos/repo-two +/repos/repo-three +``` + +## Minimal Compose File + +```yaml +services: + semgrep-mcp: + image: returntocorp/semgrep:latest + command: ["semgrep", "mcp", "-t", "streamable-http", "-p", "4004"] + environment: + - FASTMCP_HOST=0.0.0.0 + - SEMGREP_ENABLE_VERSION_CHECK=0 + - HTTP_PROXY=${HTTP_PROXY} + - HTTPS_PROXY=${HTTPS_PROXY} + - NO_PROXY=localhost,127.0.0.1,::1,proxy1.it.deakin.edu.au,10.137.0.162,api.mcprouter.to + volumes: + - /opt/redback/repos:/repos:ro + working_dir: /repos + ports: + - "4004:4004" + restart: unless-stopped +``` + +## Operational Notes + +Useful commands: + +```bash +docker compose pull semgrep-mcp +docker compose up -d semgrep-mcp +docker compose logs -f semgrep-mcp +docker compose restart semgrep-mcp +docker compose down +``` + +To update the image: + +```bash +docker compose pull semgrep-mcp +docker compose up -d semgrep-mcp +``` + +To confirm the image currently in use: + +```bash +docker inspect semgrep-mcp --format '{{.Config.Image}}' +``` + +## Summary + +This service provides a containerised Semgrep MCP endpoint for scanning mounted repositories. + +It is designed to: + +- Run as a long-lived Docker service +- Expose MCP over streamable HTTP +- Listen on port `4004` +- Use `/opt/redback/repos` as the host repository root +- Mount repositories read-only +- Support proxy-aware environments +- Integrate with an MCP hub or AI-assisted code analysis stack diff --git a/admintools/createuser b/admintools/createuser new file mode 100755 index 0000000..323f807 --- /dev/null +++ b/admintools/createuser @@ -0,0 +1,62 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Usage: +# sudo ./create_user.sh username +# +# Example: +# sudo ./create_user.sh richard + +if [[ "${EUID}" -ne 0 ]]; then + echo "Please run as root or with sudo." + exit 1 +fi + +if [[ $# -ne 1 ]]; then + echo "Usage: $0 " + exit 1 +fi + +USERNAME="$1" + +# Basic username validation +if [[ ! "$USERNAME" =~ ^[a-z_][a-z0-9_-]*$ ]]; then + echo "Invalid username: $USERNAME" + echo "Use lowercase letters, numbers, underscores, and hyphens only." + exit 1 +fi + +# Check if user already exists +if id "$USERNAME" >/dev/null 2>&1; then + echo "User '$USERNAME' already exists." + exit 1 +fi + +# Ensure docker group exists +if ! getent group docker >/dev/null 2>&1; then + groupadd docker +fi + +# Generate a random password +PASSWORD="$(openssl rand -base64 18 | tr -d '\n' | cut -c1-20)" + +# Create the user with home directory and bash shell +adduser --disabled-password --gecos "" --shell /bin/bash "$USERNAME" + +# Set password +echo "${USERNAME}:${PASSWORD}" | chpasswd + +# Add to docker group +usermod -aG docker "$USERNAME" + +# Optional: force password change at first login +chage -d 0 "$USERNAME" + +echo "----------------------------------------" +echo "User created successfully" +echo "Username : $USERNAME" +echo "Password : $PASSWORD" +echo "Home dir : /home/$USERNAME" +echo "Groups : $(id -nG "$USERNAME")" +echo "----------------------------------------" +echo "They will be asked to change their password at first login." diff --git a/admintools/migration/allapps.csv b/admintools/migration/allapps.csv new file mode 100644 index 0000000..fdded2b --- /dev/null +++ b/admintools/migration/allapps.csv @@ -0,0 +1,33 @@ +container,include_writable_layer,stop_container +airflow,false,false +bugbox_api,false,false +bugbox_compiler,false,false +bugbox_rdb,false,false +bugbox-streamlit,false,false +competent_shtern,false,false +epic_nobel,false,false +goofy_meitner,false,false +grafana,false,false +great_yonath,false,false +happy_swartz,true,false +kafka,false,false +kafka-ui,false,false +mongodb,false,false +nginx.modsecurity,false,false +nice_goodall,false,false +nifty_blackwell,false,false +peaceful_jepsen,false,false +postgres-todd,false,false +quizzical_blackburn,false,false +recursing_booth,false,false +serverpage,false,false +silly_chandrasekhar,false,false +single-node_wazuh.dashboard_1,false,false +single-node_wazuh.indexer_1,false,false +single-node_wazuh.manager_1,false,false +suspicious_kepler,false,false +test-container,false,false +zabbix-server,false,false +zabbix-server-todd,false,false +zabbix-web,false,false +zabbix-web-todd,false,false diff --git a/admintools/migration/containermig.sh b/admintools/migration/containermig.sh new file mode 100755 index 0000000..aa545c1 --- /dev/null +++ b/admintools/migration/containermig.sh @@ -0,0 +1,1974 @@ +#!/usr/bin/env bash +set -Euo pipefail + +SCRIPT_NAME="$(basename "$0")" +OUTPUT_DIR="${PWD}/docker-migration-output" +RUN_ID="$(date +%Y-%m-%dT%H-%M-%S)" + +DEST_HOST="" +DEST_USER="" +DEST_BASE="" +TRANSFER_METHOD="rsync" + +SYNC_DATA=0 +INCLUDE_WRITABLE=0 +STOP_CONTAINERS=0 +VERBOSE=0 +CSV_FILE="" + +SYNC_COMPOSE_FILES=0 +SYNC_BUILD_CONTEXT=0 +GENERATE_MIGRATED_COMPOSE=0 +FINAL_SYNC=0 +FINAL_SYNC_STOP=0 +ENABLE_LOG_CAP=1 + +SSH_KEY="" +SSH_CONTROL_PERSIST="10m" +REPLACE_FILE="" + +WRITABLE_EXCLUDE_REGEX='^/(dev|proc|sys|run|tmp|var/run|var/tmp|etc/hosts|/etc/hostname|/etc/resolv\.conf|/.dockerenv)($|/)' +CERT_FILE_REGEX='.*\.(crt|cer|pem|key|p12|pfx|jks|keystore|csr|ca-bundle|der)$' +CERT_PATH_HINT_REGEX='(cert|certs|certificate|certificates|ssl|tls|pki|letsencrypt|truststore|keystore)' +CERT_ENV_HINT_REGEX='(CERT|CERTIFICATE|TLS|SSL|KEY|KEYSTORE|TRUSTSTORE|CA_BUNDLE|CA_CERT|CLIENT_CERT|CLIENT_KEY)' + +BUILD_CONTEXT_EXCLUDES=( + ".git" + ".svn" + ".hg" + "node_modules" + "__pycache__" + ".venv" + "venv" + ".mypy_cache" + ".pytest_cache" + ".cache" + "dist" + "build" + ".idea" + ".vscode" +) + +TEXT_FILE_EXTENSIONS_REGEX='(\.ya?ml|\.json|\.conf|\.cfg|\.ini|\.env|\.properties|\.txt|\.xml|\.sh|\.py|\.js|\.ts|\.md|Dockerfile)$' + +declare -A CSV_CONTAINER_INCLUDE_WRITABLE=() +declare -A CSV_CONTAINER_STOP=() +declare -A CSV_SELECTED_CONTAINERS=() +declare -a REPLACEMENTS=() + +usage() { + cat <// + docker-compose.orig.yml + docker-compose.yml + .env + working_dir/ + migration/ + report.txt + certificates-report.txt + docker-compose.migration.override.yml + final-sync-report.txt + data/ + / + volumes/ + binds/ + writable/ + +EOF +} + +log() { echo "[INFO] $*"; } +warn() { echo "[WARN] $*" >&2; } +err() { echo "[ERROR] $*" >&2; } +vlog() { [[ "$VERBOSE" -eq 1 ]] && echo "[DEBUG] $*"; } + +require_cmd() { + command -v "$1" >/dev/null 2>&1 || { + err "Required command not found: $1" + exit 1 + } +} + +trim() { + local s="$1" + s="${s#"${s%%[![:space:]]*}"}" + s="${s%"${s##*[![:space:]]}"}" + echo "$s" +} + +lower() { + echo "$1" | tr '[:upper:]' '[:lower:]' +} + +bool_from_string() { + local v + v="$(lower "$(trim "${1:-}")")" + case "$v" in + 1|true|yes|y|on) echo "1" ;; + 0|false|no|n|off|"") echo "0" ;; + *) warn "Unrecognised boolean value '$1', treating as false"; echo "0" ;; + esac +} + +safe_name() { + local s="$1" + s="${s#/}" + s="${s//\//_}" + s="${s//:/_}" + s="${s// /_}" + s="${s//[^A-Za-z0-9._-]/_}" + echo "$s" +} + +ssh_target() { + if [[ -n "$DEST_USER" ]]; then + echo "${DEST_USER}@${DEST_HOST}" + else + echo "${DEST_HOST}" + fi +} + +build_ssh_opts_array() { + local -n _out="$1" + _out=( + -o ControlMaster=auto + -o "ControlPersist=${SSH_CONTROL_PERSIST}" + -o "ControlPath=${HOME}/.ssh/cm-%r@%h:%p" + ) + if [[ -n "$SSH_KEY" ]]; then + _out+=( + -i "$SSH_KEY" + -o IdentitiesOnly=yes + ) + fi +} + +ssh_cmd() { + local target="$1" + shift + local opts=() + build_ssh_opts_array opts + ssh -n "${opts[@]}" "$target" "$@" +} + +scp_cmd() { + local src="$1" + local dst="$2" + local opts=() + build_ssh_opts_array opts + scp "${opts[@]}" "$src" "$dst" < /dev/null +} + +rsync_rsh() { + local opts=() + build_ssh_opts_array opts + local cmd="ssh" + local o + for o in "${opts[@]}"; do + cmd+=" $(printf '%q' "$o")" + done + echo "$cmd" +} + +container_is_running() { + local c="$1" + docker inspect -f '{{.State.Running}}' "$c" 2>/dev/null | grep -qi '^true$' +} + +container_name() { + local c="$1" + docker inspect -f '{{.Name}}' "$c" | sed 's#^/##' +} + +container_image() { + local c="$1" + docker inspect -f '{{.Config.Image}}' "$c" +} + +container_restart_policy() { + local c="$1" + docker inspect -f '{{.HostConfig.RestartPolicy.Name}}' "$c" +} + +container_network_mode() { + local c="$1" + docker inspect -f '{{.HostConfig.NetworkMode}}' "$c" +} + +container_ports_json() { + local c="$1" + docker inspect "$c" | jq '.[0].HostConfig.PortBindings // {}' +} + +container_env_json() { + local c="$1" + docker inspect "$c" | jq '.[0].Config.Env // []' +} + +container_mounts_json() { + local c="$1" + docker inspect "$c" | jq '.[0].Mounts // []' +} + +container_compose_project() { + local c="$1" + docker inspect "$c" | jq -r '.[0].Config.Labels["com.docker.compose.project"] // empty' +} + +container_compose_service() { + local c="$1" + docker inspect "$c" | jq -r '.[0].Config.Labels["com.docker.compose.service"] // empty' +} + +container_compose_workdir() { + local c="$1" + docker inspect "$c" | jq -r '.[0].Config.Labels["com.docker.compose.project.working_dir"] // empty' +} + +container_compose_files() { + local c="$1" + docker inspect "$c" | jq -r '.[0].Config.Labels["com.docker.compose.project.config_files"] // empty' +} + +resolve_container_identifier() { + local ident="$1" + + if docker inspect "$ident" >/dev/null 2>&1; then + echo "$ident" + return 0 + fi + + local found="" + found="$(docker ps -a --format '{{.ID}} {{.Names}}' | awk -v q="$ident" '$2 == q {print $1; exit}')" + [[ -n "$found" ]] && { echo "$found"; return 0; } + + return 1 +} + +project_root_path() { + local project="$1" + echo "${DEST_BASE}/${project}" +} + +project_data_root() { + local project="$1" + echo "${DEST_BASE}/${project}/data" +} + +project_migration_root() { + local project="$1" + echo "${DEST_BASE}/${project}/migration" +} + +container_output_root() { + local c="$1" + local cname + cname="$(container_name "$c")" + echo "${OUTPUT_DIR}/containers/${cname}/${RUN_ID}" +} + +remote_mkdir() { + local path="$1" + if [[ -n "$DEST_HOST" ]]; then + ssh_cmd "$(ssh_target)" "mkdir -p '$path'" + else + mkdir -p "$path" + fi +} + +remote_test_exists() { + local path="$1" + if [[ -n "$DEST_HOST" ]]; then + ssh_cmd "$(ssh_target)" "test -e '$path'" + else + test -e "$path" + fi +} + +remote_du_bytes() { + local path="$1" + if [[ -n "$DEST_HOST" ]]; then + ssh_cmd "$(ssh_target)" "du -sb '$path' 2>/dev/null | awk '{print \$1}'" + else + du -sb "$path" 2>/dev/null | awk '{print $1}' + fi +} + +copy_dir() { + local src="$1" + local dst="$2" + + [[ "$SYNC_DATA" -eq 1 ]] || { vlog "Skipping copy (no --sync-data): $src -> $dst"; return 0; } + [[ -e "$src" ]] || { warn "Source does not exist, skipping copy: $src"; return 0; } + + if [[ -n "$DEST_HOST" ]]; then + remote_mkdir "$dst" + log "Copying directory: $src -> $(ssh_target):$dst" + case "$TRANSFER_METHOD" in + rsync) + rsync -aHAX --numeric-ids --info=progress2 -e "$(rsync_rsh)" "$src"/ "$(ssh_target):$dst"/ < /dev/null + ;; + scp) + tar -C "$src" -cf - . | ssh_cmd "$(ssh_target)" "tar -C '$dst' -xf -" + ;; + *) + err "Unsupported transfer method: $TRANSFER_METHOD" + return 1 + ;; + esac + else + mkdir -p "$dst" + log "Copying directory locally: $src -> $dst" + case "$TRANSFER_METHOD" in + rsync) rsync -aHAX --numeric-ids "$src"/ "$dst"/ ;; + scp) cp -a "$src"/. "$dst"/ ;; + *) err "Unsupported transfer method: $TRANSFER_METHOD"; return 1 ;; + esac + fi +} + +copy_file() { + local src="$1" + local dst="$2" + + [[ "$SYNC_DATA" -eq 1 ]] || { vlog "Skipping file copy (no --sync-data): $src -> $dst"; return 0; } + [[ -e "$src" ]] || { warn "Source file does not exist, skipping copy: $src"; return 0; } + + if [[ -n "$DEST_HOST" ]]; then + remote_mkdir "$(dirname "$dst")" + log "Copying file: $src -> $(ssh_target):$dst" + case "$TRANSFER_METHOD" in + rsync) + rsync -aHAX --numeric-ids -e "$(rsync_rsh)" "$src" "$(ssh_target):$dst" < /dev/null + ;; + scp) + scp_cmd "$src" "$(ssh_target):$dst" + ;; + *) + err "Unsupported transfer method: $TRANSFER_METHOD" + return 1 + ;; + esac + else + mkdir -p "$(dirname "$dst")" + log "Copying file locally: $src -> $dst" + cp -a "$src" "$dst" + fi +} + +copy_dir_filtered() { + local src="$1" + local dst="$2" + + [[ "$SYNC_DATA" -eq 1 ]] || { vlog "Skipping filtered copy (no --sync-data): $src -> $dst"; return 0; } + [[ -d "$src" ]] || { warn "Build context directory does not exist, skipping: $src"; return 0; } + + if [[ -n "$DEST_HOST" ]]; then + remote_mkdir "$dst" + log "Copying build context: $src -> $(ssh_target):$dst" + case "$TRANSFER_METHOD" in + rsync) + local args=( -aHAX --numeric-ids --info=progress2 ) + local ex + for ex in "${BUILD_CONTEXT_EXCLUDES[@]}"; do args+=( --exclude "$ex" ); done + rsync "${args[@]}" -e "$(rsync_rsh)" "$src"/ "$(ssh_target):$dst"/ < /dev/null + ;; + scp) + warn "scp mode does not support excludes; copying whole working directory" + tar -C "$src" -cf - . | ssh_cmd "$(ssh_target)" "tar -C '$dst' -xf -" + ;; + *) + err "Unsupported transfer method: $TRANSFER_METHOD" + return 1 + ;; + esac + else + mkdir -p "$dst" + log "Copying build context locally: $src -> $dst" + case "$TRANSFER_METHOD" in + rsync) + local args=( -aHAX --numeric-ids ) + local ex + for ex in "${BUILD_CONTEXT_EXCLUDES[@]}"; do args+=( --exclude "$ex" ); done + rsync "${args[@]}" "$src"/ "$dst"/ + ;; + scp) cp -a "$src"/. "$dst"/ ;; + *) err "Unsupported transfer method: $TRANSFER_METHOD"; return 1 ;; + esac + fi +} + +record_inventory_header() { + local f="$1" + cat > "$f" < "$f" </dev/null || true)" + case "$mime" in + text/*|application/json|application/xml|application/x-yaml|application/javascript|application/x-sh) + return 0 + ;; + esac + + local base + base="$(basename "$f")" + if [[ "$base" =~ $TEXT_FILE_EXTENSIONS_REGEX ]]; then + return 0 + fi + + return 1 +} + +escape_sed_replacement() { + printf '%s' "$1" | sed -e 's/[\/&]/\\&/g' +} + +apply_replacements_to_file() { + local f="$1" + local report_file="${2:-}" + + [[ "${#REPLACEMENTS[@]}" -gt 0 ]] || return 0 + [[ -f "$f" ]] || return 0 + file_is_probably_text "$f" || return 0 + + local before_hash after_hash + before_hash="$(sha256sum "$f" | awk '{print $1}')" + + local pair old new old_esc new_esc + for pair in "${REPLACEMENTS[@]}"; do + old="${pair%%=*}" + new="${pair#*=}" + old_esc="$(escape_sed_replacement "$old")" + new_esc="$(escape_sed_replacement "$new")" + sed -i "s/${old_esc}/${new_esc}/g" "$f" + done + + after_hash="$(sha256sum "$f" | awk '{print $1}')" + if [[ "$before_hash" != "$after_hash" && -n "$report_file" ]]; then + { + echo "Text replacements applied:" + echo " file: $f" + local p + for p in "${REPLACEMENTS[@]}"; do + echo " replace: $p" + done + echo + } >> "$report_file" + fi +} + +apply_replacements_to_tree() { + local dir="$1" + local report_file="${2:-}" + [[ "${#REPLACEMENTS[@]}" -gt 0 ]] || return 0 + [[ -d "$dir" ]] || return 0 + + while IFS= read -r -d '' f; do + apply_replacements_to_file "$f" "$report_file" + done < <(find "$dir" -type f -print0) +} + +apply_replacements_remote_file() { + local remote_path="$1" + local report_file="${2:-}" + + [[ "${#REPLACEMENTS[@]}" -gt 0 ]] || return 0 + [[ -n "$DEST_HOST" ]] || return 0 + + local before after + before="$(ssh_cmd "$(ssh_target)" "test -f '$remote_path' && sha256sum '$remote_path' | awk '{print \$1}'" 2>/dev/null || true)" + [[ -n "$before" ]] || return 0 + + local script="" + local pair old new old_esc new_esc + for pair in "${REPLACEMENTS[@]}"; do + old="${pair%%=*}" + new="${pair#*=}" + old_esc="$(printf '%s' "$old" | sed "s/'/'\\\\''/g")" + new_esc="$(printf '%s' "$new" | sed "s/'/'\\\\''/g")" + script+="perl -0pi -e 's/\\Q${old_esc}\\E/${new_esc}/g' '$remote_path'; " + done + + ssh_cmd "$(ssh_target)" "$script" + + after="$(ssh_cmd "$(ssh_target)" "test -f '$remote_path' && sha256sum '$remote_path' | awk '{print \$1}'" 2>/dev/null || true)" + + if [[ -n "$report_file" && "$before" != "$after" ]]; then + { + echo "Remote text replacements applied:" + echo " file: $remote_path" + local p + for p in "${REPLACEMENTS[@]}"; do + echo " replace: $p" + done + echo + } >> "$report_file" + fi +} + +apply_replacements_any_file() { + local path="$1" + local report_file="${2:-}" + + if [[ -n "$DEST_HOST" ]]; then + apply_replacements_remote_file "$path" "$report_file" + else + apply_replacements_to_file "$path" "$report_file" + fi +} + +apply_replacements_remote_tree() { + local remote_dir="$1" + local report_file="${2:-}" + + [[ "${#REPLACEMENTS[@]}" -gt 0 ]] || return 0 + [[ -n "$DEST_HOST" ]] || return 0 + + local files + files="$(ssh_cmd "$(ssh_target)" "find '$remote_dir' -type f 2>/dev/null" || true)" + [[ -n "$files" ]] || return 0 + + local f + while IFS= read -r f; do + [[ -n "$f" ]] || continue + case "$f" in + *.yml|*.yaml|*.json|*.conf|*.cfg|*.ini|*.env|*.properties|*.txt|*.xml|*.sh|*.py|*.js|*.ts|*.md|*/Dockerfile|*Dockerfile) + apply_replacements_remote_file "$f" "$report_file" + ;; + esac + done <<< "$files" +} + +apply_replacements_any_tree() { + local path="$1" + local report_file="${2:-}" + + if [[ -n "$DEST_HOST" ]]; then + apply_replacements_remote_tree "$path" "$report_file" + else + apply_replacements_to_tree "$path" "$report_file" + fi +} + +load_replacements_file() { + [[ -n "$REPLACE_FILE" ]] || return 0 + [[ -f "$REPLACE_FILE" ]] || { err "Replacement file not found: $REPLACE_FILE"; exit 1; } + + while IFS= read -r raw_line || [[ -n "$raw_line" ]]; do + local line + line="$(echo "$raw_line" | tr -d '\r')" + line="$(trim "$line")" + [[ -z "$line" ]] && continue + [[ "$line" =~ ^# ]] && continue + [[ "$line" == *"="* ]] || { warn "Skipping invalid replacement line (missing '='): $line"; continue; } + REPLACEMENTS+=("$line") + done < "$REPLACE_FILE" +} + +generate_bind_mount_line_rw() { + local host_path="$1" + local container_path="$2" + local rw="$3" + + if [[ "$rw" == "false" ]]; then + echo " - ${host_path}:${container_path}:ro" + else + echo " - ${host_path}:${container_path}" + fi +} + +generate_logging_override_block() { + cat <<'EOF' + logging: + driver: json-file + options: + max-size: "10m" + max-file: "3" +EOF +} + +render_ports_yaml() { + local c="$1" + local ports + ports="$(container_ports_json "$c")" + [[ "$ports" == "{}" ]] && return 0 + + echo " ports:" + echo "$ports" | jq -r ' + to_entries[] + | .key as $container_port + | (.value // []) + | .[] + | " - \"" + ((.HostIp // "") | if . == "" then "" else . + ":" end) + (.HostPort // "") + ":" + $container_port + "\"" + ' +} + +render_env_yaml() { + local c="$1" + local envj + envj="$(container_env_json "$c")" + [[ "$envj" == "[]" ]] && return 0 + + echo " environment:" + echo "$envj" | jq -r '.[] | " - " + .' +} + +render_restart_yaml() { + local c="$1" + local rp + rp="$(container_restart_policy "$c")" + [[ -n "$rp" && "$rp" != "no" ]] && echo " restart: $rp" +} + +render_network_mode_yaml() { + local c="$1" + local nm + nm="$(container_network_mode "$c")" + [[ -n "$nm" && "$nm" != "default" ]] && echo " network_mode: $nm" +} + +inspect_env_for_cert_hints() { + local c="$1" + local cert_report="$2" + + while IFS= read -r envline; do + [[ -z "$envline" ]] && continue + local k="${envline%%=*}" + local v="${envline#*=}" + + if [[ "$k" =~ $CERT_ENV_HINT_REGEX ]] || [[ "$v" =~ $CERT_PATH_HINT_REGEX ]] || [[ "$v" =~ $CERT_FILE_REGEX ]]; then + { + echo "Environment hint:" + echo " container: $(container_name "$c")" + echo " variable: $k" + echo " value: $v" + echo + } >> "$cert_report" + fi + done < <(container_env_json "$c" | jq -r '.[]') +} + +inspect_mounts_for_cert_hints() { + local c="$1" + local cert_report="$2" + + while IFS= read -r m; do + local type source dest + type="$(jq -r '.Type // ""' <<<"$m")" + source="$(jq -r '.Source // ""' <<<"$m")" + dest="$(jq -r '.Destination // ""' <<<"$m")" + + if [[ "$source" =~ $CERT_PATH_HINT_REGEX ]] || [[ "$dest" =~ $CERT_PATH_HINT_REGEX ]] || [[ "$source" =~ $CERT_FILE_REGEX ]] || [[ "$dest" =~ $CERT_FILE_REGEX ]]; then + { + echo "Mount hint:" + echo " container: $(container_name "$c")" + echo " type: $type" + echo " source: $source" + echo " dest: $dest" + echo + } >> "$cert_report" + fi + done < <(container_mounts_json "$c" | jq -c '.[]') +} + +scan_directory_for_cert_files() { + local dir="$1" + local cert_report="$2" + local heading="$3" + + [[ -d "$dir" ]] || return 0 + + local found=0 + while IFS= read -r f; do + if [[ "$found" -eq 0 ]]; then + echo "$heading" >> "$cert_report" + found=1 + fi + echo " $f" >> "$cert_report" + done < <(find "$dir" -type f \( \ + -iname '*.crt' -o -iname '*.cer' -o -iname '*.pem' -o -iname '*.key' -o \ + -iname '*.p12' -o -iname '*.pfx' -o -iname '*.jks' -o -iname '*.keystore' -o \ + -iname '*.csr' -o -iname '*.der' \ + \) 2>/dev/null) + + [[ "$found" -eq 1 ]] && echo >> "$cert_report" +} + +scan_text_file_for_cert_hints() { + local file="$1" + local cert_report="$2" + local heading="$3" + + [[ -f "$file" ]] || return 0 + + local matches + matches="$(grep -Ein 'cert|certificate|ssl|tls|key|keystore|truststore|letsencrypt|ca_bundle|ca_cert|client_cert|client_key' "$file" 2>/dev/null || true)" + if [[ -n "$matches" ]]; then + { + echo "$heading" + echo "$matches" + echo + } >> "$cert_report" + fi +} + +load_csv_selection() { + [[ -n "$CSV_FILE" ]] || return 0 + [[ -f "$CSV_FILE" ]] || { err "CSV file not found: $CSV_FILE"; exit 1; } + + log "Loading container selection CSV: $CSV_FILE" + + local requested_count=0 + local matched_count=0 + local line_no=1 + + while IFS= read -r raw_line || [[ -n "$raw_line" ]]; do + line_no=$((line_no + 1)) + local line + line="$(echo "$raw_line" | tr -d '\r')" + + [[ -z "$(trim "$line")" ]] && continue + [[ "$(trim "$line")" =~ ^# ]] && continue + + requested_count=$((requested_count + 1)) + + IFS=',' read -r col1 col2 col3 extra <<< "$line" + + local container include_writable stop_container resolved cname + container="$(trim "${col1:-}")" + include_writable="$(trim "${col2:-}")" + stop_container="$(trim "${col3:-}")" + + [[ -n "$container" ]] || { warn "Skipping CSV line $line_no with empty container field"; continue; } + + if ! resolved="$(resolve_container_identifier "$container")"; then + warn "Container from CSV not found, skipping: $container" + continue + fi + + cname="$(container_name "$resolved")" + CSV_SELECTED_CONTAINERS["$resolved"]=1 + CSV_SELECTED_CONTAINERS["$cname"]=1 + matched_count=$((matched_count + 1)) + + [[ -n "$include_writable" ]] && { + CSV_CONTAINER_INCLUDE_WRITABLE["$resolved"]="$(bool_from_string "$include_writable")" + CSV_CONTAINER_INCLUDE_WRITABLE["$cname"]="$(bool_from_string "$include_writable")" + } + + [[ -n "$stop_container" ]] && { + CSV_CONTAINER_STOP["$resolved"]="$(bool_from_string "$stop_container")" + CSV_CONTAINER_STOP["$cname"]="$(bool_from_string "$stop_container")" + } + done < <(tail -n +2 "$CSV_FILE") + + log "CSV requested containers: $requested_count" + log "CSV matched containers: $matched_count" + + [[ "$matched_count" -gt 0 ]] || { err "CSV matched zero containers. Nothing will be processed."; exit 1; } +} + +is_container_selected() { + local c="$1" + [[ -z "$CSV_FILE" ]] && return 0 + local cname + cname="$(container_name "$c")" + [[ -n "${CSV_SELECTED_CONTAINERS[$c]:-}" || -n "${CSV_SELECTED_CONTAINERS[$cname]:-}" ]] +} + +container_include_writable() { + local c="$1" + local cname + cname="$(container_name "$c")" + + [[ -n "${CSV_CONTAINER_INCLUDE_WRITABLE[$c]:-}" ]] && { echo "${CSV_CONTAINER_INCLUDE_WRITABLE[$c]}"; return; } + [[ -n "${CSV_CONTAINER_INCLUDE_WRITABLE[$cname]:-}" ]] && { echo "${CSV_CONTAINER_INCLUDE_WRITABLE[$cname]}"; return; } + echo "$INCLUDE_WRITABLE" +} + +container_should_stop() { + local c="$1" + local cname + cname="$(container_name "$c")" + + [[ -n "${CSV_CONTAINER_STOP[$c]:-}" ]] && { echo "${CSV_CONTAINER_STOP[$c]}"; return; } + [[ -n "${CSV_CONTAINER_STOP[$cname]:-}" ]] && { echo "${CSV_CONTAINER_STOP[$cname]}"; return; } + echo "$STOP_CONTAINERS" +} + +stop_container_if_requested() { + local c="$1" + local stop_flag + stop_flag="$(container_should_stop "$c")" + + if [[ "$FINAL_SYNC" -eq 1 && "$FINAL_SYNC_STOP" -eq 1 ]]; then + stop_flag=1 + fi + + if [[ "$stop_flag" -eq 1 ]] && container_is_running "$c"; then + log "Stopping container for consistent capture: $(container_name "$c")" + docker stop "$c" >/dev/null + fi +} + +is_subpath_of() { + local child="$1" + local parent="$2" + + [[ -n "$child" && -n "$parent" ]] || return 1 + [[ -e "$child" && -e "$parent" ]] || return 1 + + local child_real parent_real + child_real="$(readlink -f -- "$child" 2>/dev/null || true)" + parent_real="$(readlink -f -- "$parent" 2>/dev/null || true)" + + [[ -n "$child_real" && -n "$parent_real" ]] || return 1 + + case "$child_real" in + "$parent_real"|"$parent_real"/*) return 0 ;; + *) return 1 ;; + esac +} + +validate_project_destination() { + local project="$1" + local report_file="$2" + + local project_root migrated_compose original_compose override_file + project_root="$(project_root_path "$project")" + migrated_compose="${project_root}/docker-compose.yml" + original_compose="${project_root}/docker-compose.orig.yml" + override_file="${project_root}/migration/docker-compose.migration.override.yml" + + { + echo "Validation for project: $project" + echo " project_root: $project_root" + } >> "$report_file" + + if remote_test_exists "$project_root"; then + echo " project_root_exists: yes" >> "$report_file" + else + echo " project_root_exists: no" >> "$report_file" + fi + + if remote_test_exists "$original_compose"; then + echo " original_compose_exists: yes" >> "$report_file" + else + echo " original_compose_exists: no" >> "$report_file" + fi + + if [[ "$GENERATE_MIGRATED_COMPOSE" -eq 1 ]]; then + if remote_test_exists "$migrated_compose"; then + echo " migrated_compose_exists: yes" >> "$report_file" + else + echo " migrated_compose_exists: no" >> "$report_file" + fi + else + if remote_test_exists "$override_file"; then + echo " override_exists: yes" >> "$report_file" + else + echo " override_exists: no" >> "$report_file" + fi + fi + + echo >> "$report_file" +} + +validate_mount_copy() { + local source_path="$1" + local dest_path="$2" + local label="$3" + local report_file="$4" + + local src_bytes dst_bytes + src_bytes="" + dst_bytes="" + + if [[ -e "$source_path" ]]; then + src_bytes="$(du -sb "$source_path" 2>/dev/null | awk '{print $1}')" + fi + if remote_test_exists "$dest_path"; then + dst_bytes="$(remote_du_bytes "$dest_path" || true)" + fi + + { + echo "Mount validation:" + echo " label: $label" + echo " source: $source_path" + echo " destination: $dest_path" + echo " source_bytes: ${src_bytes:-unknown}" + echo " destination_bytes: ${dst_bytes:-missing}" + } >> "$report_file" + + if [[ -n "$src_bytes" && -n "$dst_bytes" ]]; then + if [[ "$src_bytes" == "$dst_bytes" ]]; then + echo " size_match: yes" >> "$report_file" + else + echo " size_match: no" >> "$report_file" + fi + else + echo " size_match: unknown" >> "$report_file" + fi + + echo >> "$report_file" +} + +copy_stack_files_for_project() { + local project="$1" + local example_container="$2" + local report_file="$3" + local cert_report="$4" + + [[ "$SYNC_COMPOSE_FILES" -eq 1 || "$SYNC_BUILD_CONTEXT" -eq 1 ]] || return 0 + + local workdir config_files + workdir="$(container_compose_workdir "$example_container")" + config_files="$(container_compose_files "$example_container")" + + local project_root + project_root="$(project_root_path "$project")" + + { + echo "Stack file sync:" + echo " project: $project" + echo " working_dir: ${workdir:-unknown}" + echo " compose_files: ${config_files:-unknown}" + echo + } >> "$report_file" + + if [[ "$SYNC_COMPOSE_FILES" -eq 1 ]]; then + if [[ -n "$config_files" ]]; then + IFS=',' read -ra cfarr <<< "$config_files" + local cf + for cf in "${cfarr[@]}"; do + cf="$(trim "$cf")" + [[ -z "$cf" ]] && continue + + local src_cf="$cf" + if [[ ! -f "$src_cf" && -n "$workdir" && -f "$workdir/$cf" ]]; then + src_cf="$workdir/$cf" + fi + + if [[ -f "$src_cf" ]]; then + local dst_cf="${project_root}/docker-compose.orig.yml" + copy_file "$src_cf" "$dst_cf" || warn "Failed copying compose file: $src_cf" + echo "Copied compose file: $src_cf -> $dst_cf" >> "$report_file" + scan_text_file_for_cert_hints "$src_cf" "$cert_report" "Compose file security hints: $src_cf" + apply_replacements_any_file "$dst_cf" "$report_file" + break + else + echo "Compose file not found: $cf" >> "$report_file" + fi + done + fi + + if [[ -n "$workdir" && -f "$workdir/.env" ]]; then + local dst_env="${project_root}/.env" + copy_file "$workdir/.env" "$dst_env" || warn "Failed copying env file: $workdir/.env" + echo "Copied env file: $workdir/.env -> $dst_env" >> "$report_file" + scan_text_file_for_cert_hints "$workdir/.env" "$cert_report" "Environment file security hints: $workdir/.env" + apply_replacements_any_file "$dst_env" "$report_file" + fi + fi + + if [[ -n "$workdir" && -d "$workdir" ]]; then + scan_directory_for_cert_files "$workdir" "$cert_report" "Certificate-like files under working dir: $workdir" + fi + + if [[ "$SYNC_BUILD_CONTEXT" -eq 1 ]]; then + if [[ -n "$workdir" && -d "$workdir" ]]; then + copy_dir_filtered "$workdir" "${project_root}/working_dir" || warn "Failed copying working dir: $workdir" + echo "Copied working directory: $workdir -> ${project_root}/working_dir" >> "$report_file" + apply_replacements_any_tree "${project_root}/working_dir" "$report_file" + else + echo "Working directory unavailable for build-context copy" >> "$report_file" + fi + fi +} + +capture_mounts_for_container() { + local c="$1" + local project_slug="$2" + local service_slug="$3" + local report_file="$4" + local override_file="$5" + local cert_report="$6" + local project_workdir="$7" + local final_sync_report="${8:-}" + + local mounts_json + mounts_json="$(container_mounts_json "$c")" + + inspect_mounts_for_cert_hints "$c" "$cert_report" + + { + echo " - raw mounts json:" + echo "$mounts_json" | jq . + } >> "$report_file" + + if [[ "$mounts_json" == "[]" ]]; then + echo " - no mounts detected" >> "$report_file" + return 1 + fi + + local tmp_vols + tmp_vols="$(mktemp)" + echo " volumes:" > "$tmp_vols" + + local data_root + data_root="$(project_data_root "$project_slug")" + + while IFS= read -r m; do + local type source dest name rw + type="$(jq -r '.Type // ""' <<<"$m")" + source="$(jq -r '.Source // ""' <<<"$m")" + dest="$(jq -r '.Destination // ""' <<<"$m")" + name="$(jq -r '.Name // ""' <<<"$m")" + rw="$(jq -r '.RW' <<<"$m")" + + [[ -n "$dest" ]] || continue + + local mount_slug rel_dir final_host_path + mount_slug="$(safe_name "$dest")" + + log "Inspecting mount for $(container_name "$c"): type=$type source=$source dest=$dest" + + case "$type" in + bind) + if [[ -n "$project_workdir" && "$SYNC_BUILD_CONTEXT" -eq 1 ]] && is_subpath_of "$source" "$project_workdir"; then + local rel_from_workdir + if [[ "$source" == "$project_workdir" ]]; then + rel_from_workdir="." + else + rel_from_workdir="${source#"$project_workdir"/}" + fi + + final_host_path="$(project_root_path "$project_slug")/working_dir/${rel_from_workdir}" + + { + echo " - bind mount already covered by working_dir copy:" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " source: $source" + echo " rw: $rw" + echo " migrated_to: $final_host_path" + } >> "$report_file" + + generate_bind_mount_line_rw "$final_host_path" "$dest" "$rw" >> "$tmp_vols" + + if [[ "$FINAL_SYNC" -eq 1 && -n "$final_sync_report" ]]; then + validate_mount_copy "$source" "$final_host_path" "bind-covered:${dest}" "$final_sync_report" + fi + else + rel_dir="${data_root}/${service_slug}/binds/${mount_slug}" + + if [[ -d "$source" ]]; then + final_host_path="$rel_dir" + { + echo " - bind mount (directory):" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " source: $source" + echo " rw: $rw" + echo " migrated_to: $final_host_path" + } >> "$report_file" + + copy_dir "$source" "$final_host_path" || warn "Failed copying bind directory: $source" + generate_bind_mount_line_rw "$final_host_path" "$dest" "$rw" >> "$tmp_vols" + + if [[ "$FINAL_SYNC" -eq 1 && -n "$final_sync_report" ]]; then + validate_mount_copy "$source" "$final_host_path" "bind:${dest}" "$final_sync_report" + fi + elif [[ -f "$source" ]]; then + final_host_path="${rel_dir}/$(basename "$source")" + { + echo " - bind mount (file):" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " source: $source" + echo " rw: $rw" + echo " migrated_to: $final_host_path" + } >> "$report_file" + + copy_file "$source" "$final_host_path" || warn "Failed copying bind file: $source" + generate_bind_mount_line_rw "$final_host_path" "$dest" "$rw" >> "$tmp_vols" + + if [[ "$FINAL_SYNC" -eq 1 && -n "$final_sync_report" ]]; then + validate_mount_copy "$source" "$final_host_path" "bind-file:${dest}" "$final_sync_report" + fi + else + { + echo " - bind mount (missing source):" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " source: $source" + echo " rw: $rw" + echo " action: left unchanged because source path was not found" + } >> "$report_file" + + generate_bind_mount_line_rw "$source" "$dest" "$rw" >> "$tmp_vols" + fi + fi + ;; + + volume) + rel_dir="${data_root}/${service_slug}/volumes/${mount_slug}" + + if [[ -d "$source" ]]; then + final_host_path="$rel_dir" + { + echo " - docker volume:" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " volume_name: $name" + echo " source: $source" + echo " rw: $rw" + echo " migrated_to: $final_host_path" + } >> "$report_file" + + copy_dir "$source" "$final_host_path" || warn "Failed copying volume directory: $source" + generate_bind_mount_line_rw "$final_host_path" "$dest" "$rw" >> "$tmp_vols" + + if [[ "$FINAL_SYNC" -eq 1 && -n "$final_sync_report" ]]; then + validate_mount_copy "$source" "$final_host_path" "volume:${name:-$dest}" "$final_sync_report" + fi + elif [[ -f "$source" ]]; then + final_host_path="${rel_dir}/$(basename "$source")" + { + echo " - docker volume (file):" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " volume_name: $name" + echo " source: $source" + echo " rw: $rw" + echo " migrated_to: $final_host_path" + } >> "$report_file" + + copy_file "$source" "$final_host_path" || warn "Failed copying volume file: $source" + generate_bind_mount_line_rw "$final_host_path" "$dest" "$rw" >> "$tmp_vols" + + if [[ "$FINAL_SYNC" -eq 1 && -n "$final_sync_report" ]]; then + validate_mount_copy "$source" "$final_host_path" "volume-file:${name:-$dest}" "$final_sync_report" + fi + else + { + echo " - volume source missing:" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " volume_name: $name" + echo " source: $source" + echo " action: not copied because source path was not found" + } >> "$report_file" + fi + ;; + + tmpfs) + { + echo " - tmpfs mount:" + echo " container: $(container_name "$c")" + echo " dest: $dest" + echo " action: skipped" + } >> "$report_file" + ;; + + *) + { + echo " - other mount:" + echo " container: $(container_name "$c")" + echo " type: $type" + echo " dest: $dest" + echo " source: $source" + echo " action: reported only" + } >> "$report_file" + ;; + esac + done < <(echo "$mounts_json" | jq -c '.[]') + + if grep -qE '^[[:space:]]+- ' "$tmp_vols"; then + cat "$tmp_vols" >> "$override_file" + rm -f "$tmp_vols" + return 0 + else + rm -f "$tmp_vols" + return 1 + fi +} + +capture_writable_layer() { + local c="$1" + local project_slug="$2" + local service_slug="$3" + local report_file="$4" + local override_file="$5" + local final_sync_report="${6:-}" + + local include_writable + include_writable="$(container_include_writable "$c")" + [[ "$include_writable" -eq 1 ]] || return 1 + + local cname + cname="$(container_name "$c")" + + local diff_lines + diff_lines="$(docker diff "$c" || true)" + + if [[ -z "$diff_lines" ]]; then + echo " - writable-layer: no changed files detected" >> "$report_file" + return 1 + fi + + local data_root + data_root="$(project_data_root "$project_slug")" + + local tmp_writable + tmp_writable="$(mktemp)" + echo " volumes:" > "$tmp_writable" + + while IFS= read -r line; do + [[ -n "$line" ]] || continue + + local action path + action="$(awk '{print $1}' <<< "$line")" + path="$(cut -d' ' -f2- <<< "$line")" + + [[ -n "${path:-}" ]] || continue + [[ "$action" == "D" ]] && continue + + if [[ "$path" =~ $WRITABLE_EXCLUDE_REGEX ]]; then + continue + fi + + case "$path" in + /var/log/*|/var/cache/*|/root/.cache/*|/tmp/*|/run/*) continue ;; + esac + + local item_slug rel_dir local_stage_parent final_host_path staged_item + item_slug="$(safe_name "$path")" + rel_dir="${data_root}/${service_slug}/writable/${item_slug}" + local_stage_parent="${OUTPUT_DIR}/staging/${project_slug}/${service_slug}/writable" + final_host_path="$rel_dir" + staged_item="${local_stage_parent}/$(basename "$path")" + + mkdir -p "$local_stage_parent" + + if docker cp "${c}:${path}" "$local_stage_parent/" >/dev/null 2>&1; then + { + echo " - writable-layer:" + echo " container: $cname" + echo " path: $path" + echo " recovered_to: $final_host_path" + } >> "$report_file" + + if [[ -d "$staged_item" ]]; then + copy_dir "$staged_item" "$final_host_path" || warn "Failed copying writable dir: $staged_item" + elif [[ -f "$staged_item" ]]; then + copy_file "$staged_item" "$final_host_path" || warn "Failed copying writable file: $staged_item" + fi + + echo " - ${final_host_path}:${path}" >> "$tmp_writable" + + if [[ "$FINAL_SYNC" -eq 1 && -n "$final_sync_report" ]]; then + validate_mount_copy "$staged_item" "$final_host_path" "writable:${path}" "$final_sync_report" + fi + else + { + echo " - writable-layer-skip:" + echo " container: $cname" + echo " path: $path" + echo " reason: docker cp failed" + } >> "$report_file" + fi + done < <(printf '%s\n' "$diff_lines") + + if grep -qE '^[[:space:]]+- ' "$tmp_writable"; then + cat "$tmp_writable" >> "$override_file" + rm -f "$tmp_writable" + return 0 + else + rm -f "$tmp_writable" + return 1 + fi +} + +service_has_logging_in_compose() { + local compose_file="$1" + local service="$2" + + [[ -f "$compose_file" ]] || return 1 + + python3 - "$compose_file" "$service" <<'PY' >/dev/null 2>&1 +import sys, yaml +compose_file = sys.argv[1] +service = sys.argv[2] +with open(compose_file, "r", encoding="utf-8") as f: + data = yaml.safe_load(f) or {} +svc = ((data.get("services") or {}).get(service) or {}) +sys.exit(0 if "logging" in svc else 1) +PY +} + +generate_compose_override_for_service() { + local c="$1" + local project="$2" + local service="$3" + local report_file="$4" + local override_file="$5" + local cert_report="$6" + local project_workdir="$7" + local final_sync_report="${8:-}" + local source_compose_for_logging="${9:-}" + + local tmp_service_file + tmp_service_file="$(mktemp)" + + cat > "$tmp_service_file" <> "$tmp_service_file" + { + echo "Docker log cap applied:" + echo " service: $service" + echo " max-size: 10m" + echo " max-file: 3" + echo + } >> "$report_file" + else + { + echo "Docker log cap skipped because logging already defined:" + echo " service: $service" + echo + } >> "$report_file" + fi + fi + + if grep -qE '^[[:space:]]+(volumes:|logging:)' "$tmp_service_file"; then + cat "$tmp_service_file" >> "$override_file" + else + vlog "No override content for service '$service'; skipping empty block" + fi + + rm -f "$tmp_service_file" +} + +copy_project_migration_artifacts() { + local project="$1" + local report_file="$2" + local cert_report="$3" + local override_file="$4" + + local mig_root + mig_root="$(project_migration_root "$project")" + + copy_file "$report_file" "${mig_root}/report.txt" || true + copy_file "$cert_report" "${mig_root}/certificates-report.txt" || true + copy_file "$override_file" "${mig_root}/docker-compose.migration.override.yml" || true +} + +copy_final_sync_report() { + local project="$1" + local local_report="$2" + local dest_report + dest_report="$(project_migration_root "$project")/final-sync-report.txt" + copy_file "$local_report" "$dest_report" || true +} + +generate_migrated_compose_file() { + local project="$1" + local example_container="$2" + local override_file="$3" + local report_file="$4" + + [[ "$GENERATE_MIGRATED_COMPOSE" -eq 1 ]] || return 0 + + local config_files workdir project_root migrated_compose src_compose + config_files="$(container_compose_files "$example_container")" + workdir="$(container_compose_workdir "$example_container")" + project_root="$(project_root_path "$project")" + migrated_compose="${project_root}/docker-compose.yml" + + src_compose="" + if [[ -f "${project_root}/docker-compose.orig.yml" ]]; then + src_compose="${project_root}/docker-compose.orig.yml" + elif [[ -n "$config_files" ]]; then + IFS=',' read -ra cfarr <<< "$config_files" + local cf + for cf in "${cfarr[@]}"; do + cf="$(trim "$cf")" + [[ -z "$cf" ]] && continue + + if [[ -f "$cf" ]]; then + src_compose="$cf" + break + elif [[ -n "$workdir" && -f "$workdir/$cf" ]]; then + src_compose="$workdir/$cf" + break + fi + done + fi + + if [[ -z "$src_compose" || ! -f "$src_compose" ]]; then + err "Could not locate source compose file for project '$project'" + echo "Migrated compose generation failed: source compose file not found" >> "$report_file" + return 1 + fi + + if [[ ! -f "$override_file" ]]; then + err "Override file missing for project '$project'" + echo "Migrated compose generation failed: override file not found" >> "$report_file" + return 1 + fi + + python3 - <<'PY' >/dev/null 2>&1 +import yaml +PY + if [[ $? -ne 0 ]]; then + err "PyYAML is required for --generate-migrated-compose. Install with: apt-get install -y python3-yaml" + echo "Migrated compose generation failed: PyYAML missing" >> "$report_file" + return 1 + fi + + remote_mkdir "$project_root" + + local tmp_py local_out + tmp_py="$(mktemp)" + local_out="${OUTPUT_DIR}/projects/${project}/docker-compose.yml" + mkdir -p "$(dirname "$local_out")" + + cat > "$tmp_py" <<'PY' +import sys +from pathlib import Path +import yaml + +src_compose = Path(sys.argv[1]) +override_file = Path(sys.argv[2]) +out_file = Path(sys.argv[3]) + +with src_compose.open("r", encoding="utf-8") as f: + base = yaml.safe_load(f) or {} + +with override_file.open("r", encoding="utf-8") as f: + override = yaml.safe_load(f) or {} + +base_services = base.get("services", {}) or {} +override_services = override.get("services", {}) or {} +named_volumes_to_prune = set() + +def is_named_volume_ref(vol_entry): + if not isinstance(vol_entry, str): + return None + parts = vol_entry.split(":") + if not parts: + return None + lhs = parts[0] + if lhs.startswith("/") or lhs.startswith("./") or lhs.startswith("../") or lhs.startswith("~"): + return None + return lhs + +for svc_name, ov_svc in override_services.items(): + if svc_name not in base_services: + base_services[svc_name] = {} + base_svc = base_services[svc_name] + + if "volumes" in ov_svc: + for old_vol in base_svc.get("volumes", []) or []: + named = is_named_volume_ref(old_vol) + if named: + named_volumes_to_prune.add(named) + base_svc["volumes"] = ov_svc["volumes"] + + for key, value in ov_svc.items(): + if key == "volumes": + continue + base_svc[key] = value + +base["services"] = base_services +top_vols = base.get("volumes", {}) or {} + +still_referenced = set() +for svc in base_services.values(): + for vol in svc.get("volumes", []) or []: + named = is_named_volume_ref(vol) + if named: + still_referenced.add(named) + +for v in list(named_volumes_to_prune): + if v not in still_referenced and v in top_vols: + del top_vols[v] + +if top_vols: + base["volumes"] = top_vols +elif "volumes" in base: + del base["volumes"] + +with out_file.open("w", encoding="utf-8") as f: + yaml.safe_dump(base, f, sort_keys=False, default_flow_style=False) +PY + + python3 "$tmp_py" "$src_compose" "$override_file" "$local_out" + local rc=$? + rm -f "$tmp_py" + + if [[ $rc -ne 0 || ! -f "$local_out" ]]; then + err "Failed generating merged docker-compose.yml for project '$project'" + echo "Migrated compose generation failed for project '$project'" >> "$report_file" + return 1 + fi + + copy_file "$local_out" "$migrated_compose" + if [[ $? -ne 0 ]]; then + err "Failed copying merged docker-compose.yml to destination for project '$project'" + echo "Migrated compose copy failed for project '$project'" >> "$report_file" + return 1 + fi + + if ! remote_test_exists "$migrated_compose"; then + err "Merged docker-compose.yml not found on destination for project '$project'" + echo "Migrated compose missing on destination for project '$project'" >> "$report_file" + return 1 + fi + + echo "Generated migrated compose: $local_out -> $migrated_compose" >> "$report_file" + log "Generated merged docker-compose.yml: $migrated_compose" + return 0 +} + +write_container_metadata() { + local c="$1" + local project="$2" + local service="$3" + local image="$4" + local workdir="$5" + local config_files="$6" + local out_file="$7" + + cat > "$out_file" <> "$inventory_file" + fi + + { + echo "Docker log cap enabled: $ENABLE_LOG_CAP" + if [[ "$ENABLE_LOG_CAP" -eq 1 ]]; then + echo "Docker log cap policy: max-size=10m, max-file=3" + fi + echo + } >> "$inventory_file" + + local all_containers + mapfile -t all_containers < <(docker ps -a --format '{{.ID}}') + [[ "${#all_containers[@]}" -gt 0 ]] || { warn "No containers found."; exit 0; } + + declare -A PROJECT_CONTAINERS=() + declare -A PROJECT_FIRST_CONTAINER=() + declare -A PROJECT_WORKDIR=() + declare -A PROJECT_SOURCE_COMPOSE=() + declare -a STANDALONE_CONTAINERS=() + local selected_count=0 + + local c + for c in "${all_containers[@]}"; do + if ! is_container_selected "$c"; then + continue + fi + + selected_count=$((selected_count + 1)) + + local project + project="$(container_compose_project "$c")" + if [[ -n "$project" ]]; then + PROJECT_CONTAINERS["$project"]+="${c} " + [[ -z "${PROJECT_FIRST_CONTAINER[$project]:-}" ]] && PROJECT_FIRST_CONTAINER["$project"]="$c" + [[ -z "${PROJECT_WORKDIR[$project]:-}" ]] && PROJECT_WORKDIR["$project"]="$(container_compose_workdir "$c")" + else + STANDALONE_CONTAINERS+=("$c") + fi + done + + { + echo "Run ID: $RUN_ID" + echo "Destination base: $DEST_BASE" + echo "Destination host: ${DEST_HOST:-local only}" + echo "Destination user: ${DEST_USER:-default ssh user}" + echo "Transfer method: $TRANSFER_METHOD" + echo "Sync enabled: $SYNC_DATA" + echo "Global include writable layer: $INCLUDE_WRITABLE" + echo "Global stop containers: $STOP_CONTAINERS" + echo "Sync compose files: $SYNC_COMPOSE_FILES" + echo "Sync build context: $SYNC_BUILD_CONTEXT" + echo "Generate migrated compose: $GENERATE_MIGRATED_COMPOSE" + echo "Final sync: $FINAL_SYNC" + echo "Final sync stop: $FINAL_SYNC_STOP" + echo "Docker log cap enabled: $ENABLE_LOG_CAP" + echo "SSH key: ${SSH_KEY:-default ssh identity}" + echo "CSV filter: ${CSV_FILE:-none}" + echo "Selected containers: $selected_count" + echo + } >> "$inventory_file" + + [[ "$selected_count" -gt 0 ]] || { err "Zero containers selected for processing."; exit 1; } + + log "Containers selected for processing: $selected_count" + log "Processing compose projects..." + + local project + for project in "${!PROJECT_CONTAINERS[@]}"; do + local project_dir report_file override_file cert_report final_sync_report local_merged_compose project_root + project_dir="${OUTPUT_DIR}/projects/${project}" + report_file="${project_dir}/report.txt" + override_file="${project_dir}/docker-compose.migration.override.yml" + cert_report="${project_dir}/certificates-report.txt" + final_sync_report="${project_dir}/final-sync-report.txt" + local_merged_compose="${project_dir}/docker-compose.yml" + project_root="$(project_root_path "$project")" + + mkdir -p "$project_dir" + record_inventory_header "$report_file" + : > "$cert_report" + cat > "$override_file" <<'EOF' +services: +EOF + + PROJECT_SOURCE_COMPOSE["$project"]="${project_root}/docker-compose.orig.yml" + + if [[ "$FINAL_SYNC" -eq 1 ]]; then + write_final_sync_report_header "$final_sync_report" + validate_project_destination "$project" "$final_sync_report" + fi + + { + echo "Compose project: $project" + echo + } >> "$report_file" + + copy_stack_files_for_project "$project" "${PROJECT_FIRST_CONTAINER[$project]}" "$report_file" "$cert_report" || warn "Stack file copy encountered issues for project: $project" + + declare -A PROJECT_SEEN_SERVICES=() + + local pc + for pc in ${PROJECT_CONTAINERS["$project"]}; do + local cname service workdir config_files image + cname="$(container_name "$pc")" + service="$(container_compose_service "$pc")" + workdir="$(container_compose_workdir "$pc")" + config_files="$(container_compose_files "$pc")" + image="$(container_image "$pc")" + + local service_key="${service:-$cname}" + if [[ -n "${PROJECT_SEEN_SERVICES[$service_key]:-}" ]]; then + warn "Skipping duplicate service entry in override for project '$project': $service_key" + continue + fi + PROJECT_SEEN_SERVICES["$service_key"]=1 + + local container_dir container_report container_cert_report container_final_sync_report container_meta + container_dir="$(container_output_root "$pc")" + mkdir -p "$container_dir" + + container_report="${container_dir}/report.txt" + container_cert_report="${container_dir}/certificates-report.txt" + container_final_sync_report="${container_dir}/final-sync-report.txt" + container_meta="${container_dir}/metadata.txt" + + record_inventory_header "$container_report" + : > "$container_cert_report" + if [[ "$FINAL_SYNC" -eq 1 ]]; then + write_final_sync_report_header "$container_final_sync_report" + fi + + { + echo "Container: $cname" + echo " project: $project" + echo " service: $service_key" + echo " image: $image" + echo " workdir: ${workdir:-unknown}" + echo " compose_files: ${config_files:-unknown}" + echo " include_writable_layer: $(container_include_writable "$pc")" + echo " stop_container: $(container_should_stop "$pc")" + echo " timestamp: $(date -Is)" + } >> "$container_report" + + write_container_metadata "$pc" "$project" "$service_key" "$image" "${workdir:-unknown}" "${config_files:-unknown}" "$container_meta" + + inspect_env_for_cert_hints "$pc" "$container_cert_report" + inspect_mounts_for_cert_hints "$pc" "$container_cert_report" + + stop_container_if_requested "$pc" + generate_compose_override_for_service \ + "$pc" \ + "$project" \ + "$service_key" \ + "$container_report" \ + "$override_file" \ + "$container_cert_report" \ + "${PROJECT_WORKDIR[$project]:-}" \ + "$container_final_sync_report" \ + "${PROJECT_SOURCE_COMPOSE[$project]}" + + echo >> "$container_report" + done + + copy_project_migration_artifacts "$project" "$report_file" "$cert_report" "$override_file" + + if [[ "$GENERATE_MIGRATED_COMPOSE" -eq 1 ]]; then + generate_migrated_compose_file "$project" "${PROJECT_FIRST_CONTAINER[$project]}" "$override_file" "$report_file" + fi + + if [[ "$FINAL_SYNC" -eq 1 ]]; then + validate_project_destination "$project" "$final_sync_report" + copy_final_sync_report "$project" "$final_sync_report" + fi + + for pc in ${PROJECT_CONTAINERS["$project"]}; do + local cdir + cdir="$(container_output_root "$pc")" + copy_file "$override_file" "${cdir}/docker-compose.migration.override.yml" || true + [[ -f "$local_merged_compose" ]] && copy_file "$local_merged_compose" "${cdir}/docker-compose.yml" || true + if [[ "$FINAL_SYNC" -eq 1 && -f "$final_sync_report" ]]; then + copy_file "$final_sync_report" "${cdir}/project-final-sync-report.txt" || true + fi + done + + log "Wrote project report: $report_file" + log "Wrote certificates report: $cert_report" + log "Wrote compose override: $override_file" + [[ "$GENERATE_MIGRATED_COMPOSE" -eq 1 ]] && log "Generated merged docker-compose.yml for project: $project" + [[ "$FINAL_SYNC" -eq 1 ]] && log "Generated final sync report for project: $project" + done + + log "Processing standalone containers..." + for c in "${STANDALONE_CONTAINERS[@]}"; do + local cname standalone_report cert_report compose_file container_dir + cname="$(container_name "$c")" + standalone_report="${OUTPUT_DIR}/standalone/${cname}/report.txt" + cert_report="${OUTPUT_DIR}/standalone/${cname}/certificates-report.txt" + compose_file="${OUTPUT_DIR}/standalone/${cname}/compose.generated.yml" + container_dir="$(container_output_root "$c")" + + mkdir -p "${OUTPUT_DIR}/standalone/${cname}" "$container_dir" + record_inventory_header "$standalone_report" + : > "$cert_report" + + { + echo "Standalone container: $cname" + echo "Image: $(container_image "$c")" + echo "include_writable_layer: $(container_include_writable "$c")" + echo "stop_container: $(container_should_stop "$c")" + echo "timestamp: $(date -Is)" + } >> "$standalone_report" + + write_container_metadata "$c" "standalone" "standalone" "$(container_image "$c")" "n/a" "n/a" "${container_dir}/metadata.txt" + + stop_container_if_requested "$c" + + cat > "$compose_file" <> "$compose_file" || true + render_network_mode_yaml "$c" >> "$compose_file" || true + render_env_yaml "$c" >> "$compose_file" || true + render_ports_yaml "$c" >> "$compose_file" || true + if [[ "$ENABLE_LOG_CAP" -eq 1 ]]; then + generate_logging_override_block >> "$compose_file" + fi + + inspect_env_for_cert_hints "$c" "$cert_report" + inspect_mounts_for_cert_hints "$c" "$cert_report" + + copy_file "$compose_file" "${DEST_BASE}/standalone/${cname}/compose.generated.yml" || true + copy_file "$standalone_report" "${DEST_BASE}/standalone/${cname}/report.txt" || true + copy_file "$cert_report" "${DEST_BASE}/standalone/${cname}/certificates-report.txt" || true + + copy_file "$standalone_report" "${container_dir}/report.txt" || true + copy_file "$cert_report" "${container_dir}/certificates-report.txt" || true + copy_file "$compose_file" "${container_dir}/compose.generated.yml" || true + + log "Wrote standalone output for: $cname" + done + + cat </ + docker-compose.orig.yml # preserved original, with requested text replacements applied + docker-compose.yml # merged migrated compose + .env + working_dir/ + migration/ + data/ + +Per-container review output: + ${OUTPUT_DIR}/containers//${RUN_ID}/ + +Project compose artifacts: + ${OUTPUT_DIR}/projects// + +EOF +} + +main "$@" diff --git a/admintools/migration/key.txt b/admintools/migration/key.txt new file mode 100644 index 0000000..d19e69e --- /dev/null +++ b/admintools/migration/key.txt @@ -0,0 +1,5 @@ +no named volumes +no hidden container data +everything bind-mounted +everything visible on disk +everything reproducible diff --git a/admintools/migration/mig.csv b/admintools/migration/mig.csv new file mode 100644 index 0000000..afbb6db --- /dev/null +++ b/admintools/migration/mig.csv @@ -0,0 +1,6 @@ +container,include_writable_layer,stop_container +airflow,false,false +kafka,false,false +kafka-ui,false,false +mongodb,false,false +nginx.modsecurity,false,false diff --git a/admintools/migration/readme.md b/admintools/migration/readme.md new file mode 100644 index 0000000..e0f701e --- /dev/null +++ b/admintools/migration/readme.md @@ -0,0 +1,781 @@ +# Docker Migration Script + +This repository contains a Bash-based Docker container migration helper. It inventories Docker containers on a source host, copies container data to a destination layout, and generates Docker Compose migration artifacts that make it easier to recreate the workloads on another host. + +The script is designed for two common cases: + +1. **Docker Compose projects** — containers that were started by Compose and have Compose labels. +2. **Standalone containers** — containers started directly with `docker run` or another non-Compose method. + +It can copy Docker volumes, bind mounts, Compose files, `.env` files, build contexts, and optionally changed files from a container writable layer. + +--- + +## What the script does + +At a high level, the script: + +- Scans all Docker containers on the current machine. +- Groups Compose-managed containers by Compose project. +- Detects standalone containers separately. +- Inspects container metadata, including: + - image + - restart policy + - network mode + - environment variables + - port bindings + - mounts + - Docker Compose labels +- Copies Docker volumes and bind mounts into a new destination directory tree. +- Converts volume mounts into explicit host bind mounts in generated Compose output. +- Optionally copies: + - original Compose files + - `.env` files + - Compose working directories / build contexts + - writable-layer changes from `docker diff` +- Generates migration reports for each project and container. +- Generates a certificate/security hints report by looking for certificate-like paths, filenames, and environment variables. +- Adds a default Docker JSON log cap unless disabled: + - `max-size: 10m` + - `max-file: 3` +- Can perform text replacements across copied text files, useful for changing hostnames, IPs, paths, or domains during migration. +- Can copy data either locally or to a remote destination over SSH using `rsync` or `scp`. + +--- + +## Important safety notes + +This script helps prepare a migration, but it does **not** guarantee that the migrated containers will run without review. + +Before starting the migrated stack, review: + +- generated `docker-compose.yml` +- generated `docker-compose.migration.override.yml` +- copied `.env` files +- `report.txt` +- `certificates-report.txt` +- file ownership and permissions on the destination +- application-specific secrets and certificates +- external networks that may need to be recreated manually + +If `--stop-containers` or `--final-sync-stop` is used, containers are stopped for consistency. The script does **not** restart them automatically afterwards. + +Use `--include-writable-layer` carefully. Writable layers can contain runtime cache, temporary files, logs, secrets, or state that should really live in volumes. + +--- + +## Requirements + +Run the script on the source Docker host. + +Required commands: + +```bash +docker +jq +find +grep +awk +readlink +python3 +file +sed +sha256sum +``` + +For remote transfers: + +```bash +ssh +rsync # when using --transfer rsync +scp # when using --transfer scp +``` + +For generated migrated Compose merging: + +```bash +python3-yaml +``` + +On Debian/Ubuntu: + +```bash +sudo apt-get update +sudo apt-get install -y docker.io jq rsync openssh-client python3 python3-yaml file coreutils sed grep gawk +``` + +You must have permission to inspect Docker containers and read Docker volume data. In practice, run as root or a user in the `docker` group. + +--- + +## Basic usage + +```bash +chmod +x containermig.sh + +./containermig.sh \ + --dest-base /opt/redback/migrated \ + --sync-data +``` + +This performs a local migration into: + +```text +/opt/redback/migrated/ +``` + +and writes local review artifacts under: + +```text +./docker-migration-output/ +``` + +--- + +## Remote migration example + +Copy all detected containers to another host: + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --transfer rsync \ + --sync-data \ + --sync-compose-files \ + --sync-build-context +``` + +This copies data over SSH and writes project directories under: + +```text +/opt/redback/stacks// +``` + +on the destination host. + +--- + +## Recommended full migration command + +For a Compose-heavy Docker host, this is the most useful starting point: + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --transfer rsync \ + --sync-data \ + --sync-compose-files \ + --sync-build-context \ + --final-sync \ + --final-sync-stop \ + --verbose +``` + +This will: + +- copy data +- copy Compose files +- copy build context directories +- stop containers during final sync +- generate final validation reports +- generate merged Compose files where possible + +--- + +## Options + +### Core options + +| Option | Description | +|---|---| +| `--dest-host HOST` | Remote destination host. If omitted, files are copied locally. | +| `--dest-user USER` | SSH username for the destination host. | +| `--dest-base PATH` | Required. Base destination path for migrated projects and standalone containers. | +| `--transfer rsync\|scp` | Transfer method. Default: `rsync`. | +| `--sync-data` | Actually copy files and directories. Without this, copy operations are skipped and the script mainly produces reports/artifacts. | +| `--csv FILE` | Process only containers listed in a CSV file. | +| `--verbose` | Enable debug output. | + +### Migration behavior + +| Option | Description | +|---|---| +| `--include-writable-layer` | Capture changed files from container writable layers using `docker diff` and `docker cp`. | +| `--stop-containers` | Stop selected containers before capturing their data. | +| `--sync-compose-files` | Copy original Compose files and `.env` files. Also automatically enables migrated Compose generation. | +| `--sync-build-context` | Copy the Compose working directory/build context, excluding common heavy/generated directories. | +| `--generate-migrated-compose` | Generate a merged `docker-compose.yml` using the original Compose file plus generated migration override. | +| `--final-sync` | Produce final sync validation reports. | +| `--final-sync-stop` | Stop containers during the final sync phase. | +| `--no-log-cap` | Disable automatic Docker JSON log capping in generated Compose output. | + +### Text replacement options + +| Option | Description | +|---|---| +| `--replace-text OLD=NEW` | Replace text in copied text files. Can be repeated. | +| `--replace-file FILE` | Load replacement pairs from a file, one `OLD=NEW` pair per line. Lines beginning with `#` are ignored. | + +Example: + +```bash +./containermig.sh \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --sync-compose-files \ + --replace-text old.example.com=new.example.com \ + --replace-text 192.168.1.10=10.10.20.10 +``` + +Replacement file example: + +```text +# hostname changes +old.example.com=new.example.com + +# IP changes +192.168.1.10=10.10.20.10 +``` + +Use it with: + +```bash +./containermig.sh \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --sync-compose-files \ + --replace-file replacements.txt +``` + +### SSH options + +| Option | Description | +|---|---| +| `--ssh-key PATH` | SSH private key to use. | +| `--ssh-control-persist DURATION` | SSH ControlPersist value. Default: `10m`. | + +Example: + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --ssh-key ~/.ssh/id_ed25519 \ + --dest-base /opt/redback/stacks \ + --sync-data +``` + +### Output options + +| Option | Description | +|---|---| +| `--output-dir PATH` | Local output directory. Default: `./docker-migration-output`. | + +--- + +## CSV container selection + +Use `--csv FILE` to migrate only selected containers. + +The script skips the first line, so include a header row. + +Expected columns: + +```csv +container,include_writable_layer,stop_container +``` + +Example: + +```csv +container,include_writable_layer,stop_container +nextcloud,1,1 +postgres,0,1 +redis,0,0 +``` + +The `container` field can be a container name or ID. + +Boolean values accepted: + +```text +1, true, yes, y, on +0, false, no, n, off +``` + +CSV values override the global `--include-writable-layer` and `--stop-containers` options for the listed containers. + +Run with: + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --csv containers.csv +``` + +--- + +## Destination layout + +For Compose projects, the destination layout is: + +```text +// + docker-compose.orig.yml + docker-compose.yml + .env + working_dir/ + migration/ + report.txt + certificates-report.txt + docker-compose.migration.override.yml + final-sync-report.txt + data/ + / + volumes/ + binds/ + writable/ +``` + +For standalone containers, the destination layout is: + +```text +/standalone// + compose.generated.yml + report.txt + certificates-report.txt +``` + +Local review output is also written to: + +```text +docker-migration-output/ + migration-inventory.txt + projects/ + / + standalone/ + / + containers/ + / + / + staging/ +``` + +--- + +## Generated Compose behavior + +For Compose projects, the script creates a migration override file: + +```text +docker-compose.migration.override.yml +``` + +This override replaces Docker named volumes and bind mounts with explicit host paths under the destination data directory. + +When `--generate-migrated-compose` is enabled, the script merges: + +```text +docker-compose.orig.yml +docker-compose.migration.override.yml +``` + +into: + +```text +docker-compose.yml +``` + +The merged file: + +- updates services with migrated bind mount paths +- adds log capping unless disabled +- removes unused top-level named volumes where possible + +For standalone containers, the script generates: + +```text +compose.generated.yml +``` + +This includes: + +- image +- restart policy +- network mode +- environment variables +- port bindings +- logging cap, unless disabled + +Standalone generated Compose should be reviewed carefully because not every `docker run` option can be reconstructed from inspection data. + +--- + +## Build context copying + +When `--sync-build-context` is enabled, the script copies the Compose project working directory to: + +```text +//working_dir/ +``` + +The following directories are excluded when using `rsync`: + +```text +.git +.svn +.hg +node_modules +__pycache__ +.venv +venv +.mypy_cache +.pytest_cache +.cache +dist +build +.idea +.vscode +``` + +In `scp` mode, these excludes are not applied; the whole working directory is copied. + +--- + +## Certificate and secret hints + +The script writes a certificate/security hints report: + +```text +certificates-report.txt +``` + +It looks for hints in: + +- environment variable names +- environment variable values +- mount paths +- Compose files +- `.env` files +- certificate-like filenames under the working directory + +Detected hints include terms such as: + +```text +CERT +CERTIFICATE +TLS +SSL +KEY +KEYSTORE +TRUSTSTORE +CA_BUNDLE +CLIENT_CERT +CLIENT_KEY +``` + +and file extensions such as: + +```text +.crt +.cer +.pem +.key +.p12 +.pfx +.jks +.keystore +.csr +.der +``` + +This report is informational only. You still need to review whether secrets should be copied, rotated, or handled through a secrets manager. + +--- + +## Final sync mode + +`--final-sync` writes extra validation information, including source and destination size comparisons for copied mounts. + +Use it with `--final-sync-stop` for a more consistent final copy: + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --sync-compose-files \ + --sync-build-context \ + --final-sync \ + --final-sync-stop +``` + +The final sync report is written to: + +```text +//migration/final-sync-report.txt +``` + +and local review copies are placed under: + +```text +docker-migration-output/ +``` + +--- + +## Typical migration workflow + +### 1. Run an inventory/report-only pass + +```bash +./containermig.sh \ + --dest-base /tmp/docker-migration-review \ + --verbose +``` + +Review: + +```text +docker-migration-output/migration-inventory.txt +docker-migration-output/projects/ +docker-migration-output/standalone/ +docker-migration-output/containers/ +``` + +### 2. Run the real data sync + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --transfer rsync \ + --sync-data \ + --sync-compose-files \ + --sync-build-context +``` + +### 3. Review generated files on the destination + +On the destination host: + +```bash +cd /opt/redback/stacks/ +ls -la +cat migration/report.txt +cat migration/certificates-report.txt +docker compose config +``` + +### 4. Run a final stopped sync + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --transfer rsync \ + --sync-data \ + --sync-compose-files \ + --sync-build-context \ + --final-sync \ + --final-sync-stop +``` + +### 5. Start the migrated stack + +On the destination host: + +```bash +cd /opt/redback/stacks/ +docker compose up -d +docker compose ps +docker compose logs --tail=100 +``` + +--- + +## Troubleshooting + +### `--dest-base is required` + +Always provide a destination base path: + +```bash +--dest-base /opt/redback/stacks +``` + +### `Required command not found: jq` + +Install missing dependencies: + +```bash +sudo apt-get install -y jq +``` + +### `PyYAML is required for --generate-migrated-compose` + +Install Python YAML support: + +```bash +sudo apt-get install -y python3-yaml +``` + +### Remote SSH copy fails + +Test SSH manually: + +```bash +ssh user@host 'hostname && mkdir -p /tmp/migration-test' +``` + +If using a key: + +```bash +ssh -i ~/.ssh/id_ed25519 user@host 'hostname' +``` + +Then retry with: + +```bash +--ssh-key ~/.ssh/id_ed25519 +``` + +### Generated Compose references missing paths + +Check whether `--sync-data` was used. Without `--sync-data`, the script does not actually copy directories or files. + +### Containers were stopped and not restarted + +This is expected. Restart source containers manually if needed: + +```bash +docker start +``` + +or for Compose projects: + +```bash +docker compose up -d +``` + +### `scp` copied too much build context + +Use the default `rsync` mode if you want the build context excludes to apply: + +```bash +--transfer rsync +``` + +--- + +## Example: migrate one Compose project by selecting containers + +Create `containers.csv`: + +```csv +container,include_writable_layer,stop_container +nextcloud-app,0,1 +nextcloud-db,0,1 +nextcloud-redis,0,1 +``` + +Run: + +```bash +./containermig.sh \ + --dest-host new-docker-host.example.com \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --transfer rsync \ + --sync-data \ + --sync-compose-files \ + --sync-build-context \ + --final-sync \ + --final-sync-stop \ + --csv containers.csv +``` + +--- + +## Example: local migration with hostname replacement + +```bash +./containermig.sh \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --sync-compose-files \ + --sync-build-context \ + --replace-text old-hostname.local=new-hostname.local \ + --replace-text 192.168.1.50=10.0.20.50 +``` + +--- + +## Notes and limitations + +- The script relies heavily on Docker inspection metadata. +- It cannot perfectly reconstruct every option used in an original `docker run` command. +- Compose-managed containers produce better migration output than standalone containers. +- The script does not recreate external Docker networks automatically. +- The script does not recreate Docker secrets or Swarm-specific configuration. +- The script does not restart containers it stops. +- Some applications require clean shutdowns or application-level backup/restore instead of filesystem copying. +- Database containers should ideally be migrated using database-native dump/restore or stopped during final sync. +- File ownership, SELinux labels, AppArmor profiles, and custom capabilities may need manual review. +- Writable-layer capture is a recovery aid, not a replacement for proper volume design. + +--- + +## Quick command reference + +```bash +# Help +./containermig.sh --help + +# Local migration +./containermig.sh --dest-base /opt/redback/stacks --sync-data + +# Remote migration with rsync +./containermig.sh \ + --dest-host new-host \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --sync-data + +# Remote migration with Compose files and build context +./containermig.sh \ + --dest-host new-host \ + --dest-user root \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --sync-compose-files \ + --sync-build-context + +# Select containers by CSV +./containermig.sh \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --csv containers.csv + +# Capture writable layer changes +./containermig.sh \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --include-writable-layer + +# Disable generated Docker log caps +./containermig.sh \ + --dest-base /opt/redback/stacks \ + --sync-data \ + --no-log-cap +``` diff --git a/admintools/migration/replacements.txt b/admintools/migration/replacements.txt new file mode 100644 index 0000000..ce440ff --- /dev/null +++ b/admintools/migration/replacements.txt @@ -0,0 +1 @@ +redback.it.deakin.edu.au=10.137.17.254 diff --git a/admintools/migration/trialmigrate.sh b/admintools/migration/trialmigrate.sh new file mode 100644 index 0000000..96269a9 --- /dev/null +++ b/admintools/migration/trialmigrate.sh @@ -0,0 +1 @@ + ./containermig.sh --dest-host 10.137.17.254 --dest-user ejb --dest-base /opt/redback/ejb --sync-data --csv mig.csv --include-writable-layer --sync-compose-files --sync-build-context --verbose --replace-file replacements.txt