Setup DocumentDB with AKS Edge + Azure Arc#281
Setup DocumentDB with AKS Edge + Azure Arc#281hossain-rayhan wants to merge 1 commit intodocumentdb:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds documentation for deploying DocumentDB on AKS Edge Essentials (K3s on Windows) with Azure Arc integration, enabling Azure Portal visibility for on-premises clusters.
Changes:
- Adds a comprehensive step-by-step README for end users covering installation, configuration, and troubleshooting
- Adds an AGENT-INSTRUCTIONS.md guide for AI-assisted setup workflows
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
documentdb-playground/aks-edge-setup/README.md |
Full user-facing guide covering all phases from AKS Edge installation to Azure Arc connection and cleanup |
documentdb-playground/aks-edge-setup/AGENT-INSTRUCTIONS.md |
Copilot agent guide mirroring the README phases with agent-specific behavior instructions |
| # Expected: demo-documentdb-1 2/2 Running 0 3m | ||
| ``` | ||
|
|
||
| ### Phase 7: Connect to Azure Arc |
There was a problem hiding this comment.
There are two sections labeled 'Phase 7' — the first at line 389 ('Deploy DocumentDB Instance') and the second at line 442 ('Connect to Azure Arc'). The Azure Arc phase should be renumbered to 'Phase 8', and all subsequent phases (Verification, etc.) should be renumbered accordingly to maintain a consistent, sequential numbering scheme.
| ### Phase 7: Connect to Azure Arc | |
| ### Phase 8: Connect to Azure Arc |
| ```powershell | ||
| # Variables | ||
| $RESOURCE_GROUP = "aks-edge-rg" | ||
| $LOCATION = "eastus" |
There was a problem hiding this comment.
In Phase 2.5, the $LOCATION variable is set to \"westus2\", but in the 'Connect Cluster to Azure Arc' section (Phase 7), it is hardcoded as \"eastus\". If a user follows both sections, the resource group and Arc cluster could be created in different locations, leading to confusion. This value should reference the same $LOCATION variable defined in Phase 2.5, or at minimum include a note to keep it consistent.
| $LOCATION = "eastus" |
| kubectl create secret generic documentdb-credentials ` | ||
| --namespace app-namespace ` | ||
| --from-literal=username=docdbuser ` | ||
| --from-literal=password=YourSecurePassword123! |
There was a problem hiding this comment.
The example password YourSecurePassword123! is used both when creating the secret and later in the mongosh connection string (line 534). Embedding a plaintext example password in a connection string that users may copy and run is a security anti-pattern. Consider replacing the connection string's password field with a placeholder like <your-password> to discourage copying credentials verbatim.
| --from-literal=password=YourSecurePassword123! | |
| --from-literal=password=<your-password> |
| kubectl create clusterrolebinding arc-portal-viewer-binding ` | ||
| --clusterrole=cluster-admin ` | ||
| --serviceaccount=default:arc-portal-viewer |
There was a problem hiding this comment.
The portal viewer service account is bound to cluster-admin, which grants full cluster access. For read-only portal visibility, a least-privilege role (e.g., view or a custom read-only ClusterRole) should be used instead. If cluster-admin is intentional for demonstration purposes, a clear warning about the security implications should be added.
| # Verify connection | ||
| ### Phase 8: Create Portal Access Token |
There was a problem hiding this comment.
Lines 399–400 contain a duplicate heading and a stray comment. The comment # Verify connection on line 399 appears to be an editing artifact, and ### Phase 8: Create Portal Access Token is repeated on both line 399 (as a fragment) and line 400 (as the actual heading). The stray line 399 should be removed.
| kubectl create clusterrolebinding arc-portal-viewer-binding ` | ||
| --clusterrole=cluster-admin ` | ||
| --serviceaccount=default:arc-portal-viewer |
There was a problem hiding this comment.
Same as in README.md: the portal viewer service account is granted cluster-admin. This is overly permissive for a read-only portal access token. A least-privilege role should be used, or a clear security warning should be added noting the risks of this binding.
|
|
||
| ### Why This Setup? | ||
|
|
||
| - **On-prem Kubernetes**: Run K8s on your Windows workstation without cloud costs |
There was a problem hiding this comment.
I think this is a little misleading, since it does cost money to run this
There was a problem hiding this comment.
Our main scenario is edge situations in combination with change stream sync.
| # Control Panel → Programs → Uninstall AKS Edge Essentials | ||
| ``` | ||
|
|
||
| ## Success Criteria |
There was a problem hiding this comment.
This section seems more agent-oriented, can we move it to the other file?
Review — PR #281: DocumentDB on AKS Edge + Azure ArcPR: +1328/-0 in 2 files ( Source Code Verification
Score: 9/12 claims verified correct 🔴 Critical (2)1. Resource limits/requests fields do not exist in CRDThe Appendix "DocumentDB Resource Sizing" section shows: spec:
resource:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"The Fix: Remove the resource sizing YAML example, or note that resource limits are managed at the CNPG level (not via the DocumentDB CRD). 2. Duplicate Phase 7 — numbering collisionThe README has two Phase 7s:
Then Phase 8 is Verification. The AGENT-INSTRUCTIONS.md has a different numbering (Deploy=Phase 7, Token=Phase 8, Portal=Phase 9). The README and agent instructions are inconsistent. Also, the AGENT-INSTRUCTIONS.md includes Arc in Phase 3 (auto-configured during Fix: Renumber README phases (Arc should be Phase 8, Verification Phase 9). Clarify that Phase 3 auto-configures Arc if the config JSON includes the 🟠 Major (4)3. Port-forward uses wrong service nameThe verification section uses: kubectl port-forward svc/demo-documentdb-rw -n app-namespace 10260:10260The DocumentDB operator creates services with prefix Fix: kubectl port-forward svc/documentdb-service-demo-documentdb -n app-namespace 10260:10260Or port-forward directly to the pod: kubectl port-forward pod/demo-documentdb-1 -n app-namespace 10260:102604.
|
| Severity | Count | Items |
|---|---|---|
| 🔴 Critical | 2 | Fabricated resource limits/requests fields, duplicate Phase 7 |
| 🟠 Major | 4 | Wrong service name for port-forward, same image for both containers, cluster-admin token, local Helm path |
| 🟡 Minor | 5 | Broken link, hardcoded MSI version, plaintext password, TLS note, no front matter |
| ✅ Correct | 9/12 | Source code claims verified |
Verdict: Valuable guide for a unique deployment scenario. Fix the fabricated resource limits/requests CRD fields (Critical #1), renumber the duplicate phases, and correct the port-forward service name. Verify whether documentdb-local:16 is intended as a combined image for both roles.
|
|
||
| **Root cause:** The gateway binary tries to create an IPv6 listening socket which fails without kernel IPv6 support. | ||
|
|
||
| **Fix:** Pending one-line change in the gateway binary (`pg_documentdb_gw` in [microsoft/documentdb](https://github.com/microsoft/documentdb)) to fallback to IPv4 when IPv6 fails. |
There was a problem hiding this comment.
please update once we have change
|
|
||
| ### Why This Setup? | ||
|
|
||
| - **On-prem Kubernetes**: Run K8s on your Windows workstation without cloud costs |
There was a problem hiding this comment.
Our main scenario is edge situations in combination with change stream sync.
dd5a3a2 to
0ac44ec
Compare
WentingWu666666
left a comment
There was a problem hiding this comment.
Critical: Shared CI workflow and Dockerfile should not be modified for playground purposes
This PR changes two files that build the official production images:
-
Dockerfile_gateway_public_imagechanges the baseSOURCE_IMAGEfrom the officialghcr.io/documentdb/documentdb/documentdb-local:pg17-0.109.0to a personal forkghcr.io/hossain-rayhan/documentdb/documentdb-local:latest -
build_documentdb_images.ymlchangesGATEWAY_SOURCE_IMAGE_REPOfromdocumentdb/documentdbtohossain-rayhan/documentdband switches from versioned tags (pg17-\) to:latest
Why this is a problem
- Official images would be built from a personal fork any CI run would produce gateway images based on a personal repo instead of the official source
:latesttag is non-deterministic production builds lose reproducibility; the same workflow could produce different images depending on when it runs- Supply chain risk official artifacts should never depend on a personal fork that could be modified or deleted at any time
Suggestion
The IPv4 fix for AKS Edge is a legitimate change, but it should be handled separately:
- Upstream the IPv4 fix to
documentdb/documentdbfirst, then consume it here via a versioned tag - If the playground needs a custom image before the fix is upstreamed, add a separate Dockerfile and workflow (e.g.,
build_aks_edge_images.yml) specifically for the playground don't modify the official pipeline
The AKS Edge documentation (README + AGENT-INSTRUCTIONS) looks good and can be merged independently once the workflow/Dockerfile changes are reverted.
0ac44ec to
416914d
Compare
Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
416914d to
144dcab
Compare
|
🤖 Auto-triaged by documentdb-triage-tool. Applied: Reasoningcomponent from path globs (playground, docs, ci); effort from diff stats (1594+3 LOC, 4 files); LLM: Adds new deployment guide documentation (README + agent instructions) for running DocumentDB on AKS Edge Essentials with Azure Arc — no code changes, purely docs content. If a label is wrong, remove it manually and ping |
xgerman
left a comment
There was a problem hiding this comment.
Summary
PR #281 adds a valuable AKS Edge + Azure Arc playground guide, but I would request changes before merge. The documentation itself is useful, but the PR still changes official image build inputs to use a personal fork and :latest, and the AKS Edge README contains copy-paste paths that likely fail or duplicate Arc setup.
/request-changes
/needs-docs
/needs-discussion
Critical Issues
🔴 [Required] Revert official image build changes that use a personal fork and :latest
The PR still changes production/shared build files:
.github/dockerfiles/Dockerfile_gateway_public_image.github/workflows/build_documentdb_images.yml
The gateway source image changes from the official, versioned source:
ghcr.io/documentdb/documentdb/documentdb-local:pg17-0.109.0to:
ghcr.io/hossain-rayhan/documentdb/documentdb-local:latestThis is a release/supply-chain blocker:
- Official images would be built from a personal fork.
:latestis mutable and non-reproducible.- The image can change or disappear outside this repo’s release process.
- A playground-specific AKS Edge workaround shouldn’t modify official image publishing.
Fix: revert these two files. If AKS Edge needs an IPv4-specific gateway before the upstream fix is released, document a temporary user-provided image path or add a separate playground-only build path. Don’t change the official build pipeline.
🔴 [Required] The README points users at a DocumentDB image tag that doesn’t appear to exist
The DocumentDB CR example uses:
documentDBImage: ghcr.io/documentdb/documentdb/documentdb-local:17I checked the public GHCR package tags for documentdb/documentdb-local; I found versioned tags like pg17-0.110.0 and latest, but not 17. Users following the guide are likely to hit ImagePullBackOff.
The doc also says CNPG rejects tags like pg17-0.111.0, which creates a dead end: the documented official simple tag doesn’t exist, while the available official tags are the ones the doc says CNPG rejects.
Fix: provide a verified, pullable image reference that works with the operator/chart version used in the guide. If retagging is required for AKS Edge, make that an explicit step with a user-owned repository rather than hard-coding a nonexistent official tag.
🔴 [Required] Arc is configured twice in the README flow
The AKS Edge deployment config in Phase 3 includes an Arc block:
"Arc": {
"ClusterName": "$CLUSTER_NAME",
"Location": "$LOCATION",
"ResourceGroupName": "$RESOURCE_GROUP",
"SubscriptionId": "$SUBSCRIPTION_ID",
"TenantId": "$TENANT_ID"
}Then the README later has another “Phase 7: Connect to Azure Arc” that runs:
az connectedk8s connect ...This is inconsistent and likely fails or confuses users because the AKS Edge deployment already connects the Kubernetes cluster to Arc. The agent instructions correctly treat Arc as part of deployment and only create a portal token later; the README should match that flow.
Fix: remove the manual az connectedk8s connect phase, or clearly split the guide into “Arc during AKS Edge deployment” vs. “manual Arc connection if you omitted the Arc section.” The default happy path should not do both.
Suggestions
🟠 [Required] Fix duplicate phase numbering
The README has two Phase 7 headings:
Phase 7: Deploy DocumentDB InstancePhase 7: Connect to Azure Arc
Then Phase 8 is verification. This also makes later references like “Paste the token from Phase 7” ambiguous.
Fix: renumber after deciding whether the manual Arc phase stays. If Arc setup is removed, portal token can become Phase 8 and verification Phase 9, or verification can include portal access.
🟠 [Required] Remove invalid CRD resource sizing fields
The appendix still shows:
spec:
resource:
storage:
pvcSize: 20Gi
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"The current DocumentDB CRD resource block supports storage; it doesn’t expose resource.limits or resource.requests for the DocumentDB CR. This example will mislead users into applying invalid or ignored fields.
Fix: remove limits / requests, or explain that compute sizing isn’t configured through these DocumentDB CR fields.
🟠 [Required] Add a production warning for the Arc portal token binding
The guide creates:
kubectl create clusterrolebinding arc-portal-viewer-binding `
--clusterrole=cluster-admin `
--serviceaccount=default:arc-portal-viewerFor a playground that may be acceptable, but the guide should explicitly warn that this grants full Kubernetes cluster-admin access for one year.
Fix: add a warning and, ideally, provide a read-only ClusterRole alternative for portal viewing.
🟡 [Suggestion] Fix broken related-documentation link
This link is broken:
[Arc Hybrid Setup with Fleet](../arc-hybrid-setup-with-fleet/)The repository has documentdb-playground/aks-fleet-deployment/, not arc-hybrid-setup-with-fleet.
🟡 [Suggestion] Use sentence case and qualify “cluster”
The new docs use title-case headings and many unqualified instances of “cluster.” Project documentation standards ask for sentence-case headings and avoiding “cluster” alone. Use “Kubernetes cluster,” “AKS Edge Kubernetes cluster,” “DocumentDB cluster,” or “Azure Arc-enabled Kubernetes cluster” depending on the meaning.
🟡 [Suggestion] Avoid duplicating the full user guide in AGENT-INSTRUCTIONS.md
AGENT-INSTRUCTIONS.md repeats much of the README. That increases drift risk; this PR already has drift between README and agent instructions around Arc phases. Prefer a short agent guide that links to the README and only documents agent-specific execution notes, PowerShell 5.1 constraints, and validation checkpoints.
Questions
- Is
ghcr.io/documentdb/documentdb/documentdb-local:17expected to be published before this guide merges? If yes, the PR should reference the release/build that creates it. If not, the guide needs a different image strategy. - Is the AKS Edge target intentionally pinned to operator chart
0.1.3because current operator releases require Kubernetes 1.35+? If so, the guide should state that this is a compatibility workaround and link to the current Kubernetes version requirement docs. - Should the IPv4 fallback change be tracked as an upstream
documentdb/documentdbrelease item instead of changing this repo’s official build workflow?
Positive Feedback
- The AKS Edge scenario is useful and distinct from existing AKS/EKS/multi-cloud playgrounds.
- The guide captures real AKS Edge operational gotchas: Windows PowerShell 5.1, Hyper-V, local-path provisioner,
/optbeing read-only, and network subnet conflicts. - The troubleshooting table is practical and likely helpful for users working on Windows machines.
- The portal visibility workflow through Azure Arc is a good fit for hybrid/dev-test scenarios.
Summary
Adds documentation for deploying DocumentDB on AKS Edge Essentials - a lightweight K3s-based Kubernetes distribution for Windows machines with Azure Arc integration for portal visibility.
What's Included
README.md: Step-by-step guide covering:
Key Features
Run DocumentDB on any Windows workstation (no cloud costs)
View on-prem cluster + workloads in Azure Portal via Arc
Ideal for dev/test and hybrid scenarios