Summary
In v3, ModelBuilder(mode=Mode.LOCAL_CONTAINER).deploy_local() calls docker.client.images.pull(image) directly with no auth handshake. When the target image lives in an AWS Deep Learning Containers ECR account (763104351884.dkr.ecr.<region>.amazonaws.com/...) — i.e. every SageMaker-provided container — the Docker daemon has no credentials, the pull fails, and the subsequent inspect_image returns 404, surfacing as:
ValueError: Could not find image '763104351884.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:24.09-py3' in repository
This is a regression from v2 LocalSession/local.image, which performed aws ecr get-login before pulling.
Reproduction
from sagemaker.serve import ModelBuilder, Mode, ModelServer
builder = ModelBuilder(
image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:24.09-py3",
s3_model_data_url="s3://.../model.tar.gz",
role_arn=role,
model_server=ModelServer.TRITON,
mode=Mode.LOCAL_CONTAINER,
)
builder.build(model_name="x")
builder.deploy_local(endpoint_name="x", wait=True)
On a host where Docker has not previously authenticated to the DLC ECR account (e.g. a fresh SageMaker Notebook instance with Docker installed but unused), the pull fails.
Observed
HTTPError: 404 Client Error: Not Found for url:
http+docker://localhost/v1.44/images/763104351884.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:24.09-py3/json
...
ImageNotFound: 404 Client Error ... ("No such image: ...")
...
ValueError: Could not find image '...' in repository
The images.pull() call upstream silently failed (it returns a 404-on-inspect rather than raising for the actual pull-time auth error), so the user-facing error blames "image not found" when the real cause is "Docker daemon was never told how to authenticate to this ECR registry."
Expected
Either:
deploy_local() performs an aws ecr get-login-password | docker login for any ECR-flavored image_uri before calling images.pull(), mirroring v2 behavior; or
deploy_local() raises a clear, actionable error explaining the user must pre-authenticate Docker to ECR, with the exact command shown.
Code reference
sagemaker/serve/mode/local_container_mode.py:270-278:
# Pull the image
try:
logger.info("Pulling image %s from repository...", image)
self.client.images.pull(image)
logger.info("Successfully pulled image %s", image)
except docker.errors.NotFound as e:
raise ValueError(f"Could not find image '{image}' in repository") from e
except docker.errors.APIError as e:
raise RuntimeError(f"Failed to pull image '{image}': {e}") from e
No auth_config= passed to images.pull(), no ECR token retrieval, no detection of *.dkr.ecr.*.amazonaws.com hostnames.
Workaround
Pre-authenticate Docker and pre-pull the image before calling deploy_local():
aws ecr get-login-password --region <region> \
| docker login --username AWS --password-stdin 763104351884.dkr.ecr.<region>.amazonaws.com
docker pull 763104351884.dkr.ecr.<region>.amazonaws.com/sagemaker-tritonserver:24.09-py3
Once the image is in the local Docker cache, the broken images.pull() call becomes effectively a no-op and deploy_local() proceeds.
Severity
Medium. Functionally blocks v3 local mode for any AWS-published container image out of the box, which is the most common image source for users. Workaround is mechanical but undocumented in the v3 inference docs.
Suggestion
Port the _ecr_login_if_needed helper from v2's sagemaker.local.image (which detects ECR hostnames and runs the login automatically) and call it from local_container_mode.py:_pull_image() before images.pull().
Environment
- OS: Linux 6.1.170-210.320.amzn2023.x86_64 (Amazon Linux 2023)
- Host: SageMaker Notebook instance (BaseNotebookInstanceEc2InstanceRole)
- Python: 3.10 (
/home/ec2-user/anaconda3/envs/python3/bin/python)
- Kernel:
conda_python3
- sagemaker: 3.12.0
- sagemaker-core: 2.12.0
- sagemaker-serve: 1.12.0
- sagemaker-train: 1.12.0
- sagemaker-mlops: 1.12.0
- docker (client): 25.0.14
- Region: us-east-1
Summary
In v3,
ModelBuilder(mode=Mode.LOCAL_CONTAINER).deploy_local()callsdocker.client.images.pull(image)directly with no auth handshake. When the target image lives in an AWS Deep Learning Containers ECR account (763104351884.dkr.ecr.<region>.amazonaws.com/...) — i.e. every SageMaker-provided container — the Docker daemon has no credentials, the pull fails, and the subsequentinspect_imagereturns 404, surfacing as:This is a regression from v2
LocalSession/local.image, which performedaws ecr get-loginbefore pulling.Reproduction
On a host where Docker has not previously authenticated to the DLC ECR account (e.g. a fresh SageMaker Notebook instance with Docker installed but unused), the pull fails.
Observed
The
images.pull()call upstream silently failed (it returns a 404-on-inspect rather than raising for the actual pull-time auth error), so the user-facing error blames "image not found" when the real cause is "Docker daemon was never told how to authenticate to this ECR registry."Expected
Either:
deploy_local()performs anaws ecr get-login-password | docker loginfor any ECR-flavoredimage_uribefore callingimages.pull(), mirroring v2 behavior; ordeploy_local()raises a clear, actionable error explaining the user must pre-authenticate Docker to ECR, with the exact command shown.Code reference
sagemaker/serve/mode/local_container_mode.py:270-278:No
auth_config=passed toimages.pull(), no ECR token retrieval, no detection of*.dkr.ecr.*.amazonaws.comhostnames.Workaround
Pre-authenticate Docker and pre-pull the image before calling
deploy_local():Once the image is in the local Docker cache, the broken
images.pull()call becomes effectively a no-op anddeploy_local()proceeds.Severity
Medium. Functionally blocks v3 local mode for any AWS-published container image out of the box, which is the most common image source for users. Workaround is mechanical but undocumented in the v3 inference docs.
Suggestion
Port the
_ecr_login_if_neededhelper from v2'ssagemaker.local.image(which detects ECR hostnames and runs the login automatically) and call it fromlocal_container_mode.py:_pull_image()beforeimages.pull().Environment
/home/ec2-user/anaconda3/envs/python3/bin/python)conda_python3