fix(ds-identify): parse multi-line YAML datasource_list by cjp256 · Pull Request #6883 · canonical/cloud-init

cjp256 · 2026-05-14T13:14:17Z

Multi-line YAML lists are valid YAML. The prior code assumed that multi-line datasource_list values could not be parsed, but both multi-line flow sequences and block sequences are standard YAML:

multi-line flow sequence

datasource_list: [
Azure
]

block sequence

datasource_list:
- Azure
- None

The existing read_datasource_list() only handled single-line flow sequences (e.g. 'datasource_list: [Azure, None]'). When a config file contains a multi-line flow sequence, the parser saw only '[' without the closing ']', produced an empty dslist, and fell back to the full default datasource list. On Azure VMs this meant ds-identify checked every datasource unnecessarily.

Add read_datasource_list_multiline() to handle both multi-line flow sequences and block sequences as a fallback when single-line parsing produces no result. Also tighten get_single_line_flow_sequence() to reject values that lack a closing ']', so incomplete flow sequences fall through to the multi-line parser correctly.

An alternative approach would be to shell out to python3 for YAML parsing (e.g. python3 -c 'import yaml; ...'), since python3 is a hard dependency of cloud-init and is guaranteed to be present. This would handle every valid YAML construct correctly with zero edge cases and much less code. However, ds-identify is designed as a pure POSIX shell script for speed — it runs as a systemd generator early in boot where startup latency matters, and on systems where cloud-init should be disabled it avoids the cost of a python3 interpreter entirely. The shell-based approach preserves that property while covering the formats seen in practice. We could maybe do one and attempt to fall back to the other?

When datasource_list is present in config but cannot be parsed by any method, ds-identify now emits a distinct error indicating the key was found but unparsable. I think maybe this error needs to go to console or otherwise be detected by cloud-init later and escalated as part of status, etc.

Multi-line YAML lists are valid YAML. The prior code assumed that multi-line datasource_list values could not be parsed, but both multi-line flow sequences and block sequences are standard YAML: # multi-line flow sequence datasource_list: [ Azure ] # block sequence datasource_list: - Azure - None The existing read_datasource_list() only handled single-line flow sequences (e.g. 'datasource_list: [Azure, None]'). When a config file contains a multi-line flow sequence, the parser saw only '[' without the closing ']', produced an empty dslist, and fell back to the full default datasource list. On Azure VMs this meant ds-identify checked every datasource unnecessarily. Add read_datasource_list_multiline() to handle both multi-line flow sequences and block sequences as a fallback when single-line parsing produces no result. Also tighten get_single_line_flow_sequence() to reject values that lack a closing ']', so incomplete flow sequences fall through to the multi-line parser correctly. An alternative approach would be to shell out to python3 for YAML parsing (e.g. python3 -c 'import yaml; ...'), since python3 is a hard dependency of cloud-init and is guaranteed to be present. This would handle every valid YAML construct correctly with zero edge cases and much less code. However, ds-identify is designed as a pure POSIX shell script for speed — it runs as a systemd generator early in boot where startup latency matters, and on systems where cloud-init should be disabled it avoids the cost of a python3 interpreter entirely. The shell-based approach preserves that property while covering the formats seen in practice. We could maybe do one and attempt to fall back to the other? When datasource_list is present in config but cannot be parsed by any method, ds-identify now emits a distinct error indicating the key was found but unparsable. I think maybe this error needs to go to console or otherwise be detected by cloud-init later and escalated as part of status, etc.

bpryan99 · 2026-05-22T16:56:15Z

+    local files="" f="" line="" val="" mode="" dslist="" found_file=""
+    local flow_content=""
+    files="${PATH_ETC_CI_CFG} ${PATH_ETC_CI_CFG_D}/*.cfg"
+    # shellcheck disable=2086


Could you explain the need to disable SC2086? I see that the files var contains a string that expands two constants with a space in between them. Then, on line 669, files are set without quoting $files, so the files var will expand into two strings based on the space in the $files string acting as an implicit delimiter. Is this intended behavior, and if so, does $PATH_ETC_CI_CFG contain (or not possess) a file extension? It would seem that this is intended and avoids the need to iterate over a list object for each file name when setting the files, but I want to be sure.

bpryan99 · 2026-05-22T17:09:06Z

+        mode=""
+        while IFS= read -r line || [ -n "$line" ]; do
+            line="${line%%#*}"
+            if [ "$mode" = "flow" ]; then


There's been a lot of implicit use of conditionals with test ([ condition ]) commands, but here, we explicitly set the if/then condition. Is there a reason for mixing and matching implicit and explicit conditional use? My preference is explicit for human readability, but ultimately, this should probably stay consistent with whatever the other files favor unless there is a reason to mix the two.

cjp256 mentioned this pull request May 14, 2026

ds-identify: false positive EC2 detection on non-EC2 platforms due to UUID collision in ec2_identify_platform() #6880

Open

cjp256 force-pushed the ds-identify-multiline branch from 663ee0f to 5800d1d Compare May 14, 2026 13:30

disable some string lints for readability

7fd93a5

bpryan99 reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ds-identify): parse multi-line YAML datasource_list#6883

fix(ds-identify): parse multi-line YAML datasource_list#6883
cjp256 wants to merge 2 commits into
canonical:mainfrom
cjp256:ds-identify-multiline

cjp256 commented May 14, 2026 •

edited

Loading

Uh oh!

bpryan99 May 22, 2026

Uh oh!

bpryan99 May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cjp256 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

multi-line flow sequence

block sequence

Uh oh!

bpryan99 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

bpryan99 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjp256 commented May 14, 2026 •

edited

Loading