Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions plugins/life-science-research/skills/encode-skill/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ description: Submit compact ENCODE REST API requests for object lookups, portal-
- Optional fields: `method`, `params`, `headers`, `json_body`, `form_body`, `record_path`, `response_format`, `max_items`, `max_depth`, `timeout_sec`, `save_raw`, `raw_output_path`
- Common ENCODE patterns:
- `{"base_url":"https://www.encodeproject.org","path":"biosamples/ENCBS000AAA/","params":{"frame":"object","format":"json"},"headers":{"Accept":"application/json"}}`
- `{"base_url":"https://www.encodeproject.org","path":"search/","params":{"type":"Experiment","assay_title":"RNA-seq","limit":10,"format":"json"},"record_path":"@graph","headers":{"Accept":"application/json"},"max_items":10}`
- `{"base_url":"https://www.encodeproject.org","path":"search/","params":{"type":"Experiment","assay_term_name":"RNA-seq","limit":10,"format":"json"},"record_path":"@graph","headers":{"Accept":"application/json"},"max_items":10}`

## Output
- Success returns `ok`, `source`, `path`, `method`, `status_code`, `warnings`, and either compact `records` or a compact `summary`.
Expand All @@ -32,7 +32,7 @@ description: Submit compact ENCODE REST API requests for object lookups, portal-

## Execution
```bash
echo '{"base_url":"https://www.encodeproject.org","path":"search/","params":{"type":"Experiment","assay_title":"RNA-seq","limit":10,"format":"json"},"record_path":"@graph","headers":{"Accept":"application/json"},"max_items":10}' | python scripts/rest_request.py
echo '{"base_url":"https://www.encodeproject.org","path":"search/","params":{"type":"Experiment","assay_term_name":"RNA-seq","limit":10,"format":"json"},"record_path":"@graph","headers":{"Accept":"application/json"},"max_items":10}' | python scripts/rest_request.py
```

## References
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@ description: Submit compact eQTL Catalogue API requests for association retrieva
## Execution behavior
- Return concise markdown summaries from the script JSON by default.
- Return raw JSON only if the user explicitly asks for machine-readable output.
- Prefer documented association paths such as `genes/<gene_id>/associations`, `studies/<study>/associations`, or tissue/study-scoped association routes with explicit filters, and surface upstream `400`/`500` errors verbatim when they occur.
- Prefer documented versioned paths such as `v3/studies`, `v3/associations`, `v3/studies/<study_id>/associations`, or legacy `v1/.../associations` routes with explicit filters, and surface upstream `400`/`500` errors verbatim when they occur.

## Input
- Read one JSON object from stdin.
- Required fields: `base_url`, `path`
- Optional fields: `method`, `params`, `headers`, `json_body`, `form_body`, `record_path`, `response_format`, `max_items`, `max_depth`, `timeout_sec`, `save_raw`, `raw_output_path`
- Common eQTL Catalogue patterns:
- `{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"genes/ENSG00000141510/associations","params":{"study":"<study>","tissue":"<tissue_ontology_id>","variant_id":"rs7903146","size":10},"max_items":10}`
- `{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"studies/<study>/associations","params":{"tissue":"<tissue_ontology_id>","variant_id":"rs7903146","size":10},"max_items":10}`
- `{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"associations/rs7903146","params":{"size":10},"max_items":10}`
- `{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"v3/studies","max_items":10}`
- `{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"v3/associations","params":{"gene_id":"ENSG00000141510","rsid":"rs7903146","size":10},"max_items":10}`
- `{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"v1/genes/ENSG00000141510/associations","params":{"variant_id":"rs7903146","size":10},"max_items":10}`

## Output
- Success returns `ok`, `source`, `path`, `method`, `status_code`, `warnings`, and either compact `records` or a compact `summary`.
Expand All @@ -33,7 +33,7 @@ description: Submit compact eQTL Catalogue API requests for association retrieva

## Execution
```bash
echo '{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"genes/ENSG00000141510/associations","params":{"study":"<study>","tissue":"<tissue_ontology_id>","variant_id":"rs7903146","size":10},"max_items":10}' | python scripts/rest_request.py
echo '{"base_url":"https://www.ebi.ac.uk/eqtl/api","path":"v3/studies","max_items":10}' | python scripts/rest_request.py
```

## References
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Provide at least one anchor source:
- Primary entrypoint: `scripts/map_locus_to_gene.py`
- This script:
- resolves trait/EFO and anchor variants,
- resolves seed and anchor rsID coordinates directly through NCBI RefSNP/dbSNP placements,
- gathers locus-to-gene evidence through the chained skills,
- writes mapping JSON and summary markdown,
- optionally renders figures when plotting deps are available.
Expand Down Expand Up @@ -117,8 +118,8 @@ Use these skills in order. Skip only when an earlier step is not needed by provi
2. `gwas-catalog-skill`
- Discover anchor variants for the trait/EFO scope.
- Pull association/study metadata for locus context.
3. `variant-coordinate-finder-skill`
- Normalize each anchor to rsID plus GRCh37/GRCh38 coordinates.
3. Built-in NCBI RefSNP coordinate resolution
- Normalize each anchor rsID to GRCh37/GRCh38 top-level chromosome placements.
4. `opentargets-skill`
- Retrieve credible set context, L2G predictions, and colocalisation evidence per locus.
5. `gtex-eqtl-skill`
Expand Down Expand Up @@ -312,6 +313,7 @@ Confidence label:
Fail the run when any of the following occurs:

- No anchors after normalization.
- Unresolved GRCh38 coordinates should be surfaced as `status=degraded`, not treated as an analytically clean pass.
- Any locus has candidate genes without score fields.
- `overall_score` outside `0..1`.
- Summary section order mismatch.
Expand All @@ -332,7 +334,8 @@ Return:
"mapping_output_path": "./output/locus_to_gene_mapping.json",
"summary_output_path": "./output/locus_to_gene_summary.md",
"figure_paths": [],
"warnings": []
"warnings": [],
"limitations": []
}
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,10 @@
REFSNP_BASE = "https://api.ncbi.nlm.nih.gov/variation/v0/beta/refsnp"

DEFAULT_LOCUS_PADDING_BP = 1_000_000
REFSEQ_CHROMOSOMES = {f"NC_{i:06d}": str(i) for i in range(1, 23)}
REFSEQ_CHROMOSOMES.update({"NC_000023": "X", "NC_000024": "Y", "NC_012920": "MT"})

REPO_ROOT = Path(__file__).resolve().parents[2]
VARIANT_COORDINATE_FINDER_SCRIPT = (
REPO_ROOT / "variant-coordinate-finder-skill" / "scripts" / "variant_coordinate_finder.py"
)
GTEX_EQTL_SCRIPT = REPO_ROOT / "gtex-eqtl-skill" / "scripts" / "gtex_eqtl.py"
GENEBASS_GENE_BURDEN_SCRIPT = (
REPO_ROOT / "genebass-gene-burden-skill" / "scripts" / "genebass_gene_burden.py"
Expand Down Expand Up @@ -552,33 +551,133 @@ def fetch_gwas_study_metadata(
return out


def chromosome_from_refseq(seq_id: str) -> str | None:
accession = seq_id.split(".", 1)[0]
return REFSEQ_CHROMOSOMES.get(accession)


def assembly_key_from_traits(traits: list[dict[str, Any]]) -> str | None:
for trait in traits:
assembly_name = str(trait.get("assembly_name") or "")
if assembly_name.startswith("GRCh38"):
return "grch38"
if assembly_name.startswith("GRCh37"):
return "grch37"
return None


def coordinate_from_placement(placement: dict[str, Any]) -> dict[str, Any] | None:
seq_id = str(placement.get("seq_id") or "")
chrom = chromosome_from_refseq(seq_id)
if not chrom:
return None

placement_annot = coerce_dict(placement.get("placement_annot"))
traits = coerce_list_of_dicts(placement_annot.get("seq_id_traits_by_assembly"))
if not traits:
return None

# Prefer primary top-level chromosome placements over alt loci or patches.
if not any(
trait.get("is_top_level")
and trait.get("is_chromosome")
and not trait.get("is_alt")
and not trait.get("is_patch")
for trait in traits
):
return None

spdis: list[dict[str, Any]] = []
for allele in coerce_list_of_dicts(placement.get("alleles")):
spdi = coerce_dict(coerce_dict(allele.get("allele")).get("spdi"))
if spdi:
spdis.append(spdi)
if not spdis:
return None

positions = {spdi.get("position") for spdi in spdis if spdi.get("position") is not None}
if not positions:
return None
try:
pos = int(sorted(positions)[0]) + 1
except Exception:
return None

deleted_sequences = [
str(spdi.get("deleted_sequence") or "")
for spdi in spdis
if str(spdi.get("deleted_sequence") or "")
]
if not deleted_sequences:
return None
ref = deleted_sequences[0]

alternate_alleles = sorted(
{
str(spdi.get("inserted_sequence") or "")
for spdi in spdis
if str(spdi.get("inserted_sequence") or "")
and str(spdi.get("inserted_sequence") or "") != str(spdi.get("deleted_sequence") or "")
}
)
alt = alternate_alleles[-1] if alternate_alleles else ref

assembly_name = str(traits[0].get("assembly_name") or "")
return {
"chr": chrom,
"pos": pos,
"ref": ref,
"alt": alt,
"alternate_alleles": alternate_alleles,
"seq_id": seq_id,
"assembly": assembly_name,
}


def fetch_refsnp_payload(rsid: str, limitations: list[str]) -> dict[str, Any] | None:
digits = "".join(ch for ch in rsid if ch.isdigit())
if not digits:
return None
try:
return safe_get_json(f"{REFSNP_BASE}/{digits}", timeout=35)
except Exception as exc:
limitations.append(f"RefSNP lookup failed for {rsid}: {exc}")
return None


def resolve_refsnp_coordinates(
rsid: str, warnings: list[str], limitations: list[str]
) -> dict[str, dict[str, Any]]:
payload = fetch_refsnp_payload(rsid, limitations)
if not payload:
return {}

coords: dict[str, dict[str, Any]] = {}
snapshot = coerce_dict(payload.get("primary_snapshot_data"))
for placement in coerce_list_of_dicts(snapshot.get("placements_with_allele")):
traits = coerce_list_of_dicts(
coerce_dict(placement.get("placement_annot")).get("seq_id_traits_by_assembly")
)
assembly_key = assembly_key_from_traits(traits)
if not assembly_key or assembly_key in coords:
continue
coord = coordinate_from_placement(placement)
if coord:
coords[assembly_key] = coord

if "grch38" not in coords:
warnings.append(f"Coordinate lookup did not find a GRCh38 top-level placement for {rsid}.")
return coords


def resolve_anchor_coordinates(
anchors: list[dict[str, Any]], warnings: list[str], limitations: list[str]
) -> None:
for anchor in anchors:
rsid = str(anchor.get("rsid") or "")
if not rsid:
continue
coord_result = run_json_skill_script(
VARIANT_COORDINATE_FINDER_SCRIPT,
{"rsid": rsid},
limitations,
timeout_s=25,
)
if not coord_result:
anchor["grch38"] = None
anchor["grch37"] = None
anchor["locus_id"] = f"rsid:{rsid}"
continue

if not coord_result.get("ok"):
error = coerce_dict(coord_result.get("error")).get("message")
warnings.append(f"Coordinate lookup failed for {rsid}: {error}")
anchor["grch38"] = None
anchor["grch37"] = None
anchor["locus_id"] = f"rsid:{rsid}"
continue

coord_result = resolve_refsnp_coordinates(rsid, warnings, limitations)
g38 = coerce_dict(coord_result.get("grch38"))
g37 = coerce_dict(coord_result.get("grch37"))
anchor["grch38"] = g38 if g38 else None
Expand Down Expand Up @@ -957,31 +1056,35 @@ def fetch_refsnp_annotations(rsids: list[str], limitations: list[str]) -> dict[s
out: dict[str, dict[str, Any]] = {}

for rsid in rsids:
digits = "".join(ch for ch in rsid if ch.isdigit())
if not digits:
continue
url = f"{REFSNP_BASE}/{digits}"
try:
payload = safe_get_json(url, timeout=35)
except Exception as exc:
limitations.append(f"RefSNP lookup failed for {rsid}: {exc}")
payload = fetch_refsnp_payload(rsid, limitations)
if not payload:
continue

snapshot = coerce_dict(payload.get("primary_snapshot_data"))
genes = {
str(item.get("name")).strip()
str(item.get("locus") or item.get("name")).strip()
for item in coerce_list_of_dicts(snapshot.get("genes"))
if item.get("name")
if item.get("locus") or item.get("name")
}
coding_genes: set[str] = set()
consequence_terms: set[str] = set()

for allele_ann in coerce_list_of_dicts(snapshot.get("allele_annotations")):
for asm_ann in coerce_list_of_dicts(allele_ann.get("assembly_annotation")):
for gene in coerce_list_of_dicts(asm_ann.get("genes")):
gene_symbol = str(gene.get("name") or "").strip()
gene_symbol = str(gene.get("locus") or gene.get("name") or "").strip()
if gene_symbol:
genes.add(gene_symbol)
is_coding = False
for so in coerce_list_of_dicts(gene.get("sequence_ontology")):
term = str(so.get("name") or "").strip()
if term:
consequence_terms.add(term)
for rna in coerce_list_of_dicts(gene.get("rnas")):
for so in coerce_list_of_dicts(rna.get("sequence_ontology")):
term = str(so.get("name") or "").strip()
if term:
consequence_terms.add(term)
protein = rna.get("protein")
protein_items = [protein] if isinstance(protein, dict) else protein
if not isinstance(protein_items, list):
Expand Down Expand Up @@ -1635,6 +1738,17 @@ def map_locus_to_gene(input_json: dict[str, Any]) -> dict[str, Any]:
if not anchors:
raise ValueError("No anchors remained after normalization.")

unresolved_coord_rsids = [
str(anchor.get("rsid"))
for anchor in anchors
if anchor.get("rsid") and not coerce_dict(anchor.get("grch38"))
]
if unresolved_coord_rsids:
limitations.append(
"Unresolved GRCh38 coordinates for anchors: "
+ ", ".join(dedupe_keep_order(unresolved_coord_rsids))
)

anchor_rsids = dedupe_keep_order([str(a.get("rsid")) for a in anchors if a.get("rsid")])
trait_terms = dedupe_keep_order(
[
Expand Down Expand Up @@ -1672,6 +1786,9 @@ def map_locus_to_gene(input_json: dict[str, Any]) -> dict[str, Any]:
for anchor in coerce_list_of_dicts(locus.get("anchors")):
locus_symbols.extend(as_string_list(anchor.get("mapped_genes")))
rsid = str(anchor.get("rsid") or "")
annot = coerce_dict(refsnp_annotations.get(rsid))
locus_symbols.extend(as_string_list(annot.get("coding_genes")))
locus_symbols.extend(as_string_list(annot.get("genes")))
l2g_rows = coerce_list_of_dicts(
coerce_dict(ot_support.get("per_anchor", {})).get(rsid, {}).get("l2g")
)
Expand Down Expand Up @@ -1935,7 +2052,7 @@ def map_locus_to_gene(input_json: dict[str, Any]) -> dict[str, Any]:
"sources_queried": [
"efo-ontology-skill",
"gwas-catalog-skill",
"variant-coordinate-finder-skill",
"ncbi-refsnp-coordinate-resolution",
"opentargets-skill",
"gtex-eqtl-skill",
"genebass-gene-burden-skill",
Expand Down Expand Up @@ -1987,8 +2104,12 @@ def map_locus_to_gene(input_json: dict[str, Any]) -> dict[str, Any]:
mapping_output_path.write_text(json.dumps(mapping_payload, indent=2), encoding="utf-8")
summary_output_path.write_text(summary, encoding="utf-8")

critical_limitations = [
item for item in limitations if item.startswith("Unresolved GRCh38 coordinates")
]

return {
"status": "ok",
"status": "degraded" if critical_limitations else "ok",
"mapping_output_path": str(mapping_output_path),
"summary_output_path": str(summary_output_path),
"figure_paths": [str(fig.get("path")) for fig in figure_entries],
Expand All @@ -1998,6 +2119,7 @@ def map_locus_to_gene(input_json: dict[str, Any]) -> dict[str, Any]:
"Do not wrap them in code fences."
),
"warnings": dedupe_keep_order(warnings),
"limitations": dedupe_keep_order(limitations),
}


Expand Down
Loading