Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions statvar_imports/tuberculosis_rifampicin_resistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# WHO Tuberculosis: Tuberculosis: Treatment outcomes of people with RR/MDR-TB

## Overview
This dataset provides the percentage of TB patients who started rifampicin-resistant TB treatment and whose treatment outcome was recorded as treatment success (cured or treatment completed), treatment failed, died, lost to follow-up, or not evaluated, within the reporting period.

## Data Source

**Source URL:**
https://data.who.int/indicators/i/39E4281/F1912F6

The data comes from the official WHO reporting database and includes comprehensive, country-level health metrics detailing annual Tuberculosis notifications and case classifications.

## How To Download Input Data
To download the data, you'll need to run the provided download script `who_data_download_tuberculosis_rifampicin_resistant.py`. This script automatically queries the WHO API for the indicator, merges it with the WHO geographical master list to append standard `iso3` country codes, and saves the cleaned `Tuberculosis_rr_mdr_tb_outcomes.csv` file inside an "input_files" folder.

type of place: Country.

statvars: Health / Tuberculosis.

years: 2010 to 2022

place_resolution: manually.

release_frequency: P1Y

## Processing Instructions
To process the WHO Tuberculosis data and generate statistical variables, use the following commands from your root `data` directory:

**Download input file**
```bash
python3 statvar_imports/tuberculosis_rifampicin_resistant/who_data_download_tuberculosis_rifampicin_resistant.py
```

**For Test Data Run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_rifampicin_resistant/testdata/Tuberculosis_rr_mdr_tb_outcomes.csv" \
--pv_map="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_pvmap.csv" \
--output_path="statvar_imports/tuberculosis_rifampicin_resistant/output_files/tuberculosis_rifampicin_resistant" \
--config_file="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

**For Main data run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_rifampicin_resistant/input_files/Tuberculosis_rr_mdr_tb_outcomes.csv" \
--pv_map="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_pvmap.csv" \
--output_path="statvar_imports/tuberculosis_rifampicin_resistant/output_files/tuberculosis_rifampicin_resistant" \
--config_file="statvar_imports/tuberculosis_rifampicin_resistant/tuberculosis_rifampicin_resistant_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

#### Refresh type: Fully Autorefresh
26 changes: 26 additions & 0 deletions statvar_imports/tuberculosis_rifampicin_resistant/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"import_specifications": [
{
"import_name": "WHO_TuberculosisRifampicinResistant",
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "https://data.who.int/indicators/i/39E4281/F1912F6",
"provenance_description": "Treatment outcomes among those who started rifampicin-resistant TB treatment during a specified reporting period.",
"scripts": [
"who_data_download_tuberculosis_rifampicin_resistant.py",
"../../../tools/statvar_importer/stat_var_processor.py --input_data=source_files/tuberculosis_rifampicin_resistant_input.csv --pv_map=tuberculosis_rifampicin_resistant_pvmap.csv --config_file=tuberculosis_rifampicin_resistant_metadata.csv --output_path=output/tuberculosis_rifampicin_resistant_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
],
"import_inputs": [
{
"template_mcf": "output/tuberculosis_rifampicin_resistant_output.tmcf",
"cleaned_csv": "output/tuberculosis_rifampicin_resistant_output.csv"
}
],
"source_files": [
"test_files/tuberculosis_rifampicin_resistant_input.csv"
],
"cron_schedule": "0 10 10,21 * *"
}
]
}

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Node: E:TB_output->E0
observationDate: C:TB_output->observationDate
observationAbout: C:TB_output->observationAbout
value: C:TB_output->value
variableMeasured: C:TB_output->variableMeasured
scalingFactor: 100
typeOf: dcs:StatVarObservation
unit: dcs:Percent
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
config,value
provenance_url,https://data.who.int/indicators/i/39E4281/F1912F6
output_columns,"observationDate,observationAbout,variableMeasured,value,unit,scalingFactor"
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.
#places_within,country/POL
#place_types,"AdministrativeArea,AdministrativeArea1,AdministrativeArea2,State"
#debug,1
#input_rows,100
#word_delimiter,''
#skip_rows,1
populationType,Person
measuredProperty,count
header_rows,1
mapped_columns,6
dc_api_root,https://api.datacommons.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
key,p1,v1,p2,v2,p3,v3,p4,v4,p5,v5,p6,v6,p7,v7
iso3,observationAbout,country/{Data},,,,,,
YEAR,observationDate,{Data},,,,,,
DISAGGR_1:Died,treatmentOutcome,dcs:DiedDuringTreatment,,,,,,
DISAGGR_1:Lost to follow-up,treatmentOutcome,dcs:LostToFollowUp,,,,,,
DISAGGR_1:Not evaluated,treatmentOutcome,dcs:TreatmentNotEvaluated,,,,,,
DISAGGR_1:Successfully treated,treatmentOutcome,dcs:SuccessfullyTreated,,,,,,
DISAGGR_1:Treatment failed,treatmentOutcome,dcs:TreatmentFailed,,,,,,
VALUE,value,{Number},populationType,dcs:Person,measuredProperty,dcs:count,medicalCondition,dcs:MultidrugOrRifampicinResistantTuberculosis,unit,dcs:Percent,scalingFactor,100,,
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import os
import requests
import io
import pandas as pd
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def download_tb_rr_mdr_data():
# 1. Get the Clean Data from the API using the new Indicator ID
api_url = "https://xmart-api-public.who.int/DATA_/RELAY_TB_DATA"
params = {
"$filter": "IND_ID eq '39E4281F1912F6'",
"$select": "IND_ID,INDICATOR_NAME,YEAR,COUNTRY,DISAGGR_1,VALUE",
"$format": "csv"
}

logging.info("1. Fetching clean percentage data from WHO API...")
api_response = requests.get(api_url, params=params, timeout=30)

if api_response.status_code != 200:
logging.error(f"Failed to fetch API data. HTTP {api_response.status_code}")
return

# Load the clean API data into a pandas table
api_df = pd.read_csv(io.StringIO(api_response.text))

# 2. Get ONLY the iso3 code from the master database
logging.info("2. Fetching country iso3 codes from WHO master database...")
master_url = "https://extranet.who.int/tme/generateCSV.asp?ds=notifications"
master_response = requests.get(master_url, timeout=60)
if master_response.status_code != 200:
logging.error(f"Failed to fetch master data. HTTP {master_response.status_code}")
return

# We only pull the 'country' (for matching) and 'iso3' columns
geo_columns = ['country', 'iso3']
master_df = pd.read_csv(io.StringIO(master_response.text), usecols=geo_columns).drop_duplicates()

# 3. Merge the two datasets together based on the country name
logging.info("3. Merging data and formatting...")
# The API uses uppercase 'COUNTRY', the master uses lowercase 'country'
merged_df = pd.merge(api_df, master_df, left_on='COUNTRY', right_on='country', how='left')
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.

# Drop the duplicate lowercase 'country' column used for joining
merged_df = merged_df.drop(columns=['country'])

# Reorder columns so the iso3 code sits right next to the Country name
final_columns = [
'IND_ID', 'INDICATOR_NAME', 'YEAR', 'COUNTRY', 'iso3','DISAGGR_1', 'VALUE'
]
merged_df = merged_df[final_columns]

# 4. Save to CSV in a new folder
output_dir = "statvar_imports/tuberculosis_rifampicin_resistant/input_files"
filename = os.path.join(output_dir, "Tuberculosis_rr_mdr_tb_outcomes.csv")

os.makedirs(output_dir, exist_ok=True)

# Save without the pandas index column
merged_df.to_csv(filename, index=False)
logging.info(f"Success! Data saved locally as '{filename}'")

if __name__ == "__main__":
download_tb_rr_mdr_data()