This is a template repository for the "Projects in Data Science" course. You should use this repository for your project.
If using github.itu.dk, you need to download the repository and make your own.
If you are using general Github, you can clone or fork the repository directly.
Your repository MUST be named 2026-PDS-XX where XX is your group name (e.g. 2026-PDS-Pandas).
Follow TA instructions when setting up the Python environment before running any code. Remember to export your Python library requirements by pip freeze > requirements.txt and attach it to the repo so we can evaluate your scripts.
The file hierarchy of your hand-in repo should be as follows:
ProjectInDataScience2026_ExamTemplate/
├── data/
│ ├─ features.csv # generated by src/extract_features.py: image file names, chosen featueres, and ground-truth labels
│ ├─ annotations_combined.csv # annotations of hair and penmarks
│ │
│ ├── imgs/ # skin images (do not add on GitHub)
│ │ ├── img_XX1.png
│ │ ├── img_XX2.png
│ │ ......
│ │ └── img_XXX.png
│ │
│ └── masks/ # mask images (do not add on GitHub)
│ ├── mask_XX1.png
│ ├── mask_XX2.png
│ ......
│ └── mask_XXX.png
│
├── src/
│ ├── __init__.py
│ ├── feature_A.py # code for feature A extraction
| ......
│ ├── feature_X.py # code for feature X extraction
│ ├── extract_features.py # calls feature extraction functions and generates data/features.csv
│ ......
│ └── (optional) # any additional code e.g. EDA, feature distribution analysis, helper functions
│
├── results/
│ ├── figures/ # Figures used in your report
│ ├── models/ # Trained model for each model type reported (e.g. .pkl)
│ ├── predictions/ # Predictions output by each reported model
│ └── reports/ # Files related to the Mandatory assignment
│ ├── report_GROUPEID.pdf
│ └── features_GROUPEID.csv
│
├── main.py # full model pipeline: cross-validation for model selection, saving models and predictions, and entry point for TA evaluation
└── README.md
Notes:
- DO NOT upload your data (images) to Github.
- All feature extraction functions and modules should be placed under the "src" subfolder. You must have a script
src/extract_features.pythat calls these functions and generatesdata/features.csv. Any additional code such as EDA or feature distribution analysis can optionally be placed here as well. Do not put everything in a single Python file or copy-paste the same code block across the script. main.pyis the entry point for model training and evaluation. It should contain your cross-validation loop to compare models, and save the trained model and predictions for each model type you report on. Do not put model training code insrc/.data/features.csvmust be generated by runningsrc/extract_features.pybefore runningmain.py. Ensure your feature extraction code is runnable with minimal modification so that it can be applied to a new set of images and masks.- Think of this repository as something a TA should be able to pick up, change the relevant paths, and immediately run to load your model and evaluate it on a new dataset — with no issues. Structure and document your code with that in mind.
- When TAs evaluate your submission, the process will be as follows: first, paths in
src/extract_features.pyare updated to point to the test images and masks, and the script is run to generate a newfeatures.csv. Thenload_modelis set toTrueinmain.pyand run — this should load your saved model and produce predictions without any further modification. Make sure your code supports this workflow.