Projects in Data Science (2026)

Overview

This is a template repository for the "Projects in Data Science" course. You should use this repository for your project.

If using github.itu.dk, you need to download the repository and make your own.

If you are using general Github, you can clone or fork the repository directly.

Your repository MUST be named 2026-PDS-XX where XX is your group name (e.g. 2026-PDS-Pandas).

Python environment

Follow TA instructions when setting up the Python environment before running any code. Remember to export your Python library requirements by pip freeze > requirements.txt and attach it to the repo so we can evaluate your scripts.

File Hierarchy

The file hierarchy of your hand-in repo should be as follows:

ProjectInDataScience2026_ExamTemplate/
├── data/
│   ├─ features.csv                     # generated by src/extract_features.py: image file names, chosen featueres, and ground-truth labels
│   ├─ annotations_combined.csv         # annotations of hair and penmarks
│   │
│   ├── imgs/                           # skin images (do not add on GitHub)
│   │    ├── img_XX1.png
│   │    ├── img_XX2.png
│   │     ......
│   │    └── img_XXX.png
│   │
│   └── masks/                          # mask images (do not add on GitHub)
│        ├── mask_XX1.png
│        ├── mask_XX2.png
│         ......
│        └── mask_XXX.png
│
├── src/
│   ├── __init__.py
│   ├── feature_A.py                    # code for feature A extraction
|   ......
│   ├── feature_X.py                    # code for feature X extraction
│   ├── extract_features.py             # calls feature extraction functions and generates data/features.csv
│   ......
│   └── (optional)                      # any additional code e.g. EDA, feature distribution analysis, helper functions
│
├── results/
│   ├── figures/                        # Figures used in your report
│   ├── models/                         # Trained model for each model type reported (e.g. .pkl)
│   ├── predictions/                    # Predictions output by each reported model
│   └── reports/                        # Files related to the Mandatory assignment
│        ├── report_GROUPEID.pdf
│        └── features_GROUPEID.csv
│
├── main.py                             # full model pipeline: cross-validation for model selection, saving models and predictions, and entry point for TA evaluation
└── README.md

Notes:

DO NOT upload your data (images) to Github.
All feature extraction functions and modules should be placed under the "src" subfolder. You must have a script src/extract_features.py that calls these functions and generates data/features.csv. Any additional code such as EDA or feature distribution analysis can optionally be placed here as well. Do not put everything in a single Python file or copy-paste the same code block across the script.
main.py is the entry point for model training and evaluation. It should contain your cross-validation loop to compare models, and save the trained model and predictions for each model type you report on. Do not put model training code in src/.
data/features.csv must be generated by running src/extract_features.py before running main.py. Ensure your feature extraction code is runnable with minimal modification so that it can be applied to a new set of images and masks.
Think of this repository as something a TA should be able to pick up, change the relevant paths, and immediately run to load your model and evaluate it on a new dataset — with no issues. Structure and document your code with that in mind.
When TAs evaluate your submission, the process will be as follows: first, paths in src/extract_features.py are updated to point to the test images and masks, and the script is run to generate a new features.csv. Then load_model is set to True in main.py and run — this should load your saved model and produce predictions without any further modification. Make sure your code supports this workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Projects in Data Science (2026)

Overview

Python environment

File Hierarchy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
results		results
src		src
README.md		README.md
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

Projects in Data Science (2026)

Overview

Python environment

File Hierarchy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages