This project is part of an ongoing research study. The full research paper will be added soon.
This project presents an end-to-end symbolic music generation and benchmarking framework built using the MIT Irish Folk Music Dataset (ABC notation).
It goes beyond generation by introducing a comprehensive and standardized evaluation pipeline to fairly compare multiple deep learning architectures across model-level, structural, and audio-based metrics.
This study focuses on exploring different model architectures for music generation rather than maximizing their performance.
All models were trained using the same hyperparameters to ensure a fair and unbiased comparison.The purpose of this setup is to identify which architectures are most promising for building more complex music-generation systems in the future.
Due to limited GPU resources and the exploratory nature of the work, the reported results may not represent the models’ optimal performance.
With further tuning and compute, these architectures can be extended to tasks like polyphonic generation and chord prediction.
- End-to-end pipeline: Data → Training → Generation → Evaluation
- Implementation of 7 deep learning architectures
- Unified benchmarking framework across all models
- Multi-dimensional evaluation:
- Model-level metrics
- Symbolic & structural metrics
- Musical coherence analysis
- Audio similarity (DTW)
- Research-focused design for fair model comparison
| Model | Type |
|---|---|
| RNN | Sequence Model |
| LSTM | Advanced RNN |
| GRU | Gated RNN |
| CNN | Convolutional |
| Transformer | Attention-based |
| VAE | Generative |
| GAN | Generative |
- MIT Irish Folk Music Dataset
- Format: ABC Notation (Symbolic Music)
- Character-level tokenization
- Vocabulary mapping
- Sequence batching
- Training sample generation
ABC Dataset
↓
Preprocessing & Tokenization
↓
Model Training
(RNN / LSTM / GRU / CNN / Transformer / VAE / GAN)
↓
Music Generation (ABC → Audio)
↓
Evaluation Framework (Multi-level Metrics)
This project introduces a comprehensive benchmarking system:
- Loss
- Note Accuracy
- Perplexity
- GAN-specific:
- Transition Accuracy
- Distribution Loss & Similarity
- Jaccard Similarity (Note Overlap)
- Bigram / Trigram Overlap
- BLEU Score
- Sequence Similarity
- Melodic Contour Similarity
- Pitch-Class Histogram Divergence
- Musical Coherence (Original vs Generated)
- MFCC-based similarity
- Dynamic Time Warping (DTW)
-
LSTM and CNN emerged as the most reliable models
- LSTM → Best balance of structure, coherence, and accuracy
- CNN → Extremely high accuracy but weak musical structure
-
RNN performs strongly in sequence modeling
- High similarity and coherent melody generation
-
GRU and Transformer show inconsistent results
- Require further tuning or larger datasets
-
GAN and VAE struggle with structure
- Capture distribution but fail in sequence coherence
High predictive accuracy does not guarantee musical quality
Models like CNN and GRU achieve high accuracy but fail to preserve:
- melodic flow
- structure
- long-term coherence
LSTM achieves the best trade-off across:
- accuracy
- structural similarity
- musical coherence
This repository supports a research study titled:
"Comparative Analysis of Deep Learning Models for Symbolic Music Generation"
The paper will include:
- Methodology
- Detailed experiments
- Full evaluation results
- Comparative analysis
- Future research directions
Music generation is not just about accuracy — it's about structure, coherence, and musicality.