8. Machine learning

We introduced machine learning while fitting Gaussian processes in section 5. Walks. Djalgo’s module djai includes tools for modeling music from MIDI data relying on PyTorch (a package for deep learning) and MidiTok (a package to transform MIDI files to deep learning-readeble format). djai is not loaded by default when importing Djalgo, since otherwise PyTorch and MidiTok, which are complicated packages, should have been added to Djalgo’s dependencies. To use djai, you must `pip install torch <https://pytorch.org/get-started/locally/>`__ and `pip install miditok <https://miditok.readthedocs.io/>`__ in your environment.

Ethics: art as the witnesses of experience

Even though djai was the module which took me the most time to develop, it is these days, to my opinion, the least interesting. Who needs to DIY their own AI when interesting results can already be generated with a command prompt to a large langage model (LLM)? My ethos will fluctuate and evolve, as anything should in the precious, short time we exist. Their is nothing inherently wrong with AI, but if your piece was generated with a banal command prompt, your creative process is anything but banal and uninteresting, no matter the result. In times when any artistic piece needed years of work, the result was more important than the process. Now, when anyone can ask a LLM to generate an image of a cat riding a dinausar in space in the style of a mixed of Daly and cyber-punk, well, results are generated within seconds, and the process becomes more relevant. The process can, of course, be interesting and imply AI. Indeed, if like me you have spent months to design your own AI (which is still not working so well…), the process (not the result) behind the musical piece has an artistic value as good as any composer who has spent those months studying musical theory. Let’s also keep in mind that the process includes both the originality of the approach and the enjoyment of the artist.

Artists are people who spent the precious time they own to think on the narration of the object they created. When the process becomes applying a recepe, the result quits art ant belongs to the same category of home sweet home printed carpets sold on Amazon.

That’s why the djai module doesn’t come with pre-trained models. That would have been too easy, right? I prefer seeing you tweak it and train it with your own compositions rather than just use it on Leonard Cohen’s songs to generate new ones. You worth more than this, and the world deserves more than command-prompt artists.

In the quiet moments between the shadow and the light, we find the songs that our hearts forgot to sing. — “Write an original quote in the style of Leonard Cohen”, sent to ChatGPT-4.

Djai

At the core of Djai, you’ll find the ModelManager, doing almost everything for you: it scans your midi files, tokenise (prepare for modelling), models them (defines the model), and predicts (generates a midi file). Let’s create an instance of the model, then I’ll explain the arguments.

[1]:
from djalgo import djai
model_manager = djai.ModelManager(
    sequence_length_input=24, sequence_length_output=8,
    model_type='gru', nn_units=(64, 64, 64), dropout=0.25,
    batch_size=32, learning_rate=0.001
)
  1. sequence_length_input: This defines the length of the input sequences fed into the model. In this case, it is set to 24, meaning each input sequence will consist of 24 tokens.

  2. sequence_length_output: This specifies the length of the output sequences generated by the model. Here, it is set to 8, so the model will generate sequences with 8 tokens as output. With sequence_length_input=24 and sequence_length_output=8, each 24 tokens (notes) generates 8 tokens.

  3. model_type: This argument indicates the type of neural network model to be used. Possible values include ‘gru’, ‘lstm’, and ‘transformer’. In this example, ‘gru’ specifies that a GRU (Gated Recurrent Unit) model will be used. To be short,

  • LSTMs (Long Short-Term Memory networks) are more traditional and capable but tend to be complex.

  • GRUs (Gated Recurrent Units) aim to simplify the architecture of LSTMs with fewer parameters while maintaining performance.

  • Transformers are at the forefront of current large language model (LLM) technology, offering potentially superior learning capabilities due to their attention mechanisms, albeit at the cost of increased complexity and computational demands.

  1. nn_units: This tuple defines the number of units in each layer of the neural network. For the GRU model, (64, 64, 64) means there are three layers, each with 64 units. The more units and layers you’ll add, the longer your model will take time to get fitted. Too few units and layers, and your model will not perform well (underfitting). Too many units and layers, and your model will think noise is a trend (overfitting).

  2. dropout: This is the dropout rate applied during training to prevent overfitting. A value of 0.25 means that 25% of the units will be randomly dropped during training.

  3. batch_size: This determines the number of samples per batch of input fed into the model during training. A batch_size of 32 indicates that 32 sequences will be processed together in each training step.

  4. learning_rate: This is the learning rate for the optimizer, which controls how much to adjust the model’s weights with respect to the loss gradient. A lower learning rate of 0.001 is used to make finer updates to the weights, potentially leading to better convergence.

  5. n_heads: This argument is specific to the transformer model and defines the number of attention heads in each multi-head attention layer. It is not applicable to the GRU model.

Let’s take some random MIDI files, just for testing.

[3]:
from pathlib import Path
midi_files = list(Path('_midi-djai').glob('*.mid'))
midi_files
[3]:
[PosixPath('_midi-djai/adams.mid'),
 PosixPath('_midi-djai/mario.mid'),
 PosixPath('_midi-djai/pinkpanther.mid'),
 PosixPath('_midi-djai/rocky.mid'),
 PosixPath('_midi-djai/tetris.mid')]

All we have to do it to fit our model, save it for eventual future use (large model can take a long time to converge), and generate a new midi file from any midi file used as primer.

[4]:
model_manager.fit('_midi-djai', epochs=500, verbose=25)
model_manager.save('_midi-djai/gru.model')
model_manager.generate(length=10, primer_file='_midi-output/polyloop.mid', output_file='_midi-output/djai.mid')
Epoch 1/500, Step 0, Loss: 5.883777141571045
Epoch 26/500, Step 25, Loss: 3.827465057373047
Epoch 51/500, Step 50, Loss: 3.193833589553833
Epoch 76/500, Step 75, Loss: 3.0585410594940186
Epoch 101/500, Step 100, Loss: 2.9217000007629395
Epoch 126/500, Step 125, Loss: 2.771575450897217
Epoch 151/500, Step 150, Loss: 2.6586806774139404
Epoch 176/500, Step 175, Loss: 2.5649948120117188
Epoch 201/500, Step 200, Loss: 2.4781956672668457
Epoch 226/500, Step 225, Loss: 2.4030539989471436
Epoch 251/500, Step 250, Loss: 2.3383901119232178
Epoch 276/500, Step 275, Loss: 2.2718141078948975
Epoch 301/500, Step 300, Loss: 2.2172982692718506
Epoch 326/500, Step 325, Loss: 1.9796744585037231
Epoch 351/500, Step 350, Loss: 1.8016573190689087
Epoch 376/500, Step 375, Loss: 1.6232600212097168
Epoch 401/500, Step 400, Loss: 1.4377361536026
Epoch 426/500, Step 425, Loss: 1.246799111366272
Epoch 451/500, Step 450, Loss: 1.0514758825302124
Epoch 476/500, Step 475, Loss: 0.866170346736908

Result

[5]:
import music21 as m21
m21.converter.parse('_midi-output/djai.mid').show('midi')

🤔…