Initial commit

2020-05-23 16:18:57 +02:00 · 2020-05-23 16:18:57 +02:00 · 6a6e99f8d1
commit 6a6e99f8d1
3 changed files with 199 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,141 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# Generated docs
+*.pdf
--- a/2
+++ b/2
@ -0,0 +1,2 @@
+raport.pdf: README.md 
+	pandoc -f markdown-implicit_figures -V geometry:margin=1in $^ -o $@
--- a/README.md
+++ b/README.md
@ -0,0 +1,56 @@
+# 1. Introduction and research problem analysis
+
+We aim to transcribe raw audio to some kind of more structural representation like MIDI. In technical literature it's called _Automatic Music Transcription_ (AMT). Converting  audio  recordings  of  music  into  a  symbolic  form  makes  many  tasks  in  music  information  re-trieval easier to accomplish. It's a hard task even for humans. 
+
+There are several factors that make AMT difficult:
+
+1. Polyphonic music contains a mixture of multiple simultaneous sources.
+2. Overlapping  sound  events  often  exhibit  harmonic  relations   with   each   other.
+3. The timing of musical voices is governed by the regularmetrical structure of the music.
+4. The  annotation  of  ground-truth  transcriptions  for  polyphonic music is very time consuming and requires high expertise.
+
+Especialy the last one which limits the amount of available datasets, for training models.
+
+To limit the scope of the research, we are going to mainly concern ourselves with transcribing polyphonic piano recordings. Standard piano can play 88 pitches, in many combinations and that complexity makes it prime research subject for polyphonic AMT.
+
+State of the art machine learning architectures used for this task take two approaches one similar to speech recognition and consist of two parts: _acoustic model_ and _music language model_. The acoustic model is used for estimating probability of a pitch being present in a frame of audio. The music language model is analogous to language models used in natural language processing and was introduced to predicts probability of a pitch being played in a context of previous pitches and standards of composing music.
+The second one also combines two networks: one network detects note  onsets  and  its  output  is  used  to  inform  asecond network, which focuses on detecting note lengths. The task is also implemented through different techniques such as  Non-Negative Matrix Factorization but we are going to focus on Neural Networks.
+
+Acoustic models usually use RNNs because they can capture long-term and short-term temporal patterns in music. CNNs however provide benefits, because music is created not only by modifing the height of a pitche, but also through changing the temporal "distance" between them, which can be detected using CNNs, which are very good in preserving and recognising spatial features. In case of the second approach both networks consist of Convolutional layer to process the input and RNNs that do the inference, connected in a specific way so that output of the first one can inform the RNN part of the second one.
+
+### The input
+A music file, that has to be preprocesed in some way (usually through frequency domain transformations) to a form processable by a CNN.
+
+### The output
+Series of probability vectors representing sounds present in each frames ready to be transformed into a MIDI file.
+
+## List of publications/articles/blog posts used for problem
+[:)] https://web.stanford.edu/class/ee384m/Handouts/HowtoReadPaper.pdf
+[1] [An End-to-End Neural Network for Polyphonic Piano Music Transcription](https://arxiv.org/pdf/1508.01774.pdf)
+[2] [Music Transcription by Deep Learning with Data and “Artificial Semantic” Augmentation understanding](https://arxiv.org/pdf/1712.03228.pdf)
+[3] [Modeling Temporal Dependencies in High-Dimensional Sequences:
+Application to Polyphonic Music Generation and Transcription](https://arxiv.org/ftp/arxiv/papers/1206/1206.6392.pdf)
+[4] [Onsets and Frames: Dual-Objective Piano Transcription](https://storage.googleapis.com/magentadata/papers/onsets-frames/index.html) <- Strona nowszego rozwiązania które wydaje się prostsze i bardziej efektywne
+[5] [https://github.com/ybayle/awesome-deep-learning-music](https://github.com/ybayle/awesome-deep-learning-music)
+[6] https://drive.google.com/file/d/0B1OooSxEtl0FcTBiOGdvSTBmWnc/view
+
+## List of repositories/code examples that will be used for implementation
+[1] https://github.com/IraKorshunova/folk-rnn
+[2] https://github.com/BShakhovsky/PolyphonicPianoTranscription
+[3] https://github.com/wgxli/piano-transcription
+
+## Framework selected for implementation
+TensorFlow with Keras - probably? Hard to say, but we have most experience with it.
+Keras über alles.
+PyTorch - if it looks like something can be made with it with low effort
+
+## Dataset to be used for experiments
+[MAPS](http://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/): used in [1], "31 GB of CD-quality recordings in .wav format", "The ground truth is provided for all sounds, in MIDI and text formats. The audio was generated from the ground truth in order to ensure the accuracy of the annotation."
+[MAESTRO](https://magenta.tensorflow.org/maestro-wave2midi2wave): "172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms.", "based on recordings from the International Piano-e-Competition"
+[MusicNet](https://homes.cs.washington.edu/~thickstn/musicnet.html): "a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note's position in the metrical structure of the composition", we probably have to filter this dataset in order to acquire music recordings with piano labels only. 
+[Ładny opis czterech datasetów](https://arxiv.org/ftp/arxiv/papers/1206/1206.6392.pdf).
+[2] opisuje lossless & lossy augumentation spektogramów
+
+# 2. Methodology: dataset, tools, experiments
+
+# 3. Experiment results and discussion