Skip to content

Commit

Permalink
update documentation for version 0.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Kin Wai Cheuk authored and Kin Wai Cheuk committed Nov 8, 2020
1 parent 2159e00 commit 455b72d
Show file tree
Hide file tree
Showing 7 changed files with 120 additions and 27 deletions.
33 changes: 33 additions & 0 deletions Sphinx/source/citing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Citing nnAudio
===============

If you use nnAudio in your research, please feel free to cite our work.

Plain Text
-----------
K. W. Cheuk, H. Anderson, K. Agres and D. Herremans,
"nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks,"
in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

BibTex
-------

.. code-block:: tex

@ARTICLE{9174990,
author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}},
journal={IEEE Access},
title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks},
year={2020},
volume={8},
number={},
pages={161981-162003},
doi={10.1109/ACCESS.2020.3019084}}

Link to the paper
-----------------

The paper for nnAudio is avaliable on `IEEE Access <https://ieeexplore.ieee.org/document/9174990>`__



2 changes: 2 additions & 0 deletions Sphinx/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autosectionlabel',
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.napoleon',
Expand Down Expand Up @@ -84,6 +85,7 @@
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
html_logo = "logo.png"
html_title = f'{version}'

# Theme options are theme-specific and customize the look and feel of a theme
Expand Down
20 changes: 19 additions & 1 deletion Sphinx/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,24 @@ Tutorials
Call for Contribution:
**********************

We are now looking for contributions. People who are interested in contributing to nnAudio can visit the `github page <https://github.com/KinWaiCheuk/nnAudio>`_ or contact me via kinwai<underscore>cheuk<at>mymail.sutd.edu.sg.

nnAudio is a fast-growing package. With the increasing number of feature requests, we welcome anyone who is familiar with digital signal processing and neural network to contribute to nnAudio. The current list of pending features includes:

1. Invertible Constant Q Transform (CQT)
2. CQT with filter scale factor (see issue `#54 <https://github.com/KinWaiCheuk/nnAudio/issues/54>`__)
3. Variable Q Transform see `VQT <https://www.researchgate.net/publication/274009051_A_Matlab_Toolbox_for_Efficient_Perfect_Reconstruction_Time-Frequency_Transforms_with_Log-Frequency_Resolution>`__)
4. Speed and Performance improvements for Griffin-Lim (see issue `#41 <https://github.com/KinWaiCheuk/nnAudio/issues/41>`__)
5. Data Augmentation (see issue `#49 <https://github.com/KinWaiCheuk/nnAudio/issues/49>`__)

(Quick tips for unit test: `cd` inside Installation folder, then type `pytest`. You need at least 1931 MiB GPU memory to pass all the unit tests)

Alternatively, you may also contribute by:

1. Refactoring the code structure (Now all functions are within the same file, but with the increasing number of features, I think we need to break it down into smaller modules)
2. Making a better demonstration code or tutorial

People who are interested in contributing to nnAudio can visit
the `github page <https://github.com/KinWaiCheuk/nnAudio>`_ or
contact me via kinwai<underscore>cheuk<at>mymail.sutd.edu.sg.


28 changes: 17 additions & 11 deletions Sphinx/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
nnAudio 0.2.0
===================================
Welcome to nnAudio 0.2.0. It changes the syntax of the spectrogram layers creation,
Welcome to nnAudio 0.2.0. This new version changes the syntax of the spectrogram layers creation,
such that ``stft_layer.to(device)`` can be used. This new version is more stable
than the previous version since it is more compatible with other torch modules.

Expand All @@ -24,37 +24,43 @@ But they are not using the neural network approach, and hence the
Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is
still very difficult to install under the Windows environment due to
``sox``. nnAudio is a more compatible audio processing tool across
different operation systems since it relies mostly on PyTorch
different operating systems since it relies mostly on PyTorch
convolutional neural network. The name of nnAudio comes from
``torch.nn``.

The implmentation details for **nnAudio** has also been published in IEEE Access, people who are interested can read the `paper <https://ieeexplore.ieee.org/document/9174990>`__.
The implementation details for **nnAudio** have also been published in IEEE Access, people who are interested can read the `paper <https://ieeexplore.ieee.org/document/9174990>`__.

The source code for **nnAudio** can be found in `GitHub <https://github.com/KinWaiCheuk/nnAudio>`__.


Getting started
---------------
.. toctree::
:maxdepth: 2

:maxdepth: 1
:caption: Getting Started

intro

API documentation
-----------------

.. toctree::
:maxdepth: 1
:caption: API Documentation

nnAudio

Tutorial
-----------------

.. toctree::
:maxdepth: 1
:caption: Tutorials

examples


.. toctree::
:maxdepth: 1
:caption: Citation

citing


Indices and tables
------------------

Expand Down
62 changes: 49 additions & 13 deletions Sphinx/source/intro.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
Getting Started
===============

Introduction
************
Expand All @@ -20,15 +18,17 @@ Installation

Via PyPI
~~~~~~~~
To install stable release from pypi: ``pip install nnAudio==x.x.x``, where ``x.x.x`` is the version number. The lastest version is now ``0.1.15``.
To install stable release from pypi: ``pip install nnAudio==x.x.x``, where ``x.x.x`` is the version number.
The lastest version is now ``0.2.0``.

When there are pre-release available, you can install the pre-release by ``pip install nnAudio --pre -U``.
It allows the users to use the latest features, but the new features might not be stable.
Please use it with care and report any problems that you found.

Via GitHub
~~~~~~~~~~
Alternatively, you can also install from the github by first cloning the repository with ``git clone https://github.com/KinWaiCheuk/nnAudio.git <any path you want to save to>``. Then ``cd`` into the ``Installation`` folder where the ``setup.py`` is located at, and do ``python setup.py install``.

..
To install dev version: ``pip install nnAudio --pre -U``
It allows the users to use the latest features, but the new features might not be stable. Please use with care and report any problems that you found.

Requirement
~~~~~~~~~~~
Expand All @@ -48,7 +48,9 @@ Usage

Standalone Usage
~~~~~~~~~~~~~~~~
To use nnAudio, you need to define the neural network layer. After that, you can pass a batch of waveform to that layer to obtain the spectrograms. The input shape should be `(batch, len_audio)`.
To use nnAudio, you need to define the spectrogram layer in the same way as a neural network layer.
After that, you can pass a batch of waveform to that layer to obtain the spectrograms.
The input shape should be `(batch, len_audio)`.

.. code-block:: python
Expand All @@ -61,24 +63,30 @@ To use nnAudio, you need to define the neural network layer. After that, you can
spec_layer = Spectrogram.STFT(n_fft=2048, freq_bins=None, hop_length=512,
window='hann', freq_scale='linear', center=True, pad_mode='reflect',
fmin=50,fmax=11025, sr=sr, device='cuda:0') # Initializing the model
fmin=50,fmax=11025, sr=sr) # Initializing the model
spec = spec_layer(x) # Feed-forward your waveform to get the spectrogram
.. _on-the-fly:

On-the-fly audio processing
~~~~~~~~~~~~~~~~~~~~~~~~~~~
One application for nnAudio is on-the-fly spectrogram generation when integrating it inside your neural network

.. code-block:: python
:emphasize-lines: 5,22
:emphasize-lines: 5-10,27
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
# Getting Mel Spectrogram on the fly
self.spec_layer = Spectrogram.STFT(n_fft=2048, freq_bins=None, hop_length=512, window='hann', freq_scale='no', center=True, pad_mode='reflect', fmin=50,fmax=6000, sr=22050, trainable=False, output_format='Magnitude', device='cuda:0')
self.spec_layer = Spectrogram.STFT(n_fft=2048, freq_bins=None,
hop_length=512, window='hann',
freq_scale='no', center=True,
pad_mode='reflect', fmin=50,
fmax=6000, sr=22050, trainable=False,
output_format='Magnitude')
self.n_bins = freq_bins
# Creating CNN Layers
Expand Down Expand Up @@ -106,13 +114,18 @@ One application for nnAudio is on-the-fly spectrogram generation when integratin
Using GPU
~~~~~~~~~

If GPU is avaliable in your computer, you can initialize nnAudio by choosing either CPU or GPU with the ``device`` argument. The default setting for nnAudio is ``device='cpu'``
If a GPU is available in your computer, you can use ``.to(device)`` method like any other PyTorch ``nn.Modules``
to transfer the spectrogram layer to any device you like.


.. code-block:: python
spec_layer = Spectrogram.STFT(device=device)
spec_layer = Spectrogram.STFT().to(device)
Alternatively, if your ``Spectrogram`` module is used inside your PyTorch model
as in the :ref:`on-the-fly processing section<on-the-fly>`, then you just need
to simply do ``net.to(device)``, where ``net = Model()``.

Speed
*****

Expand Down Expand Up @@ -151,4 +164,27 @@ The figure below shows how is the STFT output affected by the changes in STFT ba

.. image:: ../../figures/STFT_training.png
:align: center
:alt: STFT_training
:alt: STFT_training


Different CQT versions
**********************

The result for ``CQT1992`` is smoother than ``CQT2010`` and librosa.
Since librosa and ``CQT2010`` are using the same algorithm (downsampling approach as mentioned in this paper),
you can see similar artifacts as a result of downsampling.

For ``CQT1992v2`` and ``CQT2010v2``, the CQT is computed directly in the time domain
without the need of transforming both input waveforms and the CQT kernels to the frequency domain.
making it faster than the original CQT proposed in 1992.

The default CQT in nnAudio is the ``CQT1992v2`` version.
For more detail, please refer to our `paper <https://ieeexplore.ieee.org/document/9174990>`__

All versions of CQT are available for users to choose.
To explicitly choose which CQT to use, you can refer to the :ref:`CQT API section<nnAudio.Spectrogram.CQT>`.


.. image:: ../../figures/CQT_compare.png
:align: center
:alt: Comparing different versions of CQTs
Binary file added Sphinx/source/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 0 additions & 2 deletions Sphinx/source/nnAudio.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
nnAudio
=======

.. automodule:: nnAudio

Expand Down

0 comments on commit 455b72d

Please sign in to comment.