update documentation for version 0.2.0

KinWaiCheuk · Nov 8, 2020 · 455b72d · 455b72d
1 parent 2159e00
commit 455b72d
Show file tree

Hide file tree

Showing 7 changed files with 120 additions and 27 deletions.
diff --git a/Sphinx/source/citing.rst b/Sphinx/source/citing.rst
@@ -0,0 +1,33 @@
+Citing nnAudio
+===============
+
+If you use nnAudio in your research, please feel free to cite our work.
+
+Plain Text
+-----------
+K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, 
+"nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," 
+in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.
+
+BibTex
+-------
+
+.. code-block:: tex
+
+    @ARTICLE{9174990,
+    author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}},
+    journal={IEEE Access}, 
+    title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, 
+    year={2020},
+    volume={8},
+    number={},
+    pages={161981-162003},
+    doi={10.1109/ACCESS.2020.3019084}}
+
+Link to the paper
+-----------------
+
+The paper for nnAudio is avaliable on `IEEE Access <https://ieeexplore.ieee.org/document/9174990>`__
+
+
+
diff --git a/Sphinx/source/conf.py b/Sphinx/source/conf.py
@@ -41,6 +41,7 @@
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
 extensions = [
+    'sphinx.ext.autosectionlabel',
     'sphinx.ext.autodoc',
     'sphinx.ext.autosummary',
     'sphinx.ext.napoleon',
@@ -84,6 +85,7 @@
 # a list of builtin themes.
 #
 html_theme = 'sphinx_rtd_theme'
+html_logo = "logo.png"
 html_title = f'{version}'
 
 # Theme options are theme-specific and customize the look and feel of a theme

diff --git a/Sphinx/source/examples.rst b/Sphinx/source/examples.rst
@@ -4,6 +4,24 @@ Tutorials
 Call for Contribution:
 **********************
 
-We are now looking for contributions. People who are interested in contributing to nnAudio can visit the `github page <https://github.com/KinWaiCheuk/nnAudio>`_ or contact me via kinwai<underscore>cheuk<at>mymail.sutd.edu.sg.
+
+nnAudio is a fast-growing package. With the increasing number of feature requests, we welcome anyone who is familiar with digital signal processing and neural network to contribute to nnAudio. The current list of pending features includes:
+
+1. Invertible Constant Q Transform (CQT)
+2. CQT with filter scale factor (see issue `#54 <https://github.com/KinWaiCheuk/nnAudio/issues/54>`__)
+3. Variable Q Transform see `VQT <https://www.researchgate.net/publication/274009051_A_Matlab_Toolbox_for_Efficient_Perfect_Reconstruction_Time-Frequency_Transforms_with_Log-Frequency_Resolution>`__)
+4. Speed and Performance improvements for Griffin-Lim (see issue `#41 <https://github.com/KinWaiCheuk/nnAudio/issues/41>`__)
+5. Data Augmentation (see issue `#49 <https://github.com/KinWaiCheuk/nnAudio/issues/49>`__)
+
+(Quick tips for unit test: `cd` inside Installation folder, then type `pytest`. You need at least 1931 MiB GPU memory to pass all the unit tests)
+
+Alternatively, you may also contribute by:
+
+1. Refactoring the code structure (Now all functions are within the same file, but with the increasing number of features, I think we need to break it down into smaller modules)
+2. Making a better demonstration code or tutorial
+
+People who are interested in contributing to nnAudio can visit
+the `github page <https://github.com/KinWaiCheuk/nnAudio>`_ or 
+contact me via kinwai<underscore>cheuk<at>mymail.sutd.edu.sg.
 
 
diff --git a/Sphinx/source/index.rst b/Sphinx/source/index.rst
@@ -5,7 +5,7 @@
 
 nnAudio 0.2.0
 ===================================
-Welcome to nnAudio 0.2.0. It changes the syntax of the spectrogram layers creation, 
+Welcome to nnAudio 0.2.0. This new version changes the syntax of the spectrogram layers creation, 
 such that ``stft_layer.to(device)`` can be used. This new version is more stable 
 than the previous version since it is more compatible with other torch modules.
 
@@ -24,37 +24,43 @@ But they are not using the neural network approach, and hence the
 Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is
 still very difficult to install under the Windows environment due to
 ``sox``. nnAudio is a more compatible audio processing tool across
-different operation systems since it relies mostly on PyTorch
+different operating systems since it relies mostly on PyTorch
 convolutional neural network. The name of nnAudio comes from
 ``torch.nn``.
 
-The implmentation details for **nnAudio** has also been published in IEEE Access, people who are interested can read the `paper <https://ieeexplore.ieee.org/document/9174990>`__.
+The implementation details for **nnAudio** have also been published in IEEE Access, people who are interested can read the `paper <https://ieeexplore.ieee.org/document/9174990>`__.
 
 The source code for **nnAudio** can be found in `GitHub <https://github.com/KinWaiCheuk/nnAudio>`__.
 
 
-Getting started
---------------- 
 .. toctree::
-    :maxdepth: 2
-
+    :maxdepth: 1
+    :caption: Getting Started
+
     intro
 
-API documentation
------------------
+
 .. toctree::
     :maxdepth: 1
+    :caption: API Documentation
 
     nnAudio
 
-Tutorial
------------------ 
+
 .. toctree::
     :maxdepth: 1
+    :caption: Tutorials
 
     examples
 
 
+.. toctree::
+    :maxdepth: 1
+    :caption: Citation
+
+    citing
+
+
 Indices and tables
 ------------------
 

diff --git a/Sphinx/source/intro.rst b/Sphinx/source/intro.rst
@@ -1,5 +1,3 @@
-Getting Started
-===============
 
 Introduction
 ************
@@ -20,15 +18,17 @@ Installation
 
 Via PyPI
 ~~~~~~~~
-To install stable release from pypi: ``pip install nnAudio==x.x.x``, where ``x.x.x`` is the version number. The lastest version is now ``0.1.15``.
+To install stable release from pypi: ``pip install nnAudio==x.x.x``, where ``x.x.x`` is the version number.
+The lastest version is now ``0.2.0``.
+
+When there are pre-release available, you can install the pre-release by ``pip install nnAudio --pre -U``.
+It allows the users to use the latest features, but the new features might not be stable.
+Please use it with care and report any problems that you found.
 
 Via GitHub
 ~~~~~~~~~~
 Alternatively, you can also install from the github by first cloning the repository with ``git clone https://github.com/KinWaiCheuk/nnAudio.git <any path you want to save to>``. Then ``cd`` into the ``Installation`` folder where the ``setup.py`` is located at, and do ``python setup.py install``.
 
-..
-    To install dev version: ``pip install nnAudio --pre -U``
-    It allows the users to use the latest features, but the new features might not be stable. Please use with care and report any problems that you found.
 
 Requirement
 ~~~~~~~~~~~
@@ -48,7 +48,9 @@ Usage
 
 Standalone Usage
 ~~~~~~~~~~~~~~~~
-To use nnAudio, you need to define the neural network layer. After that, you can pass a batch of waveform to that layer to obtain the spectrograms. The input shape should be `(batch, len_audio)`.
+To use nnAudio, you need to define the spectrogram layer in the same way as a neural network layer.
+After that, you can pass a batch of waveform to that layer to obtain the spectrograms.
+The input shape should be `(batch, len_audio)`.
 
 .. code-block:: python
 
@@ -61,24 +63,30 @@ To use nnAudio, you need to define the neural network layer. After that, you can
 
     spec_layer = Spectrogram.STFT(n_fft=2048, freq_bins=None, hop_length=512, 
                                   window='hann', freq_scale='linear', center=True, pad_mode='reflect', 
-                                  fmin=50,fmax=11025, sr=sr, device='cuda:0') # Initializing the model
+                                  fmin=50,fmax=11025, sr=sr) # Initializing the model
 
     spec = spec_layer(x) # Feed-forward your waveform to get the spectrogram      
     
  
+.. _on-the-fly: 
 
 On-the-fly audio processing
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 One application for nnAudio is on-the-fly spectrogram generation when integrating it inside your neural network
 
 .. code-block:: python
-    :emphasize-lines: 5,22
+    :emphasize-lines: 5-10,27
     
     class Model(torch.nn.Module):
         def __init__(self):
             super(Model, self).__init__()
             # Getting Mel Spectrogram on the fly
-            self.spec_layer = Spectrogram.STFT(n_fft=2048, freq_bins=None, hop_length=512, window='hann', freq_scale='no', center=True, pad_mode='reflect', fmin=50,fmax=6000, sr=22050, trainable=False, output_format='Magnitude', device='cuda:0')
+            self.spec_layer = Spectrogram.STFT(n_fft=2048, freq_bins=None, 
+                                               hop_length=512, window='hann',
+                                               freq_scale='no', center=True, 
+                                               pad_mode='reflect', fmin=50,
+                                               fmax=6000, sr=22050, trainable=False,
+                                               output_format='Magnitude')
             self.n_bins = freq_bins         
 
             # Creating CNN Layers
@@ -106,13 +114,18 @@ One application for nnAudio is on-the-fly spectrogram generation when integratin
 Using GPU
 ~~~~~~~~~
 
-If GPU is avaliable in your computer, you can initialize nnAudio by choosing either CPU or GPU with the ``device`` argument. The default setting for nnAudio is ``device='cpu'``
+If a GPU is available in your computer, you can use ``.to(device)`` method like any other PyTorch ``nn.Modules`` 
+to transfer the spectrogram layer to any device you like.
 
 
 .. code-block:: python
 
-    spec_layer = Spectrogram.STFT(device=device)
+    spec_layer = Spectrogram.STFT().to(device)
     
+Alternatively, if your ``Spectrogram`` module is used inside your PyTorch model 
+as in the :ref:`on-the-fly processing section<on-the-fly>`, then you just need 
+to simply do ``net.to(device)``, where ``net = Model()``.
+
 Speed
 *****
 
@@ -151,4 +164,27 @@ The figure below shows how is the STFT output affected by the changes in STFT ba
 
 .. image:: ../../figures/STFT_training.png
     :align: center
-    :alt: STFT_training
+    :alt: STFT_training
+
+
+Different CQT versions
+**********************
+
+The result for ``CQT1992`` is smoother than ``CQT2010`` and librosa.
+Since librosa and ``CQT2010`` are using the same algorithm (downsampling approach as mentioned in this paper),
+you can see similar artifacts as a result of downsampling.
+
+For ``CQT1992v2`` and ``CQT2010v2``, the CQT is computed directly in the time domain
+without the need of transforming both input waveforms and the CQT kernels to the frequency domain.
+making it faster than the original CQT proposed in 1992.
+
+The default CQT in nnAudio is the ``CQT1992v2`` version.
+For more detail, please refer to our `paper <https://ieeexplore.ieee.org/document/9174990>`__
+
+All versions of CQT are available for users to choose.
+To explicitly choose which CQT to use, you can refer to the :ref:`CQT API section<nnAudio.Spectrogram.CQT>`.
+
+
+.. image:: ../../figures/CQT_compare.png
+    :align: center
+    :alt: Comparing different versions of CQTs
diff --git a/Sphinx/source/logo.png b/Sphinx/source/logo.png
diff --git a/Sphinx/source/nnAudio.rst b/Sphinx/source/nnAudio.rst
@@ -1,5 +1,3 @@
-nnAudio
-=======
 
 .. automodule:: nnAudio
-Original file line number
+Diff line change
@@ -1,5 +1,3 @@
-    nnAudio
-    =======
     .. automodule:: nnAudio
@@ Expand Down @@