<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Asymptotic Labs (Posts about neural networks)</title><link>http://asymptoticlabs.com/</link><description></description><atom:link href="http://asymptoticlabs.com/categories/neural-networks.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2023 &lt;a href="mailto:quidditymaster@gmail.com"&gt;Tim Anderton&lt;/a&gt; </copyright><lastBuildDate>Thu, 02 Nov 2023 07:34:06 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Deep Learning 101</title><link>http://asymptoticlabs.com/posts/deep-learning-101.html</link><dc:creator>Tim Anderton</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h2 id="Deep-Learning-101"&gt;Deep Learning 101&lt;a class="anchor-link" href="http://asymptoticlabs.com/posts/deep-learning-101.html#Deep-Learning-101"&gt;¶&lt;/a&gt;&lt;/h2&gt;&lt;h3 id="SLC-Python-Meetup:-June-2,-2021"&gt;SLC Python Meetup: June 2, 2021&lt;a class="anchor-link" href="http://asymptoticlabs.com/posts/deep-learning-101.html#SLC-Python-Meetup:-June-2,-2021"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;&lt;img width="800" height="600" src="http://asymptoticlabs.com/images/broken-clock.jpg"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://asymptoticlabs.com/posts/deep-learning-101.html"&gt;Read more…&lt;/a&gt; (20 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;
</description><category>meetup talks</category><category>neural networks</category><guid>http://asymptoticlabs.com/posts/deep-learning-101.html</guid><pubDate>Wed, 02 Jun 2021 06:00:00 GMT</pubDate></item><item><title>Visualizing Convolution Kernels</title><link>http://asymptoticlabs.com/posts/visualizing-convolution-kernels.html</link><dc:creator>Tim Anderton</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;I very rarely see any sort of inspection being done on the convolutional kernels of a CNN. In part this is because the parameters themselves are far more difficult to interpret than the outputs of a network (even intermediate outputs a.k.a the network activations). This difficulty of interpretation is worst for kernels with a small spatial footprint and unfortunately 3x3 kernels are the most performant and popular choice. Trying to understand the structure of a 3x3 convolution kernel by looking at all of the possible 3x3 spatial slices is somewhat like trying to guess what an full image looks like from being shown all the 3x3 chunks of it in random order.&lt;/p&gt;
&lt;p&gt;Despite the difficulties I think good kernel visualizations are a worthwile pursuit. Good visualization techniques can be powerful diagnostics and the better the visualizations of our models the more powerful and robust we can make them. As a motivational carrot here is a teaser plot of a visualization of a simple network which we generate in this post.&lt;/p&gt;
&lt;p&gt;&lt;img src="http://asymptoticlabs.com/images/visualizing_convolution_kernels_teaser.png" alt="teaser_plot"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://asymptoticlabs.com/posts/visualizing-convolution-kernels.html"&gt;Read more…&lt;/a&gt; (19 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;
</description><category>kernels</category><category>neural networks</category><category>visualization</category><guid>http://asymptoticlabs.com/posts/visualizing-convolution-kernels.html</guid><pubDate>Fri, 14 Sep 2018 06:00:00 GMT</pubDate></item><item><title>Parameter Diffusion</title><link>http://asymptoticlabs.com/posts/parameter-diffusion.html</link><dc:creator>Tim Anderton</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;&lt;img src="http://asymptoticlabs.com/images/parameter-diffusion-teaser.png" alt="circdiff_img"&gt;&lt;/p&gt;
&lt;p&gt;I love using k-fold cross validation for my machine learning projects. But especially when I am dealing with neural network models that take hours or even days to train doing a full k-folds style analysis becomes an uncomfortably heavy computational burden. Unfortunately for models with such long training times I usually abandon training an esemble of models and just train one model with a single train/validation split.&lt;/p&gt;
&lt;p&gt;I really wanted a way to get at least some of the diagnostic benefits you get from having an ensemble of semi-independently trained models the way you do in K-folds, but without needing to wait days or weeks for my neural nets to train. I started experimenting with weakly coupled mixtures of models. Instead of feeding most of the data to K otherwise independent models as in K-folds why not try feeding just a fraction 1/K of the data to each model and let the models communicate about their parameters with each other in a controlled way. I thought that perhaps by cleverly controlling what information is passed between which models, how often messages are passed, and how information from them may be used I could effectively isolate the information in some data folds from the values of the parameters of some of the models. In this way I could hopefully save some computation time over a k-folds cross validation without sacrificing all of its benefits.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://asymptoticlabs.com/posts/parameter-diffusion.html"&gt;Read more…&lt;/a&gt; (34 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;
</description><category>cross validation</category><category>neural networks</category><guid>http://asymptoticlabs.com/posts/parameter-diffusion.html</guid><pubDate>Fri, 20 Apr 2018 16:59:50 GMT</pubDate></item><item><title>Extracting Phonemes</title><link>http://asymptoticlabs.com/posts/extracting-phonemes.html</link><dc:creator>Tim Anderton</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h2 id="Extracting-Phonemes-From-Speech-Samples"&gt;Extracting Phonemes From Speech Samples&lt;a class="anchor-link" href="http://asymptoticlabs.com/posts/extracting-phonemes.html#Extracting-Phonemes-From-Speech-Samples"&gt;¶&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;My best single model for the recent &lt;a href="https://www.kaggle.com/c/tensorflow-speech-recognition-challenge"&gt; speech recognition &lt;/a&gt; kaggle competition. Was a model based on the idea of extracting a probabilistic map of the phonemes present in a particular speech sample and to then using that phoneme map as a feature set to predict the word.&lt;/p&gt;
&lt;p&gt;The dataset provided consists of examples of 30 different words with one word appearing in each 1 second sample. Since there is no phonetic information provided other than which word is which the first step was to turn each word into a phonetic spelling.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://asymptoticlabs.com/posts/extracting-phonemes.html"&gt;Read more…&lt;/a&gt; (24 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;
</description><category>audio</category><category>neural networks</category><category>phonemes</category><category>speech</category><guid>http://asymptoticlabs.com/posts/extracting-phonemes.html</guid><pubDate>Wed, 24 Jan 2018 17:37:19 GMT</pubDate></item><item><title>3D CNN for audio data</title><link>http://asymptoticlabs.com/posts/audio-3DCNN.html</link><dc:creator>Tim Anderton</dc:creator><description>&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h2 id="3D-Time/Frequency/Phase-Representation-of-Audio-for-Speech-Recognition."&gt;3D Time/Frequency/Phase Representation of Audio for Speech Recognition.&lt;a class="anchor-link" href="http://asymptoticlabs.com/posts/audio-3DCNN.html#3D-Time/Frequency/Phase-Representation-of-Audio-for-Speech-Recognition."&gt;¶&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;I recently participated in a &lt;a href="https://www.kaggle.com/c/tensorflow-speech-recognition-challenge"&gt; speech recognition &lt;/a&gt; kaggle competition. Although I didn't come close to the top of the leaderboard (238th place with 87% accuracy vs 91% accuracy for the winners) I learned quite a bit about handling audio data and had a lot of fun. One of the more novel things I tried during the competition was to spatially encode the phase information in the audio and pass the results into a 3D CNN.&lt;/p&gt;
&lt;p&gt;A common pre-processing step in speech recognition is to turn the 1D audio into a 2D &lt;a href="https://en.wikipedia.org/wiki/Spectrogram"&gt;spectrogram&lt;/a&gt;. The spectrogram the volume of the audio as a function of time and at a particular frequency. Spectrograms are a great way of summarizing the important information in an audio clip in a way that makes it accessible visually. Here is a spectrogram of an utterance of the word "marvin".&lt;/p&gt;
&lt;p&gt;&lt;img src="http://asymptoticlabs.com/images/marvin_specgram.jpg" alt="marvin_specgram"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://asymptoticlabs.com/posts/audio-3DCNN.html"&gt;Read more…&lt;/a&gt; (22 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;
</description><category>audio</category><category>neural networks</category><category>speech recognition</category><guid>http://asymptoticlabs.com/posts/audio-3DCNN.html</guid><pubDate>Mon, 22 Jan 2018 07:00:00 GMT</pubDate></item><item><title>Wavelet Spectrograms for Speech Recognition</title><link>http://asymptoticlabs.com/posts/waveletSpectrogramsTFSR.html</link><dc:creator>Tim Anderton</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h2 id="Wavelet-Features-For-Speech-Recognition."&gt;Wavelet Features For Speech Recognition.&lt;a class="anchor-link" href="http://asymptoticlabs.com/posts/waveletSpectrogramsTFSR.html#Wavelet-Features-For-Speech-Recognition."&gt;¶&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;I've been partipating in the TensorFlow speech recognition challenge.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.kaggle.com/c/tensorflow-speech-recognition-challenge"&gt;https://www.kaggle.com/c/tensorflow-speech-recognition-challenge&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It seems like the most common approach is to begin by turning the audio into a spectrogram and then feeding that into a 2D CNN. One trouble with spectrograms is that you have to trade off resolution in frequency for resolution in time and vice versa. In principle you can get higher resolution in time for higher frequencies than you can for lower frequencies but when you pick an input length for your short time fourier transform you lose temporal resolution much below the window length.&lt;/p&gt;
&lt;p&gt;Wavelets are one possible way around this limitation &lt;/p&gt;&lt;p&gt;&lt;a href="http://asymptoticlabs.com/posts/waveletSpectrogramsTFSR.html"&gt;Read more…&lt;/a&gt; (17 min remaining to read)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;
</description><category>audio</category><category>kaggle</category><category>machine learning</category><category>neural networks</category><category>speech recognition</category><category>wavelets</category><guid>http://asymptoticlabs.com/posts/waveletSpectrogramsTFSR.html</guid><pubDate>Mon, 08 Jan 2018 07:00:00 GMT</pubDate></item></channel></rss>