April 19, 2019 Posted by Raji AyinlaProgramming 0 thoughts on “10 Deep Learning Resources for Audio Processing”
We’ve written before about the rise of voice assistants in the IoT market. As these devices become more and more sophisticated, we can expect to see them number of voice assistant to skyrocket to 8 billion by 2023. This emerging market creates an excellent opportunity for those interested in audio processing or audio anomaly detection. In the future, there may be more emphasis put in creating valuable resources that can help up-and-coming audio processing data scientists practise their craft. As of now, there are plenty of resources available. Below are 10 resources for audio processing.
These are signal processing notes from a Computer Science class, but they cover the basics of the mathematical background you need to then perform other tasks. The topics include DSP Primer, Perception and Features, Principal Component Analysis, ICA and NMF, KPCA and Manifold Methods, Detection and Matched Filters, Decision theory & classifiers, Nonlinear classifiers, Classification bits and pieces, Clustering, DTW and HMMs, Missing data & dynamical models, Arrays & source separation, Underconstrained separation, Deep Learning.
“aubio is a library to extract annotations from audio signals: it provides a set of functions that take an input audio signal, and output pitch estimates, attack times (onset), beat location estimates, and other annotation tasks.”
“This post discuss techniques of feature extraction from sound in Python using open source library Librosa and implements a Neural Network in Tensorflow to categories urban sounds, including car horns, children playing, dogs bark, and more.”
March 7, 2019 Posted by Raji AyinlaNews 0 thoughts on “GIPHY’s Open Sourced Celebrity Detector Thinks Shaq Is Terry Crews”
GIPHY recently released its machine learning model, GIPHY Celebrity Detector, under the Mozilla Public License 2.0(MLP). While there are numerous face recognition models like OpenFace out there, they don’t have the quirk of being specifically trained to accurately analyze a celebrity’s face. GIHPY boasts a 98% accuracy rate. Of course, Redditors tested out this claim by conducting an experiment of their own. One Redditor achieved a great outcome when submitting Will Smith.
[singham]I submitted Will Smith gif and got the following match.
Will Smith 96.47% Match
Dj Jazzy Jeff 0.4% Match
Jimmy Butler 0.28% Match
Nick Young 0.16% Match
Though, what GIPHY considers a celebrity mostly pertains to movie and TV stars, because when putting Shaq’s face to the test, the result came out to…
That was a .jpg file found on Google. What about a .gif of Shaq on GIPHY?
Well, you can sort of see how Shaq might be Terry Crews’ long lost brother if you squint really hard. Jokes aside, GIPHY only claims to have accurate results for 2,307 celebrities. The question is, how did GIPHY define as a celebrity?
That task was handled by GIPHY’s R&D team which included Nick Hasty, Ihor Kroosh, Dmitry Voitekh, and Dmytro Korduban. The researchers already had access to queries and started training the model with images from the most popular celebrity searches, which would’ve included power houses like Cardi B and the Kardashians. On their blog they mentioned that,
“To generate our training data, we extracted all the celebrity names from the top 50,000 searches across all our platforms, including our website, mobile apps and integrations likes Facebook, Twitter, and Slack. This yielded a data set comprised of over 2,300 celebrity names.”
The data set, which GIPHY has provided, doesn’t list Shaq as one of their celebrities. A finding like that is important because it shows you how much the availability of data plays into how face recognition models learn. An inaccuracy of the magnitude of Shaq’s example is what you’d get if the model has no available data or is not adequately trained. It will be interesting to see what developers will do with this model in the future.
GIPHY provided full installation, training, and various other setup instructions on their github page.
When I say, “the year 2250.” What thought pops into your head? It’s probably a world full of artificially intelligent machines re-enacting every possible scenario that science fiction has presented to us. Students in the 23rd century might study the history of artificial intelligence in a computer science course, scoffing at our nascent machine learning algorithms just as we scoff at the ENIAC. But just as the ENIAC — and many other Turing-complete machines — laid the foundation for supercomputers, neural networks are laying the groundwork for androids that can mimic Picasso’s artistic flair.
What are neural networks?
It’s common practice to conceptualize an artificial neural network(ANN) as a biological neuron. Technically speaking, an artificial neural network, as Ada Lovelace put it, is the “calculus of the nervous system.” A really, really simplified version of the nervous system. So much so that a few neuroscientists have probably lost sleep over the analogy.
A good way to visualize a neural network is not to think of biological neurons. Instead, imagine a car wash. Yes, a car wash.
When you take your grime-stained car to your local car wash, you’re expecting to input the car.
Now, imagine that the washing process is hidden from you.
When the car wash outputs your car, it’s suddenly clean.
So, we can surmise that you input your car(x), it was forwarded to a hidden layer(h), and then it was forwarded to an output layer(y). This is an oversimplification, but what this analogy does depict is forward propagation. Of course, there is more going on in a neural network than the forwarding of data.
Neural networks consist of an input layer, a single, or multiple, hidden layer(s), and an output layer. Note that layers are counted from the first hidden layer. The more hidden layers you have, the more complex the network is. Four or more hidden layers constitute a deep network. The neural network depicted in the above image is considered to be a 6X4X3X1 network.
Let’s take a look at the common components shared by all layers: synapses and neurons.
Synapses take the value of the input stored from the previous layer and multiplies it by a weight. The value of this weight can range from -1 and 1. As you’ll find out, deep learning is all about calibrating the weights to a specific value to yield an accurate output. Think of a shower where you adjust the hot and cold knobs to get your ideal temperature. Afterwards, the synapses forward propagate their results to the neurons.
Neurons, on the other hand, have a larger responsibility. They have to add up all of the (Weights * X values) and include a bias.
Here’s what the barebones equation looks like:
What is the bias?
Well, picture a scale. A bias sits on the left platform of the scale. This left platform is marked with a 0. On the other platform, you have the summation. The number associated with the right platform is 1. The larger the bias, the harder it is for that neuron to output a 1. The smaller the bias, the easier it is for the neuron to output a 1.
Why is a bias needed?
Implementing a bias allows you to offset an input that has a 0 value. Since weights are multiplied by the input along synapses and anything times 0 equals 0, you will have a situation where the final output is bizarrely different from what you expected it to be A great argument for the need for a bias can be found in this stackoverflow forum.
So we have our solution. We’re not finished yet, though. The neuron needs the potential to either fire or not. This is achieved through an activation function. There are several activation functions out there, and they all have their unique flavors. Exploring each one in depth is beyond the scope of this article, but we’ll take a look at a step function and a sigmoid function to show the contrasts between the two.
A step function is like an on/off switch. Its binary. For anyone familiar with programming, you know that step functions are equivalent to boolean conditionals. Take this pseudocode example:
threshold = 0.5.
If Y > threshold, output 1.
Else If Y < 1 threshold, output 0.
In terms of neural networks, step functions work fine if the network can clearly identify a class. Unfortunately, image identification almost always involves more than one class. For example, digits ranging from 0–9 are the classes used when trying to train a neural network with the dataset of handwritten digits found in the MNIST database.
What if all of the neurons fired because all the summations met the threshold? The results will be disastrous.
The solution is to use a nonlinear activation function, like a sigmoid function. When you pass your summation output into the sigmoid function, the range of your output will be between 0 and 1.
Now that we have decided on our activation function. We can move on to the most advertised portion of machine learning: the training stage.
Training Neural Networks:
When you were a toddler, you probably made plenty of mistakes, causing your parents to rebuke you to show you how wrong you were. After a few more scoldings you were trained to see the error of your ways. Similarly, we train neural networks by determining the amount of error in a prediction through the use of a cost function. The mean square error is a common cost function used for this purpose.
The desired value is compared to the prediction. Once we attain the error value we need to figure out how to minimize the cost function. Theoretically, all we need to do is to adjust the weights in order to change the cost. The lower our cost, the more accurate our result will be. Remember our shower analogy? Well, think of all the weights along the synapses as individual knobs. We can adjust these knobs manually by computing every possibility, but now imagine that there are not two knobs, but millions of knobs to adjust.The curse of dimensionality prevents anyone from trying to guess values via a brute force method.
Tired of analogies? Well, here’s a parable of our solution. It’s of a blind man who overshoots the location of his camp. The distance away from the camp is his error. He knows that his camp rests on the lowest elevation. He decides to minimize his error by moving if he senses that he’s going downhill. If he is in fact moving downhill, he will continue to increases his momentum, confident that he’s nearing the bottom.
This is essentially how gradient descent works. What’s happening is a mathematical process called differentiation. We find the derivative of the mean square error. This determines the rate of change. If it is negative, the cost function is going downhill. If it is positive, it is going uphill. We then adjust the weights accordingly.
This weight adjustment process is called backpropagation.You can boil down backpropagation to the chain rule. In calculus, the chain rule is used to multiply the derivative of an outer function by the derivative of the inner function. With this rule, no matter how many hidden layers a network has, you will always be able to work your way from the nth layer to the first. Let’s reimagine a 3X2X1 layer as nested functions f(f(XW1) W2).
You can equate backpropagation to taking apart a nested matryoshka doll. We’re popping out one function at a time, multiplying the inner by the outer until we reach our root layer. On the other hand, forward propagation is like putting together a matryoshka doll. When you combine forward propagation and backpropagation, you create a loop that results in incremental weight adjustments that result in decreased error values. After multiple iterations, the weights will stabilize and the end result is an optimized output.
In the beginning, I mentioned that neural networks may result in artistically inclined androids. Yes, this may be a fantasy. Some data scientists will be ecstatic if their network’s ImageNet classification accuracy can peak over 80 percent. This still shouldn’t discourage the most ardent Asimov fan, however, because data classification is at the forefront of AI nowadays. You know all those data science buzzwords you keep seeing in tech articles, and how deep learning seems to show up 9 out of 10 times? Well, recent breakthroughs in data classification is to blame for all the recent hoopla.
The primary type of neural network involved in these classifications is called a Conventional Neural Network, or CNN, or ConvNets — whichever floats your boat. Neural networks have many flavors depending on the problem at hand.They continue to break barriers and will be a prominent part of the technological landscape in the years to come.
One time I was talking to a friend about an app I built that fetched data from an API. He was lost after API.
Every field has its technical jargon, but the field of software development often either collapses words into acronyms or uses a normal word like “cloud” to describe servers.
Here are 7 quick buzzwords that’ll help demystify some of the jargon that your developer friends speak.
This has nothing to do with being an agile coder or thinker. It has everything to do with the development process of software. Agile development is a set of principles that ensures that software is continuously updated to meet the demands of a consumer. For example, if users complain about certain bugs in software(there are always bugs), the developers should be able to come up with a patch within weeks rather than telling users to wait for Bugware 2.0.
If you haven’t guessed it yet, project managers are not only looking for ways to hack software, but also to hack people. DevOps is Agile’s sister with a focus on the cooperation of development people and IT people. If you think about this philosophy as a marriage of two fields, then it’s easier to digest. If you want the whole manifesto, check out Atlassian’s post.
When a developer says her company performs unit tests and seems pretty proud about that fact, know that she means that the development team tests almost every line of additional code added to their code base. Every unit added has to be tested. This is a rigorous process, so not all development shops unit test.
“My website is responsive!” exclaims Developer Larry.
Developer Larry didn’t not just claim that his website is a sentient being. All he meant was that the elements(text, menu, etc.) of his website shrinks to fit within the screen of an iphone and expands to fit a widescreen monitor.
This isn’t science fiction. At this point, you’ve been living with AI for quite a while now. Data scientists cull data and feed them to computers so that the computers can do things like recognize facial patterns or autocomplete the rest of your sentence in your Gmail. This process is what most call Machine Learning, which is just a facet of AI.
Big data, as its name suggest, is a ton of data. Of course, it’s not that simple. A data scientist who uses the term would probably have a spark in their eye, not because there’s so much data(a lot of useless data is a headache), but because they are able to extract insights from that data that can lead to massive growth for the company they work for. Computers running machine learning algorithms also love big data. Everyone loves big data, it seems. In a way, you can say that Facebook’s carelessness with their use of big data is what landed them in hot waters.
Ah, the cloud. “I sent my pictures to the cloud.” Contrary to the name, the Cloud is not a network of storage spaces in Heaven. It’s just a load of servers sitting somewhere in Seattle or something. Those servers are special because you can just have a third party company keep your files without having to deal with your own limited storage space.