Neural Networks Use Case

9 min readMar 22, 2021

Deep learning is one of the most exciting artificial intelligence topics. It’s a family of algorithms loosely based on a biological interpretation that have proven astonishing results in many areas: computer vision, natural language processing, speech recognition and more.

Over the past five years, deep learning expanded to a broad range of industries.

Many recent technological breakthroughs owe their existence to it. To name a few: Tesla autonomous cars, photo tagging systems at Facebook, virtual assistants such as Siri or Cortana, chatbots, object recognition cameras. In so many areas, deep learning achieved a human-performance level on the cognitive tasks of language understanding and image analysis.

Here’s an example of what deep learning algorithms are capable of doing: automatically detecting and labeling different objects in a scene.

Deep learning also became a widely mediatized tech topic.

Neural Networks

Neural networks reflect the behavior of the human brain, allowing computer programs to recognize patterns and solve common problems in the fields of AI, machine learning, and deep learning.

What are neural networks?

Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another.

Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

Neural networks rely on training data to learn and improve their accuracy over time. However, once these learning algorithms are fine-tuned for accuracy, they are powerful tools in computer science and artificial intelligence, allowing us to classify and cluster data at a high velocity. Tasks in speech recognition or image recognition can take minutes versus hours when compared to the manual identification by human experts. One of the most well-known neural networks is Google’s search algorithm.

How do neural networks work?

Think of each individual node as its own linear regression model, composed of input data, weights, a bias (or threshold), and an output. The formula would look something like this:

Once an input layer is determined, weights are assigned. These weights help determine the importance of any given variable, with larger ones contributing more significantly to the output compared to other inputs. All inputs are then multiplied by their respective weights and then summed. Afterward, the output is passed through an activation function, which determines the output. If that output exceeds a given threshold, it “fires” (or activates) the node, passing data to the next layer in the network. This results in the output of one node becoming in the input of the next node. This process of passing data from one layer to the next layer defines this neural network as a feedforward network.

Let’s break down what one single node might look like using binary values. We can apply this concept to a more tangible example, like whether you should go surfing (Yes: 1, No: 0). The decision to go or not to go is our predicted outcome, or y-hat. Let’s assume that there are three factors influencing your decision-making:

Are the waves good? (Yes: 1, No: 0)
Is the line-up empty? (Yes: 1, No: 0)
Has there been a recent shark attack? (Yes: 0, No: 1)

Then, let’s assume the following, giving us the following inputs:

X1 = 1, since the waves are pumping
X2 = 0, since the crowds are out
X3 = 1, since there hasn’t been a recent shark attack

Now, we need to assign some weights to determine importance. Larger weights signify that particular variables are of greater importance to the decision or outcome.

W1 = 5, since large swells don’t come around often
W2 = 2, since you’re used to the crowds
W3 = 4, since you have a fear of sharks

Finally, we’ll also assume a threshold value of 3, which would translate to a bias value of –3. With all the various inputs, we can start to plug in values into the formula to get the desired output.

Y-hat = (1*5) + (0*2) + (1*4) — 3 = 6

If we use the activation function from the beginning of this section, we can determine that the output of this node would be 1, since 6 is greater than 0. In this instance, you would go surfing; but if we adjust the weights or the threshold, we can achieve different outcomes from the model. When we observe one decision, like in the above example, we can see how a neural network could make increasingly complex decisions depending on the output of previous decisions or layers.

In the example above, we used perceptrons to illustrate some of the mathematics at play here, but neural networks leverage sigmoid neurons, which are distinguished by having values between 0 and 1. Since neural networks behave similarly to decision trees, cascading data from one node to another, having x values between 0 and 1 will reduce the impact of any given change of a single variable on the output of any given node, and subsequently, the output of the neural network.

As we start to think about more practical use cases for neural networks, like image recognition or classification, we’ll leverage supervised learning, or labeled datasets, to train the algorithm. As we train the model, we’ll want to evaluate its accuracy using a cost (or loss) function. This is also commonly referred to as the mean squared error (MSE). In the equation below,

i represents the index of the sample,
y-hat is the predicted outcome,
y is the actual value, and
m is the number of samples.

Ultimately, the goal is to minimize our cost function to ensure correctness of fit for any given observation. As the model adjusts its weights and bias, it uses the cost function and reinforcement learning to reach the point of convergence, or the local minimum. The process in which the algorithm adjusts its weights is through gradient descent, allowing the model to determine the direction to take to reduce errors (or minimize the cost function). With each training example, the parameters of the model adjust to gradually converge at the minimum.

See this IBM Developer article for a deeper explanation of the quantitative concepts involved in neural networks.

Most deep neural networks are feedforward, meaning they flow in one direction only, from input to output. However, you can also train your model through backpropagation; that is, move in the opposite direction from output to input. Backpropagation allows us to calculate and attribute the error associated with each neuron, allowing us to adjust and fit the parameters of the model(s) appropriately.

A Brief History of Neural Networks

Neural networks date back to the early 1940s when mathematicians Warren McCulloch and Walter Pitts built a simple algorithm-based system designed to emulate human brain function. Work in the field accelerated in 1957 when Cornell University’s Frank Rosenblatt conceived of the perceptron, the groundbreaking algorithm developed to perform complex recognition tasks. During the four decades that followed, the lack of computing power necessary to process large amounts of data put the brakes on advances. In the 2000s, thanks to the advent of greater computing power and more sophisticated hardware, as well as to the existence of vast data sets to draw from, computer scientists finally had what they needed, and neural networks and AI took off, with no end in sight. To understand how much the field has expanded in the new millennium, consider that ninety percent of internet data has been created since 2016. That pace will continue to accelerate, thanks to the growth of the Internet of Things (IoT).

For more background and an expansive timeline, read “The Definitive Guide to Machine Learning: Business Applications, Techniques, and Examples.”

Microsoft to accelerate Bing search with neural network

When we search Google’s web index, we are only searching around 10 percent of the half-a-trillion or so pages that are potentially available. Much of the content in the larger deep web — not to be confused with the dark web — is buried further down in the sites that make up the visible surface web. The indexes of competitors like Yahoo and Bing (around 15 billion pages each) are still only half as large as Google’s. To close this gap, Microsoft has recently pioneered sophisticated new Field-Programmable Gate Array (FPGA) technology to make massive web crawls more efficient, and faster.

Google’s engineers have previously estimated that a typical 0.2-second web query reflects a quantity of work spent in indexing and retrieval equal to about 0.0003 kWh of energy per search. With over 100 billion looks per month at their petabyte index, well-executed page ranking has become a formidable proposition. Microsoft’s approach with Bing has been to break the ranking portion of search into three parts — feature extraction, free-form expressions, and machine learning scoring:

Bing’s document selection service, which retrieves and filters the documents containing search terms, still runs on Xeon processors. Their ranking services, which score the filtered documents according to the relevance of the search results, have recently been ported to an FPGA-based system they call Project Catapult. Microsoft could likely afford custom ASICs (application-specific integrated circuits) to accelerate Bing’s ranking functions. But given the speed at which the software algorithms now change, it probably can’t afford not to use programmable FPGA hardware instead.

Traditionally FPGAs have been the go-to device for very specific computing needs. Because you can easily reconfigure their internal structure, they’re frequently used for prototyping processors. They’re also handy for applications where a large number of input or output connections on the chip are needed. But there is another place where they are used no matter what the cost, and that is for where absolute speed is of the essence. For example, if your device needs to calculate the total energy of all the hits to a massive satellite-based cosmic-ray detecting scintillator array, decide which hits are real, and do it all in a few nanoseconds, software simply isn’t up to the job.

Project Catapult was originally based on a PCI-Express Card design from 2011 that used six Xilinx FPGAs linked with a controller. Integrating new devices into their existing servers, however, required several redesigns to adhere to strict limits on how much power the devices would draw, and how much heat they would radiate. Their latest design now uses a Stratix V GS D5 FPGA from Altera. For the hardcore FPGA crowd, this particular device has 1,590 digital signal processing blocks, 2,104 M20K memory blocks, and thirty-six 14.1 Gb/sec transceivers. As the Bing team announced last June at ISCA 2014, this platform enabled ranking with roughly half the number of servers used before.

The term Microsoft is using here is “convolution neural network accelerator.” Convolution is commonly used in signal processing applications like computer vision, speech recognition, or anywhere where special averaging or cross-correlation would be of service. In computer vision for example, 2D convolution can be used to massage each pixel using information from its immediate neighbors to achieve various filtering effects. Convolutional neural networks (CNNS) are composed of small assemblies of artificial neurons, where each focuses on a just small part of an image — their receptive field. CNNs have already bested humans in classifying objects in challenges like the ImageNet 1000. Classifying documents for ranking is a similar problem, which is now one among many Microsoft hopes to address with CNNs.

As we speak Microsoft’s engineers are looking to start using Altera’s new Arria 10 FPGA. This chip is optimized for the kinds of floating-point intensive operations that were traditionally the province of DSPs. Able to run at Teraflop speeds with three times the energy efficiency of a comparable GPU, Microsoft hopes it will help them to make significant gains in the search-and-rank business.