plateaux

Physical neural networks?

20 Apr 2024

Modern AI models such as OpenAI’s GPT, Anthropic’s Claude, Meta’s LLaMA, and Google’s Gemini have set off a boom in data center construction and a staggering rally in Nvidia thanks to the demand for their (over USD40,000) chips; machine learning models suck up a lot of electricity due to their compute-expensive training loops, and total ML modeling power consumption is as much as a small country. Small wonder that there is interest in energy-efficient machine learning methods; from the last article (emphasis mine):

One of the areas with the fastest-growing demand for energy is the form of machine learning called generative AI, which requires a lot of energy for training and a lot of energy for producing answers to queries. Training a large language model like OpenAI’s GPT-3, for example, uses nearly 1,300 megawatt-hours (MWh) of electricity, the annual consumption of about 130 US homes. According to the IEA, a single Google search takes 0.3 watt-hours of electricity, while a ChatGPT request takes 2.9 watt-hours. (An incandescent light bulb draws an average of 60 watt-hours of juice.) If ChatGPT were integrated into the 9 billion searches done each day, the IEA says, the electricity demand would increase by 10 terawatt-hours a year — the amount consumed by about 1.5 million European Union residents.

One potentially very exciting avenue for this is in developing neural networks that use fundamentally different physical representations versus silicon chips. These are chips that use completely different physics versus the silicon integrated circuits that comprise, say, Nvidia H100s.

This post is a brief dive into two promising approaches for physical representations of neural networks: optical neural networks (which use photonic circuits, making use of the energy-efficient transport of photons versus electrons), and quantum neural networks (which use quantum-mechanical properties to unlock the ability to perform efficient matrix operations).


Optical neural networks: training with photons

Optical neural networks, or ONNs, are neural networks implemented with photonic chips; photonic chips, in turn, are desirable due to the significantly lower energy needed to move photons around as opposed to electrons — photonic mobility comes almost for free.

That said, the “right” way to implement these things is still being researched. Some options include:

As of this writing, one of the most promising real-life optical neural network chips is the taichi system developed by researchers at Tsinghua University in Beijing, which has the ability to store a 13-14 million neuron optical neural network, and which has been able to show extreme energy efficiency compared to Nvidia’s (as of this writing) leading H100 chips:

Neural networks that imitate the workings of the human brain now often generate art, power computer vision, and drive many more applications. Now a neural network microchip from China that uses photons instead of electrons, dubbed Taichi, can run AI tasks as well as its electronic counterparts with a thousandth as much energy, according to a new study. All in all, the researchers found Taichi displayed an energy efficiency of up to roughly 160 trillion operations per second per watt and an area efficiency of nearly 880 trillion multiply-accumulate operations (the most basic operation in neural networks) per square millimeter. This makes it more than 1,000 times more energy efficient than one of the latest electronic GPUs, the NVIDIA H100, as well as roughly 100 times more energy efficient and 10 times more area efficient than previous other optical neural networks.

That said, though, there are some caveats:

Although the Taichi chip is compact and energy-efficient, Fang cautions that it relies on many other systems, such as a laser source and high-speed data coupling. These other systems are far more bulky than a single chip, “taking up almost a whole table,” she notes. In the future, Fang and her colleagues aim to add more modules onto the chips to make the whole system more compact and energy-efficient.

And how does it perform?

For instance, previous optical neural networks usually only possessed thousands of parameters—the connections between neurons that mimic the synapses linking biological neurons in the human brain. In contrast, Taichi boasts 13.96 million parameters. Previous optical neural networks were often limited to classifying data along just a dozen or so categories—for instance, figuring out whether images represented one of 10 digits. In contrast, in tests with the Omniglot database of 1,623 different handwritten characters from 50 different alphabets, Taichi displayed an accuracy of 91.89 percent, comparable to its electronic counterparts. The scientists also tested Taichi on the advanced AI task of content generation. They found it could produce music clips in the style of Johann Sebastian Bach and generate images of numbers and landscapes in the style of Vincent Van Gogh and Edvard Munch.

Above quotes taken from this IEEE Spectrum article, which references the original article in Science linked above.


Quantum neural networks: manipulating schroedinger’s equation for deep learning

Quantum neural networks — neural networks implemented on quantum computer architectures and taking advantage of quantum gates — have been considered since Feynman proposed quantum computing as a concept in the 1980s, though serious study of quantum neural networks have gradually picked up pace since the early 2000s. In this section, we point out three relevant papers.

The first dates to 2011: Neural networks with quantum architecture and quantum learning by Panella and Martinelli proposes a way to implement QNNs with quantum circuits. Interestingly, they don’t use backprop at all; rather, quantum computers can perform parameter search using exhaustive search, which is actually efficiently feasible when we have access to a sufficiently sophisticated quantum computer. This implies that QNNs are quite different from classical neural networks; there is the potential that the expensive backprop and gradient descent-based training routines can be completely obviated altogether when we are allowed to use methods based on quantum superposition.

Quantum perceptron over a field and neural network architecture selection in a quantum computer by Da Silva et al (2016) proposes a QNN called “quantum perceptron over a field” (QPF) that directly generalizes a classical perceptron, and additionally proposes a quantum computing algorithm to search over weights and architectures in polynomial time.

Finally, Efficient learning for deep quantum neural networks by Beer et al (2019) proposes deep QNNs by defining a quantum perceptron as a unitary operators taking m qubits to n qubits. An L-layer QNN, by analogy with classical feedforward neural networks, is thus a sequence of (generally, non-commutative) unitary operators. Finally, they propose a method to train these deep QNNs by assuming that training data is represented in the form of qubits, with a cost function given by the fidelity between the outputs and the original qubits (which should be replicable):

\[C = \frac{1}{N} \sum_{x=1}^{N} \langle \phi_x^{out} \vert \rho_x^{out} \vert \phi_x^{out} \rangle.\]

Finally, they present a quantum analogue of the backpropagation algorithm, which nonetheless is more efficient due to the mathematics of unitary operators.


Final points

Alternative physical re-thinkings of neural networks are an exciting area of present research, but even the most promising approaches (such as the Tai Chi paper previously mentioned) don’t have the representational capacity to match the needs of modern model architectures such as the GPTs, which can reach over one trillion parameters.

But given that we need exponential amounts of data relative to model size, it remains to be seen whether the bottleneck is growing the representational capacity of these physical neural networks or something more fundamental to the “throw data at a big model and see what sticks” approach that we seem insistent on using at the present juncture, such as the lack of useful data in our world and the underlying sample-inefficient architectures of current foundational models.