Artificial intelligence is growing ever more capable at increasingly complex tasks, but requires vast amounts of computing power to develop. A more efficient technique could save up to half the time, energy and computer power needed to train an AI model.
Deep learning models are typically composed of a huge grid of artificial neurons linked by “weights” computer code that takes an input and passes on a changed output that represent the synapses linking real neurons. By tinkering with these weights over thousands or millions of trials, it is possible to gradually train the model to carry out a task, such as identifying a person from a picture of their face or digitising text from an image of handwriting.
To train a model, researchers go through an iterative process of passing data in, assessing the quality of the output and then calculating a gradient that informs how the weights should be altered to improve performance. This process involves passing data from one side of the neural network to the other, via every link in the chain of artificial neurons, and then working backwards again to the beginning in order to calculate the gradient.