Let be a vector from the space , where N is the sum of the number of weights and of the number of biases of the network. Let E be the error function we want to minimize.
SCG differs from other CGMs in two points:
and adjusting at each iteration. This is the main contribution of SCG to both fields of neural learning and optimization theory.
SCG has been shown to be considerably faster than standard backpropagation and than other CGMs [Mol93].