article

Improving the convergence of the backpropagation algorithm using learning rate adaptation methods

Authors:

G. D. Magoulas,

M. N. Vrahatis, and

G. S. AndroulakisAuthors Info & Claims

Neural Computation, Volume 11, Issue 7

Pages 1769 - 1796

https://doi.org/10.1162/089976699300016223

Published: 01 October 1999 Publication History

Abstract

No abstract available.

References

[1]

Altman, M. (1961). Connection between gradient methods and Newton's method for functionals. Bull. Acad. Polon. Sci. Ser. Sci. Math. Astronom. Phys., 9, 877-880.

Google Scholar

[2]

Armijo, L. (1966). Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16, 1-3.

Crossref

Google Scholar

[3]

Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods. Complex Systems, 3, 331-342.

Google Scholar

[4]

Battiti, R. (1992). First- and second-order methods for learning: Between steepest descent and Newton's method. Neural Computation, 4, 141-166.

Digital Library

Google Scholar

[5]

Becker, S., & Le Cun, Y. (1988). Improving the convergence of the back-propagation learning with second order methods. In D. S. Touretzky, G. E. Hinton, & T. J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School (pp. 29-37). San Mateo, CA: Morgan Kaufmann.

Google Scholar

[6]

Booth, A. (1949). An application of the method of steepest descent to the solution of systems of nonlinear simultaneous equations. Quart. J. Mech. Appl. Math., 2, 460-468.

Crossref

Google Scholar

[7]

Cauchy, A. (1847). Méthode générale pour la résolution des systèmes d'équations simultanées. Comp. Rend. Acad. Sci. Paris, 25, 536-538.

Google Scholar

[8]

Chan, L. W., & Fallside, F. (1987). An adaptive training algorithm for back-propagation networks. Computers, Speech and Language, 2, 205-218.

Crossref

Google Scholar

[9]

Darken, C., Chiang, J., & Moody, J. (1992). Learning rate schedules for faster stochastic gradient search. In Proceedings of the IEEE 2nd Workshop on Neural Networks for Signal Processing (pp. 3-12).

Google Scholar

[10]

Demuth, H., & Beale, M. (1992). Neural network toolbox user's guide. Natick, MA: MathWorks.

Google Scholar

[11]

Dennis, J. E., & Moré, J. J. (1977). Quasi-Newton methods, motivation and theory. SIAM Review, 19, 46-89.

Digital Library

Google Scholar

[12]

Dennis, J. E., & Schnabel, R. B. (1983). Numerical methods for unconstrained optimization and nonlinear equations. Englewood Cliffs, NJ: Prentice-Hall.

Crossref

Google Scholar

[13]

Fahlman, S. E. (1989). Faster-learning variations on back-propagation: An empirical study. InD. S. Touretzky, G. E. Hinton, & T. J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School (pp. 38-51). San Mateo, CA: Morgan Kaufmann.

Google Scholar

[14]

Fakotakis, N., & Sirigos, J. (1996). A high-performance text-independent speaker recognition system based on vowel spotting and neural nets. In Proceedings of the IEEE International Conference on Acoustic Speech and Signal Processing, 2, 661-664.

Digital Library

Google Scholar

[15]

Fakotakis, N., & Sirigos, J. (forthcoming). A high-performance text-independent speaker identification and verification system based on vowel spotting and neural nets. IEEE Trans. Speech and Audio processing.

Google Scholar

[16]

Fisher, W., Zue, V., Bernstein, J., & Pallet, D. (1987). An acoustic-phonetic data base. Journal of Acoustical Society of America, Suppl. A, 81, 581-592.

Google Scholar

[17]

Goldstein, A. A. (1962). Cauchy's method of minimization. Numerische Mathematik, 4, 146-150.

Digital Library

Google Scholar

[18]

Gori, M., & Tesi, A. (1992). On the problem of local minima in backpropagation. IEEE Trans. Pattern Analysis and Machine Intelligence, 14, 76-85.

Digital Library

Google Scholar

[19]

Hirose, Y., Yamashita, K., & Hijiya, S. (1991). Back-propagation algorithm which varies the number of hidden units. Neural Networks, 4, 61-66.

Digital Library

Google Scholar

[20]

Hoehfeld, M., & Fahlman, S. E. (1992). Learning with limited numerical precision using the cascade-correlation algorithm. IEEE Trans. on Neural Networks, 3, 602-611.

Digital Library

Google Scholar

[21]

Hsin, H.-C., Li, C.-C., Sun, M., & Sclabassi, R. J. (1995). An adaptive training algorithm for back-propagation neural networks. IEEE Transactions on System, Man and Cybernetics, 25, 512-514.

Crossref

Google Scholar

[22]

Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295-307.

Crossref

Google Scholar

[23]

Kelley, C. T. (1995). Iterative methods for linear and nonlinear equations. Philadelphia: SIAM.

Google Scholar

[24]

Kung, S. Y., Diamantaras, K., Mao, W. D., Taur, J. S. (1991). Generalized perceptron networks with nonlinear discriminant functions. In R. J. Mammone & Y. Y. Zeevi (Eds.), Neural networks theory and applications (pp. 245-279). New York: Academic Press.

Crossref

Google Scholar

[25]

Le Cun, Y., Simard, P. Y., & Pearlmutter, B. A. (1993). Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in neural information processing systems, 5 (pp. 156-163). San Mateo, CA: Morgan Kaufmann.

Crossref

Google Scholar

[26]

Lee, Y., Oh, S.-H., & Kim, M. W. (1993). An analysis of premature saturation in backpropagation learning. Neural Networks, 6, 719-728.

Digital Library

Google Scholar

[27]

Lisboa, P. J. G., & Perantonis S. J. (1991). Complete solution of the local minima in the XOR problem. Network, 2, 119-124.

Crossref

Google Scholar

[28]

Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1996). A new method in neural network supervised training with imprecision. In Proceedings of the IEEE 3rd International Conference on Electronics, Circuits and Systems (pp. 287- 290).

Google Scholar

[29]

Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1997). Effective back-propagation with variable stepsize. Neural Networks, 10, 69-82.

Digital Library

Google Scholar

[30]

Magoulas, G. D., Vrahatis, M. N., Grapsa, T. N., & Androulakis, G. S. (1997). Neural network supervised training based on a dimension reducing method. In S. W. Ellacot, J. C. Mason, & I. J. Anderson (Eds.), Mathematics of neural networks: Models, algorithms and applications (pp. 245-249). Norwell, MA: Kluwer.

Crossref

Google Scholar

[31]

Møller, M. F. (1993). A scaled conjugate gradient algorithm, for fast supervised learning. Neural Networks, 6, 525-533.

Digital Library

Google Scholar

[32]

Nocedal, J. (1991). Theory of algorithms for unconstrained optimization. Acta Numerica, 199-242.

Google Scholar

[33]

Ortega, J. M., & Rheinboldt, W. C. (1970). Iterative solution of nonlinear equations in several variables. New York: Academic Press.

Crossref

Google Scholar

[34]

Parker, D. B. (1987). Optimal algorithms for adaptive networks: Second order back-propagation, second order direct propagation, and second order Hebbian learning. In Proceedings of the IEEE International Conference on Neural Networks, 2, 593-600.

Google Scholar

[35]

Parlos, A. G., Fernandez, B., Atiya, A. F., Muthusami, J., & Tsai, W. K. (1994). An accelerated learning algorithm for multilayer perceptron networks. IEEE Trans. on Neural Networks, 5, 493-497.

Digital Library

Google Scholar

[36]

Pearlmutter, B. (1992). Gradient descent: Second-order momentum and saturating error. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds)., Advances in neural information processing systems, 4 (pp. 887-894). San Mateo, CA: Morgan Kaufmann.

Google Scholar

[37]

Pfister, M., & Rojas, R. (1993). Speeding-up backpropagation--A comparison of orthogonal techniques. In Proceedings of the Joint Conference on Neural Networks . (pp. 517-523). Nagoya, Japan.

Google Scholar

[38]

Riedmiller, M. (1994). Advanced supervised learning in multi-layer perceptrons--From backpropagation to adaptive learning algorithms. International Journal of Computer Standards and Interfaces, special issue, 5.

Google Scholar

[39]

Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The Rprop algorithm. In Proceedings of the IEEE International Conference on Neural Networks. (pp. 586-591). San Francisco, CA.

Google Scholar

[40]

Rigler, A. K., Irvine, J. M., & Vogl, T. P. (1991). Rescaling of variables in back-propagation learning. Neural Networks, 4, 225-229.

Digital Library

Google Scholar

[41]

Rojas, R. (1996). Neural networks: A systematic introduction. Berlin: Springer-Verlag.

Crossref

Google Scholar

[42]

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318-362). Cambridge, MA: MIT Press.

Digital Library

Google Scholar

[43]

Schaffer, J., Whitley, D., & Eshelman, L. (1992). Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Proceedings of the International Workshop on Combinations of Genetic Algorithms and Neural Networks (pp. 1-37). Los Alamitos, CA: IEEE Computer Society Press.

Google Scholar

[44]

Shultz, G. A., Schnabel, R. B., & Byrd, R. H. (1982). A family of trust region based algorithms for unconstrained minimization with strong global convergence properties (Tech. Rep. No. CU-CS216-82). University of Colorado.

Google Scholar

[45]

Silva, F., & Almeida, L. (1990). Acceleration techniques for the back-propagation algorithm. Lecture Notes in Computer Science, 412, 110-119.

Crossref

Google Scholar

[46]

Sirigos, J., Darsinos, V., Fakotakis, N., & Kokkinakis, G. (1996). Vowel/nonvowel decision using neural networks and rules. In Proceedings of the 3rd IEEE International Conference on Electronics, Circuits, and Systems (pp. 510-513).

Google Scholar

[47]

Sirigos, J., Fakotakis, N., & Kokkinakis, G. (1995). A comparison of several speech parameters for speaker independent speech recognition and speaker recognition. In Proceedings of the 4th European Conference of Speech Communications and Technology.

Google Scholar

[48]

Van der Smagt, P. P. (1994). Minimization methods for training feedforward neural networks. Neural Networks, 7, 1-11.

Digital Library

Google Scholar

[49]

Vogl, T. P, Mangis, J. K., Rigler, J. K., Zink, W. T., & Alkon, D. L. (1988). Accelerating the convergence of the backpropagation method. Biological Cybernetics, 59, 257-263.

Digital Library

Google Scholar

[50]

Watrous, R. L. (1987). Learning algorithms for connectionist networks: Applied gradient of nonlinear optimization. In Proceedings of the IEEE International Conference on Neural Networks, 2, 619-627.

Google Scholar

[51]

Wessel, L. F., & Barnard, E. (1992). Avoiding false local minima by proper initialization of connections. IEEE Trans. Neural Networks, 3, 899-905.

Digital Library

Google Scholar

[52]

Wolfe, P. (1969). Convergence conditions for ascent methods. SIAM Review, 11, 226-235.

Digital Library

Google Scholar

[53]

Wolfe, P. (1971). Convergence conditions for ascent methods. II: Some corrections. SIAM Review, 13, 185-188.

Digital Library

Google Scholar

Cited By

View all

Bablani AEdla DTripathi DCheruku R(2019)Survey on Brain-Computer InterfaceACM Computing Surveys10.1145/329771352:1(1-32)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3297713
de Aguiar Ede A. Nogueira FVellasco MRibeiro M(2017)Set-Membership Type-1 Fuzzy Logic System Applied to Fault Classification in a Switch MachineIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.265962018:10(2703-2712)Online publication date: 29-Sep-2017
https://dl.acm.org/doi/10.1109/TITS.2017.2659620
Salmento MPestana de Aguiar ECamponogara nRibeiro M(2017)An enhanced receiver for an impulsive UWB-based PLC system for low-bit rate applicationsDigital Signal Processing10.1016/j.dsp.2017.08.00370:C(145-154)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.dsp.2017.08.003
Show More Cited By

Index Terms

Improving the convergence of the backpropagation algorithm using learning rate adaptation methods
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Modeling and simulation
    1. Simulation evaluation
2. Mathematics of computing
  1. Mathematical analysis
    1. Differential equations
      1. Ordinary differential equations

Recommendations

On the rate of convergence of projected Barzilai–Borwein methods

We study the rate of convergence of a projected Barzilai–Borwein method, which performs the Grippo–Lampariello–Lucidi GLL non-monotone line search along the feasible direction, for convex constrained optimization. Under mild conditions, we establish ...
Read More
Towards the Optimal Learning Rate for Backpropagation

A backpropagation learning algorithm for feedforward neural networks with an adaptive learning rate is derived. The algorithm is based upon minimising the instantaneous output error and does not include any simplifications encountered in the corresponding ...
Read More
On the convergence of Newton-type methods using recurrent functions

We introduce the new idea of recurrent functions to provide a new semilocal convergence analysis for Newton-type methods. It turns out that our sufficient convergence conditions are weaker, and the error bounds are tighter than in earlier studies in ...
Read More

Comments

Information & Contributors

Information

Published In

Neural Computation Volume 11, Issue 7

Oct. 1, 1999

296 pages

ISSN:0899-7667

Editor:
Terrence Sejnowski
The Salk Institute and Univ. of California at San Diego

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 October 1999

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

View all

Bablani AEdla DTripathi DCheruku R(2019)Survey on Brain-Computer InterfaceACM Computing Surveys10.1145/329771352:1(1-32)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3297713
de Aguiar Ede A. Nogueira FVellasco MRibeiro M(2017)Set-Membership Type-1 Fuzzy Logic System Applied to Fault Classification in a Switch MachineIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.265962018:10(2703-2712)Online publication date: 29-Sep-2017
https://dl.acm.org/doi/10.1109/TITS.2017.2659620
Salmento MPestana de Aguiar ECamponogara nRibeiro M(2017)An enhanced receiver for an impulsive UWB-based PLC system for low-bit rate applicationsDigital Signal Processing10.1016/j.dsp.2017.08.00370:C(145-154)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.dsp.2017.08.003
Ithapu VRavi SSingh V(2016)On the interplay of network structure and gradient convergence in deep learning2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)10.1109/ALLERTON.2016.7852271(488-495)Online publication date: 27-Sep-2016
https://dl.acm.org/doi/10.1109/ALLERTON.2016.7852271
Cairl BKhorrami F(2015)Trunk stabilization of multi-legged robots using on-line learning via a NARX neural network compensator2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7354276(6298-6303)Online publication date: 28-Sep-2015
https://dl.acm.org/doi/10.1109/IROS.2015.7354276
Bailey T(2015)Convergence of Rprop and variantsNeurocomputing10.1016/j.neucom.2015.02.016159:C(90-95)Online publication date: 2-Jul-2015
https://dl.acm.org/doi/10.1016/j.neucom.2015.02.016
Didandeh AMirbakhsh NAmiri AFathy M(2011)AVLR-EBPNeural Processing Letters10.1007/s11063-011-9173-133:2(201-214)Online publication date: 1-Apr-2011
https://dl.acm.org/doi/10.1007/s11063-011-9173-1
Do SNguyen TWoo DPark D(2010)Standard additive fuzzy system for stock price forecastingProceedings of the Second international conference on Intelligent information and database systems: Part II10.5555/1894808.1894842(279-288)Online publication date: 24-Mar-2010
https://dl.acm.org/doi/10.5555/1894808.1894842
Sharif MAbbod MAmira AZaidi H(2010)Artificial neural network-based system for PET volume segmentationJournal of Biomedical Imaging10.1155/2010/1056102010(1-11)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1155/2010/105610
Kostopoulos AGrapsa T(2009)Self-scaled conjugate gradient training algorithmsNeurocomputing10.1016/j.neucom.2009.04.00672:13-15(3000-3019)Online publication date: 1-Aug-2009
https://dl.acm.org/doi/10.1016/j.neucom.2009.04.006
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

On the rate of convergence of projected Barzilai–Borwein methods

Towards the Optimal Learning Rate for Backpropagation

On the convergence of Newton-type methods using recurrent functions

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations