skip to main content
article

Improving the convergence of the backpropagation algorithm using learning rate adaptation methods

Published: 01 October 1999 Publication History
  • Get Citation Alerts
  • Abstract

    No abstract available.

    References

    [1]
    Altman, M. (1961). Connection between gradient methods and Newton's method for functionals. Bull. Acad. Polon. Sci. Ser. Sci. Math. Astronom. Phys., 9, 877-880.
    [2]
    Armijo, L. (1966). Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16, 1-3.
    [3]
    Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods. Complex Systems, 3, 331-342.
    [4]
    Battiti, R. (1992). First- and second-order methods for learning: Between steepest descent and Newton's method. Neural Computation, 4, 141-166.
    [5]
    Becker, S., & Le Cun, Y. (1988). Improving the convergence of the back-propagation learning with second order methods. In D. S. Touretzky, G. E. Hinton, & T. J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School (pp. 29-37). San Mateo, CA: Morgan Kaufmann.
    [6]
    Booth, A. (1949). An application of the method of steepest descent to the solution of systems of nonlinear simultaneous equations. Quart. J. Mech. Appl. Math., 2, 460-468.
    [7]
    Cauchy, A. (1847). Méthode générale pour la résolution des systèmes d'équations simultanées. Comp. Rend. Acad. Sci. Paris, 25, 536-538.
    [8]
    Chan, L. W., & Fallside, F. (1987). An adaptive training algorithm for back-propagation networks. Computers, Speech and Language, 2, 205-218.
    [9]
    Darken, C., Chiang, J., & Moody, J. (1992). Learning rate schedules for faster stochastic gradient search. In Proceedings of the IEEE 2nd Workshop on Neural Networks for Signal Processing (pp. 3-12).
    [10]
    Demuth, H., & Beale, M. (1992). Neural network toolbox user's guide. Natick, MA: MathWorks.
    [11]
    Dennis, J. E., & Moré, J. J. (1977). Quasi-Newton methods, motivation and theory. SIAM Review, 19, 46-89.
    [12]
    Dennis, J. E., & Schnabel, R. B. (1983). Numerical methods for unconstrained optimization and nonlinear equations. Englewood Cliffs, NJ: Prentice-Hall.
    [13]
    Fahlman, S. E. (1989). Faster-learning variations on back-propagation: An empirical study. InD. S. Touretzky, G. E. Hinton, & T. J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School (pp. 38-51). San Mateo, CA: Morgan Kaufmann.
    [14]
    Fakotakis, N., & Sirigos, J. (1996). A high-performance text-independent speaker recognition system based on vowel spotting and neural nets. In Proceedings of the IEEE International Conference on Acoustic Speech and Signal Processing, 2, 661-664.
    [15]
    Fakotakis, N., & Sirigos, J. (forthcoming). A high-performance text-independent speaker identification and verification system based on vowel spotting and neural nets. IEEE Trans. Speech and Audio processing.
    [16]
    Fisher, W., Zue, V., Bernstein, J., & Pallet, D. (1987). An acoustic-phonetic data base. Journal of Acoustical Society of America, Suppl. A, 81, 581-592.
    [17]
    Goldstein, A. A. (1962). Cauchy's method of minimization. Numerische Mathematik, 4, 146-150.
    [18]
    Gori, M., & Tesi, A. (1992). On the problem of local minima in backpropagation. IEEE Trans. Pattern Analysis and Machine Intelligence, 14, 76-85.
    [19]
    Hirose, Y., Yamashita, K., & Hijiya, S. (1991). Back-propagation algorithm which varies the number of hidden units. Neural Networks, 4, 61-66.
    [20]
    Hoehfeld, M., & Fahlman, S. E. (1992). Learning with limited numerical precision using the cascade-correlation algorithm. IEEE Trans. on Neural Networks, 3, 602-611.
    [21]
    Hsin, H.-C., Li, C.-C., Sun, M., & Sclabassi, R. J. (1995). An adaptive training algorithm for back-propagation neural networks. IEEE Transactions on System, Man and Cybernetics, 25, 512-514.
    [22]
    Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295-307.
    [23]
    Kelley, C. T. (1995). Iterative methods for linear and nonlinear equations. Philadelphia: SIAM.
    [24]
    Kung, S. Y., Diamantaras, K., Mao, W. D., Taur, J. S. (1991). Generalized perceptron networks with nonlinear discriminant functions. In R. J. Mammone & Y. Y. Zeevi (Eds.), Neural networks theory and applications (pp. 245-279). New York: Academic Press.
    [25]
    Le Cun, Y., Simard, P. Y., & Pearlmutter, B. A. (1993). Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in neural information processing systems, 5 (pp. 156-163). San Mateo, CA: Morgan Kaufmann.
    [26]
    Lee, Y., Oh, S.-H., & Kim, M. W. (1993). An analysis of premature saturation in backpropagation learning. Neural Networks, 6, 719-728.
    [27]
    Lisboa, P. J. G., & Perantonis S. J. (1991). Complete solution of the local minima in the XOR problem. Network, 2, 119-124.
    [28]
    Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1996). A new method in neural network supervised training with imprecision. In Proceedings of the IEEE 3rd International Conference on Electronics, Circuits and Systems (pp. 287- 290).
    [29]
    Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1997). Effective back-propagation with variable stepsize. Neural Networks, 10, 69-82.
    [30]
    Magoulas, G. D., Vrahatis, M. N., Grapsa, T. N., & Androulakis, G. S. (1997). Neural network supervised training based on a dimension reducing method. In S. W. Ellacot, J. C. Mason, & I. J. Anderson (Eds.), Mathematics of neural networks: Models, algorithms and applications (pp. 245-249). Norwell, MA: Kluwer.
    [31]
    Møller, M. F. (1993). A scaled conjugate gradient algorithm, for fast supervised learning. Neural Networks, 6, 525-533.
    [32]
    Nocedal, J. (1991). Theory of algorithms for unconstrained optimization. Acta Numerica, 199-242.
    [33]
    Ortega, J. M., & Rheinboldt, W. C. (1970). Iterative solution of nonlinear equations in several variables. New York: Academic Press.
    [34]
    Parker, D. B. (1987). Optimal algorithms for adaptive networks: Second order back-propagation, second order direct propagation, and second order Hebbian learning. In Proceedings of the IEEE International Conference on Neural Networks, 2, 593-600.
    [35]
    Parlos, A. G., Fernandez, B., Atiya, A. F., Muthusami, J., & Tsai, W. K. (1994). An accelerated learning algorithm for multilayer perceptron networks. IEEE Trans. on Neural Networks, 5, 493-497.
    [36]
    Pearlmutter, B. (1992). Gradient descent: Second-order momentum and saturating error. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds)., Advances in neural information processing systems, 4 (pp. 887-894). San Mateo, CA: Morgan Kaufmann.
    [37]
    Pfister, M., & Rojas, R. (1993). Speeding-up backpropagation--A comparison of orthogonal techniques. In Proceedings of the Joint Conference on Neural Networks . (pp. 517-523). Nagoya, Japan.
    [38]
    Riedmiller, M. (1994). Advanced supervised learning in multi-layer perceptrons--From backpropagation to adaptive learning algorithms. International Journal of Computer Standards and Interfaces, special issue, 5.
    [39]
    Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The Rprop algorithm. In Proceedings of the IEEE International Conference on Neural Networks. (pp. 586-591). San Francisco, CA.
    [40]
    Rigler, A. K., Irvine, J. M., & Vogl, T. P. (1991). Rescaling of variables in back-propagation learning. Neural Networks, 4, 225-229.
    [41]
    Rojas, R. (1996). Neural networks: A systematic introduction. Berlin: Springer-Verlag.
    [42]
    Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318-362). Cambridge, MA: MIT Press.
    [43]
    Schaffer, J., Whitley, D., & Eshelman, L. (1992). Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Proceedings of the International Workshop on Combinations of Genetic Algorithms and Neural Networks (pp. 1-37). Los Alamitos, CA: IEEE Computer Society Press.
    [44]
    Shultz, G. A., Schnabel, R. B., & Byrd, R. H. (1982). A family of trust region based algorithms for unconstrained minimization with strong global convergence properties (Tech. Rep. No. CU-CS216-82). University of Colorado.
    [45]
    Silva, F., & Almeida, L. (1990). Acceleration techniques for the back-propagation algorithm. Lecture Notes in Computer Science, 412, 110-119.
    [46]
    Sirigos, J., Darsinos, V., Fakotakis, N., & Kokkinakis, G. (1996). Vowel/nonvowel decision using neural networks and rules. In Proceedings of the 3rd IEEE International Conference on Electronics, Circuits, and Systems (pp. 510-513).
    [47]
    Sirigos, J., Fakotakis, N., & Kokkinakis, G. (1995). A comparison of several speech parameters for speaker independent speech recognition and speaker recognition. In Proceedings of the 4th European Conference of Speech Communications and Technology.
    [48]
    Van der Smagt, P. P. (1994). Minimization methods for training feedforward neural networks. Neural Networks, 7, 1-11.
    [49]
    Vogl, T. P, Mangis, J. K., Rigler, J. K., Zink, W. T., & Alkon, D. L. (1988). Accelerating the convergence of the backpropagation method. Biological Cybernetics, 59, 257-263.
    [50]
    Watrous, R. L. (1987). Learning algorithms for connectionist networks: Applied gradient of nonlinear optimization. In Proceedings of the IEEE International Conference on Neural Networks, 2, 619-627.
    [51]
    Wessel, L. F., & Barnard, E. (1992). Avoiding false local minima by proper initialization of connections. IEEE Trans. Neural Networks, 3, 899-905.
    [52]
    Wolfe, P. (1969). Convergence conditions for ascent methods. SIAM Review, 11, 226-235.
    [53]
    Wolfe, P. (1971). Convergence conditions for ascent methods. II: Some corrections. SIAM Review, 13, 185-188.

    Cited By

    View all
    • (2019)Survey on Brain-Computer InterfaceACM Computing Surveys10.1145/329771352:1(1-32)Online publication date: 13-Feb-2019
    • (2017)Set-Membership Type-1 Fuzzy Logic System Applied to Fault Classification in a Switch MachineIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.265962018:10(2703-2712)Online publication date: 29-Sep-2017
    • (2017)An enhanced receiver for an impulsive UWB-based PLC system for low-bit rate applicationsDigital Signal Processing10.1016/j.dsp.2017.08.00370:C(145-154)Online publication date: 1-Nov-2017
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Neural Computation
    Neural Computation  Volume 11, Issue 7
    Oct. 1, 1999
    296 pages
    ISSN:0899-7667
    Issue’s Table of Contents

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    Published: 01 October 1999

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Survey on Brain-Computer InterfaceACM Computing Surveys10.1145/329771352:1(1-32)Online publication date: 13-Feb-2019
    • (2017)Set-Membership Type-1 Fuzzy Logic System Applied to Fault Classification in a Switch MachineIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.265962018:10(2703-2712)Online publication date: 29-Sep-2017
    • (2017)An enhanced receiver for an impulsive UWB-based PLC system for low-bit rate applicationsDigital Signal Processing10.1016/j.dsp.2017.08.00370:C(145-154)Online publication date: 1-Nov-2017
    • (2016)On the interplay of network structure and gradient convergence in deep learning2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)10.1109/ALLERTON.2016.7852271(488-495)Online publication date: 27-Sep-2016
    • (2015)Trunk stabilization of multi-legged robots using on-line learning via a NARX neural network compensator2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7354276(6298-6303)Online publication date: 28-Sep-2015
    • (2015)Convergence of Rprop and variantsNeurocomputing10.1016/j.neucom.2015.02.016159:C(90-95)Online publication date: 2-Jul-2015
    • (2011)AVLR-EBPNeural Processing Letters10.1007/s11063-011-9173-133:2(201-214)Online publication date: 1-Apr-2011
    • (2010)Standard additive fuzzy system for stock price forecastingProceedings of the Second international conference on Intelligent information and database systems: Part II10.5555/1894808.1894842(279-288)Online publication date: 24-Mar-2010
    • (2010)Artificial neural network-based system for PET volume segmentationJournal of Biomedical Imaging10.1155/2010/1056102010(1-11)Online publication date: 1-Jan-2010
    • (2009)Self-scaled conjugate gradient training algorithmsNeurocomputing10.1016/j.neucom.2009.04.00672:13-15(3000-3019)Online publication date: 1-Aug-2009
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media