References

GOFAI GOOD OLD-FASHIONED ARTIFICIAL INTELLIGENCE

Supplementary reading:

M.A. Arbib, 1989, The Metaphorical Brain 2: Neural Networks and Beyond, Wiley-Interscience.

M.A. Arbib, Ed., 1995, The Handbook of Brain Theory and Neural Networks, MIT Press (paperback).

Michael A. Arbib, and Jeffrey Grethe, Editors, 2001, Computing the Brain: A Guide to Neuroinformatics, and the Project Team of the University of Southern California Brain Project, San Diego: Academic Press.

A. Weitzenfeld, M.A. Arbib and A. Alexander, 2000, NSL Neural Simulation Language, MIT Press (in press). [http://www-hbp.usc.edu/_Documentation/NSL/Book/TOC.htm]

Reinforcement Learning

Baxter, J., Tridgell, A., Weaver, L. (1998). KnightCap: A chess program that learns by combining TD() with game-tree search. Proceedings of the Fifteenth International Conference on Machine Learning, pp. 28-36.

Bertsekas, D. P., and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.

Crites, R. H., and Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems 9, pp. 1017-1023. MIT Press, Cambridge, MA.

McCallum, A. K. (1995) Reinforcement Learning with Selective Perception and Hidden State. University of Rochester PhD. thesis.

Nie, J., and Haykin, S. (1996). A dynamic channel assignment policy through Q-learning. CRL Report 334. Communications Research Laboratory, McMaster University, Hamilton, Ontario.

Precup, D., Sutton, R.S. (1998). Multi-time models for temporally abstract planning. Advances in Neural Information Processing Systems 11. MIT Press, Cambridge, MA.

Singh, S. P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 10, pp. 974-980. MIT Press, Cambridge, MA.

Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44.

Sutton, R. S., and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.

Sutton, R. S., Precup, D., Singh, S. (1998). Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Technical Report 98-74, Department of Computer Science, University of Massachusetts.

Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58-68.

Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, Cambridge University.

Zhang, W., and Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD network. In Advances in Neural Information Processing Systems 9, pp. 1024-1030. MIT Press, Cambridge, MA.