Interview with Prof. Xin Yao of University of Birmingham, co-chairman of the 2017 IEEE Symposium on Computational Intelligence and Ensemble Learning (Interviewed by Dr. David Fogel, co-chair, 2017 IEEE Symposium Series on Computational Intelligence.)
Prof. Yao (Fellow IEEE) has been researching areas in computational intelligence since before I first met him in 1993. His interests include evolutionary computation, neural networks, meta-heuristic algorithms, data mining, and also ensemble learning. Prof. Yao serves as chairman of the IEEE 2017 Symposium on Computational Intelligence and Ensemble Learning, to be held as part of the 2017 IEEE Symposium Series on Computational Intelligence, held in Honolulu, HI, Nov. 27-Dec. 1. I asked Prof. Yao to help identify the key issues in ensemble learning.
DF: What is ensemble learning and how does it differ from other approaches?
XY: An ensemble refers to a collection of machine learners. Ensemble learning studies the algorithms and techniques that are used to construct and train such a collection of learners so that they can perform a task collaboratively. This is different from the monolithic approach where the focus is on training a single learner. My personal belief is that as problems grow larger and more complex, it’ll be increasingly difficult to construct and train a monolithic learner. The divide-and-conquer strategy has to be used in such cases. Ensemble learning could be regarded as an automatic approach towards divide-and-conquer since it uses a collection of learners to perform a complex task. Of course, how to do the “division” automatically is an extremely interesting question. There are many techniques around, including those from the field of evolutionary computation (see an early paper, P. J. Darwen and X. Yao, “Speciation as automatic categorical modularization,” IEEE Transactions on Evolutionary Computation, 1(2):101-108, 1997.)
DF: [Smiling] OK, so I remember that paper from 20 years ago. I was editor-in-chief of the IEEE Transactions on Evolutionary Computation at the time so I have to remember, right? All right, where is ensemble learning used to best effect?
XY: Ensemble learning is most likely to be useful if we cannot find a perfect learner that is always correct. In this case, we can use a collection of imperfect learners to construct a stronger ensemble, if the errors made by different individual learners are negatively correlated or at least mutually independent of each other. Our 1997 paper demonstrated that we could come up with a stronger ensemble player by combining different game-playing strategies. There have been many combination methods, linear or nonlinear, proposed over the years for different application scenarios. Leo Breiman has some very nice statistical discussions of ensembles, which I recommend to everyone very highly. In terms of problem domains, ensemble learning has been used widely, especially for regression problems. It has been used in supervised learning, semi-supervised learning, and unsupervised learning. It has been used in online learning of data streams with concept drift, class imbalance learning, and reinforcement learning. It has also been used in learning to optimize by adaptively selecting search operators. The idea behind ensemble learning is closely linked to that behind algorithm portfolios in optimization. In fact, there has been some initial work transferring ideas from ensemble learning to algorithm portfolios. (K. Tang, P. Yang and X. Yao, “Negatively Correlated Search,” IEEE Journal on Selected Areas in Communications, 34(3):542-550, March 2016.)
DF: If you had to choose just one example of ensemble learning to point to a successful application, which one would it be?
XY: There is not any one that stands out immediately for me, because there are so many that are so successful. However, let me mention the $1m winner of the Netflix Prize back in 2009: “A key to achieving highly competitive results on the Netflix data is usage of sophisticated blending schemes, which combine the multiple individual predictors into a single final solution.” (Y. Koren, “The BellKor Solution to the Netflix Grand Prize,” (2009). URL: http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf. Accessed on 17/1/2017.)
DF: You have a strong background in evolutionary algorithms, including being the second editor-in-chief of IEEE Transactions on Evolutionary Computation (!). How did you get into ensemble learning?
XY: Ensemble learning and evolutionary computation share more similarities than differences. Both emphasize a population/collection of individuals/learners. Both emphasize diversity. In fact, my evolutionary computation background helped me a lot in getting into and understanding ensemble learning. I first got into ensemble learning when I was using evolutionary programming to evolve neural networks (X. Yao and Y. Liu, “A new evolutionary system for evolving artificial neural networks,” IEEE Transactions on Neural Networks, 8(3):694-713, May 1997.). Then it became kind of obvious that “two heads are better than one” (or, a population of heads is better than a population of one head). We reported some evidence to support that (X. Yao and Y. Liu, “Making use of population information in evolutionary artificial neural networks,” IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 28(3):417-425, June 1998), which also for the first time proposed an evolutionary algorithm for selecting a subset of a population as an ensemble. My involvement with ensemble learning grows from there, until today.
DF: What are you working on personally in this area?
XY: Diversity is a key topic I’ve worked on closely, partly because it’s a key issue in ensemble learning, partly because it is related to but different from the diversity in evolutionary computation. At this moment, I’m focusing on ensemble approaches to online learning with concept drift and class imbalance learning, including formulating such learning as multi-objective learning problems.
DF: What led you to this work?
XY: It is fairly intuitive as to why diversity is essential. When we say “two heads are better than one,” we implicitly assume that the two heads think differently. Two positively correctly heads won’t provide much more useful information than any one of them. Such intuition about diversity could be analyzed and understood, at least for regression problems, based on the error decomposition into the bias, variance, and co-variance terms, where the co-variance term could be seen as the diversity. This is more or less the idea behind negative correlation learning.
DF: What do you think someone not working directly in ensemble learning would gain by attending your symposium at 2017 IEEE SSCI?
XY: One could find out how and why ensembles could be used to build a better and more robust learner than single learners. They could find out how such an idea could be used to build algorithm portfolios in optimization. Instead of including arbitrary or best (in the literature) individual algorithms in a portfolio, there are techniques that we could use to construct better algorithm portfolios that are even stronger than the best single algorithm. (K. Tang, F. Peng, G. Chen and X. Yao, “Population-based Algorithm Portfolios with automated constituent algorithms selection,” Information Sciences, 279:94-104, 20 September, 2014). A participant in the symposium could also find out what has not been done, or has not been done well, in this area, so that he or she could determine opportunities for further advancements.
Contact information:
Xin Yao: University of Birmingham, UK, x.yao@cs.bham.ac.uk. https://www.cs.bham.ac.uk/~xin/
David Fogel: Natural Selection, Inc., 6480 Weathers Pl., Suite 350, San Diego, CA 92121, dfogel@natural-selection.com. (858) 455-6449. www.natural-selection.com
The 2017 IEEE Symposium Series on Computational Intelligence can be found at: http://www.ele.uri.edu/ieee-ssci2017/.
See the references contained in the interview, and also:
James M. Keller, Derong Liu, and David B. Fogel, Fundamentals of Computational Intelligence: Neural Networks, Fuzzy Logic, and Evolutionary Computation, John Wiley, NY, 2016
© 2017, David Fogel