Kilho Shin, Tetsuji Kuboyama, Haruhiko Nishimura
Mendel, 570-575, Jan 1, 2012
CWC (Combination of Weakest Components), a successful consistency-based feature selection algorithm, is known to outperform not only other consistency-based algorithms but many non-consistency-based filter algorithms in terms of time-efficiency and predictive accuracy. In this paper, we propose two methods to develop a new consistency-based feature selection algorithm based on CWC: One is to improve the predictive accuracy of CWC, and the other is to support the requirement of multiple outputs. With respect to the predictive accuracy, we found through intensive experiments that the order of investigating features to determine whether to eliminate the feature (CWC is a backward elimination algorithm) can affect the performance of CWC to a large extent, and identified the best order of the three orders that we investigated. On the other hand, when considering the way how we perform feature selection in the real world, we see that the role of feature selection is to provide candidates for the feature set that could explain the target problem the best, and we should examine their actual relevance separately in a manner specific to the problem. In this sense, it is desirable that a feature selection algorithm outputs multiple feature sets, but CWC can output only a single feature set per a dataset. Our idea to solve this problem is to run CWC multiple times taking advantage of its excellent time-efficiency, and we propose an algorithm for iterative execution of CWC using the outputs of the previous executions as feedback.