SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

Document Type : Research Article

Authors

University of Zanjan Department of Computer Engineering Zanjan, Iran

Abstract

 In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification. Therefore, selection of the appropriate genes is important in bioinformatics and machine learning. The proposed method combines the advantage of wrapper and filter methods for gene subset selection. SFLA-FS consists of two phases. In the first phase a filter method is used for gene ranking from high dimensional microarray data and in the second phase, SFLA is applied to gene selection. The performance of SFLA-FS evaluated for cancer classification using seven standard microarray cancer datasets. Experimental results are compared with those of obtained from several existing well-known gene selection algorithm. The experimental results show that SFLA-FS has a remarkable ability to generate reduced size of genes while yielding significant classification accuracy in cancer classification.

Keywords


[1] Chang.C and Lin.C.J,” LIBSVM:a library for support vector machines,” ACM Trans Intell, SystTechnol, vol. 2, no. 27, pp. 1–27, 2011.
[2] Hajiloo.M, Rabiee.H.R and Anooshahpour.M,”Fuzzy support vector machine: an efficient rule-based classification technique for microarrays,” BMC bioinformatics /1471-2105/14/s13/s4. 2013.
[3] Ammu.K and Preeja.V,”Feature Selection for high Dimensional DNA Microarray data using hybrid approaches,” Bioinformatics, vol. 9, no. 16, pp. 824-828, 2013.
[4] Abedini.M,Kirly.M and Chiong.R, ”Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data,” Australasian Medical Journal AMJ, vol. 6, no. 5, pp. 272-279, 2013.
[5] Ben-Bassat.M,”Pattern recognition and reduction of dimensionality,” In Krishnaiah.P and Kanal,L, (eds.) Handbook of Statistics II, Vol. 1.North-Holland, Amsterdam. pp. 773–791, 1982.
[6] Yu.L and Liu.H, “Efficient feature selection via analysis of relevance and redundancy,” J. Mach. Learn. Res, vol. 5, pp. 1205–1224, 2004.
[7] ElAlami.M.E,”A filter model for feature subset selection based on genetic algorithm,” Knowledge-Based Systems, vol. 22, pp. 356–362, 2009.
[8] Kittler.J, “Pattern Recognition and Signal Processing: Chapter Feature Set Search Algorithms,” Sijth off and Noordhoff, Alphen an den Rijn, Netherlands, pp. 41–60, 1978.
[9] Foithong. S, Pinngern. O, and Attachoo.B, ”Feature subset selection wrapper based on mutual information and rough sets,” Expert Systems with Applications, vol. 39, pp. 574–584, 2012.
[10] Yang.W,Li.D and Zhu.L, ”An improved genetic algorithm for optimal feature subset selection from multi-character feature set,” Expert Systems with Applications, vol. 38, pp.2733–2740, 2011.
[11] Gheyas .I and Leslie .S,”Feature subset selection in large dimensionality domains,” Pattern Recognition, vol. 43, pp. 5 – 13, 2010.
[12] Yassi .M and Moattar .M.H ,” Robust and stable feature selection by integrating ranking methods and wrapper technique in genetic data classification ,” Biochemical and Biophysical Research Communications, vol. 446, pp. 850–856, 2014.
[13] Weston .J and et al,”Use of the zero-norm with linear models and kernel methods,” J. Mach. Learn. Res , vol. 3, pp. 1439 –1461, 2003.
[14] Duda .P and et al, Pattern Classification, Wiley, New York, 2001.
[15] MonirulKabir .M.D, Shahjahan .M.D and Murase .K, “A new hybrid ant colony optimization algorithm for feature selection,” Expert Systems with Applications, vol. 39, pp. 3747–3763, 2012.
[16] Bermejo .P, Ossa .L, Gmez .L and Puerta .J.M, ”Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking,” Knowledge-Based Systems, vol. 25, pp. 35–44, 2012.
[17] Sivagaminathan .R and Ramakrishnan .M,”A hybrid approach for feature subset selection using neural networks and ant colony optimization,” Expert Systems with Applications, vol. 33, pp. 49–60, 2007.
[18] Shengqiao.Li, James .H.E and Adjeroh. D.A, “Random KNN feature selection - a fast and stable Alternative to Random Forests,” BMC Bioinformatics, vol. 12, pp. 450, 2011.
[19] Huang .J and et al,”Decision forest for classification of gene expression data,” Computers in biology and medicine, vol. 40, pp.698-704, 2010.
[20] Li .X and Zhao .H, “Weighted random subspace method for high dimensional data classification,” Stat Interface, vol. 2, pp. 153–159, 2009.
[21] Eusuff .M, Laney .K and Pasha .F, “Shuffled frog-leaping algorithm: a mimetic meta-heuristic for discrete optimization,” Engineering Optimization, vol. 38, no. 2, pp. 129-154, 2006.
[22] Duan, Q.Y, et al, “Shuffled complex evolution approach for effective and efficient global minimization,” journal of optimization theory and application, vol. 76, no. 3, pp. 501-521, 2009.
[23] Golub .T.R and et al, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 826, no. 5439, pp. 530-537, 1999.
[24] Alon .U, Barkai .N and Notterman .D.A, Gishdagger .k, Ybarradagger .S, Mackdagger .D and Levine .A.J. “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat’l Academy of Sciences USA, vol. 96, no. 08, pp. 6745-6750. June 1999.
[25] Singh .D and et al. “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 0, no. 8, pp. 803-809, 2000.
[26] Shipp .M.A and et al, “Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning,” Nature Medicine, vol. 2, no.0, pp. 62-74.Jan, 2008.
[27] Pomeroy .S.L and et al, “Prediction of Central Nervous System Embryonic Tumor Outcome Based on Gene Expression,” Nature, vol. 405, pp. 865-870, 2008.
[28] Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswami S ,Richards WG, Sugarbaker DJ, Bueno R: “Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma,” Cancer Res, vol. 62, pp. 4963–4967, 2002.
[29] Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, and Mercola D, “In silico dissection of cell-type-associated patterns of gene expression in prostate cancer,” Proc Natl Acad Sci USA, 101:615–62, 2004.