IETE Journal of Research
Home | About us | Search | Current Issue | Past Issues | Guidelines | Subscribe | ContactLogin 
IETE Journal of Research
  Users Online: 30 Print this page  Email this page Small font size Default font size Increase font size


 
ARTICLE
Year : 2009  |  Volume : 55  |  Issue : 4  |  Page : 162-168 Table of Contents   

New Entropy-Based Method for Gene Selection


1 Department of Electrical and Electronic, Eng. Faculty, University Putra Malaysia - 43400 Serdang, Selangor, Malaysia
2 Cell and Molecular Biology, Biotechnology and Biomolecular Faculty, University Putra Malaysia - 43400 Serdang, Selangor, Malaysia
3 Obstetrics and Gynaecology, Medicine and Health Sciences Faculty, University Putra Malaysia - 43400 Serdang, Selangor, Malaysia
4 Computer and Communication Systems, Eng. Faculty, University Putra Malaysia - 43400 Serdang, Selangor, Malaysia, Malaysia

Date of Web Publication23-Sep-2009

Correspondence Address:
Hamid Mahmoodian
Department of Electrical and Electronic, Eng. Faculty, University Putra Malaysia - 43400 Serdang, Selangor
Malaysia
Login to access the Email id

DOI: 10.4103/0377-2063.55985

Get Permissions

   Abstract 

Dimension reduction and selection of a small number of genes with high ability, to discriminate objects, are ­important challenges in micro-array data analysis. Gene selection, based on top ranked genes which individually have high power to discriminate objects, is a traditional method that doesn't consider the redundancy among the genes. Some results present that subset of genes with low degree of redundancy can show a more comprehensive representation of the targeted classes than one with redundant genes. In this paper, we use Shannon theorem and penalized logistic regression (PLR) as a probability estimator to present a new algorithm for dimension reduction and collect a subset of representative genes of gene expression profile. Breast cancer, leukemia, colon and lung datasets have been ­classified based on proposed gene selection algorithm by PLR classifier. In most cases the results show a good performance compared to other recent researches.

Keywords: Gene selection, Penalized logistic regression, Shannon theory.


How to cite this article:
Mahmoodian H, Marhaban M H, Rahim R A, Rosli R, Saripan I. New Entropy-Based Method for Gene Selection. IETE J Res 2009;55:162-8

How to cite this URL:
Mahmoodian H, Marhaban M H, Rahim R A, Rosli R, Saripan I. New Entropy-Based Method for Gene Selection. IETE J Res [serial online] 2009 [cited 2013 May 25];55:162-8. Available from: http://www.jr.ietejournals.org/text.asp?2009/55/4/162/55985


   1. Introduction Top


One of the fundamental challenges in patterns of ­bio-informatics is to deal with large datasets. Advances in micro-array technology have made it possible to measure the expression levels of thousands of genes in a single experiment. There are multiple techniques available to analyze a given gene expression profile. A common characteristic of these techniques is selecting a subset of genes as features to overcome the curse of ­dimensionality problem of profiles. The optimality of each subset of genes can only be evaluated on the basis of data, training and independent, which provides a rough approximation to the true data distribution.

In gene expression analysis studies, many gene ­selection methods have been developed to to classify the objects. These methods are divided into uni-variate and ­multi-variate gene selection. The former considers the genes individually and independently from a ­statistical perspective and the latter considers the interaction between the genes.

Some uni-variate approaches are such t-score-base ­statistics which sort genes on the basis of the t-test ­values [1],[2],[3],[4] or maximum likelihood ratio approach to rank genes in the order of most ­discriminating to least discriminating between two classes [5] .

Interactions between genes are considered in ­multi-variate approaches which have been developed in different methods such as using PCA (Principal Component Analysis) or SVD (Singular Value Decomposition) that explicitly use the high dimensional nature of the gene expression space. Some previous researches used SVD to reduce the dimension of gene expression profiles with at least losing information [6] .

In recent studies many new gene selection techniques have been proposed by researchers. Blazadonakis proposed a mixed method of support vector machine and neural network in [7] . Y. Liu used wavelet transformation to analyse DNA microarray data [8] and S. Li shows that a hybrid Particle Swarm Optimization and Genetic Algorithm (PSO/GA) can be used for gene selection [9] .

One common approach to uni-variate models is to select top-ranked genes which are usually based on the ­discriminative power of genes. However, in feature ­selection, it has been recognized that the combination of two highly ranked features does not necessarily lead to a better feature subset because it is possible that these genes could be redundant. The redundancy between two ­features is signified by the fact that the class discriminative power of either one will not change much if the other is removed. Two of the most important disadvantages of redundancy are low efficiency in object classification and less comprehensive representation of the targeted classes than one of the same size without redundancy [10] .

A lot of research has been done to increase the power of classification and reduce the redundancy. ­Information theory and mutual information are ­suitable ­representations of the amount of ­information and relevance between genes which can be considered random variables. Calculating mutual information is usually difficult in high dimensional data. Ruichu recently presented an algorithm based on ­calculating mutual information. The probability distribution ­functions for genes are estimated based on ­Gaussian Parzen windows [11] . Liu proposed an ­algorithm based on normalized mutual information which maximizes a two objective cost function. One of the objectives should be maximized and the other should be minimized to increase the power of discriminator and reduce the redundancy between genes respectively [12] . The main idea in this algorithm was similar to the proposed algorithm by Ding earlier [13] . In other research [14] Furlanello used ­Shannon theory to improve gene elimination in SVM-RFE algorithm [15] . He considered the weighing of each gene in SVM-RFE method as a ­random variable and eliminated genes with low entropy to decrease the cost computational of SVM-RFE.

In this paper, a new entropy-based dimension reduction algorithm is proposed which allows for the selection of a subset of representative genes which have high power in classification. In this algorithm, we used penalized logistic regression (PLR) to estimate the probability of genes to be in a gene pool with a defined common specification and then use Shannon entropy rule to select many representatives of these genes. Thus, we address the problem of selecting a small subset of representative genes that would be adequate enough to discriminate between the two classes of interest in classification. We tested the algorithm on four cancerous datasets (breast cancer, Leukemia ALL/AML, colon and Lung dataset) to select the subset of representative genes and used the PLR classifier to classify the tumors. We used external 10-fold cross validation to analyze the validity of the selected genes. The results show that the selected subsets of genes have suitable performance to discriminate objects whereas contains small set of genes comparing with other algorithms.

The rest of the paper is as follows: The ­mathematical framework is explained in section 2, proposed ­algorithm for gene selection is presented in ­section 3 and results and conclusion are stated in sections 4 and 5 ­respectively.


   2. Mathematical Framework Top


2.1 Penalized Logistic Regression

We suppose that there are m samples which belong to the two different labeled classes. Each sample can be in class A or B. Let y i indicate the class of i th sample in which y i = 1 or y i = 0 means i th sample is belong to the class A or B respectively. Let p i be probability that y i =1 and x­ij indicates the amount of j th ­ feature in i th sample. The attempt is to find α and βj such that



subject to maximize (2)



where n is the number of features, λ > 0 is penalty parameter and



is log-likelihood [16] . This problem can be solved by using Newton-Raphson steps and iteratively re-weighted ridge regression (IRRR) algorithm [17] . When PLR is used as a classifier then i th sample will be in class A if p i > 0.5 and will be in class B if p i < 0.5.

2.2 Shannon Theory

Shannon entropy (H) is an amount of information that may be gained by an observation of a system. It was originally developed by Claude Shannon for use in ­communication technology. Entropy measures ­variation or changes in a series of events. If the probability of occurrence of events be near to zero or near to one, there is little information in those events.

Suppose the probability of occurrence of an event or ­feature g in m trial be p i (i = 1, 2, ... , m), then the entropy of feature g is measured by (log is considered as log 2 (.)).




   3. Gene Selection Algorithm Top


In the proposed algorithm, we have defined two pools of genes that each of them contains a few of genes with a common specification. For instance, genepool1 and ­genepool2 can be included of u number of ­highest and lowest correlation coefficients between genes and ­outcome respectively or they can be contained of u ­number of highest and lowest of ranked genes based on SVM-RFE algorithm. Then PLR is used to estimate the amount of probability of all genes to be in ­genepool1. If we have m trials and suppose that the amount of ­probability that j th gene in the i th trial be p ij, then the entropy of this gene is calculated by:



Suppose that gene expression matrix (G n × m) has n genes and m samples which are classified into two different labeled classes 1 and 0. Let define ­vectors YεRm (which shows the class label of samples),





we define PCC(RQ) to be Pearson Correlation Coefficient between two vectors R and Q. The algorithm is divided in two parts, gene ranking and gene selection. In the first part, the algorithm presents that how genes should be sorted and in the second part, the algorithm presents that how genes should be selected.

3.1 Gene Ranking

Following steps explain the procedure for gene ranking:

  1. i = 1
  2. Calculate the T j = PCC(g(−i)j,Y−iR n for all j = 1, 2, ...,n
  3. Sort T j descending and choose the µ number of ­highest and lowest genes of sorted T j as genepool1 and genepool2 respectively.

    So genepool1 and genepool2 have u number of highest and lowest correlated genes with the binary classes when i th sample is removed.
  4. Create vector ZεR2u which present a binary class for both genepool1 and genepool2
  5. Consider vector Z and use expression values of genes which are in genepool1 and genepool2 to find α and βs (s = 1, 2, …,m-1) in (1). In this step we consider λ equal 1 for all dataset.
  6. Use PLR to estimate the amount of probability of all genes.
  7. Make vector where P (−i)εRn where P(-i) consists the amounts of probability of all genes that are in ­genepool1 when ith sample is left out.
  8. Increase i, (i = i + 1) and if i≤ m go to step 2, otherwise, go to the next step.
  9. Make probability matrix P matεRnxm such the columns of matrix are P (−i) = (i = 1, 2, …., m)
  10. Calculate the vector of entropy for all genes (HεRn ) by:



  11. Sort the vector H ascending and use top genes and bottom genes in gene selection procedure.


All samples in the step 2 of the algorithm are left out at once which is like an internal leave-one out cross ­validation to increase the smoothness of ­calculated ­correlation coefficient between genes and class label. Note that an external 10-fold cross validation is ­implemented on the all steps of gene ranking and gene selection. The number of u in step 3 can be varied from one to n/2. The larger u causes the better probability approximation by PLR but increase the dependency of probability value to the genes in genepools and also computational cost.

Top genes (low entropy genes) will be considered as representative of genes which in most of trials are in the genepool1 or 2 with high probability (if a gene be in genepool1 with low probability, it means that it is belong to genepool2 with high probability and vice versa) and bottom genes (high entropy genes) will be considered as representative of genes which in most of trials are in to the genepool1 or 2 with almost equal probability amount. In fact high entropy genes are whose have high uncertainty in their correlation with output class there is no significant reason to consider them noise or ­effective markers. By defining two threshold values for the entropy, a subset of genes with low dimension can be used for classification.

3.2 Gene Selection

Set an initial value for the maximum number of selected genes (e.g. Gen_Num) which can be large enough

  1. w = 0
  2. Select w gene(s) with highest entropy
  3. v = 1
  4. Select v gene(s) with lowest entropy
  5. Classify samples by PLR classifier based on selected genes (respect to v and P)and calculate the ­misclassification error
  6. v = v + 1.

    If v < Gen_Num/2, then go to step 4 otherwise continue.
  7. w = w + 1.

    If w < Gen_Num/2, then go to step 2 otherwise continue.
  8. Consider the best subset of genes which causes to have minimum misclassification error.


PLR classifier is a suitable classifier which is widely used in high dimensional data classification. One of the advantages of using PLR is that decision making is based on measured vector β. The amount of the elements of this vector can show the power of respected features (genes in this research) in classification like as ­weighting values in SVM-RFE method. This ability reduces the effect of features which don't have enough power in class ­discrimination (e.g. uncorrelated genes with output class) and has used in feature selection algorithms based on a recursive feature elimination method [14] .


   4. Results Top


To test the effectiveness of our proposed algorithm for gene subset selection, we conducted experiments on four well known microarray datasets of cancer disease (breast cancer, leukemia, colon and lung). Penalized Logistic Regression (PLR) have been used as ­classifier and 10-fold external cross validation (in 150 ­iterations) is ­implemented for validation the results for all ­datasets. The penalty value (ë) in PLR classifier have been chosen by other cross validation to have minimum ­misclassification errors. We considered u equal 10 (step 4 of gene ranking algorithm) and Gen_Num equal 300 (in gene selection algorithm) for all datasets.

The misclassification error in the training data of this ­algorithm is calculated based on the mean of ­measured errors in 150 iteration, the error in each iteration is ­calculated based on mean of errors in 10-fold cross ­validation. The percentage of the accuracy is ­measured by:

Percentage of accuracy (%) = 100 − percentage of misclassification error (%)

There are extra samples for two of the four following dataset used as independent samples to validate the procedure of gene selection and classification.

4.1 Breast Cancer Dataset

Breast cancer dataset [18] , contains 77 training ­samples which includes 44 and 33 tumors with relapse time greater and less than five years respectively. In ­pre-filtering of genes, about 5100 of 25000 human genes were selected based on their P-values and 2-fold changes in the ratio of their intensities.

[Table 1] shows the percentage of accuracy in the ­classification of training and independent samples which consists 19 extra samples (including seven and 12 samples with relapse time greater and less than five years respectively). LE and HE show the number of low entropy and high entropy genes respectively.

From [Table 1] we can find that the best accuracy for ­independent samples is achieved by a subset of 16 genes which includes 13 genes with lowest and three genes with highest entropy. Comparison with some other ­activities on van't Veer dataset shows that ­proposed algorithm have equal or better performance to ­classify independent samples (e.g. 89.47, 78.95, 94.74 and 89.47 percentage of accuracy have been achieved in [18],[19],[20],[7] respectively).

The names of 16 genes (highest accuracy in ­independent samples) are, MAGEA1,BIRC5,CCNB2, GSTA3, CENPA, CCNE2, CDC2,DUSP9, BUB1, FLJ10156, FLJ20093, AW295902, AI332560, AI080735, AW131552, AI962298 which 7 (bolded ones) out of 16 genes are in van't veer set which includes 231 genes and 2 (italic bolded ones) out of 16 are in 70 significant selected genes her. Some other selected genes (MAGEA1, GSTA3, DUSP9, FLJ20093) have been reported to have high correlation with cancer disease, [21],[22],[23] and CDC2 is also in the reported list by [24] which analyzed node positive and node negative patients by gene expression profile.

4.2 ALL/AML Dataset

The algorithm was implemented on the leukemia dataset (reported by [25] ) which contains 38 training samples (27 ALL and 11 AML) with 7129 genes and 34 ­independent samples (20 ALL and 14 AML).

[Table 1] shows the percentage of accuracy for ALL/AML dataset with different subset of selected genes while [Table 2] presents some other recent results [26],[9],[27] . Comparing [Table 1] and the first row of [Table 2], it can be seen that using a subset of nine genes (seven LE and two HE) has a high performance in the classification of training and independent samples.

4.3 Colon Dataset

Colon dataset contains 2000 genes of 62 samples of colon tissue which consist 40 tumor cells and 22 normal cells [28] . In this experiment all samples are considered as training samples and the mean of ­misclassification errors of external 10-fold cross validation in 150 ­iterations is measured. [Table 3] shows the percentage of accuracy in different subsets of selected genes. Using a subset of 10 genes (8 LE and 2 HE) of our proposed ­algorithm has better performance comparing with [26] and worse performance comparing with [9],[27] in [Table 2].

4.4 Lung Dataset

We evaluate the performance of the proposed method on lung dataset [29] .The lung dataset consists of 12333 genes (after filtering some missed data) and 181 samples, ­including 31malignant pleural mesothelioma (MPM) and 150 adenocarcinoma (ADCA). All samples have been considered as training samples and the percentage of accuracy is presented in [Table 3]. It shows that more than 98% accuracy is achieved by choosing a subset of 16 genes.


   5. Conclusion Top
[

Table 4] shows the number of high correlation, medium correlation and low correlation genes in each selected ­subset of different datasets. For example, if we define 0.1 as boundary between low and medium ­correlation ­values and 0.3 as boundary between medium and high correlation values [Table 4] shows that the proposed ­algorithm could gather subsets of genes which have more representatives of features and reduce the redundancy among the genes which in turn increases the ability of discrimination for classifiers. Also, in all datasets, the numbers of selected genes with low correlation (less than 0.1) are less than high correlated genes (greater than 0.3).

In addition, since all samples have been left out at once to make probability matrix (step 9 in gene ranking), the dependency of results into the samples is reduced in this algorithm. In this paper, we have proposed a new ­algorithm for gene subset selection of ­microarray ­datasets which presents a competitive accuracy ­compared to the performance of other proposed ­methods and tested on four public datasets. The selected subset of genes includes representatives of high, low and medium ­correlated genes with the binary output class. ­Information theory is used to rank the genes based on common specifications of genes which are in gene pool 1 and 2. Correlation coefficient with output class is ­common specification which is used in this paper. We can also consider other criterion like signal to noise ratio to make gene pools in the algorithm.

Authors


Hamid Mahmoodian received the Master and Bachelor degree in robust control and artificial intelligent control respectively from Isfahan University of Technology, Esfahan, Iran. He is currently pursuing PhD student from University Putra Malaysia in the medical engineering field. His current research interests include bioinformatics, data mining and uncertainty in fuzzy modeling and artificial intelligence in prediction.


Mohammad Hamiruce Marhaban received his PhD and BS degree in Electronic Engineering from Surrey and Salfored Universities, UK in 2003 and 1998. He is an Associate Professor in Electrical and Electronic Department at Engineering Faculty of University Putra Malaysia. Currently he is also an associate researcher in institute of advanced technology (ITMA) at UPM, program manager of intelligent systems, research coordinator and associate professor in the department. His research interests include control systems, artificial intelligent and computer vision. He is member of IEEE in Control Systems, Communication and Circuits& Systems Society.


Raha Abdul Rahim received her BS Microbiology from Oklahoma State University and MS from the University of Oklahoma, USA. She completed her PhD in Molecular Biology from Strathclyde University, Scotland. She is a Professor in Microbial Genetics and Heads the Department of Cell and Molecular Biology at the Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia. She currently lectures on Genetic Engineering and Cell and Molecular Biology to the undergraduate BS Biotechnology program. Her research interests include plasmid biology, as well as the development of recombinant bacteria for delivery of useful proteins.


Rozita Rosli completed her post-doctoral training at the Indiana University School of Medicine in 1996. She is currently an Associate Professor in Molecular Genetics and Deputy Dean for Research and Graduate Studies at the Faculty of Medicine and Health Sciences, University Putra Malaysia. She now lectures on Medical Genetics in both the undergraduate and graduate programs. Her research areas of interest include the genetics, as well as the development of vaccines and therapeutics against breast cancer.


M. Iqbal Saripan received his B.Eng. degree in Electrical-Electronics Engineering from the Universiti Teknologi Malaysia (2001). He completed his Ph.D. degree in the area of computer vision from the University of Surrey, United Kingdom (2006). Currently, he is a Lecturer and also the Head of the Embedded and Intelligent Systems Engineering Research Group at the Department of Computer and Communication Systems Engineering, Faculty of Engineering, Universiti Putra Malaysia. His research interests are in the area of digital image processing, particularly in medical imaging (SPECT, CT, ultrasound, PET), speech processing, artificial intelligence and embedded system.

 
   References Top

1.W. Pan, "A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments," Bioinformatics, Vol. 8, No. 4, pp. 546-54, 2002.  Back to cited text no. 1      
2.O. Troyanskaya, M. Garber, P. Brown, D. Botstein, and R. Altman, "Nonparametric methods for identifying differentially expressed genes in microarray data," Bioinformatics, Vol. 18, No. 11, pp. 1454-61, 2002.  Back to cited text no. 2      
3.W. Pan, J. Lin, and C. Le, "How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach," Genome biol. Vol. 3 No. 5, pp. 0022.1-22.10, 2002.  Back to cited text no. 3      
4.J. Li, H. Liu, J. Downing, A. Yeoh, and L. Wong, "Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients," Bioinformatics, Vol. 19, No. 1, pp. 71-8, 2003.  Back to cited text no. 4      
5.W. Li, and Y. Yang, "Zipf's law in importance of genes for cancer classification using microarray data," Theor Biol , 219, No. 4, pp. 539-51, 2002.  Back to cited text no. 5      
6.O. Alter, P. Brown, and D. Botstein, "Singular value decomposition for genome-wide expression data processing and modeling," Proc. Natl. Acad. Sci. USA, Vol. 97, No. 18, pp. 10101-06, 2000.  Back to cited text no. 6      
7.M. Blazadonakis, M. Zervakis, M. Kounelakis, E. Biganzoli, and N. Lama, "Support Vector Machines and Neural Networks as Marker Selectors for Cancer Gene Analysis" in Proceeding of 3 rd International IEEE Conference on Intelligent Systems, 2006, pp. 626-30.  Back to cited text no. 7      
8.Y. Liu, "Cancer Identification Based on DNA Microarray Data," LNAI 4819, pp. 153-161,2007.  Back to cited text no. 8      
9.S. Li, and X. Wu, and M. Tan, "Gene selection using hybrid particle swarm optimization and genetic algorithm," Soft Computing. 12:1039-48, 2008.  Back to cited text no. 9      
10.L. Yu, and H. Liu, "Redundancy based feature selection for microarray data" in Proceeding off the Tenth ACM SIGKDD conferences on Knowledge Discovery and Data Minin, 2004, Pages 737-42.ACM.  Back to cited text no. 10      
11.R. Cai, Z. Hao, X. Yang, and W. Wen, "An efficient gene selection algorithm based on mutual information," Neurocomputing journal, Vol. 72, pp. 991-9, 2009.  Back to cited text no. 11      
12.X. Liu, A. Krishnan, and A. Mondry, "An Entropy-based gene selection method for cancer classification using microarray data," BMC Bioinformatics, Vol.6:76, 2005.  Back to cited text no. 12      
13.C. Ding, and H. Peng, "Minimum Redundancy feature selection from microarray gene expression data," Computational Systems Bioinformatics, 2003.  Back to cited text no. 13      
14.Y. Guo, T. Hastie, and R. Tibshirani, "Regularized linear discriminant analysis and its application in microarrays," Biostatistics, Vol. 8, No. 1, pp. 86-100, 2007.  Back to cited text no. 14      
15.I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine Learning, 46(1-3):389-422,2002.  Back to cited text no. 15      
16.I. Frank, and J. Friedman, "A statistical view of some chemometric regression tools," Technometrics 35, 109-48,1993.  Back to cited text no. 16      
17.M. Park, and T. Hastie, "Penalized logistic regression for detecting gene interactions," Biostatistics, Vol.9, pp. 30-50, 2007.  Back to cited text no. 17      
18.L. van't Veer, and H. Dai, M. Van de Vijver, and Y. He, et al., "Gene expression profiling predicts clinical outcome of breast cancer" Letters to Nature, 415, 2002, pp. 530-36, 2002.  Back to cited text no. 18      
19.M. Blazadonkis, A. Peroglou, and M. Zervakis, "Using a single neuron as a marker selector-A breast cancer case study" in Proceeding of the 29 th Annual international conference of the IEEE Embs, Lyon, France, 2007, pp. 4219-22.  Back to cited text no. 19      
20.R. Shen, D. Ghosh, A. Chinnaiyan, and M. Zhaoling, "Eigengene-based linear discriminant model for tumor classification using gene expression microarray data," Bioinformatics, vol. 22, pp. 2635-42, 2006.  Back to cited text no. 20      
21.M. Takahashi, S. Shichijo, M. Noguchi, M. Hirohata, and K. Itoh, "Identification of MAGE-1 and MAGE-4 proteins in spermatogonia and primary spermatocytes of testis," Cancer Res. 15, 55(16),pp. 3478-82, 1995.  Back to cited text no. 21      
22.R. Boidot, F. Vegran, D. Jacob, S. Chevrier, N. Gangneux, J. Taboureau, C. Oudin, V. Rainville, L. Mercier, and S. Lizard-Nacol, "The expression of BIRC5 is correlated with loss of specific chromosomal regions in breast carcinomas," Genes Chromosomes Cancer. 47(4),pp. 299-308, 2008.  Back to cited text no. 22      
23.D. Stav, I. BarI, and J. Sandbank, "Usefulness of CDK5RAP3, CCNB2, and RAGE genes for the diagnosis of lung adenocarcinoma," Int J Biol Markers., 22(2):pp. 108-13, 2007.  Back to cited text no. 23      
24.S. Sotiriou, S. Neo, L. McShane, E. Korn, P. Long , A. Jazaeri, P. Martiat, S. Fox, A. Harris, and E. Liu, "Breast cancer classification and prognosis based on gene expression profiles from a population-based study," PNAS, Vol. 100, No.18, pp. 10393-98, 2003.  Back to cited text no. 24      
25.T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. BloomÞeld, and E. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, Vol. 286, pp. 531-37,1999.  Back to cited text no. 25      
26.W. Xiong, Z. Cai, and J. Ma, "A DSRPCL-SVM approach to informative gene analysis," Geno, prot, Bioinfo. Vol6, No.2, 2008.  Back to cited text no. 26      
27.S. Li, X. Wu, and X. Hu, "Gene selection using genetic algorithm and support vector machines," Soft computing, Vol. 12, pp. 693-98, 2008.  Back to cited text no. 27      
28.U.Alon, N. Baraki, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," Proc. Natl. Acad. Sci. Vol. 96, pp. 6745-50, 1999.  Back to cited text no. 28      
29.G. Gordon, R. Jensen, L. Hsiao, S. Gullans, J. Blumenstock, S. Ramaswamy, W. Richards, D. Sugarbaker, and R. Bueno, "Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma," Cancer research, Vol. 62, 4963-67, 2002.  Back to cited text no. 29      



 
 
    Tables

  [Table 1], [Table 2], [Table 3], [Table 4]



 

Top
 
  Search
 
  
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

 
  In this article
    Abstract
    1. Introduction
    2. Mathematical ...
    3. Gene Selectio...
    4. Results
    5. Conclusion
    References
    Article Tables

 Article Access Statistics
    Viewed1587    
    Printed135    
    Emailed0    
    PDF Downloaded302    
    Comments [Add]    

Recommend this journal