TY - GEN
T1 - Feature extraction from microarray expression data by integration of semantic knowledge
AU - Cho, Young Rae
AU - Xu, Xian
AU - Hwang, Woochang
AU - Zhang, Aidong
PY - 2007
Y1 - 2007
N2 - Microarray techniques give biologists first peek into the molecular states of living tissues. Previous studies have proven that it is feasible to build sample classifiers using the gene expressional profiles. To build an effective sample classifier, dimension reduction process is necessary since classic pattern recognition algorithms do not work well in high dimensional space. In this paper, we present a novel feature extraction algorithm based on the concept of virtual genes by integrating microarray expression data sets with domain knowledge embedded in Gene Ontology (GO) annotations. We define semantic similarity to measure the functional associations between two genes using the annotation on each GO term. We then identify the groups of genes, called virtual genes, that potentially interact with each other for a biological function. The correlation in gene expression levels of virtual genes can be used to build a sample classifier. For a colon cancer data set, the integration of microarray expression data with GO annotations significantly improves the accuracy of sample classification by more than 10%.
AB - Microarray techniques give biologists first peek into the molecular states of living tissues. Previous studies have proven that it is feasible to build sample classifiers using the gene expressional profiles. To build an effective sample classifier, dimension reduction process is necessary since classic pattern recognition algorithms do not work well in high dimensional space. In this paper, we present a novel feature extraction algorithm based on the concept of virtual genes by integrating microarray expression data sets with domain knowledge embedded in Gene Ontology (GO) annotations. We define semantic similarity to measure the functional associations between two genes using the annotation on each GO term. We then identify the groups of genes, called virtual genes, that potentially interact with each other for a biological function. The correlation in gene expression levels of virtual genes can be used to build a sample classifier. For a colon cancer data set, the integration of microarray expression data with GO annotations significantly improves the accuracy of sample classification by more than 10%.
UR - https://www.scopus.com/pages/publications/47349103891
U2 - 10.1109/ICMLA.2007.49
DO - 10.1109/ICMLA.2007.49
M3 - Conference contribution
AN - SCOPUS:47349103891
SN - 0769530699
SN - 9780769530697
T3 - Proceedings - 6th International Conference on Machine Learning and Applications, ICMLA 2007
SP - 606
EP - 611
BT - Proceedings - 6th International Conference on Machine Learning and Applications, ICMLA 2007
T2 - 6th International Conference on Machine Learning and Applications, ICMLA 2007
Y2 - 13 December 2007 through 15 December 2007
ER -