LASSO REGRESSION AND AN APPLICATION IN BREAST CANCER DATA ANALYSIS | Vân | TNU Journal of Science and Technology

LASSO REGRESSION AND AN APPLICATION IN BREAST CANCER DATA ANALYSIS

About this article

Received: 25/04/22                Revised: 30/05/22                Published: 31/05/22

Authors

1. Nong Quynh Van Email to author, TNU - University of Education
2. Tran Dinh Hung, TNU - University of Education

Abstract


The LASSO is one of the regularized regression methods proposed by Tibshirani in 1996. The goal of LASSO is to select and estimate parameters in a linear regression model by exactly shrinking some coefficients to zero. In particular, the LASSO is useful in analyzing microarray gen data in which the number of predictors (genes) is much larger than the number of sample observations (number of patients). In this paper, we introduce a brief summary of the LASSO and apply this  method to study gene in breast cancer data. The aim was to assess the genes interactions associated with breast cancer microarray data. The results show that the LASSO method performs relatively well in analyzing gene expression levels and indicates genes that related to the breast cancer gene BRCA1 such as genes NBR2, AASDH, KIAA2013, VPS25, NBR1, SEC22C, RPL27, CBLN3, KHDRBS1, XRCC2. In fact, the NBR2 gene is adjacent to BRCA1 on chromosome 17, and two genes share the same promoter region. Thus, breast cancer prognosis determined by regression will help us to better understand the mechanism underlying the occurrence of breast cancer of young women.

Keywords


Regression; Ordinary least square; LASSO; L1 regularization; Penalized regression; Breast cancer

References


[1] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least Angle Regression (with discussion),” Annals of Statistics, vol. 32, pp. 407-499, 2004.

[2] J. Fan and R. Li, “Variable selection via non concave penalized likelihood and its oracle properties,” Journal of American Statistical Association, vol. 96, pp. 1348–1360, 2001.

[3] J. Fan and J. Lv, “A selective overview of variable selection in high dimensional feature space,” Statistica Sinica, vol. 20, pp. 101-148, 2010.

[4] W. J. Fu, “Penalized regression: The bridge versus the LASSO,” Journal of Computational and Graphical Statistics, vol. 7, pp. 397-416, 1998.

[5] H. Zou and T. Hastie, “Regularization and Variable Selection via the Elastic Net,” Journal of the Royal Statistical Society Series B, vol. 67, pp. 301-320, 2005.

[6] H. Zou, “The adaptive lasso and its oracle properties,” Journal of American Statistical Association, vol. 101, pp. 1418-1429, 2006.

[7] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer: New York, NY, USA, 2008.

[8] M. R. Osborne, B. Presnell, A. Brevin, and B. Turlach, “On the LASSO and its dual,” Journal of Computational and Graphical Statistics, vol. 9, no. 2, pp. 319-338, 2000.

[9] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society Series B, vol. 58, pp. 267-288, 1996.

[10] A. S. Amusan and I. O. Adeshina, “Multicollinearity Regularization Using Lasso and Ridge Regression on Economic Data,” Kasu Journal of Mathematical Sciences, vol. 2, no. 2, pp. 43-54, 2021.

[11] A. Brown, F. Xu, H. Nicolai, B. Griffiths, A. Chambers, D. Black, and E. Solomon, “The 5' end of the BRCA1 gene lies within a duplicated region of human chromosome 17q21,” Oncogene, vol. 12, pp. 2507-2513, 1996.




DOI: https://doi.org/10.34238/tnu-jst.5901

Refbacks

  • There are currently no refbacks.
TNU Journal of Science and Technology
Rooms 408, 409 - Administration Building - Thai Nguyen University
Tan Thinh Ward - Thai Nguyen City
Phone: (+84) 208 3840 288 - E-mail: jst@tnu.edu.vn
Based on Open Journal Systems
©2018 All Rights Reserved