TY - GEN
T1 - A K-Means Approach to Clustering Disease Progressions
AU - Luong, Duc Thanh Anh
AU - Chandola, Varun
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/8
Y1 - 2017/9/8
N2 - K-means algorithm has been a workhorse of unsupervised machine learning for many decades, primarily owing to its simplicity and efficiency. The algorithm requires availability of two key operations on the data, first, a distance metric to compare a pair of data objects, and second, a way to compute a representative (centroid) for a given set of data objects. These two requirements mean that k-means cannot be readily applied to time series data, in particular, to disease progression profiles often encountered in healthcare analysis. We present a k-means inspired approach to clustering disease progression data. The proposed method represents a cluster as a set of weights corresponding to a set of splines fitted to the time series data and uses the 'goodness-of-fit' as a way to assign time series to clusters. We use the algorithm to group patients suffering from Chronic Kidney Disease (CKD) based on their disease progression profiles. A qualitative analysis of the representative profiles for the learnt clusters reveals that this simple approach can be used to identify groups of patients with interesting clinical characteristics. Additionally, we show how the representative profiles can be combined with patient's observations to obtain an accurate patient specific profile that can be used for extrapolating into the future.
AB - K-means algorithm has been a workhorse of unsupervised machine learning for many decades, primarily owing to its simplicity and efficiency. The algorithm requires availability of two key operations on the data, first, a distance metric to compare a pair of data objects, and second, a way to compute a representative (centroid) for a given set of data objects. These two requirements mean that k-means cannot be readily applied to time series data, in particular, to disease progression profiles often encountered in healthcare analysis. We present a k-means inspired approach to clustering disease progression data. The proposed method represents a cluster as a set of weights corresponding to a set of splines fitted to the time series data and uses the 'goodness-of-fit' as a way to assign time series to clusters. We use the algorithm to group patients suffering from Chronic Kidney Disease (CKD) based on their disease progression profiles. A qualitative analysis of the representative profiles for the learnt clusters reveals that this simple approach can be used to identify groups of patients with interesting clinical characteristics. Additionally, we show how the representative profiles can be combined with patient's observations to obtain an accurate patient specific profile that can be used for extrapolating into the future.
KW - Chronic Kidney Disease
KW - Clustering
KW - K-means
UR - https://www.scopus.com/pages/publications/85032333974
U2 - 10.1109/ICHI.2017.18
DO - 10.1109/ICHI.2017.18
M3 - Conference contribution
AN - SCOPUS:85032333974
T3 - Proceedings - 2017 IEEE International Conference on Healthcare Informatics, ICHI 2017
SP - 268
EP - 274
BT - Proceedings - 2017 IEEE International Conference on Healthcare Informatics, ICHI 2017
A2 - Cummins, Mollie
A2 - Facelli, Julio
A2 - Meixner, Gerrit
A2 - Giraud-Carrier, Christophe
A2 - Nakajima, Hiroshi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Healthcare Informatics, ICHI 2017
Y2 - 23 August 2017 through 26 August 2017
ER -