This talk will familiarize attendees with several clustering methods in the context of medicine. This kind of unsupervised algorithms are used for finding homogeneous groups (clusters) of objects. Samples forming one group should be similar to each other but different between other clusters.

Such methods are successfully used in the field of medicine and biology. One of successful applications is grouping genes having similar expression patterns. For example, the agglomerative clustering (with average linkage and correlation as a similarity metric) is applied to NCI60 data (containing gene expressions for 60 tumor cell lines), and, as a result, a dendrogram is obtained which allows to see groups of similar cell lines.

There are more applications of clustering, like better missing values imputation, outliers detection or finding subpopulations in patients’ data, to name a few, and will be presented with an explanation of clustering algorithms applied.


  1. Ton J. Cleophas, Aeilko H. Zwinderman, Machine Learning in Medicine – a complete overview
  2. Miroslav Kubat, An Introduction to Machine Learning
  3. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning with Applications in R
  4. Culhane, A.C., Perrière, G. & Higgins, D.G. Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4, 59 (2003). https://doi.org/10.1186/1471-2105-4-59
  5. Shortliffe, Edward H, and James J. Cimino. Biomedical Informatics: Computer Applications in Health Care and Biomedicine. , 2014
  6. Jiang, Daxin & Tang, Chun & Zhang, Aidong. (2004). Cluster Analysis for Gene Expression Data: A Survey. Knowledge and Data Engineering, IEEE Transactions on. 16. 1370- 1386. 10.1109/TKDE.2004.68.
  7. Gamberger, D., Ženko, B., Mitelpunkt, A. et al. Homogeneous clusters of Alzheimer’s disease patient population. BioMed Eng OnLine 15, 78 (2016). https://doi.org/10.1186/s12938-016-0183-0
  8. Estiri H, Klann JG, Murphy SN. A clustering approach for detecting implausible observation values in electronic health records data. BMC Med Inform Decis Mak. 2019;19(1):142. Published 2019 Jul 23. doi:10.1186/s12911-019-0852-6


Tadeusz Satława
Sano Centre for Computational Medicine,
Krakow, Poland

Date and time

Tuesday, 10 November 2020, 2:00-3:30 PM (CEST)