SlideShare a Scribd company logo
Tilani Gunawardena
Algorithms: Clustering
• Grouping of records ,observations or cases into
classes of similar objects.
• A cluster is a collection of records,
– Similar to one another
– Dissimilar to records in other clusters
What is Clustering?
Clustering
Clustering
• There is no target variable for clustering
• Clustering does not try to classify or predict the
values of a target variable.
• Instead, clustering algorithms seek to segment
the entire data set into relatively homogeneous
subgroups or clusters,
– Where the similarity of the records within the
cluster is maximized, and
– Similarity to records outside this cluster is
minimized.
Difference between Clustering and
Classification
• Identification of groups f records such that
similarity within a group is very high while the
similarity to records in other groups is very low.
– group data points that are close (or similar) to
each other
– identify such groupings (or clusters) in an
unsupervised manner
• Unsupervised: no information is provided to
the algorithm on which data points belong to
which clusters
• In other words,
– Clustering algorithm seeks to construct clusters of
records such that the between-cluster variation(BCV)
is large compared to the within-cluster
variation(WCV)
Goal of Clustering
Between-cluster variation:
Within-cluster variation:
Goal of Clustering
between-cluster variation(BCV) is
large compared to the within-
cluster variation(WCV)
(Intra-cluster distance) the sum of distances
between objects in the same cluster are
minimized
(Inter-cluster distance) while the distances
between different clusters are maximized
• Clustering techniques apply when there is no
class to be predicted
• As we've seen clusters can be:
– disjoint vs. overlapping
– deterministic vs. probabilistic
– flat vs. hierarchical
• k-means Algorithm
– k-means clusters are disjoint, deterministic, and
flat
Clustering
Issues Related to Clustering
• How to measure similarity
– Euclidian Distance
– City-block Distance
– Minkowski Distance
• How to recode categorical variables?
• How to standardize or normalize numerical
variables?
– Min-Max Normalization
– Z-score standardization ( )
• How many clusters we expect to uncover?
m,s
Type of Clustering
• Partitional clustering: Partitional algorithms
determine all clusters at once. They include:
– K-Means Clustering
– Fuzzy c-means clustering
– QT clustering
• Hierarchical Clustering:
– Agglomerative ("bottom-up"): Agglomerative
algorithms begin with each element as a
separate cluster and merge them into successively
larger clusters.
– Divisive ("top-down"): Divisive algorithms begin
with the whole set and proceed to divide it into
successively smaller clusters.
K-means Clustering
k-Means Clustering
• Input: n objects (or points) and a number k
• Algorithm
1) Randomly assign K records to be the initial
cluster center locations
2) Assign each object to the group that has the
closest centroid
3) When all objects have been assigned,
recalculate the positions of the K centroids
4) Repeat steps 2 to 3 until convergence or
termination
K-Mean Clustering
Termination Conditions
• The algorithm terminates when the centroids
no longer change.
• The SSE(sum of squared errors) value is less
than some small threshold value 
• Where p є Ci represents each data point in
cluster i and mi represent the centroid of
cluster i.
SSE = d(p- mi )2
pÎci
å
i=1
k
å
Example 1:
• Lets s suppose the following points are the
delivery locations for pizza
• Lets locate three cluster centers randomly
• Find the distance of points as shown
• Assign the points to the nearest cluster center
based on the distance between each center
and the point
• Re-assign the cluster centres and locate
nearest points
• Re-assign the cluster centres and locate
nearest points, calculate the distance
Form the three clusters
Example 2:
• Suppose that we have eight data points in
two-dimensional space as follows
• And suppose that we are interested in
uncovering k=2 clusters.
Point Distance from m1 Distance from m2 Cluster
membership
a
b
c
d
e
f
g
h


n
i
ii yxYXD
1
2
)(),(
Point Distance from m1 Distance from m2 Cluster
membership
a 2.00 2.24
b 2.83 2.24
c 3.61 2.83
d 4.47 3.61
e 1.00 1.41
f 3.16 2.24
g 0.00 1.00
h 1.00 0.00


n
i
ii yxYXD
1
2
)(),(
Point Distance from m1 Distance from m2 Cluster
membership
a 2.00 2.24 C1
b 2.83 2.24 C2
c 3.61 2.83 C2
d 4.47 3.61 C2
e 1.00 1.41 C1
f 3.16 2.24 C2
g 0.00 1.00 C1
h 1.00 0.00 C2
SSE=22+2.242+2.832+3.612+12+2.242+02+02=36
d(m1,m2)=1
Centroid of the cluster 1 is
[(1+1+1)/3,(3+2+1)/3]
=(1,2)
Centroid of the cluster 2 is
[(3+4+5+4+2)/5,(3+3+3+2+1)/5]
=(3.6,2.4)
Point Distance from m1 Distance from m2 Cluster
membership
a
b
c
d
e
f
g
h
m1=(1,2)
m2=(3.6,2.4)
Point Distance from m1 Distance from m2 Cluster
membership
a 1.00 2.67
b 2.24 0.85
c 3.61 0.72
d 4.12 1.52
e 0.00 2.63
f 3.00 0.57
g 1.00 2.95
h 1.41 2.13
m1=(1,2)
m2=(3.6,2.4)
Point Distance from m1 Distance from m2 Cluster
membership
a 1.00 2.67 C1
b 2.24 0.85 C2
c 3.61 0.72 C2
d 4.12 1.52 C2
e 0.00 2.63 C1
f 3.00 0.57 C2
g 1.00 2.95 C1
h 1.41 2.13 C1
SSE=12+0.852+0.722+1.522+02+0.572+12+1.412=7.88
d(m1,m2)=2.63
m1(1,2)
m2(3.6,2.4)
Centroid of the cluster 1 is
[(1+1+1+2)/4,(3+2+1+1)/4]
=(1.25,1.75)
Centroid of the cluster 2 is
[(3+4+5+4)/4,(3+3+3+2)/4]
=(4,2.75)
Point Distance from m1 Distance from m2 Cluster
membership
a
b
c
d
e
f
g
h
m1(1.25,1.75)
m2(4,2.75)
Point Distance from m1 Distance from m2 Cluster
membership
a 1.27 3.01
b 2.15 1.03
c 3.02 0.25
d 3.95 1.03
e 0.35 3.09
f 2.76 0.75
g 0.79 3.47
h 1.06 2.66
m1(1.25,1.75)
m2(4,2.75)
Point Distance from m1 Distance from m2 Cluster
membership
a 1.27 3.01 C1
b 2.15 1.03 C2
c 3.02 0.25 C2
d 3.95 1.03 C2
e 0.35 3.09 C1
f 2.76 0.75 C2
g 0.79 3.47 C1
h 1.06 2.66 C1
m1(1.25,1.75)
m2(4,2.75)
SSE=1.272+1.032+0.252+1.032+0.352+0.752+0.792+1.062=6.25
d(m1,m2)=2.93
Final Results
Example 2:
Hierachical clustering
How to decide k?
• Unless the analyst has a prior knowledge of
the number of underlying clusters, therefore,
– Clustering solutions for each value of K is
compared
– The value of K resulting in the smallest SSE being
selected
Summary
K-means algorithm is a simple yet popular method for
clustering analysis
• Low complexity :complexity is O(nkt), where t =
#iterations
• Its performance is determined by initialisation and
appropriate distance measure
• There are several variants of K-means to overcome its
weaknesses
– K-Medoids: resistance to noise and/or outliers(data that do
not comply with the general behaviour or model of the data
)
– K-Modes: extension to categorical data clustering analysis
– CLARA: extension to deal with large data sets
– Gaussian Mixture models (EM algorithm): handling
uncertainty of clusters
Hierarchical Clustering
42
Hierarchical Clustering
• Build a tree-based hierarchical taxonomy
(dendrogram)
• One approach: recursive application of a partitional
clustering algorithm.
animal
vertebrate
fish reptile amphib. mammal worm insect crustacean
invertebrate
Hierarchical clustering and
dendrograms
• A hierarchical clustering on a set of objects D is a set
of nested partitions of D. It is represented by a binary
tree such that :
– The root node is a cluster that contains all data points
– Each (parent) node is a cluster made of two subclusters
(childs)
– Each leaf node represents one data point (singleton ie
cluster with only one item)
• A hierarchical clustering scheme is also called a
taxonomy. In data clustering the binary tree is called
a dendrogram.
44
• Clustering obtained
by cutting the
dendrogram at a
desired level: each
connected
component forms a
cluster.
Dendogram: Hierarchical Clustering
Hierarchical clustering: forming
clusters
• Forming clusters from dendograms
Hierarchical clustering
• There are two styles of hierarchical clustering algorithms
to build a tree from the input set S:
– Agglomerative (bottom-up):
• Beginning with singletons (sets with 1 element)
• Merging them until S is achieved as the root.
• In each steps , the two closest clusters are aggregates into a new
combined cluster
• In this way, number of clusters in the data set is reduced at each step
• Eventually, all records/elements are combined into a single huge cluster
• It is the most common approach.
– Divisive (top-down):
• All records are combined in to a one big cluster
• Then the most dissimilar records being split off recursively partitioning
S until singleton sets are reached.
• Does not require the number of clusters k in advance
Two types of hierarchical clustering algorithms :
Agglomerative : “bottom-up”
Divisive : “top-down
48
Hierarchical Agglomerative Clustering (HAC)
Algorithm
Start with all instances in their own cluster.
Until there is only one cluster:
Among the current clusters, determine the two
clusters, ci and cj, that are most similar.
Replace ci and cj with a single cluster ci  cj
• Assumes a similarity function for determining the
similarity of two instances.
• Starts with all instances in a separate cluster and then
repeatedly joins the two clusters that are most similar
until there is only one cluster.
Classification of AHC
We can distinguish AHC algorithms according to the type of
distance measures used. There are two approaches :
Graph methods :
• Single link method
• Complete link method
• Group average method (UPGMA)
• Weighted group average method (WPGMA)
Geometric :
• Ward’s method
• Centroid method
• Median method
Lance –Williams Algorithm
Definition(Lance-Williams formula)
In AHC algorithms, the Lance-Williams formula
[Lance and Williams, 1967] is a recurrence
equation used to calculate the dissimilarity
between a cluster Ck and a cluster formed by
merging two other clusters Cl ∪Cl′
where α
I
, α
I’
,β, γ are real numbers
AHC methods and the Lance-Williams
formula
Single link method
• Also known as the nearest neighbor method,
since it employs the nearest neighbor to
measure the dissimilarity between two
clusters
Cluster distance measure
• Single link
– Distance between closest elements in clusters
• Complete link
– Distance between farthest elements in clusters
• Centroids
– Distance between centroids(means) of two clusters
Single-link clustering
Nested Clusters Dendrogram
1
2
3
4
5
6
1
2
3
4
5
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Example 1-Single link method
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Hierachical clustering
Example 02-Single link method
• x
1
= (1, 2)
• x
2
= (1, 2.5)
• x
3
= (3, 1)
• x
4
= (4, 0.5)
• x
5
= (4, 2)
• x
1
= (1, 2)
• x
2
= (1, 2.5)
• x
3
= (3, 1)
• x
4
= (4, 0.5)
• x
5
= (4, 2)
Merge X1 and X2
Hierachical clustering
Merge X3 and X4
Hierachical clustering
Merge {X3,X4} and X5
Hierachical clustering
Merge {X1,X2} and {X3,X4,X5}
Hierachical clustering
• x
1
= (1, 2)
• x
2
= (1, 2.5)
• x
3
= (3, 1)
• x
4
= (4, 0.5)
• x
5
= (4, 2)
Merge X1 and X2
Example 3-Complete link method
Hierachical clustering
Merge X3 and X4
Hierachical clustering
Merge {X3,X4} and X5
Hierachical clustering
Merge {X1,X2} and {X3,X4,X5}
The dendrogram :
Summary
 Hierarchical Clustering
• For a dataset consisting of n points
• O(n2) space; it requires storing the distance matrix
• O(n3) time complexity in most of the cases(agglomerative
clustering)
• Advantages
– Dendograms are great for visualization
– Provides hierarchical relations between clusters
• Disadvantages
– Not easy to define levels for clusters
– Can never undo what was done previously
– Sensitive to cluster distance measures and noise/outliers
– Experiments showed that other clustering techniques
outperform hierarchical clustering
• There are several variants to overcome its weaknesses
– BIRCH: scalable to a large data set
– ROCK: clustering categorical data
– CHAMELEON: hierarchical clustering using dynamic modelling

More Related Content

What's hot

05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
Valerii Klymchuk
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
hadifar
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Hierarchical clustering.pptx
Hierarchical clustering.pptxHierarchical clustering.pptx
Hierarchical clustering.pptx
NTUConcepts1
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
PrasiddhaSarma
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering
Ashek Farabi
 
Clustering
ClusteringClustering
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
Megha Sharma
 
Chapter8
Chapter8Chapter8
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
Akshay Udhane
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Kamalakshi Deshmukh-Samag
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
ishmecse13
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
Chathurangi Shyalika
 

What's hot (20)

05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Hierarchical clustering.pptx
Hierarchical clustering.pptxHierarchical clustering.pptx
Hierarchical clustering.pptx
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering
 
Clustering
ClusteringClustering
Clustering
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Chapter8
Chapter8Chapter8
Chapter8
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 

Similar to Hierachical clustering

kmean clustering
kmean clusteringkmean clustering
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
vikassingh569137
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
Sudhakar Chavan
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
NithyananthSengottai
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
Enaam Alotaibi
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
ssusere1fd42
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
nikshaikh786
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
ch_5_dm clustering in data mining.......
ch_5_dm clustering in data mining.......ch_5_dm clustering in data mining.......
ch_5_dm clustering in data mining.......
PriyankaPatil919748
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
VoidVampire
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
SandinoBerutu1
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
ImXaib
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
sandeepsandy494692
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdf
p_manimozhi
 
Clustering
ClusteringClustering
Clustering
Md. Hasnat Shoheb
 
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
RevathiSundar4
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousing
Swetha544947
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
VIKASGUPTA127897
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 

Similar to Hierachical clustering (20)

kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
ch_5_dm clustering in data mining.......
ch_5_dm clustering in data mining.......ch_5_dm clustering in data mining.......
ch_5_dm clustering in data mining.......
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdf
 
Clustering
ClusteringClustering
Clustering
 
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousing
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 

More from Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

BlockChain.pptx
BlockChain.pptxBlockChain.pptx
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Data analytics
Data analyticsData analytics
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
Decision tree
Decision treeDecision tree
Covering algorithm
Covering algorithmCovering algorithm
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Big data in telecom
Big data in telecomBig data in telecom
Cloud Computing
Cloud ComputingCloud Computing
MapReduce
MapReduceMapReduce
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Pig Experience
Pig ExperiencePig Experience
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
HadoopDB in Action
HadoopDB in ActionHadoopDB in Action

More from Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL (20)

BlockChain.pptx
BlockChain.pptxBlockChain.pptx
BlockChain.pptx
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Data analytics
Data analyticsData analytics
Data analytics
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Decision tree
Decision treeDecision tree
Decision tree
 
Covering algorithm
Covering algorithmCovering algorithm
Covering algorithm
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Big data in telecom
Big data in telecomBig data in telecom
Big data in telecom
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
MapReduce
MapReduceMapReduce
MapReduce
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
HadoopDB in Action
HadoopDB in ActionHadoopDB in Action
HadoopDB in Action
 

Recently uploaded

Production Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptxProduction Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptx
UmeshTimilsina1
 
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
mansk2
 
How to install python packages from Pycharm
How to install python packages from PycharmHow to install python packages from Pycharm
How to install python packages from Pycharm
Celine George
 
NLC 2024 - Certificate of Recognition
NLC  2024  -  Certificate of RecognitionNLC  2024  -  Certificate of Recognition
NLC 2024 - Certificate of Recognition
Deped
 
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 SlidesHow to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
Celine George
 
React Interview Question PDF By ScholarHat
React Interview Question PDF By ScholarHatReact Interview Question PDF By ScholarHat
React Interview Question PDF By ScholarHat
Scholarhat
 
2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
Dreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptxDreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptx
AncyTEnglish
 
Reports in Odoo 17 Point of Sale - Odoo Slides
Reports in Odoo 17 Point of Sale - Odoo SlidesReports in Odoo 17 Point of Sale - Odoo Slides
Reports in Odoo 17 Point of Sale - Odoo Slides
Celine George
 
SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
MarkKennethBellen1
 
A history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, PennsylvaniaA history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, Pennsylvania
ThomasRue2
 
SQL Server Interview Questions PDF By ScholarHat
SQL Server Interview Questions PDF By ScholarHatSQL Server Interview Questions PDF By ScholarHat
SQL Server Interview Questions PDF By ScholarHat
Scholarhat
 
Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024
APEC Melmaruvathur
 
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdfFINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
HayddieMaeCapunong
 
MATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docx
MATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docxMATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docx
MATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docx
AmabellePagalunanAcl
 
Brigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptxBrigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptx
aiofits06
 
Why study French Mackenzie Neale PowerPoint
Why study French Mackenzie Neale PowerPointWhy study French Mackenzie Neale PowerPoint
Why study French Mackenzie Neale PowerPoint
nealem1
 
VRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptxVRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptx
Banker and Adjunct Lecturer
 
BANG E BHARAT QSN SET by Amra Quiz Pagoler Dol
BANG E BHARAT QSN SET by Amra Quiz Pagoler DolBANG E BHARAT QSN SET by Amra Quiz Pagoler Dol
BANG E BHARAT QSN SET by Amra Quiz Pagoler Dol
Amra Quiz Pagoler Dol (AQPD)
 
New features of Maintenance Module in Odoo 17
New features of Maintenance Module in Odoo 17New features of Maintenance Module in Odoo 17
New features of Maintenance Module in Odoo 17
Celine George
 

Recently uploaded (20)

Production Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptxProduction Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptx
 
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
 
How to install python packages from Pycharm
How to install python packages from PycharmHow to install python packages from Pycharm
How to install python packages from Pycharm
 
NLC 2024 - Certificate of Recognition
NLC  2024  -  Certificate of RecognitionNLC  2024  -  Certificate of Recognition
NLC 2024 - Certificate of Recognition
 
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 SlidesHow to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
 
React Interview Question PDF By ScholarHat
React Interview Question PDF By ScholarHatReact Interview Question PDF By ScholarHat
React Interview Question PDF By ScholarHat
 
2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx
 
Dreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptxDreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptx
 
Reports in Odoo 17 Point of Sale - Odoo Slides
Reports in Odoo 17 Point of Sale - Odoo SlidesReports in Odoo 17 Point of Sale - Odoo Slides
Reports in Odoo 17 Point of Sale - Odoo Slides
 
SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
 
A history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, PennsylvaniaA history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, Pennsylvania
 
SQL Server Interview Questions PDF By ScholarHat
SQL Server Interview Questions PDF By ScholarHatSQL Server Interview Questions PDF By ScholarHat
SQL Server Interview Questions PDF By ScholarHat
 
Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024
 
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdfFINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
 
MATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docx
MATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docxMATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docx
MATATAG-MUSIC-and-ARTS_CG-2023_GRADE-4-and-7.docx
 
Brigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptxBrigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptx
 
Why study French Mackenzie Neale PowerPoint
Why study French Mackenzie Neale PowerPointWhy study French Mackenzie Neale PowerPoint
Why study French Mackenzie Neale PowerPoint
 
VRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptxVRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptx
 
BANG E BHARAT QSN SET by Amra Quiz Pagoler Dol
BANG E BHARAT QSN SET by Amra Quiz Pagoler DolBANG E BHARAT QSN SET by Amra Quiz Pagoler Dol
BANG E BHARAT QSN SET by Amra Quiz Pagoler Dol
 
New features of Maintenance Module in Odoo 17
New features of Maintenance Module in Odoo 17New features of Maintenance Module in Odoo 17
New features of Maintenance Module in Odoo 17
 

Hierachical clustering

  • 2. • Grouping of records ,observations or cases into classes of similar objects. • A cluster is a collection of records, – Similar to one another – Dissimilar to records in other clusters What is Clustering?
  • 5. • There is no target variable for clustering • Clustering does not try to classify or predict the values of a target variable. • Instead, clustering algorithms seek to segment the entire data set into relatively homogeneous subgroups or clusters, – Where the similarity of the records within the cluster is maximized, and – Similarity to records outside this cluster is minimized. Difference between Clustering and Classification
  • 6. • Identification of groups f records such that similarity within a group is very high while the similarity to records in other groups is very low. – group data points that are close (or similar) to each other – identify such groupings (or clusters) in an unsupervised manner • Unsupervised: no information is provided to the algorithm on which data points belong to which clusters • In other words, – Clustering algorithm seeks to construct clusters of records such that the between-cluster variation(BCV) is large compared to the within-cluster variation(WCV) Goal of Clustering
  • 7. Between-cluster variation: Within-cluster variation: Goal of Clustering between-cluster variation(BCV) is large compared to the within- cluster variation(WCV) (Intra-cluster distance) the sum of distances between objects in the same cluster are minimized (Inter-cluster distance) while the distances between different clusters are maximized
  • 8. • Clustering techniques apply when there is no class to be predicted • As we've seen clusters can be: – disjoint vs. overlapping – deterministic vs. probabilistic – flat vs. hierarchical • k-means Algorithm – k-means clusters are disjoint, deterministic, and flat Clustering
  • 9. Issues Related to Clustering • How to measure similarity – Euclidian Distance – City-block Distance – Minkowski Distance • How to recode categorical variables? • How to standardize or normalize numerical variables? – Min-Max Normalization – Z-score standardization ( ) • How many clusters we expect to uncover? m,s
  • 10. Type of Clustering • Partitional clustering: Partitional algorithms determine all clusters at once. They include: – K-Means Clustering – Fuzzy c-means clustering – QT clustering • Hierarchical Clustering: – Agglomerative ("bottom-up"): Agglomerative algorithms begin with each element as a separate cluster and merge them into successively larger clusters. – Divisive ("top-down"): Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters.
  • 12. k-Means Clustering • Input: n objects (or points) and a number k • Algorithm 1) Randomly assign K records to be the initial cluster center locations 2) Assign each object to the group that has the closest centroid 3) When all objects have been assigned, recalculate the positions of the K centroids 4) Repeat steps 2 to 3 until convergence or termination
  • 14. Termination Conditions • The algorithm terminates when the centroids no longer change. • The SSE(sum of squared errors) value is less than some small threshold value  • Where p є Ci represents each data point in cluster i and mi represent the centroid of cluster i. SSE = d(p- mi )2 pÎci å i=1 k å
  • 15. Example 1: • Lets s suppose the following points are the delivery locations for pizza
  • 16. • Lets locate three cluster centers randomly
  • 17. • Find the distance of points as shown
  • 18. • Assign the points to the nearest cluster center based on the distance between each center and the point
  • 19. • Re-assign the cluster centres and locate nearest points
  • 20. • Re-assign the cluster centres and locate nearest points, calculate the distance
  • 21. Form the three clusters
  • 22. Example 2: • Suppose that we have eight data points in two-dimensional space as follows • And suppose that we are interested in uncovering k=2 clusters.
  • 23. Point Distance from m1 Distance from m2 Cluster membership a b c d e f g h   n i ii yxYXD 1 2 )(),(
  • 24. Point Distance from m1 Distance from m2 Cluster membership a 2.00 2.24 b 2.83 2.24 c 3.61 2.83 d 4.47 3.61 e 1.00 1.41 f 3.16 2.24 g 0.00 1.00 h 1.00 0.00   n i ii yxYXD 1 2 )(),(
  • 25. Point Distance from m1 Distance from m2 Cluster membership a 2.00 2.24 C1 b 2.83 2.24 C2 c 3.61 2.83 C2 d 4.47 3.61 C2 e 1.00 1.41 C1 f 3.16 2.24 C2 g 0.00 1.00 C1 h 1.00 0.00 C2 SSE=22+2.242+2.832+3.612+12+2.242+02+02=36 d(m1,m2)=1
  • 26. Centroid of the cluster 1 is [(1+1+1)/3,(3+2+1)/3] =(1,2) Centroid of the cluster 2 is [(3+4+5+4+2)/5,(3+3+3+2+1)/5] =(3.6,2.4)
  • 27. Point Distance from m1 Distance from m2 Cluster membership a b c d e f g h m1=(1,2) m2=(3.6,2.4)
  • 28. Point Distance from m1 Distance from m2 Cluster membership a 1.00 2.67 b 2.24 0.85 c 3.61 0.72 d 4.12 1.52 e 0.00 2.63 f 3.00 0.57 g 1.00 2.95 h 1.41 2.13 m1=(1,2) m2=(3.6,2.4)
  • 29. Point Distance from m1 Distance from m2 Cluster membership a 1.00 2.67 C1 b 2.24 0.85 C2 c 3.61 0.72 C2 d 4.12 1.52 C2 e 0.00 2.63 C1 f 3.00 0.57 C2 g 1.00 2.95 C1 h 1.41 2.13 C1 SSE=12+0.852+0.722+1.522+02+0.572+12+1.412=7.88 d(m1,m2)=2.63 m1(1,2) m2(3.6,2.4)
  • 30. Centroid of the cluster 1 is [(1+1+1+2)/4,(3+2+1+1)/4] =(1.25,1.75) Centroid of the cluster 2 is [(3+4+5+4)/4,(3+3+3+2)/4] =(4,2.75)
  • 31. Point Distance from m1 Distance from m2 Cluster membership a b c d e f g h m1(1.25,1.75) m2(4,2.75)
  • 32. Point Distance from m1 Distance from m2 Cluster membership a 1.27 3.01 b 2.15 1.03 c 3.02 0.25 d 3.95 1.03 e 0.35 3.09 f 2.76 0.75 g 0.79 3.47 h 1.06 2.66 m1(1.25,1.75) m2(4,2.75)
  • 33. Point Distance from m1 Distance from m2 Cluster membership a 1.27 3.01 C1 b 2.15 1.03 C2 c 3.02 0.25 C2 d 3.95 1.03 C2 e 0.35 3.09 C1 f 2.76 0.75 C2 g 0.79 3.47 C1 h 1.06 2.66 C1 m1(1.25,1.75) m2(4,2.75) SSE=1.272+1.032+0.252+1.032+0.352+0.752+0.792+1.062=6.25 d(m1,m2)=2.93
  • 37. How to decide k? • Unless the analyst has a prior knowledge of the number of underlying clusters, therefore, – Clustering solutions for each value of K is compared – The value of K resulting in the smallest SSE being selected
  • 38. Summary K-means algorithm is a simple yet popular method for clustering analysis • Low complexity :complexity is O(nkt), where t = #iterations • Its performance is determined by initialisation and appropriate distance measure • There are several variants of K-means to overcome its weaknesses – K-Medoids: resistance to noise and/or outliers(data that do not comply with the general behaviour or model of the data ) – K-Modes: extension to categorical data clustering analysis – CLARA: extension to deal with large data sets – Gaussian Mixture models (EM algorithm): handling uncertainty of clusters
  • 40. 42 Hierarchical Clustering • Build a tree-based hierarchical taxonomy (dendrogram) • One approach: recursive application of a partitional clustering algorithm. animal vertebrate fish reptile amphib. mammal worm insect crustacean invertebrate
  • 41. Hierarchical clustering and dendrograms • A hierarchical clustering on a set of objects D is a set of nested partitions of D. It is represented by a binary tree such that : – The root node is a cluster that contains all data points – Each (parent) node is a cluster made of two subclusters (childs) – Each leaf node represents one data point (singleton ie cluster with only one item) • A hierarchical clustering scheme is also called a taxonomy. In data clustering the binary tree is called a dendrogram.
  • 42. 44 • Clustering obtained by cutting the dendrogram at a desired level: each connected component forms a cluster. Dendogram: Hierarchical Clustering
  • 43. Hierarchical clustering: forming clusters • Forming clusters from dendograms
  • 44. Hierarchical clustering • There are two styles of hierarchical clustering algorithms to build a tree from the input set S: – Agglomerative (bottom-up): • Beginning with singletons (sets with 1 element) • Merging them until S is achieved as the root. • In each steps , the two closest clusters are aggregates into a new combined cluster • In this way, number of clusters in the data set is reduced at each step • Eventually, all records/elements are combined into a single huge cluster • It is the most common approach. – Divisive (top-down): • All records are combined in to a one big cluster • Then the most dissimilar records being split off recursively partitioning S until singleton sets are reached. • Does not require the number of clusters k in advance
  • 45. Two types of hierarchical clustering algorithms : Agglomerative : “bottom-up” Divisive : “top-down
  • 46. 48 Hierarchical Agglomerative Clustering (HAC) Algorithm Start with all instances in their own cluster. Until there is only one cluster: Among the current clusters, determine the two clusters, ci and cj, that are most similar. Replace ci and cj with a single cluster ci  cj • Assumes a similarity function for determining the similarity of two instances. • Starts with all instances in a separate cluster and then repeatedly joins the two clusters that are most similar until there is only one cluster.
  • 47. Classification of AHC We can distinguish AHC algorithms according to the type of distance measures used. There are two approaches : Graph methods : • Single link method • Complete link method • Group average method (UPGMA) • Weighted group average method (WPGMA) Geometric : • Ward’s method • Centroid method • Median method
  • 48. Lance –Williams Algorithm Definition(Lance-Williams formula) In AHC algorithms, the Lance-Williams formula [Lance and Williams, 1967] is a recurrence equation used to calculate the dissimilarity between a cluster Ck and a cluster formed by merging two other clusters Cl ∪Cl′ where α I , α I’ ,β, γ are real numbers
  • 49. AHC methods and the Lance-Williams formula
  • 50. Single link method • Also known as the nearest neighbor method, since it employs the nearest neighbor to measure the dissimilarity between two clusters
  • 51. Cluster distance measure • Single link – Distance between closest elements in clusters • Complete link – Distance between farthest elements in clusters • Centroids – Distance between centroids(means) of two clusters
  • 52. Single-link clustering Nested Clusters Dendrogram 1 2 3 4 5 6 1 2 3 4 5 3 6 2 5 4 1 0 0.05 0.1 0.15 0.2
  • 67. Example 02-Single link method • x 1 = (1, 2) • x 2 = (1, 2.5) • x 3 = (3, 1) • x 4 = (4, 0.5) • x 5 = (4, 2)
  • 68. • x 1 = (1, 2) • x 2 = (1, 2.5) • x 3 = (3, 1) • x 4 = (4, 0.5) • x 5 = (4, 2) Merge X1 and X2
  • 74. Merge {X1,X2} and {X3,X4,X5}
  • 76. • x 1 = (1, 2) • x 2 = (1, 2.5) • x 3 = (3, 1) • x 4 = (4, 0.5) • x 5 = (4, 2) Merge X1 and X2 Example 3-Complete link method
  • 82. Merge {X1,X2} and {X3,X4,X5}
  • 84. Summary  Hierarchical Clustering • For a dataset consisting of n points • O(n2) space; it requires storing the distance matrix • O(n3) time complexity in most of the cases(agglomerative clustering) • Advantages – Dendograms are great for visualization – Provides hierarchical relations between clusters • Disadvantages – Not easy to define levels for clusters – Can never undo what was done previously – Sensitive to cluster distance measures and noise/outliers – Experiments showed that other clustering techniques outperform hierarchical clustering • There are several variants to overcome its weaknesses – BIRCH: scalable to a large data set – ROCK: clustering categorical data – CHAMELEON: hierarchical clustering using dynamic modelling