Data Clusters

Clusters are collections of similar data
Clustering is a type of unsupervised learning
The Correlation Coefficient describes the strength of a relationship.

Clusters

Clusters are collections of data based on similarity.

Data points clustered together in a graph can often be classified into clusters.

In the graph below we can distinguish 3 different clusters:

Identifying Clusters

Clusters can hold a lot of valuable information, but clusters come in all sorts of shapes, so how can we recognize them?

The two main methods are:

Using Visualization
Using an Clustering Algorithm

Clustering

Clustering is a type of Unsupervised Learning.

Clustering is trying to:

Collect similar data in groups
Collect dissimilar data in other groups

Clustering Methods

Density Method
Hierarchical Method
Partitioning Method
Grid-based Method

The Density Method considers points in a dense regions to have more similarities and differences than points in a lower dense region. The density method has a good accuracy. It also has the ability to merge clusters.
Two common algorithms are DBSCAN and OPTICS.

The Hierarchical Method forms the clusters in a tree-type structure. New clusters are formed using previously formed clusters.
Two common algorithms are CURE and BIRCH.

The Grid-based Method formulates the data into a finite number of cells that form a grid-like structure.
Two common algorithms are CLIQUE and STING

The Partitioning Method partitions the objects into k clusters and each partition forms one cluster.
One common algorithm is CLARANS.

Correlation Coefficient

The Correlation Coefficient (r) describes the strength and direction of a linear relationship and x/y variables on a scatterplot.

The value of r is always between -1 and +1:

-1.00	Perfect downhill	Negative linear relationship.
-0.70	Strong downhill	Negative linear relationship.
-0.50	Moderate downhill	Negative linear relationship.
-0.30	Weak downhill	Negative linear relationship.
0		No linear relationship.
+0.30	Weak uphill	Positive linear relationship.
+0.50	Moderate uphill	Positive linear relationship.
+0.70	Strong uphill	Positive linear relationship.
+1.00	Perfect uphill	Positive linear relationship.

Perfect Uphill +1.00:

Perfect Downhill -1.00:

Strong Uphill +0.61:

No Relationship:

❮ Previous Next ❯

★ +1

Track your progress - it's free!

Machine Learning

TensorFlow

Example 1

Example 2

JS Graphics

History

Mathematics

Statistics

Data Clusters

Clusters

Identifying Clusters

Clustering

Clustering Methods

Correlation Coefficient

COLOR PICKER

Contact Sales

Report Error

Top Tutorials

Top References

Top Examples

Get Certified