Menu
×
   ❮     
HTML CSS JAVASCRIPT SQL PYTHON JAVA PHP HOW TO W3.CSS C C++ C# BOOTSTRAP REACT MYSQL JQUERY EXCEL XML DJANGO NUMPY PANDAS NODEJS R TYPESCRIPT ANGULAR GIT POSTGRESQL MONGODB ASP AI GO KOTLIN SASS VUE DSA GEN AI SCIPY AWS CYBERSECURITY DATA SCIENCE
     ❯   

Data Clusters

  • Clusters are collections of similar data
  • Clustering is a type of unsupervised learning
  • The Correlation Coefficient describes the strength of a relationship.

Clusters

Clusters are collections of data based on similarity.

Data points clustered together in a graph can often be classified into clusters.

In the graph below we can distinguish 3 different clusters:


Identifying Clusters

Clusters can hold a lot of valuable information, but clusters come in all sorts of shapes, so how can we recognize them?

The two main methods are:

  • Using Visualization
  • Using an Clustering Algorithm

Clustering

Clustering is a type of Unsupervised Learning.

Clustering is trying to:

  • Collect similar data in groups
  • Collect dissimilar data in other groups

Clustering Methods

  • Density Method
  • Hierarchical Method
  • Partitioning Method
  • Grid-based Method

The Density Method considers points in a dense regions to have more similarities and differences than points in a lower dense region. The density method has a good accuracy. It also has the ability to merge clusters.
Two common algorithms are DBSCAN and OPTICS.

The Hierarchical Method forms the clusters in a tree-type structure. New clusters are formed using previously formed clusters.
Two common algorithms are CURE and BIRCH.

The Grid-based Method formulates the data into a finite number of cells that form a grid-like structure.
Two common algorithms are CLIQUE and STING

The Partitioning Method partitions the objects into k clusters and each partition forms one cluster.
One common algorithm is CLARANS.



Correlation Coefficient

The Correlation Coefficient (r) describes the strength and direction of a linear relationship and x/y variables on a scatterplot.

The value of r is always between -1 and +1:

-1.00Perfect downhillNegative linear relationship.
-0.70Strong downhillNegative linear relationship.
-0.50Moderate downhillNegative linear relationship.
-0.30Weak downhillNegative linear relationship.
0No linear relationship.
+0.30Weak uphillPositive linear relationship.
+0.50Moderate uphillPositive linear relationship.
+0.70Strong uphillPositive linear relationship.
+1.00Perfect uphillPositive linear relationship.

Perfect Uphill +1.00:

Perfect Downhill -1.00:

'

Strong Uphill +0.61:

No Relationship:


×

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail:
[email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail:
[email protected]

W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning. Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. While using W3Schools, you agree to have read and accepted our terms of use, cookie and privacy policy.

Copyright 1999-2024 by Refsnes Data. All Rights Reserved. W3Schools is Powered by W3.CSS.