Clustering Essentials: K-Means, Hierarchical u0026 DBSCAN Quiz Quiz

Explore fundamental concepts of clustering algorithms including K-Means, Hierarchical, and DBSCAN, focusing on their characteristics, use-cases, and differences. This quiz helps you reinforce your knowledge on clustering techniques, parameters, and key principles essential for data science and unsupervised learning.

  1. K-Means Algorithm Initialization

    Which step is performed first in the K-Means clustering algorithm when grouping a set of data points?

    1. Randomly assigning cluster centers
    2. Merging closest data points
    3. Sorting all data points
    4. Calculating distances between all points

    Explanation: The first step in K-Means clustering is randomly assigning cluster centers (also known as centroids). This serves as the starting point before the algorithm iteratively updates the clusters. Calculating distances occurs after initialization, not before it. Sorting data points is not a standard part of the algorithm, and merging is associated with hierarchical clustering, not K-Means.

  2. Cluster Shape Sensitivity

    What is a key limitation of K-Means clustering when applied to data with complex, non-spherical cluster shapes?

    1. It only discovers circular clusters
    2. It always produces overlapping clusters
    3. It ignores outliers completely
    4. It merges all data into one cluster

    Explanation: K-Means works best with clusters that are roughly spherical (circular in 2D), because it uses Euclidean distance from cluster centers. It does not inherently cause all data to merge into one cluster, nor does it always produce overlapping clusters. While K-Means can be sensitive to outliers, it does not totally ignore them.

  3. DBSCAN Fundamentals

    In the DBSCAN algorithm, which feature differentiates it from K-Means and Hierarchical clustering?

    1. Sensitivity to data sorting
    2. Use of centroids for clusters
    3. Ability to identify noise points
    4. Requirement to specify the number of clusters

    Explanation: DBSCAN can identify noise points that don't belong to any cluster by analyzing density, a capability not present in traditional K-Means or agglomerative hierarchical clustering. Unlike K-Means, DBSCAN does not require specifying the number of clusters. The result is independent of data sorting, and DBSCAN does not use centroids.

  4. Hierarchical Clustering Dendrogram

    What does a dendrogram represent in hierarchical clustering?

    1. A tree showing merging of clusters
    2. A table of cluster centroids
    3. A map of density regions
    4. A chart of distances between all points

    Explanation: A dendrogram visually displays how clusters are merged step by step in hierarchical clustering, resembling a tree structure. It is not a chart of distances, though distances are shown on the axes. A table of centroids would be relevant for K-Means, and a density map pertains more to DBSCAN.

  5. DBSCAN Parameters

    Which two main parameters must be defined when using the DBSCAN algorithm?

    1. Epsilon and MinPts
    2. Alpha and Beta
    3. Iterations and K
    4. Gamma and Delta

    Explanation: DBSCAN requires Epsilon (maximum distance for neighborhood search) and MinPts (minimum points to form a dense cluster). Parameters like Alpha, Beta, Gamma, and Delta are unrelated to DBSCAN. 'Iterations' and 'K' pertain to iterative algorithms and K-Means specifically.

  6. Choosing Cluster Number in K-Means

    Which method is commonly used to select an appropriate value of K in the K-Means algorithm?

    1. Silhouette method
    2. Centroid swap
    3. Dendrogram threshold
    4. Elbow method

    Explanation: The Elbow method is frequently used to decide the optimal number of clusters by plotting the sum of squared errors versus the number of clusters. The Silhouette method also helps but is less commonly the first choice for basic users. Centroid swap is not a standard method, and dendrograms are used for hierarchical clustering, not K-Means.

  7. Agglomerative vs Divisive Hierarchical Clustering

    What is the main difference between agglomerative and divisive hierarchical clustering?

    1. Agglomerative sorts clusters by size, divisive sorts by density
    2. Agglomerative merges clusters, divisive splits them
    3. Agglomerative uses Elbow method, divisive uses DBSCAN
    4. Agglomerative uses centroids, divisive uses medoids

    Explanation: Agglomerative hierarchical clustering begins with individual points and merges them, whereas divisive starts with all points in a cluster and recursively splits them. Sorting by size or density does not differentiate these methods. Neither uses specific Elbow or DBSCAN approaches for cluster formation.

  8. K-Means Output Interpretation

    After running K-Means clustering, you receive a set of centroids and labels for each data point. What does each centroid represent?

    1. The mean position of all points in a cluster
    2. The farthest point from the cluster
    3. The highest-density area
    4. The smallest cluster region

    Explanation: Each centroid corresponds to the mean position (average) of all points assigned to its cluster in feature space. It is not the farthest point from the cluster, as centroids are centrally located. Highest-density area might be closer to DBSCAN's core point concept, and smallest region is not a property of centroids.

  9. Cluster Membership in DBSCAN

    How does DBSCAN determine if a point should be added to a cluster?

    1. Through random initialization
    2. Based on density of neighboring points
    3. By direct centroid assignment
    4. Via size sorting of clusters

    Explanation: A point in DBSCAN is added to a cluster if it has enough neighboring points within a certain radius, revealing dense regions. Centroid assignment is a feature of K-Means, not DBSCAN. Random initialization is more relevant for K-Means, and DBSCAN does not involve sorting clusters by size.

  10. Cluster Overlap Handling

    Which statement best describes how K-Means deals with data points that are equally close to two centroids?

    1. Assigns the point to one centroid arbitrarily
    2. Merges centroids together
    3. Flags the point as noise
    4. Divides the point's membership equally

    Explanation: If a data point is equally close to two or more centroids in K-Means, it is assigned to one of them arbitrarily due to the algorithm's determinism. K-Means does not support fractional memberships as in soft clustering, nor does it flag such points as noise; merging centroids is not a standard operation in this algorithm.