Clustering Essentials: K-Means, Hierarchical u0026 DBSCAN Quiz Quiz

Explore key concepts of clustering with this quiz focused on K-Means, Hierarchical, and DBSCAN algorithms. Assess your understanding of how these clustering methods work, their strengths, and their differences across essential clustering scenarios.

  1. K-Means Cluster Centers

    In the K-Means clustering algorithm, what do the cluster centers (centroids) represent in a dataset of two-dimensional points?

    1. The data point farthest from all other points
    2. The median value of each cluster's coordinates
    3. Randomly chosen points during initialization
    4. The arithmetic mean of all points assigned to a cluster

    Explanation: In K-Means, each cluster center, or centroid, is calculated as the arithmetic mean of all points in its cluster. This minimizes within-cluster variance. The farthest data point is not the centroid, but rather an outlier or edge point. The median is not used to compute K-Means centroids. While initialization starts with random points, centroids are updated as means during clustering.

  2. Number of Clusters in K-Means

    Which of the following accurately describes how the number of clusters is determined in K-Means clustering?

    1. It is specified by the user before running the algorithm
    2. It is always equal to the maximum data dimension
    3. It is chosen by the algorithm based on the data size
    4. It is automatically determined using dendrograms

    Explanation: K-Means requires the user to specify the desired number of clusters, known as 'k', before the algorithm starts. Dendrograms relate to hierarchical clustering, not K-Means. The algorithm itself does not automatically select k based on data size. There is no requirement that the number of clusters equals the data's dimensionality.

  3. Hierarchical Clustering Structure

    What structure does hierarchical clustering produce to represent the grouping of data points?

    1. A confusion matrix
    2. A tree-like dendrogram
    3. A scatter plot
    4. A single flat list

    Explanation: Hierarchical clustering creates a dendrogram, which is a tree-like structure showing how clusters are merged or split at various steps. It is not represented as a flat list, which lacks hierarchical information. A scatter plot visualizes data but does not represent clustering structure. A confusion matrix is used for classification evaluation, not clustering.

  4. DBSCAN Core Points

    In DBSCAN, what is a 'core point' within the dataset?

    1. A point with at least a minimum number of neighbors within a specified distance
    2. A random starting point for clusters
    3. The farthest point from the dataset center
    4. A point that does not belong to any cluster

    Explanation: A core point in DBSCAN must have at least a minimum number of neighboring points within a set distance. It is not defined by being farthest from the dataset center. Points that do not belong to clusters are noise or outlier points, not core points. Random starting points are more related to algorithms like K-Means, not DBSCAN.

  5. Shape of Clusters in K-Means

    Which statement best describes the shape of clusters that K-Means performs well on?

    1. Clusters forming concentric circles
    2. Clusters that overlap heavily
    3. Clusters that are roughly spherical and equally sized
    4. Clusters of any arbitrary shape and size

    Explanation: K-Means works best with clusters that are spherical and similarly sized because it relies on Euclidean distance from centroids. It struggles with arbitrary shapes or clusters that overlap, as its distance metric isn't flexible. Clusters forming concentric circles confuse K-Means, as the centroids won't align well with the true clusters.

  6. Noise in DBSCAN

    How does DBSCAN handle noise or outlier data points in clustering?

    1. It explicitly labels them as noise and excludes them from clusters
    2. It merges noise points into the largest cluster
    3. It assigns all points to the nearest cluster
    4. It ignores noise completely and does not mark them

    Explanation: DBSCAN identifies points that neither qualify as core nor border points as noise, and these are excluded from any cluster. Assigning all points to clusters is what K-Means does, but not DBSCAN. Merging noise into the largest cluster reduces clustering quality, and DBSCAN does not ignore noise; it labels such points distinctly.

  7. Clustering Hierarchies

    Which clustering approach naturally produces a hierarchy of nested clusters?

    1. K-Means clustering
    2. Hierarchical clustering
    3. DBSCAN
    4. Principal Component Analysis

    Explanation: Hierarchical clustering forms a tree structure (dendrogram) with nested clusters at multiple levels. K-Means and DBSCAN only assign data to flat groups, not hierarchies. Principal Component Analysis is not a clustering technique, but a dimensionality reduction method, so it does not form clusters at all.

  8. DBSCAN Parameters

    Which two main parameters must be set for DBSCAN to work properly?

    1. Tree depth and splitting criterion
    2. Variance threshold and number of neighbors
    3. Number of clusters and learning rate
    4. Epsilon (maximum distance) and minimum points per cluster

    Explanation: DBSCAN requires 'epsilon' (the maximum distance for neighbor points) and the minimum number of points required to form a dense region. There is no learning rate or required number of clusters for DBSCAN, unlike in K-Means. Tree depth and splitting criteria relate to decision trees, not clustering. Variance threshold and neighbors pertain to other algorithms.

  9. K-Means Initialization

    Why can the initial choice of centroids affect the results of K-Means clustering?

    1. It prevents any further iteration after the first step
    2. It always finds the unique global minimum regardless of starting point
    3. It produces clusters of identical shapes every time
    4. It can cause the algorithm to converge to different local minima

    Explanation: K-Means is sensitive to initial centroid placement and may converge to different local minima, leading to various clusterings in different runs. It does not guarantee a unique global minimum due to random initialization. Shapes of clusters can vary, and iterations proceed regardless of the starting points; the process is not halted prematurely.

  10. Cluster Assignment in Hierarchical Clustering

    How are data points assigned to clusters in agglomerative hierarchical clustering?

    1. By splitting all points into singletons at the beginning and never merging
    2. By assigning points to clusters based on maximum density
    3. By randomly assigning points at each iteration
    4. By iteratively merging the closest pairs of clusters until a stopping criterion is met

    Explanation: Agglomerative hierarchical clustering starts with each point as a singleton and iteratively merges the closest clusters until a desired number or level is reached. Assigning points by density refers to DBSCAN, not hierarchical clustering. Random assignment does not create meaningful clusters, and points are not kept as singletons indefinitely, but merged into clusters.