1.3 Coarse graining part I - Clustering algorithms » Quiz Solution
1. Why does a clustering algorithm like k-means produce a coarse-graining?
A. it tells you how to map a high-dimensional description with lots of variables to a small number of discrete categories.
B. it's automated, so that the human user can't manipulate the answer.
C. the clusters that the algorithm finds are scientifically meaningful.
D. it transforms the data points using their averages.
Answer: (A). (B) is wrong because coarse-grainings are not defined by being "without human intervention", indeed, some of the most common examples of coarse-graining involve human judgement -- e.g., when a professor groups essays into A, B, C and D-grade essays. (C) is wrong because a coarse-graining doesn't have to be scientifically meaningful; we could end up with crappy clusters that tell us nothing interesting, but it's still a coarse-graining. (D) is wrong because just because an algorithm uses a summary, or coarse-grained variable, doesn't make it a coarse-graining; if I take the mean value of all the variables, and add it on, I have moved the data around, but not simplified it. Note that if I subtract the mean value, I have done a (very light) coarse-graining -- I have made an irreversible transformation (I no longer know the mean offset) and reduced the information (if I had n datapoints, I now only need to keep n-1 because I know the final one has to be such that the average works out to zero).
2. Does the majority vote algorithm correspond to a "hard" or a "soft" clustering?
A. hard
B. soft
Answer: hard. Any particular 10x10 grid point gets mapped to a single discrete symbol, black vs. white. In a soft clustering, a grid point would be mapped to (for example) a probability ("25% white").