similarity measure

The glossary is being gradually proof checked, but may have typos and misspellings.

Given two data items, we often need to calculate some measure or metric of how similar they are. For example, this may be used by a clustering algorithm. For discrete valued features this might simply be a count of how many features are identical. For continuous valued feature some distance measure may be used, such as Euclidean distance or Manhatten block distance, but to be a similarity measure this would usually be inverted in some way (e.g. 1/distance) so that higher values are more similar.

Used in Chap. 7: pages 91, 92; Chap. 8: page 108; Chap. 9: page 118; Chap. 10: pages 134, 141; Chap. 12: page 185; Chap. 16: pages 240, 242, 248; Chap. 18: page 286; Chap. 21: page 340

Also known as: similarity, similarity metrics

Used in glossary entries: clustering, Euclidean distance