Home > E-book list > Cluster Analysis

Garson, G. D. (2014). Cluster analysis. Asheboro, NC: Statistical Associates Publishers.

Instant availablity without passwords in Kindle format on Amazon: click http://www.amazon.com/dp/B009442Y1G.
Tutorial on the free Kindle for PC Reader app: click here.
Obtain the free Kindle Reader app for any device: click here.
Delayed availability with passwords in free pdf format: right-click here on Dec. 3 and save file.
Register to obtain a password: click here.
Statistical Associates Publishers home page.
About the author
Table of Contents
ISBN: 978-1-62638-030-1
ASIN: B009442Y1G.
@c 2014 by G. David Garson and Statistical Associates Publishers. worldwide rights reserved in all languages and on all media. Permission is not granted to copy, distribute, or post e-books or passwords.


An illustrated tutorial and introduction to cluster analysis using SPSS, SAS, SAS Enterprise Miner, and Stata for examples. Suitable for introductory graduate-level study.

The 2014 edition is a major update to the 2012 edition. Among the new features are these:

The full content is now available from Statistical Associates Publishers. Click here.

Below is the unformatted table of contents.

Table of Contents
Overview	10
Data examples in this volume	10
Key Concepts and Terms	12
Terminology	12
Distances (proximities)	12
Cluster formation	12
Cluster validity	12
Types of cluster analysis	14
Types of cluster analysis by software package	14
Disjoint clustering	15
Hierarchical clustering	15
Overlapping clustering	16
Fuzzy clustering	16
Hierarchical cluster analysis in SPSS	16
SPSS Input for hierarchical clustering	16
Example	16
The main "Hierarchical Cluster Analysis" dialog	17
Statistics button	18
Plots button	19
Methods button	20
SPSS output for hierarchical cluster analysis	21
Proximity table	21
Cluster membership table	22
Agglomeration Schedule	22
Dendogram	24
Icicle plots	27
Summary measures	28
Hierarchical cluster analysis in SAS	29
SAS input for hierarchical cluster analysis\	29
Example	29
Data setup	29
SAS syntax	30
SAS output for hierarchical cluster analysis	31
Simple statistics table	31
Eigenvalues of the covariance matrix table	31
Root mean square coefficients	32
Cluster history table	33
Dendogram	34
Icicle Plots	36
Cluster membership table	36
Saving data to file	37
Hierarchical cluster analysis in Stata	38
Stata input for hierarchical cluster analysis	38
Stata output for hierarchical cluster analysis	40
Agglomeration coefficients	40
Dendogram	41
Saving cluster membership values	42
Cluster membership table	43
K-means cluster analysis	44
Overview	44
Example	45
K-means cluster analysis in SPSS	45
SPSS input	45
Main K-means dialog	45
The Iterate button	47
The Save button	48
The Options button	49
SPSS Output for K-Means cluster analysis	50
The Anova table	50
Number of cases in each cluster	51
Getting different clusters	52
Cluster membership table	52
K-Means cluster analysis in SAS	53
Overview	53
Example	54
SAS input for k-means cluster analysis	54
SAS output for k-means cluster analysis	55
The "Statistics for Variables" table	55
Criteria for determining k	57
The "Cluster Summary" table	60
Cluster membership and distance values	61
Crosstabulation tables	61
Cluster separation plots	62
K-Means cluster analysis in Stata	64
Example	64
Stata input for k-means cluster analysis	64
The main kmeans clustering command	64
Obtaining descriptive statistics	65
Obtaining distance information	65
Obtaining cluster separation plots	65
Comparing kmeans and kmedian solutions	66
Stata output for k-means cluster analysis	66
Cluster membership assignments	66
Descriptive statistics	67
Distance coefficients	69
Cluster separation plots	70
Comparing kmeans and kmedians solutions	71
Two-step cluster analysis in SPSS	72
Overview	72
Cluster feature tree (CF tree)	73
Proximity	73
Example	74
SPSS input for two-step clustering	74
The main two-step clustering dialog	74
Options button dialog	75
Output button dialog	78
SPSS output for two-step clustering	79
Autoclustering table	79
Cluster distribution table	81
Centroids (cluster profiles) table	81
Model summary	82
The "Cluster Quality" graph	82
The "Cluster Sizes" pie chart	82
The "Predictor Importance" chart	83
The "Clusters" table	84
The "Cell Distribution" chart	85
The "Cluster Comparison" chart	86
Nearest neighbor analysis in SPSS	87
Overview	87
Target variables	87
Selecting k	87
Feature variables	88
Focal cases	88
Case labels	89
Partitions and cross-validation	89
Example	89
SPSS input	90
The user interface	90
The "Variables" tab	90
The "Neighbors" tab	91
The "Features" tab	92
The "Partitions" tab	93
The "Save" tab	95
The "Output" tab	96
The "Options" tab	97
SPSS output	97
Overview	97
The "Case Processing Summary " table	98
The "Predictor Space" plot	98
The "Peers Chart"	101
The "k Nearest Neighbors and Distances" table	102
"k and Predictor Selection" plots	103
"Quadrant Map" maps	104
The "Error Summary" table	105
SAS PROC ACECLUS: Pre-processing for elliptical clusters	106
Overview	106
Example	106
SAS input	107
Overview	107
Set-up	107
Plot of original data	108
Using PROC ACECLUS to transform the data	108
Plot of transformed data	109
K-means clustering of transformed data	109
K-means clustering of original data	110
SAS output	110
Plot of untransformed data	110
Data transformation with PROC ACECLUS	111
Plot of transformed data	112
K-means (PROC FASTCLUS) results with original vs. transformed data	113
SAS PROC VARCLUS : Oblique principal components  cluster analysis	115
Overview	115
The PROC VARCLUS default method	115
PROC VARCLUS variations	115
Example	116
SAS input	116
SAS output	119
The dendogram from PROC TREE	119
The cluster summary table	119
The R-squared table	121
The standardized scoring coefficients table	122
The cluster structure table	123
The table of inter-cluster correlations	124
The cluster history summary statistics table	125
Cluster membership	126
Cluster scores	127
SAS PROC MODECLUS: Nonparametric density cluster analysis	127
Overview	127
Interpreting p-values	129
Example	129
SAS input	130
PROC MODECLUS specifications	130
PROC MODECLUS  command syntax	131
SAS output	133
First pass: Selecting the optimal radius	133
Second pass: Generating main output	136
PROC MODECLUS: Nearest neighbor analysis	141
SAS syntax for nearest neighbor lists/distances	141
SAS output for nearest neighbor analysis	142
Kohonen clustering in SAS Enterprise Miner	144
Overview of Kohonen clustering	144
Kohonen Clustering in SAS Enterprise Miner: Setup	144
Kohonen Clustering in SAS Enterprise Miner: Modeling	153
Overview	153
The flow chart model	154
Node overview	156
The "Input Data" node	156
The "SOM/Kohonen" node	157
The "Segment Profile" node	159
Kohonen Clustering in SAS Enterprise Miner: Output	160
Results of the "Data Input" node	160
Results of the "SOM/Kohonen" node	161
Results of the "Segment Profile" node	165
Other Forms of Cluster Analysis	173
Expectation maximization (EM) clustering	173
Cross-classification to determine k	173
Distributional characteristics	173
Classification probabilities	174
Q-mode factor analysis	174
Multidimensional scaling	175
Discriminant function analysis	175
F-ratio methods	176
Assumptions	176
Randomization	176
Data level	176
Independence of observations	176
Data distribution	177
Comparable scaling	177
GLM assumptions	178
Sample size	178
Outliers	178
Frequently Asked Questions	178
Should data be standardized prior to running cluster analysis?	178
What are alternative linkage methods?	180
SPSS	180
SAS	181
Stata	182
What are alternative distance measures?	183
SPSS	183
SAS	191
Stata	193
It is acknowledged that k-means and hierarchical clustering are inefficient and inaccurate for large datasets, but what is the evidence that two-step clustering does better?	194
Can I cluster variables instead of cases?	194
Can I cluster repeated measures data?	194
Isn't discriminant analysis the same as cluster analysis?	195
What is the ratio of distance measure used in autoclustering in two-step cluster analysis?	195
How does SAS's PROC MODECLUS work?	196
How does joining and dissolving work in SAS PROC MODECLUS?	196
What is the rationale for the stability value criterion in SAS PROC MODECLUS?	198
What does the content of OUTSTAT= files look like for PROC VARCLUS?	199
What is BIRCH clustering?	200
What is ClustanGraphics?	200
What is SaTScan?	201
Where can I find cluster software for R?	201
How does cluster analysis compare with factor analysis and multidimensional scaling?	201
Acknowledgments	201
Bibliography	201          
Pagecount: 207