g_cluster(G;S;XX;A;N;Z)

Returns a model corresponding to the clustering of points in data.

Function type

Vector only

Syntax

g_cluster(G;S;XX;A;N;Z)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

XX integer or decimal A space- or comma-separated list of column names

This denotes the data for clustering.

A text A string that specifies the clustering algorithm to use
A may have the value:
  • 'kmeans' for k-means clustering
    Note: When A is 'kmeans', it is mandatory to specify N.
N integer The number of clusters into which to partition the data
Z integer or decimal A list of two elements
Z is a list of two elements that specify:
  • the maximum number of iterations to be used in the clustering routine (integer)
  • the number of extra different starting points to try the clustering from

If Z is omitted, then the default values of 1000 (iterations) and 0 (extra starting points) are used.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_cluster applies a clustering algorithm specified by method A (as modified by parameters N and Z) to the data in XX and returns a special type representing a model for each group in the data.

The model that g_cluster returns can be used as an argument to:
Assuming M is the column containing the result of g_cluster, use the following function calls to obtain the desired information:
param(M;'centers';D I)
D'th dimension of the center of the I'th cluster
  • D ranges from 1 to the number of elements in XX above
  • I ranges from 1 to N above
classify(XX;M;)
Discrete cluster assignment I for each point in XX