Home
Function Reference
1010data Insights Platform offers a rich set of functions that may be used in value expressions when creating computed columns and in selection expressions when performing row selections.
Group functions (g_functions)
Group functions, or g_functions, are used to perform operations, such as summarizations, on rows that have the same values in a set of given columns.
Modeling
g_cluster(G;S;XX;A;N;Z)
Returns a model corresponding to the clustering of points in data.

Function Reference
1010data Insights Platform offers a rich set of functions that may be used in value expressions when creating computed columns and in selection expressions when performing row selections.
- All 1010data Functions
  1010data provides a whole range of functions that can provide information and act on your data.
- Time/Date functions
  The time/date functions are vector functions that operate on columns containing time-related data and return specific information about the date/time for each row.
- Group functions (g_functions)
  Group functions, or g_functions, are used to perform operations, such as summarizations, on rows that have the same values in a set of given columns.
  - Using g_functions
    Provides an overview of basic functionality of g_functions in 1010data.
  - Creating a selection column
    All g_functions contain an optional S parameter, or a selection column, to tell the g_function whether to include or exclude a row in its calculations.
  - Summarizations
  - Statistics
  - Time and order
  - Conditional
  - Modeling
    - classify(XX;M;Z)
      Returns the result of classifying data points according to a discrete model.
    - cparam(M;P;I)
      Returns the text value of a particular model parameter extracted from a discrete or continuous model.
    - g_arima(G;S;O;X;P;D;Q;OPTS)
      Fits an ARIMA(P,D,Q) model to the time series data in column X. (Available as of version 19.14)
    - g_arimax(G;S;O;X;VV;P;D;Q;OPTS)
      Fits an ARIMA(P,D,Q) model to the time series data in column X, with eXogenous variables. (Available as of version 19.35)
    - g_cluster(G;S;XX;A;N;Z)
      Returns a model corresponding to the clustering of points in data.
    - g_glm(G;S;Y;XX;Z)
      Returns a generalized linear model corresponding to the regression of a dependent variable with one or more independent variables. (Available as of prod-9)
    - g_info_iv(G;S;X;Y)
      Returns the information value (IV) of X provided Y. (Available as of version 10.42)
    - g_info_woe(G;S;X;Y)
      Returns the information theoretic value weight of evidence (WoE) of X provided Y. (Available as of version 10.42)
    - g_liblinear(G;S;Y;XX;Z)
      Returns a model of a given data set using one of the 10 supported underlying algorithms, which include logistic regression as well as both support vector classification and regression. (Available as of prod-9)
    - g_logreg(G;S;Y;XX;Z)
      Returns a model corresponding to the logistic regression of one or more independent variables against a given dependent variable.
    - g_lsq(G;S;Y;XX)
      Returns a model corresponding to the multiple least squares regression of one or more independent variables against a given dependent variable.
    - g_pca(G;S;XX;Z)
      Returns a model corresponding to the principal component analysis of one or more variables.
    - g_sarima(G;S;O;X;SP;P;D;Q;PP;DD;QQ;OPTS)
      Perform Seasonal ARIMA. (Available as of version 19.35)
    - g_sarimax(G;S;O;X;VV;SP;P;D;Q;PP;DD;QQ;OPTS)
      Perform Seasonal ARIMA with eXogenous variables. Since ARIMA is a special case of SARIMAX, g_sarimax can fit the entire family of models. (Available as of version 19.35)
    - g_wlsq(G;S;Y;W;XX)
      Returns a model corresponding to the weighted multiple least squares regression of one or more independent variables against a given dependent variable.
    - param(M;P;I)
      Returns the numerical value of a particular model parameter extracted from a discrete or continuous model.
    - score(XX;M;Z)
      Returns the result of scoring data points according to a continuous model.
    - select(V;I)
      Returns an element extracted from a vector value
  - Matrix functions
  - Signal processing
  - Miscellaneous
- Math functions
  Math functions are vector functions that perform mathematical operations on one or more columns and return a column of results based on the operation.
- Matrix functions
- Categorization functions
  Categorization functions are vector functions that organize data and determine logical groupings. These functions can be used for conditionalizing results and/or bucketing value ranges.
- String functions
  String functions are vector functions that manipulate vectors of strings to provide information about a string or substring, concatenate and split strings, or transform strings based on specified criteria.
- List functions
  List functions are scalar functions that return lists, subsets of lists, and combinations of lists, among other functionality. Lists, along with packages, are compound scalar data types that facilitate programmatic interaction with scalar data values and variables.
- Data-handling functions
  Data-handling functions are vector functions that return a hash value based on an input column or columns.
- Row functions
  Row functions are vector functions that return computational outputs for row inputs, as opposed to column outputs. Row inputs are defined as a space- or comma-separated list of column names.
- SQL compatibility functions
  SQL compatibility functions are functions that treat null values as SQL nulls.
- Financial functions
- System functions
  System functions are special functions in 1010data that return information about users, tables, and other system objects.
- Object functions
  The object functions can be used to check the existence, type, and accessibility of objects such as folders, tables, and queries on the 1010data Insights Platform.
- Miscellaneous functions
  This category of functions contains special functions that don't neatly fit into other categories of functions.

`g_cluster(G;S;XX;A;N;Z)`

Returns a model corresponding to the clustering of points in data.

Function type

Vector only

Syntax

g_cluster(G;S;XX;A;N;Z)

Input

Argument	Type	Description
`G`	any	A space- or comma-separated list of column names Rows are in the same group if their values for all of the columns listed in `G` are the same. If `G` is omitted, all rows are considered to be in the same group. If any of the columns listed in `G` contain N/A, the N/A value is considered a valid grouping value.
`S`	integer	The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation If `S` is omitted, all rows will be considered by the function (subject to any prior row selections). If any of the values in `S` are neither 1 nor 0, an error is returned.
`XX`	integer or decimal	A space- or comma-separated list of column names This denotes the data for clustering.
`A`	text	A string that specifies the clustering algorithm to use `A` may have the value: `'kmeans'` for k-means clustering Note: When `A` is `'kmeans'`, it is mandatory to specify `N`.
`N`	integer	The number of clusters into which to partition the data
`Z`	integer or decimal	A list of two elements `Z` is a list of two elements that specify: the maximum number of iterations to be used in the clustering routine (integer) the number of extra different starting points to try the clustering from If `Z` is omitted, then the default values of `1000` (iterations) and `0` (extra starting points) are used.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_cluster applies a clustering algorithm specified by method A (as modified by parameters N and Z) to the data in XX and returns a special type representing a model for each group in the data.

The model that g_cluster returns can be used as an argument to:

param(M;P;I) to extract the clustering model parameters, or
classify(XX;M;Z) to classify data points (i.e., assign them to clusters)

Assuming M is the column containing the result of g_cluster, use the following function calls to obtain the desired information:

param(M;'centers';D I)

D'th dimension of the center of the I'th cluster

D ranges from 1 to the number of elements in XX above
I ranges from 1 to N above

classify(XX;M;)

Discrete cluster assignment I for each point in XX