`g_glm(G;S;Y;XX;Z)`

Returns a generalized linear model corresponding to the regression of a dependent variable with one or more independent variables. (Available as of prod-9)

Function type

Vector only

Syntax

g_glm(G;S;Y;XX;Z)

Input

Argument	Type	Description
`G`	any	A space- or comma-separated list of column names Rows are in the same group if their values for all of the columns listed in `G` are the same. If `G` is omitted, all rows are considered to be in the same group. If any of the columns listed in `G` contain N/A, the N/A value is considered a valid grouping value.
`S`	integer	The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation If `S` is omitted, all rows will be considered by the function (subject to any prior row selections). If any of the values in `S` are neither 1 nor 0, an error is returned.
`Y`	integer or decimal	A column name denoting the dependent variable If the `fit_type` specified by the `Z` parameter is `logistic`, `Y` can only take the values `0` or `1`. If the `fit_type` specified by the `Z` parameter is `gaussian`, `Y` can take any decimal value.
`XX`	integer or decimal	A space- or comma-separated list of column names denoting the independent variable(s) If you would like to include an intercept term, the first element of `XX` must be the special value `1`. Otherwise, no intercept will be included in the model estimation.
`Z`	text	A list of key-value pairs that provide control over the model fitting For example: `'fit_type' 'logistic' 'eps' '1e-6'` The options you may specify for the `Z` parameter are: `'fit_type' 'value'` The `value` associated with `fit_type` determines the type of model generated by `g_glm` and can be one of the following: `gaussian` (Gaussian model) `logistic` (logistic model) `poisson` (Poisson model) (Available as of version 11.01 The default is `gaussian`. `'eps' 'value'` The `value` associated with `eps` is a numerical value that determines convergence criteria. At each iteration in the solver, the change in the deviance is computed and compared to this value. If the relative change is smaller than the value specified, the model has converged. The default is 1e-8. `'MAX_IT' 'value'` The `value` associated with `MAX_IT` is an integer value that determines the maximum number of iterations allowed. If the solver reaches a number of iterations greater than this value, it returns with the model parameter `convergence` set to `false`. The default is 30.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_glm computes a regression of a dependent variable Y with one or more independent variables XX and returns a generalized linear model for each group in the data.

Note: g_glm may be much slower if there is significant multicollinearity in the data (i.e., if two or more of the independent variables XX are nearly perfectly correlated with each other).

The model that g_glm returns can be used as an argument to the following functions:

param(M;P;I) to extract the numerical value of a regression model parameter
cparam(M;P;I) to extract the text value of a regression model parameter

Assuming M is the column containing the result of g_glm, use the following function calls to obtain the desired information:

param(M;'b';N)

Nth coefficient of the model (corresponding to the Nth data column in XX)

param(M;'se';N)

Set of standard error of beta (computed with the Fisher information)

param(M;'tv';N)

Set of standardized beta values with mean of 0

param(M;'pv';N)

Set of p-values

If the fit_type is logistic, then normal CDF is used to compute p-values, else Student’s t CDF is used.

param(M;'dev';)

Deviance at last iteration

param(M;'delta';)

Percent change of the deviance from one iteration to the next

((dev - dev0) / (0.1 + dev0))

param(M;'fisher_iterations';)

Number of iterations taken by g_glm

param(M;'fit_type';)

Type of model generated by g_glm

cparam(M;'convergence';)

Boolean value indicating if there was convergence in the model

Returns true when the change in deviance is less than the convergence epsilon:

(delta <= convEPS)

Otherwise, returns false.

param(M;'df';)

Degrees of freedom

param(M;'convEPS';)

Convergence epsilon

param(M;'valcnt';)

Number of observations in the model

Example

The following example uses g_glm(G;S;Y;XX;Z) to fit a logistic regression model on Fisher's Iris data set (pub.demo.mleg.uci.iris) for the Iris-virginica class and outputs the related model parameters.

<base table="pub.demo.mleg.uci.iris"/>
<willbe name="response" value="class='Iris-virginica'"/>
<willbe name="glm_results" value="g_glm(;;response;1,sepal_length,sepal_width,
petal_length,petal_width; 'fit_type' 'logistic' )"/>
<willbe name="beta_bias" value="param(glm_results;'b';1)"/>
<willbe name="beta_1" value="param(glm_results;'b';2)"/>
<willbe name="beta_2" value="param(glm_results;'b';3)"/>
<willbe name="beta_3" value="param(glm_results;'b';4)"/>
<willbe name="beta_4" value="param(glm_results;'b';5)"/>
<note>Standard error</note>
<willbe name="se_bias" value="param(glm_results;'se';1)"/>
<willbe name="se_1" value="param(glm_results;'se';2)"/>
<willbe name="se_2" value="param(glm_results;'se';3)"/>
<willbe name="se_3" value="param(glm_results;'se';4)"/>
<willbe name="se_4" value="param(glm_results;'se';5)"/>
<note>t</note>
<willbe name="t_bias" value="param(glm_results;'tv';1)"/>
<willbe name="t_1" value="param(glm_results;'tv';2)"/>
<willbe name="t_2" value="param(glm_results;'tv';3)"/>
<willbe name="t_3" value="param(glm_results;'tv';4)"/>
<willbe name="t_4" value="param(glm_results;'tv';5)"/>
<note>p-values</note>
<willbe name="p_bias" value="param(glm_results;'pv';1)"/>
<willbe name="p_1" value="param(glm_results;'pv';2)"/>
<willbe name="p_2" value="param(glm_results;'pv';3)"/>
<willbe name="p_3" value="param(glm_results;'pv';4)"/>
<willbe name="p_4" value="param(glm_results;'pv';5)"/>
<note>Residual Deviance</note>
<willbe name="dev" value="param(glm_results;'dev';)"/>
<note>Delta</note>
<willbe name="delta" value="param(glm_results;'delta';)"/>
<note>Fisher iterations</note>
<willbe name="fisher_iterations" 
 value="param(glm_results;'fisher_iterations';)"/>
<note>Convergence</note>
<willbe name="convergence" value="cparam(glm_results;'convergence';)"/>
<willbe name="fit_type" value="cparam(glm_results;'fit_type';)"/>