g_glm(G;S;Y;XX;Z)

Returns a generalized linear model corresponding to the regression of a dependent variable with one or more independent variables. (Available as of prod-9)

Function type

Vector only

Syntax

g_glm(G;S;Y;XX;Z)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

Y integer or decimal A column name denoting the dependent variable

If the fit_type specified by the Z parameter is logistic, Y can only take the values 0 or 1.

If the fit_type specified by the Z parameter is gaussian, Y can take any decimal value.

XX integer or decimal A space- or comma-separated list of column names denoting the independent variable(s)

If you would like to include an intercept term, the first element of XX must be the special value 1. Otherwise, no intercept will be included in the model estimation.

Z text A list of key-value pairs that provide control over the model fitting

For example: 'fit_type' 'logistic' 'eps' '1e-6'

The options you may specify for the Z parameter are:

'fit_type' 'value'

The value associated with fit_type determines the type of model generated by g_glm and can be one of the following:

  • gaussian (Gaussian model)
  • logistic (logistic model)
  • poisson (Poisson model) (Available as of version 11.01

The default is gaussian.

'eps' 'value'

The value associated with eps is a numerical value that determines convergence criteria. At each iteration in the solver, the change in the deviance is computed and compared to this value. If the relative change is smaller than the value specified, the model has converged.

The default is 1e-8.

'MAX_IT' 'value'

The value associated with MAX_IT is an integer value that determines the maximum number of iterations allowed. If the solver reaches a number of iterations greater than this value, it returns with the model parameter convergence set to false.

The default is 30.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_glm computes a regression of a dependent variable Y with one or more independent variables XX and returns a generalized linear model for each group in the data.

Note: g_glm may be much slower if there is significant multicollinearity in the data (i.e., if two or more of the independent variables XX are nearly perfectly correlated with each other).
The model that g_glm returns can be used as an argument to the following functions:
  • param(M;P;I) to extract the numerical value of a regression model parameter
  • cparam(M;P;I) to extract the text value of a regression model parameter
Assuming M is the column containing the result of g_glm, use the following function calls to obtain the desired information:
param(M;'b';N)
Nth coefficient of the model (corresponding to the Nth data column in XX)
param(M;'se';N)
Set of standard error of beta (computed with the Fisher information)
param(M;'tv';N)
Set of standardized beta values with mean of 0
param(M;'pv';N)
Set of p-values

If the fit_type is logistic, then normal CDF is used to compute p-values, else Student’s t CDF is used.

param(M;'dev';)
Deviance at last iteration
param(M;'delta';)
Percent change of the deviance from one iteration to the next

((dev - dev0) / (0.1 + dev0))

param(M;'fisher_iterations';)
Number of iterations taken by g_glm
param(M;'fit_type';)
Type of model generated by g_glm
cparam(M;'convergence';)
Boolean value indicating if there was convergence in the model

Returns true when the change in deviance is less than the convergence epsilon:

(delta <= convEPS)

Otherwise, returns false.

param(M;'df';)
Degrees of freedom
param(M;'convEPS';)
Convergence epsilon
param(M;'valcnt';)
Number of observations in the model

Example

The following example uses g_glm(G;S;Y;XX;Z) to fit a logistic regression model on Fisher's Iris data set (pub.demo.mleg.uci.iris) for the Iris-virginica class and outputs the related model parameters.

<base table="pub.demo.mleg.uci.iris"/>
<willbe name="response" value="class='Iris-virginica'"/>
<willbe name="glm_results" value="g_glm(;;response;1,sepal_length,sepal_width,
petal_length,petal_width; 'fit_type' 'logistic' )"/>
<willbe name="beta_bias" value="param(glm_results;'b';1)"/>
<willbe name="beta_1" value="param(glm_results;'b';2)"/>
<willbe name="beta_2" value="param(glm_results;'b';3)"/>
<willbe name="beta_3" value="param(glm_results;'b';4)"/>
<willbe name="beta_4" value="param(glm_results;'b';5)"/>
<note>Standard error</note>
<willbe name="se_bias" value="param(glm_results;'se';1)"/>
<willbe name="se_1" value="param(glm_results;'se';2)"/>
<willbe name="se_2" value="param(glm_results;'se';3)"/>
<willbe name="se_3" value="param(glm_results;'se';4)"/>
<willbe name="se_4" value="param(glm_results;'se';5)"/>
<note>t</note>
<willbe name="t_bias" value="param(glm_results;'tv';1)"/>
<willbe name="t_1" value="param(glm_results;'tv';2)"/>
<willbe name="t_2" value="param(glm_results;'tv';3)"/>
<willbe name="t_3" value="param(glm_results;'tv';4)"/>
<willbe name="t_4" value="param(glm_results;'tv';5)"/>
<note>p-values</note>
<willbe name="p_bias" value="param(glm_results;'pv';1)"/>
<willbe name="p_1" value="param(glm_results;'pv';2)"/>
<willbe name="p_2" value="param(glm_results;'pv';3)"/>
<willbe name="p_3" value="param(glm_results;'pv';4)"/>
<willbe name="p_4" value="param(glm_results;'pv';5)"/>
<note>Residual Deviance</note>
<willbe name="dev" value="param(glm_results;'dev';)"/>
<note>Delta</note>
<willbe name="delta" value="param(glm_results;'delta';)"/>
<note>Fisher iterations</note>
<willbe name="fisher_iterations" 
 value="param(glm_results;'fisher_iterations';)"/>
<note>Convergence</note>
<willbe name="convergence" value="cparam(glm_results;'convergence';)"/>
<willbe name="fit_type" value="cparam(glm_results;'fit_type';)"/>