g_logreg(G;S;Y;XX;Z)

Returns a model corresponding to the logistic regression of one or more independent variables against a given dependent variable.

Function type

Vector only

Syntax

g_logreg(G;S;Y;XX;Z)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

Y integer or decimal A column name denoting the dependent variable
XX integer or decimal A space- or comma-separated list of column names denoting the independent variable(s)

The first element of XX must be the special value 1 for the constant (intercept) term in the model.

Z text and decimal A list of pairs of option names and option values that control convergence criteria

For example: 'cgdeveps' 0.0000001 'lreps' 0.000000001

The options you may specify for the Z parameter are:

'method' value
Currently, value must be 'trirls' (for Truncated Iteratively-Reweighted Least Squares)
Note: This is the default if the 'method' option is not specified.
The following options can be specified for the 'trirls' method:
'rrlambda' value
Specify λ for ridge regularization, which penalizes large coefficients by a factor of λ times the sum of coefficients squared

Default is 0 (no ridge regularization)

Note: The default formerly was 10.0, so values will be slightly different than before.
'cgdeveps' value
Specify ε such that conjugate gradient iterations when relative difference of the deviance drops below ε

Default is 0.005 - reduce for tighter fit

'cgeps' value
Specify ε such that conjugate gradient iterations when residual drops below ε

Default is disabled - reduce below 0.001 for tighter fit

'lreps' value
Specify ε such that IRLS iterations are terminated if the relative difference of the deviance drops below ε

Default is 0.05 - reduce for tighter fit

Note: This option is independent of other settings.
Note: Only one of 'cgdeveps' or 'cgeps' may be specified!

To obtain closer fits for low-dimensional problems, try reducing 'cgdeveps' and 'lreps'.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_logreg computes a logistic regression for the independent variable(s) XX against the dependent variable Y and returns a special type representing a model for each group in the data.

The model that g_logreg returns can be used as an argument to the following functions:
  • param(M;P;I) to extract the regression model parameters
  • score(XX;M;Z) to score data points according to the regression model
Note: g_logreg may be much slower if there is significant multicollinearity in the data (i.e., if two or more of the independent variables XX are nearly perfectly correlated with each other).
Assuming M is the column containing the result of g_logreg, use the following function calls to obtain the desired information:
param(M;'b';N)
Nth coefficient of the model (corresponding to the Nth data column in XX)
score(XX;M;)
Predicted Y for data points XX according to the model