g_liblinear(G;S;Y;XX;Z)

Returns a model of a given data set using one of the 10 supported underlying algorithms, which include logistic regression as well as both support vector classification and regression. (Available as of prod-9)

Function type

Vector only

Syntax

g_liblinear(G;S;Y;XX;Z)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

Y integer or decimal A column name denoting the dependent variable

For a classification or logistic regression analysis, this is a column of labels.

For a support vector regression analysis, this is a column of continuous data.

XX integer or decimal A space- or comma-separated list of column names denoting the independent variable(s)

If you wish to include a bias, the first element of XX must be the special value 1 for the constant (intercept) term in the linear model.

Z text and decimal A list of key-value pairs that modify the underlying method

For example: 'solver_type' 'L1R_L2LOSS_SVC'

The options you may specify for the Z parameter are:

'solver_type' 'value'

The value associated with solver_type determines the underlying algorithm used by g_liblinear and may be one of the following:

L2R_LR L2-regularized logistic regression (primal)
L2R_L2LOSS_SVC_DUAL L2-regularized L2-loss support vector classification (dual)
L2R_L2LOSS_SVC L2-regularized L2-loss support vector classification (primal)
L2R_L1LOSS_SVC_DUAL L2-regularized L1-loss support vector classification (dual)
L1R_L2LOSS_SVC L1-regularized L2-loss support vector classification
L1R_LR L1-regularized logistic regression
L2R_LR_DUAL L2-regularized logistic regression (dual)
L2R_L2LOSS_SVR L2-regularized L2-loss support vector regression (primal)
L2R_L2LOSS_SVR_DUAL L2-regularized L2-loss support vector regression (dual)
L2R_L1LOSS_SVR_DUAL L2-regularized L1-loss support vector regression (dual)
'violation_cost' 'value'

The violation_cost is a parameter that allows one to trade off training error vs. model complexity. A small value for violation_cost will increase the number of training errors, while a large violation_cost will be extremely strict. If it is too small, there may be underfitting.

The default is 1.0.

'sensitivity' 'value'

The sensitivity value has an effect on the smoothness of the SVM’s response, and it affects the number of support vectors, so both the complexity and the generalization capability of the network depend on its value. Optimal setting of the sensitivity value requires the knowledge of noise level.

The default is 0.1.

'stopping_crit' 'value'

The value associated with stopping_crit determines convergence criteria. At each iteration in the algorithm, the change in the deviance is computed and compared to this numerical value. If the relative change is smaller, the model has converged. The lower the value, the more accurate the solution; however, it may take longer, and if the value is too small, it may never converge.

The default is 0.1.

'na_to_zero' 'value'

When the value associated with na_to_zero is set to 1, N/A values in the result will be set to 0.

The default is 0.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_liblinear computes a fast, large-scale classification or regression, according to the method specified by the Z parameter.

g_liblinear supports 10 different types of underlying algorithms, including three types of logistic regression, four types of support vector classification, and three types of support vector regression.

Note: In a logistic regression, g_liblinear may be much slower if there is significant multicollinearity in the data (i.e., if two or more of the independent variables XX are nearly perfectly correlated with each other).
The model that g_liblinear returns can be used as an argument to the following functions:
score(XX;M;Z)
Score data points when g_liblinear is used to train a continuous regression model or a logistic regression
Valid solver types include:
  • L2R_LR
  • L1R_LR
  • L2R_LR_DUAL
  • L2R_L2LOSS_SVR
  • L2R_L2LOSS_SVR_DUAL
  • L2R_L1LOSS_SVR_DUAL
Note: Do not include the bias (intercept) in the XX parameter to score(XX;M;Z), even if it was specified in the XX parameter to g_liblinear(G;S;Y;XX;Z).
classify(XX;M;Z)
Classify data points when g_liblinear is used to train a discrete model
Valid solver types include:
  • L2R_L2LOSS_SVC_DUAL
  • L2R_L2LOSS_SVC
  • L2R_L1LOSS_SVC_DUAL
  • L1R_L2LOSS_SVC
param(M;P;I)
Extract the model parameters
Assuming M is the column containing the result of g_liblinear, use the following function calls to obtain the desired information:
param(M;'solver_type';)
Algorithm used to train the model:
0 L2-regularized logistic regression (primal)
1 L2-regularized L2-loss support vector classification (dual)
2 L2-regularized L2-loss support vector classification (primal)
3 L2-regularized L1-loss support vector classification (dual)
5 L1-regularized L2-loss support vector classification
6 L1-regularized logistic regression
7 L2-regularized logistic regression (dual)
11 L2-regularized L2-loss support vector regression (primal)
12 L2-regularized L2-loss support vector regression (dual)
13 L2-regularized L1-loss support vector regression (dual)
param(M;'violation_cost';)
Penalty factor
param(M;'sensitivity';)
Sensitivity
param(M;'stopping_crit';)
Stopping criterion
param(M;'nr_weight';)
Number of weight multipliers assigned
param(M;'weight_label';N)
Label of the weight multiplier
param(M;'weight';N)
Weight multiplier
param(M;'nr_class';)
Number of classes
param(M;'nr_feature';)
Number of features
param(M;'model_weights';N)
Array of all the model weights

The array will have length nr_class * nr_feature (including the bias feature, if the bias is positive).

If there are two classes, however, the array will have length nr_feature (including the bias feature).

param(M;'labels';N)
Expressed list of all the different class types
param(M;'bias';)
Intercept value

If no bias was specified, the result is -1.

param(M;'max_iter';)
Maximum number of iterations reached

The result is 1 if the maximum number of iterations was reached, 0 otherwise.

param(M;'support_vectors';N)
Number of support vectors

If solver_type is 1 or 3, the result is the number of support vectors at each subproblem.

If solver_type is 7, the result is a list of zeros.

If solver_type is 0 or 2, the result is a list of N/A values.

If solver_type is 11, 12, or 13, the results are meaningless.

param(M;'num_iterations';N)
Number of iterations

If solver_type is 1, 3, 5, 6, or 7, the result is the number of iterations at each subproblem.

If solver_type is 0 or 2, the result is N/A.

param(M;'valcnt';)
Count of valid observations in the data

Example

The following example uses g_liblinear(G;S;Y;XX;Z) to train a model using support vector classification to determine whether or not a client at a particular banking institution subscribed for a term deposit. This example uses the information in the Bank Marketing data set (pub.demo.mleg.uci.bankmarketing).

<base table="pub.demo.mleg.uci.bankmarketing"/>
<library>
  <block name="enum" fields="job,marital,education,default,housing,
loan,contact,month,day_of_week,poutcome">
    <foreach enum="{@fields}">
      <willbe name="{@enum}_enum" value="g_enum(;;;{@enum})"/>
    </foreach>
  </block>
</library>
<insert block="enum"/>
<willbe name="label" value="g_enum(;;;y)"/>
<willbe name="model" value="g_liblinear(;;label;1 job_enum marital_enum 
        education_enum default_enum housing_enum loan_enum contact_enum 
        month_enum day_of_week_enum poutcome_enum age duration campaign 
        pdays previous empvarrate conspriceidx consconfidx euribor3m 
        nremployed; 'solver_type' 'L1R_L2LOSS_SVC')"/>
<willbe name="predictions" value="classify(job_enum marital_enum 
        education_enum default_enum housing_enum loan_enum contact_enum 
        month_enum day_of_week_enum poutcome_enum age duration campaign 
        pdays previous empvarrate conspriceidx consconfidx euribor3m 
        nremployed;model;)"/>
<willbe name="score" value="if(predictions=label;1;0)"/>
<willbe name="sum" value="g_sum(;;score)"/>
<willbe name="total" value="g_cnt(;)"/>
<willbe name="accuracy" value="(sum/total)*100"/>
<sel value="g_first1(;;)"/>
<colord cols="model,sum,total,accuracy"/>

Additional Information

Details about the liblinear library provided by the Machine Learning Group at National Taiwan University can be found at: LIBLINEAR -- A Library for Large Linear Classification.