g_info_iv(G;S;X;Y)

Returns the information value (IV) of X provided Y. (Available as of version 10.42)

Function type

Vector only

Description

The information value (IV) is defined as:

where Prob(X | Y = 1) is the probability of X, given that Y=1. Similarly for Prob(X | Y = 0).

g_info_iv(G;S;X;Y) provides both a metric for determining absolute strength of predicting Y with X.

Syntax

g_info_iv(G;S;X;Y)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

X any A column name

This column contains categorical or unordered data.

Y integer or decimal A column name

This column must only contain values of 0 or 1.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_info_iv(G;S;X;Y) returns a numeric value greater than or equal to 0.

The following table provides some heuristics for evaluating the value returned by g_info_iv(G;S;X;Y):

Table 1. Heuristics for evaluating the IV
Value Strength
less than 0.02 unpredictive
between 0.02 to 0.1 weak
between 0.1 to 0.3 medium strength
greater than 0.3 strong

g_info_iv(G;S;X;Y) may also be used to compare the IVs of different columns relative to Y for feature selection.

Example

The following example uses g_info_iv(G;S;X;Y) and g_info_woe(G;S;X;Y) to calculate the information value (IV) and information theoretic value weight of evidence (WoE) for the columns job, marital, education, default, housing, and loan in the table pub.demo.mleg.uci.bankmarketing. The columns that have an IV greater than 0.02 are then specified to g_logreg(G;S;Y;XX;Z) and score(XX;M;Z) using their corresponding WoE columns.

<base table="pub.demo.mleg.uci.bankmarketing"/>
<willbe name="y01" value="y='yes'"/>
<foreach var="job,marital,education,default,housing,loan">
  <willbe name="iv_{@var}" value="g_info_iv(;;{@var};y01)" format="dec:5"/>
  <willbe name="iw_{@var}" value="g_info_woe(;;{@var};y01)" format="dec:5"/>
</foreach>
<colord cols="y01,iv_*"/>
<note>For this example, only those columns with an IV value greater  
than 0.02 value are specified to g_logreg and score.</note>
<willbe name="model" format="dec:5"
 value="g_logreg(;;y01;1,iw_job,iw_marital,iw_education,iw_default;)"/>
<willbe name="score" format="dec:5" 
 value="score(1,iw_job,iw_marital,iw_education,iw_default;model;)"/>
<colord cols="iw_job,iw_marital,iw_education,iw_default,score"/>