g_info_iv(G;S;X;Y)
Returns the information value (IV) of X
provided Y. (Available as of version
10.42)
Function type
Vector only
Description
The information value (IV) is defined as:

where Prob(X | Y = 1) is the probability of X, given that Y=1. Similarly for Prob(X | Y = 0).
g_info_iv(G;S;X;Y) provides both a metric for determining absolute
strength of predicting Y with X.
Syntax
g_info_iv(G;S;X;Y)
Input
| Argument | Type | Description |
|---|---|---|
G |
any | A space- or comma-separated list of column names Rows are in the same group
if their values for all of the columns listed in If If any of the columns listed in |
S |
integer | The name of a column in which every row evaluates to a 1 or 0, which determines
whether or not that row is selected to be included in the calculation If
If any of the values in
|
X |
any | A column name This column contains categorical or unordered data. |
Y |
integer or decimal | A column name This column must only contain values of 0 or 1. |
Return Value
For every row in each group defined by G (and for those rows where
S=1, if specified),
g_info_iv(G;S;X;Y) returns a numeric value greater than or equal to
0.
The following table provides some heuristics for evaluating the value returned by
g_info_iv(G;S;X;Y):
| Value | Strength |
|---|---|
| less than 0.02 | unpredictive |
| between 0.02 to 0.1 | weak |
| between 0.1 to 0.3 | medium strength |
| greater than 0.3 | strong |
g_info_iv(G;S;X;Y) may also be used to compare the IVs of different columns
relative to Y for feature selection.
Example
The following example uses g_info_iv(G;S;X;Y) and
g_info_woe(G;S;X;Y) to calculate the information value (IV) and
information theoretic value weight of evidence (WoE) for the columns job,
marital, education, default,
housing, and loan in the table
pub.demo.mleg.uci.bankmarketing. The columns that have an IV greater
than 0.02 are then specified to g_logreg(G;S;Y;XX;Z) and
score(XX;M;Z) using their corresponding WoE columns.
<base table="pub.demo.mleg.uci.bankmarketing"/> <willbe name="y01" value="y='yes'"/> <foreach var="job,marital,education,default,housing,loan"> <willbe name="iv_{@var}" value="g_info_iv(;;{@var};y01)" format="dec:5"/> <willbe name="iw_{@var}" value="g_info_woe(;;{@var};y01)" format="dec:5"/> </foreach> <colord cols="y01,iv_*"/> <note>For this example, only those columns with an IV value greater than 0.02 value are specified to g_logreg and score.</note> <willbe name="model" format="dec:5" value="g_logreg(;;y01;1,iw_job,iw_marital,iw_education,iw_default;)"/> <willbe name="score" format="dec:5" value="score(1,iw_job,iw_marital,iw_education,iw_default;model;)"/> <colord cols="iw_job,iw_marital,iw_education,iw_default,score"/>

