g_info_iv(G;S;X;Y)
Returns the information value (IV) of X
provided Y
. (Available as of version
10.42)
Function type
Vector only
Description
The information value (IV) is defined as:
where Prob(X | Y = 1) is the probability of X, given that Y=1. Similarly for Prob(X | Y = 0).
g_info_iv(G;S;X;Y)
provides both a metric for determining absolute
strength of predicting Y
with X
.
Syntax
g_info_iv(G;S;X;Y)
Input
Argument | Type | Description |
---|---|---|
G |
any | A space- or comma-separated list of column names Rows are in the same group
if their values for all of the columns listed in If If any of the columns listed in |
S |
integer | The name of a column in which every row evaluates to a 1 or 0, which determines
whether or not that row is selected to be included in the calculation If
If any of the values in
|
X |
any | A column name This column contains categorical or unordered data. |
Y |
integer or decimal | A column name This column must only contain values of 0 or 1. |
Return Value
For every row in each group defined by G
(and for those rows where
S
=1, if specified),
g_info_iv(G;S;X;Y)
returns a numeric value greater than or equal to
0.
The following table provides some heuristics for evaluating the value returned by
g_info_iv(G;S;X;Y)
:
Value | Strength |
---|---|
less than 0.02 | unpredictive |
between 0.02 to 0.1 | weak |
between 0.1 to 0.3 | medium strength |
greater than 0.3 | strong |
g_info_iv(G;S;X;Y)
may also be used to compare the IVs of different columns
relative to Y
for feature selection.
Example
The following example uses g_info_iv(G;S;X;Y)
and
g_info_woe(G;S;X;Y)
to calculate the information value (IV) and
information theoretic value weight of evidence (WoE) for the columns job
,
marital
, education
, default
,
housing
, and loan
in the table
pub.demo.mleg.uci.bankmarketing. The columns that have an IV greater
than 0.02 are then specified to g_logreg(G;S;Y;XX;Z)
and
score(XX;M;Z)
using their corresponding WoE columns.
<base table="pub.demo.mleg.uci.bankmarketing"/> <willbe name="y01" value="y='yes'"/> <foreach var="job,marital,education,default,housing,loan"> <willbe name="iv_{@var}" value="g_info_iv(;;{@var};y01)" format="dec:5"/> <willbe name="iw_{@var}" value="g_info_woe(;;{@var};y01)" format="dec:5"/> </foreach> <colord cols="y01,iv_*"/> <note>For this example, only those columns with an IV value greater than 0.02 value are specified to g_logreg and score.</note> <willbe name="model" format="dec:5" value="g_logreg(;;y01;1,iw_job,iw_marital,iw_education,iw_default;)"/> <willbe name="score" format="dec:5" value="score(1,iw_job,iw_marital,iw_education,iw_default;model;)"/> <colord cols="iw_job,iw_marital,iw_education,iw_default,score"/>