g_info_woe(G;S;X;Y)

Returns the information theoretic value weight of evidence (WoE) of X provided Y. (Available as of version 10.42)

Function type

Vector only

Description

The weight of evidence (WoE) is defined as:

when the value Xi occurs at least twice in the column X. If the value Xi occurs exactly once, then info_WOE(i) is defined to be 0.

g_info_woe(G;S;X;Y) may be used for transforming a categorical or unordered column into a numerical ordered column.

Syntax

g_info_woe(G;S;X;Y)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

X any A column name

This column contains categorical or unordered data.

Y integer or decimal A column name

This column must only contain values of 0 or 1.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_info_woe(G;S;X;Y) returns a numeric value in which larger values indicate a greater chance of observing Y=1 and smaller values indicate a greater change of observing Y=0.

Example

The following example uses the function g_info_woe(G;S;X;Y) to transform the categorical columns job and education in the table pub.demo.mleg.uci.bankmarketing into the numerical columns iwoe_job and iwoe_education. These numerical columns can then be used by g_logreg(G;S;Y;XX;Z) and score(XX;M;Z).

<base table="pub.demo.mleg.uci.bankmarketing"/>
<willbe name="y01" value="y='yes'"/>
<willbe name="iwoe_job" value="g_info_woe(;;job;y01)" format="dec:5"/>
<willbe name="iwoe_education" 
 value="g_info_woe(;;education;y01)" format="dec:5"/>
<note>The iwoe columns have transformed text columns into numerical 
columns that can be used to model</note>
<willbe name="model" value="g_logreg(;;y01;1,iwoe_job,iwoe_education;)"/>
<willbe name="score" 
 value="score(1,iwoe_job,iwoe_education;model;)" format="dec:5"/>
<colord cols="job,education,iwoe_job,iwoe_education,score"/>