Bank Marketing Data Set
This data set was obtained from the UC Irvine Machine Learning Repository and contains information related to a direct marketing campaign of a Portuguese banking institution and its attempts to get its clients to subscribe for a term deposit.
Source
This data set was obtained by downloading bank-additional-full.csv (contained in bank-additional.zip) from https://archive.ics.uci.edu/ml/datasets/Bank+Marketing.
The table contains 41,188 rows and 21 columns.
The path to this data set is pub.demo.mleg.uci.bankmarketing.
Input Variables
There are 20 columns in the table that provide information about each client, such as age, marital status, and education level. A subset of these are related to the last contact of the current campaign, such as the month and day of the week the last contact was made as well as the number of days since the client was last contacted in a previous campaign. There are 10 columns in the table that are categorial, meaning that they contain textual values that correspond to a particular category for a given variable.
Column Name | Description | Type |
---|---|---|
age |
Age of the client | Numeric |
job |
Client's occupation | Categorial:
|
marital |
Marital status | Categorial:
Note: divorced means divorced or widowed |
education |
Client's education level | Categorial:
|
default |
Indicates whether the client has credit in default | Categorial:
|
housing |
Indicates whether the client has a housing loan | Categorial:
|
loan |
Indicates whether the client as a personal loan | Categorial:
|
contact |
Type of contact communication | Categorial:
|
month |
Month that last contact was made | Categorial:
|
day_of_week |
Day that last contact was made | Categorial:
|
duration |
Duration of last contact in seconds | Numeric Note: This attribute highly affects the output target (e.g., if
duration =0 then
y =no). Yet, the duration is not known
before a call is performed. Also, after the end of the call, y
is obviously known. Thus, this input should only be included for benchmark
purposes and should be discarded if the intention is to have a realistic
predictive model. |
campaign |
Number of contacts performed during this campaign for this client (including last contact) | Numeric |
pdays |
Number of days since the client was last contacted in a previous campaign | Numeric Note: 999 means client was not previously contacted
|
previous |
Number of contacts performed before this campaign for this client | Numeric |
poutcome |
Outcome of the previous marketing campaign | Categorial:
|
empvarrate |
Employment variation rate (quarterly indicator) Note: This column was named
emp.var.rate in the original data
set. |
Numeric |
conspriceidx |
Consumer price index (monthly indicator) Note: This column was named
cons.price.idx in the original data
set. |
Numeric |
consconfidx |
Consumer confidence index (monthly indicator) Note: This column was named
cons.conf.idx in the original data
set. |
Numeric |
euribor3m |
Euribor 3-month rate (daily indicator) | Numeric |
nremployed |
Number of employees (quarterly indicator) Note: This column was named
nr.employed in the original data
set. |
Numeric |
Output Variable
There is one column in the table that corresponds to our target value.
Column Name | Description | Type |
---|---|---|
y |
Indicates whether the client has subscribed for a term deposit | Binary (yes or no ) |
Dummy Variables
Since we cannot use textual data in our analysis, categorial variables are coded as dummy variables. Each dummy variable represents one of the categories in the categorial columns.
Column Name | Description | Type |
---|---|---|
yy |
Client subscribes for a term
deposit
|
Boolean (0 or 1) |
hsng |
Client has a housing
loan
|
Boolean (0 or 1) |
h_unk |
Unknown if the client has a housing
loan
|
Boolean (0 or 1) |
def |
Client has credit in
default
|
Boolean (0 or 1) |
d_unk |
Unknown if the client has credit in
default
|
Boolean (0 or 1) |
loans |
Client has a personal
loan
|
Boolean (0 or 1) |
l_unk |
Client has a personal
loan
|
Boolean (0 or 1) |
nonxst |
Previous outcome of marketing campaign is
nonexistent
|
Boolean (0 or 1) |
succ |
Previous outcome of marketing campaign was a
success
|
Boolean (0 or 1) |
blue |
Client occupation: blue-collar
worker
|
Boolean (0 or 1) |
tech |
Client occupation:
technician
|
Boolean (0 or 1) |
j_unk |
Client occupation:
unknown
|
Boolean (0 or 1) |
svcs |
Client occupation:
services
|
Boolean (0 or 1) |
mgmt |
Client occupation:
management
|
Boolean (0 or 1) |
ret |
Client occupation:
retired
|
Boolean (0 or 1) |
entr |
Client occupation:
entrepreneur
|
Boolean (0 or 1) |
self |
Client occupation:
self-employed
|
Boolean (0 or 1) |
maid |
Client occupation:
housemaid
|
Boolean (0 or 1) |
unemp |
Client occupation:
unemployed
|
Boolean (0 or 1) |
stud |
Client occupation:
student
|
Boolean (0 or 1) |
marr |
Marital status:
married
|
Boolean (0 or 1) |
sgl |
Marital status:
single
|
Boolean (0 or 1) |
m_unk |
Marital status:
unknown
|
Boolean (0 or 1) |