Census Income Data Set
This data set was obtained from the UC Irvine Machine Learning Repository and contains weighted census data extracted from the 1994 and 1995 Current Population Surveys conducted by the U.S. Census Bureau.
Source
This data set was obtained by downloading census-income.data (contained in census-income.data.gz) from http://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD).
The original table contains 199,523 rows and 42 columns. An additional column,
edu_year
, has been added to aid in the analysis (see Input Variables).
The path to this data set is pub.demo.mleg.uci.censusincome.
Input Variables
There are 42 columns in the table that provide demographic and employment-related information.
Column Name | Description | Type |
---|---|---|
age |
Age of the worker | Numeric |
class_worker |
Class of worker | Categorial:
|
det_ind_code |
Industry code | Numeric |
det_occ_code |
Occupation code | Numeric |
education |
Level of education | Categorial:
|
wage_per_hour |
Wage per hour | Numeric |
hs_college |
Enrolled in educational institution last week | Categorial:
|
marital_stat |
Marital status | Categorial:
|
major_ind_code |
Major industry code | Categorial:
|
major_occ_code |
Major occupation code | Categorial:
|
race |
Race | Categorial:
|
hisp_origin |
Hispanic origin | Categorial:
|
sex |
Sex | Categorial:
|
union_member |
Member of a labor union | Categorial:
|
unemp_reason |
Reason for unemployment | Categorial:
|
full_or_part_emp |
Full- or part-time employment status | Categorial:
|
capital_gains |
Capital gains | Numeric |
capital_losses |
Capital losses | Numeric |
stock_dividends |
Dividends from stocks | Numeric |
tax_filer_stat |
Tax filer status | Categorial:
|
region_prev_res |
Region of previous residence | Categorial:
|
state_prev_res |
State of previous residence | Categorial:
|
det_hh_fam_stat |
Detailed household and family status | Categorial:
|
det_hh_summ |
Detailed household summary in household | Categorial:
|
mig_chg_msa |
Migration code - change in MSA | Categorial:
|
mig_chg_reg |
Migration code - change in region | Categorial:
|
mig_move_reg |
Migration code - move within region | Categorial:
|
mig_same |
Live in this house one year ago | Categorial:
|
mig_prev_sunbelt |
Migration - previous residence in sunbelt | Categorial:
|
num_emp |
Number of persons that worked for employer | Numeric |
fam_under_18 |
Family members under 18 | Categorial:
|
country_father |
Country of birth father | Categorial:
|
country_mother |
Country of birth mother | Categorial:
|
country_self |
Country of birth | Categorial:
|
citizenship |
Citizenship | Categorial:
|
own_or_self |
Own business or self-employed? | Numeric |
vet_question |
Fill included questionnaire for Veterans Administration | Categorial:
|
vet_benefits |
Veterans benefits | Numeric |
weeks_worked |
Weeks worked in the year | Numeric |
year |
Year of survey | Numeric |
income_50k |
Income less than or greater than $50,000 | Categorial:
|
edu_year |
Number of years of education | Numeric |
edu_year
column is derived from the education
column according to the following mapping:education |
edu_year |
---|---|
Children | 0 |
Less than 1st grade | 0.5 |
1st 2nd 3rd or 4th grade | 2.5 |
5th or 6th grade | 5.5 |
7th and 8th grade | 7.5 |
9th grade | 9 |
10th grade | 10 |
11th grade | 11 |
12th grade no diploma | 12 |
High school graduate | 12 |
Some college but no degree | 14 |
Associates degree-academic program | 14 |
Associates degree-occup /vocational | 14 |
Bachelors degree(BA AB BS) | 16 |
Masters degree(MA MS MEng MEd MSW MBA) | 18 |
Prof school degree (MD DDS DVM LLB JD) | 20 |
Doctorate degree(PhD EdD) | 21 |
Weight Variable
There is one column in the table that corresponds to the weight value.
Column Name | Description | Type |
---|---|---|
instance_weight |
Indicates the number of people in the population that each record represents due to stratified sampling | Numeric |
Output Variable
There is one column in the table that corresponds to our target value.
Column Name | Description | Type |
---|---|---|
wage_per_hour |
Wage per hour (multiplied by 100) For example, a value of 1200 would correspond to $12.00/hr. |
Numeric |