Skip to content
Snippets Groups Projects
Commit d0822726 authored by PengnanZ's avatar PengnanZ Committed by Colbry, Dirk
Browse files

Add files via upload

parent cc232a0e
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:7a42b048 tags:
# U.S. Census Data Tutorial
%% Cell type:code id:319735dc tags:
``` python
#!pip install censusdata
```
%% Cell type:markdown id:7baf9942 tags:
If censusdata package was not in your enviroment, make sure to uncommond above line to pip it.
Reference of the [CensusData library](https://jtleider.github.io/censusdata/index.html)
%% Cell type:code id:83acab08 tags:
``` python
import pandas as pd
import re
import numpy as np
import censusdata
```
%% Cell type:markdown id:04dc2a46 tags:
### Main Methods
[CensusData API Documentation](https://jtleider.github.io/censusdata/api.html)
%% Cell type:code id:333e3815 tags:
``` python
# Search for ACS 2015-2019 5-year estimate variables where the concept
# includes the text 'population'.
sample = censusdata.search('acs5', 2019, 'concept',
lambda value: re.search('population', value, re.IGNORECASE))
```
%% Cell type:markdown id:cb0b1ea5 tags:
**Parameters:**
* src (str) – Census data source: ```‘acs1’``` for **ACS 1-year estimates**, ```‘acs5’``` for **ACS 5-year estimates**, ```‘acs3’``` for **ACS 3-year estimates**, ```‘acsse’``` for **ACS 1-year supplemental estimates**, ```‘sf1’``` for **SF1 data**.
* year (int) – Year of data.
* field (str) – Field in which to search.
* criterion (str or function) – Search criterion. Either string to search for, or a function which will be passed the value of field and return True if a match and False otherwise.
* tabletype (str, optional) – Type of table from which variables are drawn (only applicable to ACS data). Options are ```‘detail’``` (detail tables), ```‘subject’``` (subject tables), ```‘profile’``` (data profile tables), ```‘cprofile’``` (comparison profile tables).
**Returns:**
List of 3-tuples containing variable names, concepts, and labels matching the search criterion.
**Return type:**
list
%% Cell type:code id:4984d1b7 tags:
``` python
print(len(sample))
```
%% Output
10765
%% Cell type:markdown id:ede1fed8 tags:
This would be the sample amount we get based on what we use to search. In this case, there are 10765 samples which are ACS 5-year estimates for 2019 include the text 'population'.
%% Cell type:code id:dbffaeca tags:
``` python
print(sample[0])
```
%% Output
('B01003_001E', 'TOTAL POPULATION', 'Estimate!!Total')
%% Cell type:markdown id:07ae53a2 tags:
Let's use the first sample file as an example. Based on the result from above, the first sample is called: 'B01003_001E', which is a total population table under the parent table B01003.
%% Cell type:markdown id:a97df25b tags:
After you know the parent table you're interested in you can use the ```printtable``` function to get a clean readout of all the subtables in order to check if there are other subtables we might interested about.
%% Cell type:code id:fe3d19e4 tags:
``` python
censusdata.printtable(censusdata.censustable('acs5', 2019, 'B01003'))
```
%% Output
Variable | Table | Label | Type
-------------------------------------------------------------------------------------------------------------------
B01003_001E | TOTAL POPULATION | !! Estimate Total | int
-------------------------------------------------------------------------------------------------------------------
%% Cell type:markdown id:60ac12ee tags:
### Data download
%% Cell type:markdown id:d74c7914 tags:
If you want download data based on some state, county etc. Start at **step 1**, if not start at **step 3**.
**Step 1** If you want to download the data for some States, you need to find the geography code for it. And function ```geographies``` is build for that
%% Cell type:code id:501fec9c tags:
``` python
states = censusdata.geographies(censusdata.censusgeo([('state', '*')]), 'acs5', 2019)
print(states['Michigan'])
```
%% Output
Summary level: 040, state:26
%% Cell type:markdown id:6de55a1a tags:
**Step 2** Also if you want it be county level you need do almost the same thing but by adding county after state. For example:
%% Cell type:code id:bb5ad44b tags:
``` python
counties = censusdata.geographies(censusdata.censusgeo([('state', '26'), ('county', '*')]), 'acs5', 2019)
print(counties['Wayne County, Michigan'])
```
%% Output
Summary level: 050, state:26> county:163
%% Cell type:markdown id:e610da56 tags:
**Step 3** Now, is time to download what you want. Example based on Michigan, Wayne County. If you don't have state and county code, leave that as ```'*'```.
%% Cell type:code id:d7dac8c6 tags:
``` python
data = censusdata.download('acs5', 2019, censusdata.censusgeo([('state', '26'),
('county', '163'),
('block group', '*')]),
['B01003_001E'])
```
%% Cell type:markdown id:d19c7015 tags:
And this is the length of the data we get.
%% Cell type:code id:44f953f0 tags:
``` python
len(data)
```
%% Output
1822
%% Cell type:markdown id:57c5df9b tags:
### Extra (data formating, slice)
This part are some extra step if you need, such as change the column name by using pandas, and slice it based on Census Tract by using ```census_cut``` in ```Help_Functions```.
%% Cell type:code id:1af89c80 tags:
``` python
column_name = ['TOTAL POPULATION']
data.columns = column_name
```
%% Cell type:code id:02d0bfed tags:
``` python
new_indices = []
for index in data.index.tolist():
new_indices.append(index)
data.index = new_indices
```
%% Cell type:code id:9a537ede tags:
``` python
data.head()
```
%% Output
TOTAL POPULATION
Block Group 0, Census Tract 9901, Wayne County,... 0
Block Group 3, Census Tract 5104, Wayne County,... 238
Block Group 5, Census Tract 5528, Wayne County,... 1546
Block Group 3, Census Tract 5014, Wayne County,... 757
Block Group 2, Census Tract 5044, Wayne County,... 427
%% Cell type:markdown id:54f9165e tags:
### ```census_cut``` usage
%% Cell type:code id:6f39eed2 tags:
``` python
from Help_Functions import census_cut
import re
```
%% Cell type:markdown id:229c79aa tags:
For example, we want the data for some areas based on Census Tracts are 5303, 5304, 5316, 5317
%% Cell type:code id:58749d91 tags:
``` python
Tracts = ['Census Tract 5303', 'Census Tract 5304','Census Tract 5316', 'Census Tract 5317']
```
%% Cell type:code id:336301fe tags:
``` python
df = census_cut(Tracts, data)
df
```
%% Output
TOTAL POPULATION
Block Group 2, Census Tract 5304, Wayne County,... 647
Block Group 1, Census Tract 5304, Wayne County,... 398
Block Group 3, Census Tract 5303, Wayne County,... 702
Block Group 1, Census Tract 5303, Wayne County,... 281
Block Group 2, Census Tract 5317, Wayne County,... 905
Block Group 1, Census Tract 5317, Wayne County,... 648
Block Group 2, Census Tract 5316, Wayne County,... 1094
Block Group 1, Census Tract 5316, Wayne County,... 761
Block Group 2, Census Tract 5303, Wayne County,... 329
# +
import pandas as pd
import numpy as np
import re
def census_cut(tracts, data):
'''
This function is use to cut based on Census Tract for data download by using censusdata package.
Parameters:
tracts: A list of string which are the Census Tract. Such as 'Census Tract 0000'.
data: Data download by using censusdata package.
Return:
result: A new set of data which only include the data based on Census Track.
'''
mask = []
for i in range(len(data.index)):
string = str(data.index[i])
check = True
for tract in tracts:
match = re.search(tract, string)
if match:
mask.append(True)
check = False
if check:
mask.append(False)
len(mask), len(data.index)
result = data[mask]
return result
# Censusdata_tutorial
This is the Tutorial for how to use censusdata package. There also include an extra ```Help_Functions.py``` file which has a help function use for help slice the data download by using cens
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment