Chicago School of Data Days - Sept. 20, 2014
David Eads @eads | NPR Visuals @nprviz
Based on an NPR Visuals blog post
Slides: http://recoveredfactory.net/data-workflow
Presentation source: github.com/eads/data-workflow | Made with Tarbell
MRAPs And Bayonets: What We Know About The Pentagon's 1033 Program
by Arezou Rezvani, Jessica Pupovac, David Eads and Tyler Fisher
#!/bin/bash
echo 'IMPORT DATA'
echo '-----------'
./import.sh
echo 'CREATE SUMMARY FILES'
echo '--------------------'
./summarize.sh
echo 'EXPORT PROCESSED DATA'
echo '---------------------'
./export.sh
# NSN parsing example
if header == 'nsn':
nsn_parts = cell_value.split('-')
row_dict['federal_supply_class'] = nsn_parts[0]
# Example import statement
echo "Import FIPS crosswalk"
psql leso -c "CREATE TABLE fips (
county varchar,
state varchar,
fips varchar
);"
psql leso -c "COPY fips FROM '`pwd`/src/fips_crosswalk.csv' DELIMITER ',' CSV HEADER;"
in2csv
command to extract each sheetcsvstack
command to combine the sheets and add a grouping columncsvcut
command to remove a pointless "row number" column
echo "Import agency.csv"
in2csv --sheet "State Agencies" data.xlsx > state_agencies.csv
in2csv --sheet "Federal Agencies" data.xlsx > federal_agencies.csv
in2csv --sheet "Tribal Agencies" data.xlsx > tribal_agencies.csv
csvstack -n "agency_type" -g "state,federal,tribal"
state_agencies.csv
federal_agencies.csv
tribal_agencies.csv | csvcut -c "1,3,4" > agencies.csv
select
c.full_name,
c.code as federal_supply_class,
sum((d.quantity * d.acquisition_cost)) as total_cost
from data as d
join codes as c on d.federal_supply_class = c.code
group by c.full_name, c.code
order by c.full_name
select
c.name,
sum(quantity * acquisition_cost) as total_cost,
extract(year from ship_date) as year
from data as d
join codes as c on d.federal_supply_category = c.code
group by c.name, year
order by year desc
Rand Paul: "Mr. Estevez, in the NPR investigation of 1033 program they list that 12,000 bayonets have been given out. What purpose are bayonets being given out for?"