Chapter 3 Data transformation
While the main dataset was easy to get started with we had to create various new variables from the existing variable to uncover some interesting findings.
3.1 Change In Rank
To understand the impact of civilian misconduct cases on Officer Rank we created a Change In Rank metric using the rank_incident
(Rank at the time of Incident) and rank_now
(Officer’s rank when the data was fetched)
Created new variable
rank_after
which was calculated for each officer as a leadrank_incident
from the next complaint against the same officer.Created
nth_complaint
variable by officer based on the temporal ordering of complaints.Created
nth_complaint_disposition
variable by officer based on the temporal ordering of complaints and the board disposition.
unique_mos_id | board_disposition_top | year_received | rank_incident | rank_now | rank_after | nth_complaint | nth_complaint_disposition |
---|---|---|---|---|---|---|---|
2 | Exonerated | 2009 | Sergeant | Lieutenant | Sergeant | 1 | 1 |
2 | Exonerated | 2009 | Sergeant | Lieutenant | Sergeant | 2 | 2 |
2 | Exonerated | 2010 | Sergeant | Lieutenant | Sergeant | 3 | 3 |
2 | Substantiated | 2011 | Sergeant | Lieutenant | Sergeant | 4 | 1 |
2 | Unsubstantiated | 2011 | Sergeant | Lieutenant | Sergeant | 5 | 1 |
2 | Unsubstantiated | 2012 | Sergeant | Lieutenant | Sergeant | 6 | 2 |
2 | Unsubstantiated | 2012 | Sergeant | Lieutenant | Sergeant | 7 | 3 |
2 | Unsubstantiated | 2012 | Sergeant | Lieutenant | Sergeant | 8 | 4 |
2 | Exonerated | 2013 | Sergeant | Lieutenant | Sergeant | 9 | 4 |
2 | Unsubstantiated | 2013 | Sergeant | Lieutenant | Lieutenant | 10 | 5 |
- Created ordered rank columns with lower the Rank number higher the Rank as follows
order_df <- data.frame(rank = c('1','2','3','4','5','6','7','8'),
label = c('Chiefs and other ranks','Inspector','Deputy Inspector','Captain','Lieutenant','Sergeant','Detective','Police Officer'))
- Created
change_in_rank
andchange_in_rank_top
Columns by differencingrank_incident
andrank_after
as follows
new_df <- merge(clean_data_exp, order_df, by.x = "rank_after", by.y = "label", all.x = TRUE, all.y = FALSE) %>%
rename(rank_after_n = rank) %>%
merge(order_df, by.x = "rank_incident", by.y = "label", all.x = TRUE, all.y = FALSE) %>%
rename(rank_incident_n = rank) %>%
merge(order_df, by.x = "rank_now", by.y = "label", all.x = TRUE, all.y = FALSE) %>%
rename(rank_now_n = rank) %>%
mutate(change_in_rank =-as.integer(rank_after_n)+as.integer(rank_incident_n))%>%
arrange(unique_mos_id, year_month_received)
new_df$change_in_rank_top = as.factor(sign(new_df$change_in_rank))
3.2 Interactive Plot
- Created condensed .csv file for the D3 Interactive Plot
raw_data1 <- raw_data%>%
mutate(complainant_ethnicity=replace(complainant_ethnicity, complainant_ethnicity=="", "Unknown"))
complainant_ethnicity_by_precinct <- raw_data1 %>%
group_by(precinct, complainant_ethnicity) %>%
summarise(count=n(), .groups="drop")
complainant_ethnicity_by_precinct1 <- complainant_ethnicity_by_precinct %>%
pivot_wider(names_from = complainant_ethnicity, values_from = count, values_fill = 0)
write.csv(complainant_ethnicity_by_precinct1 , file = "~/group24-EDAV-Project/barchart.csv", row.names=FALSE