Chapter 3 Data transformation

While the main dataset was easy to get started with we had to create various new variables from the existing variable to uncover some interesting findings.

3.1 Change In Rank

To understand the impact of civilian misconduct cases on Officer Rank we created a Change In Rank metric using the rank_incident (Rank at the time of Incident) and rank_now (Officer’s rank when the data was fetched)

  • Created new variable rank_after which was calculated for each officer as a lead rank_incident from the next complaint against the same officer.

  • Created nth_complaint variable by officer based on the temporal ordering of complaints.

  • Created nth_complaint_disposition variable by officer based on the temporal ordering of complaints and the board disposition.

unique_mos_id board_disposition_top year_received rank_incident rank_now rank_after nth_complaint nth_complaint_disposition
2 Exonerated 2009 Sergeant Lieutenant Sergeant 1 1
2 Exonerated 2009 Sergeant Lieutenant Sergeant 2 2
2 Exonerated 2010 Sergeant Lieutenant Sergeant 3 3
2 Substantiated 2011 Sergeant Lieutenant Sergeant 4 1
2 Unsubstantiated 2011 Sergeant Lieutenant Sergeant 5 1
2 Unsubstantiated 2012 Sergeant Lieutenant Sergeant 6 2
2 Unsubstantiated 2012 Sergeant Lieutenant Sergeant 7 3
2 Unsubstantiated 2012 Sergeant Lieutenant Sergeant 8 4
2 Exonerated 2013 Sergeant Lieutenant Sergeant 9 4
2 Unsubstantiated 2013 Sergeant Lieutenant Lieutenant 10 5
  • Created ordered rank columns with lower the Rank number higher the Rank as follows
order_df <- data.frame(rank = c('1','2','3','4','5','6','7','8'),
                       label = c('Chiefs and other ranks','Inspector','Deputy Inspector','Captain','Lieutenant','Sergeant','Detective','Police Officer'))
  • Created change_in_rank and change_in_rank_top Columns by differencing rank_incident and rank_after as follows
new_df <- merge(clean_data_exp, order_df, by.x = "rank_after", by.y = "label", all.x = TRUE, all.y = FALSE) %>%
  rename(rank_after_n = rank) %>% 
  merge(order_df, by.x = "rank_incident", by.y = "label", all.x = TRUE, all.y = FALSE) %>%
  rename(rank_incident_n = rank) %>% 
  merge(order_df, by.x = "rank_now", by.y = "label", all.x = TRUE, all.y = FALSE) %>%
  rename(rank_now_n = rank) %>% 
  mutate(change_in_rank =-as.integer(rank_after_n)+as.integer(rank_incident_n))%>% 
  arrange(unique_mos_id, year_month_received)
  
new_df$change_in_rank_top = as.factor(sign(new_df$change_in_rank))

3.2 Interactive Plot

  • Created condensed .csv file for the D3 Interactive Plot
raw_data1 <- raw_data%>%
  mutate(complainant_ethnicity=replace(complainant_ethnicity, complainant_ethnicity=="", "Unknown"))

complainant_ethnicity_by_precinct <- raw_data1 %>% 
  group_by(precinct, complainant_ethnicity) %>% 
  summarise(count=n(), .groups="drop")



complainant_ethnicity_by_precinct1 <- complainant_ethnicity_by_precinct %>% 
pivot_wider(names_from = complainant_ethnicity, values_from = count, values_fill = 0)

write.csv(complainant_ethnicity_by_precinct1 , file = "~/group24-EDAV-Project/barchart.csv", row.names=FALSE