Chapter 4 Missing values
4.1 Missing Values by Column
## unique_mos_id first_name last_name
## 0 0 0
## command_now shield_no complaint_id
## 0 0 0
## month_received year_received month_closed
## 0 0 0
## year_closed command_at_incident rank_abbrev_incident
## 0 1544 0
## rank_abbrev_now rank_now rank_incident
## 0 0 0
## mos_ethnicity mos_gender mos_age_incident
## 0 0 0
## complainant_ethnicity complainant_gender complainant_age_incident
## 4464 4195 4812
## fado_type allegation precinct
## 0 1 24
## contact_reason outcome_description board_disposition
## 199 56 0
Amongst the columns that were used for analysis The values were missing only for complainant_ethnicity
, complainant_gender
, complainant_age_incident
, precinct
## NOTE: In the following pairs of variables, the missingness pattern of the second is a subset of the first.
## Please verify whether they are in fact logically distinct variables.
## [,1] [,2]
## [1,] "command_at_incident" "allegation"
## [2,] "complainant_ethnicity" "allegation"
## [3,] "complainant_gender" "allegation"
## [4,] "complainant_age_incident" "allegation"
The plot only displays the rows that had atleast 1 missing value. It is clear that the missing values for Age, Gender, Ethnicity occur together for most instances. This suggests that it might have been due to the unavailability of such information for the complainant, probably due to inconsistent book keeping.
The number of rows that had missing values were very less as compared to the total number of rows. The data was transformed such that all the Missing values were replaced by the label “Not Known”