Chapter 4 Missing values

4.1 Missing Values by Column

##            unique_mos_id               first_name                last_name 
##                        0                        0                        0 
##              command_now                shield_no             complaint_id 
##                        0                        0                        0 
##           month_received            year_received             month_closed 
##                        0                        0                        0 
##              year_closed      command_at_incident     rank_abbrev_incident 
##                        0                     1544                        0 
##          rank_abbrev_now                 rank_now            rank_incident 
##                        0                        0                        0 
##            mos_ethnicity               mos_gender         mos_age_incident 
##                        0                        0                        0 
##    complainant_ethnicity       complainant_gender complainant_age_incident 
##                     4464                     4195                     4812 
##                fado_type               allegation                 precinct 
##                        0                        1                       24 
##           contact_reason      outcome_description        board_disposition 
##                      199                       56                        0

Amongst the columns that were used for analysis The values were missing only for complainant_ethnicity, complainant_gender, complainant_age_incident, precinct

## NOTE: In the following pairs of variables, the missingness pattern of the second is a subset of the first.
##  Please verify whether they are in fact logically distinct variables.
##      [,1]                       [,2]        
## [1,] "command_at_incident"      "allegation"
## [2,] "complainant_ethnicity"    "allegation"
## [3,] "complainant_gender"       "allegation"
## [4,] "complainant_age_incident" "allegation"

The plot only displays the rows that had atleast 1 missing value. It is clear that the missing values for Age, Gender, Ethnicity occur together for most instances. This suggests that it might have been due to the unavailability of such information for the complainant, probably due to inconsistent book keeping.

The number of rows that had missing values were very less as compared to the total number of rows. The data was transformed such that all the Missing values were replaced by the label “Not Known”