#Dataset Loading and Cleaning
We started by loading the full dataset into the table df_311. We then initialized a new dataset, clean_311, with a randomly selected 50000 rows from df_311. We have removed several variables from the original dataset, including some locational data (latitude and longitude, district information, etc) as well as columns with too many missing values. We may decide to analyze some of these variables later, but for our initial analysis we choose to omit them and focus on other factors.
Three columns in the dataset contained date-time information, so we have modified these columns into
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.1.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
df_311 <- read_csv(here::here("dataset-ignore", "311_data.csv"))
## Rows: 273951 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (20): ontime, case_status, closure_reason, case_title, subject, reason,...
## dbl (6): case_enquiry_id, fire_district, city_council_district, neighborho...
## dttm (3): open_dt, target_dt, closed_dt
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
clean_311 <- df_311 %>% select(-c(case_enquiry_id, closure_reason, latitude, longitude, closedphoto, submittedphoto, pwd_district, city_council_district, neighborhood_services_district, ward, precinct)) %>% sample_n(50000)
clean_311$open_dt <- as.Date(clean_311$open_dt)
clean_311$closed_dt <- as.Date(clean_311$closed_dt)
clean_311$target_dt <- as.Date(clean_311$target_dt)
clean_311$duration <- difftime(clean_311$closed_dt, clean_311$open_dt, units = "days")
#Exploratory Data Analysis
We are beginning our exploratory data analysis by looking at two location variables: location_zipcode and neighborhood. We are interested in identifying the extent to which our observations vary by location. We find that there are 30 unique zip codes and 23 unique neighborhoods, and the neighborhood variable has significantly fewer NAs than location_zipcode. After exploring our key location variables, we analyze the variables subject, reason, and type, because these variables contain the bulk of the case-specific information for each observation. We find that there are 10 unique subjects, 43 unique reasons, and 158 unique types. We show two bar plots, one for the subject variable and one for the reason variable, to better visualize the types and relative frequencies of different subjects and reasons for service requests. We find that, for our subset of the data, the most common subjects of request are public works and transportation services. There is more diversity in reasons for service requests, but we find that most reasons correspond only to one subject. We also looked into the range of values in duration and found that most cases are completed within one week of being opened, but some outlier cases last up to hundreds of days.
clean_311 %>% ggplot() + geom_bar(aes(x=location_zipcode, fill = ontime)) + coord_flip()

clean_311 %>% ggplot() + geom_bar(aes(x = neighborhood, fill = ontime)) + coord_flip()

#EDA for reasons for request, later want to compare this against locational and demographic information
unique(clean_311$subject)
## [1] "Mayor's 24 Hour Hotline" "Inspectional Services"
## [3] "Property Management" "Public Works Department"
## [5] "Transportation - Traffic Division" "Parks & Recreation Department"
## [7] "Boston Water & Sewer Commission" "Animal Control"
## [9] "Boston Police Department" "Neighborhood Services"
## [11] "Consumer Affairs & Licensing"
unique(clean_311$reason)
## [1] "Needle Program" "Building"
## [3] "Health" "Graffiti"
## [5] "Street Cleaning" "Enforcement & Abandoned Vehicles"
## [7] "Highway Maintenance" "Sanitation"
## [9] "Trees" "Park Maintenance & Safety"
## [11] "Code Enforcement" "Street Lights"
## [13] "Housing" "Recycling"
## [15] "Signs & Signals" "Generic Noise Disturbance"
## [17] "Catchbasin" "Employee & General Comments"
## [19] "Animal Issues" "Abandoned Bicycle"
## [21] "Environmental Services" "Traffic Management & Engineering"
## [23] "Noise Disturbance" "Sidewalk Cover / Manhole"
## [25] "Administrative & General Requests" "Weights and Measures"
## [27] "Notification" "Pothole"
## [29] "Neighborhood Services Issues" "Fire Hydrant"
## [31] "Operations" "Air Pollution Control"
## [33] "Bridge Maintenance" "Programs"
## [35] "General Request" "Massport"
## [37] "Office of The Parking Clerk" "Cemetery"
## [39] "Boston Bikes" "Alert Boston"
## [41] "Valet" "Consumer Affairs Issues"
## [43] "Billing" "Parking Complaints"
unique(clean_311$type)
## [1] "Needle Pickup"
## [2] "Egress"
## [3] "Unsanitary Conditions - Establishment"
## [4] "Graffiti Removal"
## [5] "CE Collection"
## [6] "Requests for Street Cleaning"
## [7] "Parking Enforcement"
## [8] "PWD Graffiti"
## [9] "Schedule a Bulk Item Pickup"
## [10] "Tree Maintenance Requests"
## [11] "Ground Maintenance"
## [12] "Improper Storage of Trash (Barrels)"
## [13] "Missed Trash/Recycling/Yard Waste/Bulk Item"
## [14] "Street Light Outages"
## [15] "Request for Pothole Repair"
## [16] "Heat - Excessive Insufficient"
## [17] "Request for Recycling Cart"
## [18] "Sign Repair"
## [19] "Contractor Complaints"
## [20] "Unsanitary Conditions - Employees"
## [21] "Building Inspection Request"
## [22] "Contractors Complaint"
## [23] "Abandoned Vehicles"
## [24] "Knockdown Replacement"
## [25] "Recycling Cart Return"
## [26] "Work Hours-Loud Noise Complaints"
## [27] "Schedule a Bulk Item Pickup SS"
## [28] "Undefined Noise Disturbance"
## [29] "Catchbasin"
## [30] "Pest Infestation - Residential"
## [31] "Pick up Dead Animal"
## [32] "General Comments For a Program or Policy"
## [33] "Unsatisfactory Living Conditions"
## [34] "Equipment Repair"
## [35] "Animal Generic Request"
## [36] "Traffic Signal Inspection"
## [37] "Sidewalk Repair (Make Safe)"
## [38] "Abandoned Bicycle"
## [39] "Work w/out Permit"
## [40] "Unsafe Dangerous Conditions"
## [41] "Animal Lost"
## [42] "Chronic Dampness/Mold"
## [43] "General Lighting Request"
## [44] "Pigeon Infestation"
## [45] "Poor Conditions of Property"
## [46] "Sticker Request"
## [47] "Request for Snow Plowing (Emergency Responder)"
## [48] "Illegal Posting of Signs"
## [49] "Unsatisfactory Utilities - Electrical Plumbing"
## [50] "Requests for Traffic Signal Studies or Reviews"
## [51] "Rodent Activity"
## [52] "Protection of Adjoining Property"
## [53] "Loud Parties/Music/People"
## [54] "Sidewalk Cover / Manhole"
## [55] "New Sign Crosswalk or Pavement Marking"
## [56] "Illegal Dumping"
## [57] "Illegal Occupancy"
## [58] "New Tree Requests"
## [59] "Sidewalk Repair"
## [60] "Empty Litter Basket"
## [61] "Unshoveled Sidewalk"
## [62] "Tree in Park"
## [63] "Missing Sign"
## [64] "Bed Bugs"
## [65] "Tree Emergencies"
## [66] "Public Works General Request"
## [67] "Maintenance Complaint - Residential"
## [68] "Parking on Front/Back Yards (Illegal Parking)"
## [69] "Zoning"
## [70] "Request for Snow Plowing"
## [71] "Parks Lighting/Electrical Issues"
## [72] "Scanning Overcharge"
## [73] "Notification"
## [74] "Working Beyond Hours"
## [75] "Automotive Noise Disturbance"
## [76] "Cross Metering - Sub-Metering"
## [77] "BWSC Pothole"
## [78] "Dumpster & Loading Noise Disturbances"
## [79] "Roadway Repair"
## [80] "Fire Hydrant"
## [81] "Space Savers"
## [82] "Parks General Request"
## [83] "Park Improvement Requests"
## [84] "Transportation General Request"
## [85] "Street Light Knock Downs"
## [86] "Exceeding Terms of Permit"
## [87] "Mice Infestation - Residential"
## [88] "Request for Litter Basket Installation"
## [89] "Short Term Rental"
## [90] "Electrical"
## [91] "Utility Casting Repair"
## [92] "Plumbing"
## [93] "No Utilities - Food Establishment - Flood"
## [94] "Illegal Auto Body Shop"
## [95] "Pavement Marking Inspection"
## [96] "Maintenance - Homeowner"
## [97] "Rooftop & Mechanical Disturbances"
## [98] "Food Alert - Confirmed"
## [99] "Sewage/Septic Back-Up"
## [100] "Bridge Maintenance"
## [101] "Heat/Fuel Assistance"
## [102] "Litter Basket Maintenance"
## [103] "Big Buildings Online Request"
## [104] "General Comments For An Employee"
## [105] "No-Tow Complaint Confirmation"
## [106] "Overcrowding"
## [107] "StreetLight Pole WO"
## [108] "No Utilities Residential - Electricity"
## [109] "Student Move-in Issues"
## [110] "Pole Compliance"
## [111] "Misc. Snow Complaint"
## [112] "Major System Failure"
## [113] "Animal Noise Disturbances"
## [114] "BWSC General Request"
## [115] "Overflowing or Un-kept Dumpster"
## [116] "Trash on Vacant Lot"
## [117] "Abandoned Building"
## [118] "Animal Found"
## [119] "Product Short Measure"
## [120] "Construction Debris"
## [121] "Aircraft Noise Disturbance"
## [122] "Unsanitary Conditions - Food"
## [123] "Pavement Marking Maintenance"
## [124] "Rental Unit Delivery Conditions"
## [125] "General Traffic Engineering Request"
## [126] "Phone Bank Service Inquiry"
## [127] "Traffic Signal Repair"
## [128] "No Utilities - Food Establishment - Sewer"
## [129] "No Utilities - Food Establishment - Water"
## [130] "Upgrade Existing Lighting"
## [131] "Parking Meter Repairs"
## [132] "Occupying W/Out A Valid CO/CI"
## [133] "Illegal Rooming House"
## [134] "Pickup/Clear Conduit"
## [135] "City/State Snow Issues"
## [136] "No Utilities Residential - Water"
## [137] "No Utilities Residential - Gas"
## [138] "Poor Ventilation"
## [139] "Install New Lighting"
## [140] "Cemetery Maintenance Request"
## [141] "Squalid Living Conditions"
## [142] "Food Alert - Unconfirmed"
## [143] "Bicycle Issues"
## [144] "Illegal Vending"
## [145] "Item Price Missing"
## [146] "Fire in Food Establishment"
## [147] "Alert Boston"
## [148] "Mosquitoes (West Nile)"
## [149] "Utility Call-In"
## [150] "Carbon Monoxide"
## [151] "Valet Parking Problems"
## [152] "Public Events Noise Disturbances"
## [153] "Planting"
## [154] "Requests for Directional or Roadway Changes"
## [155] "Rat Bite"
## [156] "Mechanical"
## [157] "Billing Complaint"
## [158] "Street Light Longterm Repair"
## [159] "Private Parking Lot Complaints"
ggplot(clean_311) + geom_bar(aes(subject, fill = neighborhood), na.rm = TRUE) + coord_flip() + theme(legend.position = "none")

ggplot(clean_311) + geom_bar(aes(reason, fill = subject), na.rm = TRUE) + coord_flip()

#EDA for duration of requests
ggplot(clean_311) + geom_histogram(aes(duration), na.rm = TRUE, bins = 100)
## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.

mean(clean_311$duration < 7, na.rm = TRUE)
## [1] 0.8427656