Creating an Arrow IPC file from the NYC 311 data

Authors
Affiliations

Jun Yan

University of Connecticut

Douglas Bates

University of Wisconsin - Madison

Abstract

We show conversion of a CSV file containing data on 311 service calls in New York City during March, 2023 to the Arrow IPC (Inter-Process Communication) file format (also called the Feather V2 format), which can be read quickly and easily into several different data science environments. This conversion is done in Julia but it could equally well have been done in a different language, if that is more convenient for the person doing the conversion.

Reading the CSV file as a DataFrame

First we attach the packages to be used,

using Arrow, CSV, Dates, DataFrames

The first two lines of the CSV file are

open("./data/311_half_Mar_2023.csv", "r") do io # open the file
    println(readline(io))     # read and print the first line
    println(readline(io))     # read and print the next line
end;  # this open/do/end construction automatically closes the file when done
"Unique.Key","Created.Date","Closed.Date","Agency","Agency.Name","Complaint.Type","Descriptor","Location.Type","Incident.Zip","Incident.Address","Street.Name","Cross.Street.1","Cross.Street.2","Intersection.Street.1","Intersection.Street.2","Address.Type","City","Landmark","Facility.Type","Status","Due.Date","Resolution.Description","Resolution.Action.Updated.Date","Community.Board","BBL","Borough","X.Coordinate..State.Plane.","Y.Coordinate..State.Plane.","Open.Data.Channel.Type","Park.Facility.Name","Park.Borough","Vehicle.Type","Taxi.Company.Borough","Taxi.Pick.Up.Location","Bridge.Highway.Name","Bridge.Highway.Direction","Road.Ramp","Bridge.Highway.Segment","Latitude","Longitude","Location","Zip.Codes","Community.Districts","Borough.Boundaries","City.Council.Districts","Police.Precincts"
57000688,"03/09/2023 03:05:25 PM","03/13/2023 08:21:11 AM","DOE","Department of Education","School Maintenance","Other School Condition","School","","","","","","","","","","","","Closed","","The Department of Education determined that this complaint is not within its jurisdiction because it concerns a private school. Please contact the private school directly.","03/13/2023 08:21:17 AM","Unspecified BROOKLYN",NA,"BROOKLYN",989013,173052,"PHONE","Unspecified","BROOKLYN","","","","","","","",40.6416631014762,-73.9828374867859,"(40.641663101476226, -73.98283748678588)",17620,2,2,27,39
Need to scroll right to see the whole output line

The output from this code block has very long lines. You will need to scroll right in the output to read it all.

Note that the character on the right of the title bar for callout blocks like this, currently a v-shape for this block, is a toggle to show or collapse the contents of the block. Try clicking on that character to collapse the contents of the block.

In what follows the callout blocks are initially collapsed and you must click on the > character on the right end of the title bar to show the contents.

We now have enough information, such as the date-time format used, to do an initial conversion of the CSV file to a DataFrame, using several optional arguments to customize the process. (Details about each of these optional arguments are given in callout blocks below.)

df = CSV.read(
    "./data/311_half_Mar_2023.csv",  # file name
    DataFrame,           # output table type
    normalizenames=true, # convert column names to valid symbols
    missingstring=[      # strings that are used to indicate missing values
        "",
        "Unspecified",
        "unspecified",
        "Unknown",
        "unknown",
        "NA",
    ],
    dateformat=dateformat"m/d/Y I:M:S p", # DateTimes look like this
    downcast=true,       # use smallest type of Int that can represent the column 
    stripwhitespace=true # strip leading and trailing whitespace 
)
125826×46 DataFrame
125801 rows omitted
Row Unique_Key Created_Date Closed_Date Agency Agency_Name Complaint_Type Descriptor Location_Type Incident_Zip Incident_Address Street_Name Cross_Street_1 Cross_Street_2 Intersection_Street_1 Intersection_Street_2 Address_Type City Landmark Facility_Type Status Due_Date Resolution_Description Resolution_Action_Updated_Date Community_Board BBL Borough X_Coordinate_State_Plane_ Y_Coordinate_State_Plane_ Open_Data_Channel_Type Park_Facility_Name Park_Borough Vehicle_Type Taxi_Company_Borough Taxi_Pick_Up_Location Bridge_Highway_Name Bridge_Highway_Direction Road_Ramp Bridge_Highway_Segment Latitude Longitude Location Zip_Codes Community_Districts Borough_Boundaries City_Council_Districts Police_Precincts
Int32 DateTime DateTime? String15 String String String? String? String7? String? String? String? String? String? String? String15? String31? String? String15? String15? DateTime? String? DateTime? String31 Int64? String15? Int32? Int32? String7 String? String15? String15? String15? String? String31? String? String? String? Float64? Float64? String? Int16? Int8? Int8? Int8? Int8?
1 57000688 2023-03-09T15:05:25 2023-03-13T08:21:11 DOE Department of Education School Maintenance Other School Condition School missing missing missing missing missing missing missing missing missing missing missing Closed missing The Department of Education determined that this complaint is not within its jurisdiction because it concerns a private school. Please contact the private school directly. 2023-03-13T08:21:17 Unspecified BROOKLYN missing BROOKLYN 989013 173052 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6417 -73.9828 (40.641663101476226, -73.98283748678588) 17620 2 2 27 39
2 56993837 2023-03-08T21:27:34 2023-05-03T07:17:20 TLC Taxi and Limousine Commission For Hire Vehicle Complaint Driver Complaint - Non Passenger Street 10075 995 MADISON AVENUE MADISON AVENUE EAST 77 STREET EAST 78 STREET EAST 77 STREET EAST 78 STREET ADDRESS NEW YORK MADISON AVENUE missing Closed missing The Taxi and Limousine Commission (TLC) reviewed your complaint. The driver or owner pled guilty to a violation and paid a fine. TLC has sent you a written confirmation of this outcome. 2023-05-03T07:17:24 08 MANHATTAN 1013927501 MANHATTAN 994507 221687 ONLINE missing MANHATTAN missing missing 995 MADISON AVENUE, MANHATTAN (NEW YORK), NY, 10075 missing missing missing missing 40.7751 -73.963 (40.775149965980326, -73.96296687549152) 10092 23 4 51 11
3 57049479 2023-03-15T19:26:41 2023-03-21T21:16:53 HPD Department of Housing Preservation and Development UNSANITARY CONDITION PESTS RESIDENTIAL BUILDING 10459 1010 BRYANT AVENUE BRYANT AVENUE missing missing missing missing ADDRESS BRONX missing missing Closed missing The Department of Housing Preservation and Development inspected the following conditions. No violations were issued. The complaint has been closed. 2023-03-21T00:00:00 02 BRONX 2027560001 BRONX 1015090 239359 PHONE missing BRONX missing missing missing missing missing missing missing 40.8236 -73.8886 (40.82360689628868, -73.8885703894082) 10937 8 5 43 24
4 57056381 2023-03-16T23:58:53 2023-03-17T02:44:31 NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 10455 541 UNION AVENUE UNION AVENUE EAST 147 STREET EAST 149 STREET EAST 147 STREET EAST 149 STREET ADDRESS BRONX UNION AVENUE missing Closed missing The Police Department responded to the complaint and determined that police action was not necessary. 2023-03-17T02:44:41 01 BRONX 2025820042 BRONX 1010420 235260 MOBILE missing BRONX missing missing missing missing missing missing missing 40.8124 -73.9055 (40.81237137041546, -73.90545977958996) 10933 49 5 35 23
5 57023097 2023-03-12T15:15:54 2023-04-19T00:00:00 DOB Department of Buildings Building/Use Illegal Conversion Of Residential Building/Space missing 11365 69-35 166 STREET 166 STREET missing missing missing missing ADDRESS FRESH MEADOWS missing missing Assigned missing The Department of Buildings attempted to investigate this complaint but could not gain access to the location. Please schedule an appointment for a follow up inspection by contacting the appropriate unit at Department of Buildings. You can find contact information at https://www1.nyc.gov/site/buildings/about/contact-us.page. 2023-04-19T00:00:00 08 QUEENS 4069350033 QUEENS 1038821 206195 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.7325 -73.8031 (40.732465966173585, -73.80309625279911) 14507 25 3 24 65
6 56958927 2023-03-04T09:14:44 2023-03-04T10:21:46 DPR Department of Parks and Recreation Animal in a Park Injured Wildlife Park 11691 25 BEACH 28 STREET BEACH 28 STREET BEND SEAGIRT AVENUE BEND SEAGIRT AVENUE ADDRESS FAR ROCKAWAY BEACH 28 STREET missing Closed missing NYC Parks has reviewed the reported condition and corrected the problem. 2023-03-04T10:21:51 14 QUEENS missing QUEENS 1050687 155489 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.5932 -73.7608 (40.593208594326164, -73.76078103938401) 20529 51 3 47 59
7 57082890 2023-03-19T18:51:07 2023-03-20T08:28:00 DOT Department of Transportation Street Condition Pothole missing 11374 AUSTIN STREET AUSTIN STREET 63 DRIVE 64 ROAD missing missing BLOCKFACE QUEENS missing N/A Closed missing The Department of Transportation inspected this complaint and repaired the problem. 2023-03-20T08:28:00 06 QUEENS missing QUEENS missing missing UNKNOWN missing QUEENS missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing
8 57062847 2023-03-16T22:39:01 2023-03-17T01:10:14 NYPD New York City Police Department Noise - Residential Banging/Pounding Residential Building/House 11364 212-04 75 AVENUE 75 AVENUE 210 STREET BELL BOULEVARD 210 STREET BELL BOULEVARD ADDRESS OAKLAND GARDENS 75 AVENUE missing Closed missing The Police Department responded to the complaint but officers were unable to gain entry into the premises. 2023-03-17T01:10:19 11 QUEENS 4077480500 QUEENS 1050720 208220 ONLINE missing QUEENS missing missing missing missing missing missing missing 40.7379 -73.7601 (40.73794265357658, -73.76014232450228) 14506 26 3 16 69
9 56981582 2023-03-07T17:32:00 2023-03-07T17:40:00 DEP Department of Environmental Protection Water System No Water (WNW) missing 11234 6924 AVENUE L AVENUE L E 69 ST E 70 ST missing missing ADDRESS BROOKLYN missing missing Closed missing The Department of Environmental Protection investigated this complaint and found it to be a temporary condition. 2023-03-07T17:40:00 18 BROOKLYN 3083560049 BROOKLYN 1007660 167173 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6255 -73.9157 (40.62549684383184, -73.91566718936745) 13825 5 2 8 38
10 56989554 2023-03-08T11:41:47 2023-03-12T13:56:42 HPD Department of Housing Preservation and Development HEAT/HOT WATER APARTMENT ONLY RESIDENTIAL BUILDING 10468 2286 SEDGWICK AVENUE SEDGWICK AVENUE missing missing missing missing ADDRESS BRONX missing missing Closed missing The Department of Housing Preservation and Development was not able to gain access to your apartment or others in the building to inspect for a lack of heat or hot water. The complaint has been closed. If the condition still exists, please file a new complaint. 2023-03-12T00:00:00 07 BRONX 2032250149 BRONX 1009196 253132 PHONE missing BRONX missing missing missing missing missing missing missing 40.8614 -73.9098 (40.86142834654178, -73.90981505703196) 11606 24 5 29 34
11 57152044 2023-03-26T17:09:23 2023-03-26T22:21:41 NYPD New York City Police Department Illegal Parking Double Parked Blocking Traffic Street/Sidewalk 10034 9 HENSHAW STREET HENSHAW STREET RIVERSIDE DRIVE DYCKMAN STREET RIVERSIDE DRIVE DYCKMAN STREET ADDRESS NEW YORK HENSHAW STREET missing Closed missing The Police Department responded to the complaint and took action to fix the condition. 2023-03-26T22:21:49 12 MANHATTAN 1022460125 MANHATTAN 1003624 255090 MOBILE missing MANHATTAN missing missing missing missing missing missing missing 40.8668 -73.93 (40.86681644892715, -73.92995331784537) 13092 47 4 39 22
12 57124879 2023-03-24T01:19:51 2023-03-24T02:51:56 NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10463 2676 HEATH AVENUE HEATH AVENUE WEST 193 STREET WEST KINGSBRIDGE ROAD WEST 193 STREET WEST KINGSBRIDGE ROAD ADDRESS BRONX HEATH AVENUE missing Closed missing The Police Department issued a summons in response to the complaint. 2023-03-24T02:52:00 07 BRONX 2032400010 BRONX 1010303 256514 PHONE missing BRONX missing missing missing missing missing missing missing 40.8707 -73.9058 (40.87070770667293, -73.90579987814343) 11272 24 5 29 34
13 57148781 2023-03-26T14:14:18 2023-03-26T17:05:37 NYPD New York City Police Department Graffiti Police Report Not Requested Store/Commercial 11373 45 AVENUE 45 AVENUE 45 AVENUE BROADWAY 45 AVENUE BROADWAY INTERSECTION missing missing missing Closed missing The Police Department responded and upon arrival those responsible for the condition were gone. 2023-03-26T17:05:43 04 QUEENS missing QUEENS 1017225 209572 PHONE missing QUEENS missing missing missing missing missing missing missing 40.7418 -73.881 (40.74184186444486, -73.8810025231725) 14784 66 3 5 68
125815 56928915 2023-03-01T14:00:00 2023-03-15T12:51:00 DOT Department of Transportation Street Light Condition Street Light Out missing 11365 missing missing missing missing 69 AVENUE 168 STREET INTERSECTION QUEENS missing missing Closed missing Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. 2023-03-15T12:51:00 08 QUEENS missing QUEENS 1039328 206376 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.733 -73.8013 (40.73295962284684, -73.80126541379411) 14507 25 3 24 65
125816 57028889 2023-03-14T03:03:03 2023-03-14T06:01:34 NYPD New York City Police Department Illegal Parking Overnight Commercial Storage Street/Sidewalk 11103 24-37 49 STREET 49 STREET BROOKLYN QUEENS EXPRESSWAY WEST 25 AVENUE BROOKLYN QUEENS EXPRESSWAY WEST 25 AVENUE ADDRESS ASTORIA 49 STREET missing Closed missing The Police Department responded to the complaint and took action to fix the condition. 2023-03-14T06:01:39 01 QUEENS 4010160023 QUEENS 1010893 218162 ONLINE missing QUEENS missing missing missing missing missing missing missing 40.7654 -73.9038 (40.765440563155224, -73.9038188686141) 16860 39 3 4 72
125817 56977801 2023-03-07T12:04:41 2023-03-07T14:09:09 NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 11206 342 TEN EYCK STREET TEN EYCK STREET WATERBURY STREET BOGART STREET WATERBURY STREET BOGART STREET ADDRESS BROOKLYN TEN EYCK STREET missing Closed missing The Police Department responded to the complaint and determined that police action was not necessary. 2023-03-07T14:09:13 01 BROOKLYN 3030290027 BROOKLYN 1001939 198530 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.7116 -73.9362 (40.71157805562249, -73.93619438372028) 17213 36 2 30 56
125818 56933591 2023-03-01T16:27:10 2023-03-05T17:32:57 HPD Department of Housing Preservation and Development UNSANITARY CONDITION MOLD RESIDENTIAL BUILDING 11216 1236 PACIFIC STREET PACIFIC STREET missing missing missing missing ADDRESS BROOKLYN missing missing Closed missing The following complaint conditions are still open. HPD may attempt to contact you to verify the correction of the condition or may conduct an inspection. 2023-03-05T00:00:00 08 BROOKLYN 3012060031 BROOKLYN 997897 186188 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6777 -73.9508 (40.67770929012322, -73.95079919734158) 17618 16 2 49 49
125819 57054603 2023-03-15T06:49:34 2023-03-15T08:54:02 NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 11419 115-02 ATLANTIC AVENUE ATLANTIC AVENUE 115 STREET 116 STREET 115 STREET 116 STREET ADDRESS SOUTH RICHMOND HILL ATLANTIC AVENUE missing Closed missing The Police Department responded and upon arrival those responsible for the condition were gone. 2023-03-15T08:54:09 09 QUEENS 4094010001 QUEENS 1031061 191855 ONLINE missing QUEENS missing missing missing missing missing missing missing 40.6932 -73.8312 (40.69315062980763, -73.83119561317962) 24014 46 3 46 60
125820 57199244 2023-03-31T16:46:20 missing DOHMH Department of Health and Mental Hygiene Construction Lead Dust In Common Areas 3+ Family Apartment Building 11215 161 PROSPECT PARK WEST PROSPECT PARK WEST 10 STREET 11 STREET 10 STREET 11 STREET ADDRESS BROOKLYN PROSPECT PARK WEST missing In Progress missing missing missing 06 BROOKLYN 3010950050 BROOKLYN 990416 181008 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6635 -73.9778 (40.66349972132597, -73.97777478483737) missing 14 2 27 50
125821 57199802 2023-03-31T11:15:46 2023-04-03T00:00:00 DOB Department of Buildings Boilers Boiler - Defective/Inoperative/No Permit missing 10301 631 HOWARD AVENUE HOWARD AVENUE missing missing missing missing ADDRESS STATEN ISLAND missing N/A Closed missing The Department of Buildings investigated this complaint and issued an Office of Administrative Trials and Hearings (OATH) summons. 2023-04-03T00:00:00 01 STATEN ISLAND 5006200001 STATEN ISLAND 957806 163766 UNKNOWN missing STATEN ISLAND missing missing missing missing missing missing missing 40.6161 -74.0952 (40.61613680750325, -74.0952492197216) 10369 4 1 13 74
125822 57116487 2023-03-22T17:47:08 missing DPR Department of Parks and Recreation New Tree Request For One Address Street 10025 209 WEST 93 STREET WEST 93 STREET AMSTERDAM AVENUE BROADWAY AMSTERDAM AVENUE BROADWAY ADDRESS NEW YORK WEST 93 STREET missing In Progress missing NYC Parks will inspect the site to determine if it is suitable for a new street tree, including a review of potential conflicts with other infrastructure. If the site is found to be suitable, a tree will be planted during the next available planting season. You may review recently completed and upcoming street tree plantings in your neighborhood by visiting the NYC Parks Tree Work Hub at nyc.gov/parks/treework. 2023-04-12T09:02:03 07 MANHATTAN 1012410025 MANHATTAN 991932 227949 MOBILE missing MANHATTAN missing missing missing missing missing missing missing 40.7923 -73.9723 (40.792340101747136, -73.97225680841004) 12422 20 4 19 15
125823 56954525 2023-03-04T16:23:00 2023-03-08T10:50:00 DEP Department of Environmental Protection Lead Lead Kit Request (Residential) (L10) missing 10011 220 WEST 16 STREET WEST 16 STREET 7 AVE 8 AVE missing missing ADDRESS NEW YORK missing N/A Closed missing The Department of Environmental Protection (DEP) mailed you the free lead test kit you requested. The kit includes instructions for collecting a water sample and contact information for DEP's Lead Program Unit if you have any questions. Please use the included pre-paid label to mail the sample to the lab for analysis. It may take the lab up to 30 days to complete their analysis and mail you the test results. This Service Request is closed and no further status will be available from 311. 2023-03-08T10:50:00 04 MANHATTAN 1007650055 MANHATTAN 984375 208960 ONLINE missing MANHATTAN missing missing missing missing missing missing missing 40.7402 -73.9995 (40.74022355178869, -73.99954892134235) 12074 12 4 10 6
125824 57132547 2023-03-24T21:57:44 2023-03-24T23:18:11 NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk 10306 38 STERLING AVENUE STERLING AVENUE 10 STREET CLAWSON STREET 10 STREET CLAWSON STREET ADDRESS STATEN ISLAND STERLING AVENUE missing Closed missing The Police Department responded and upon arrival those responsible for the condition were gone. 2023-03-24T23:18:15 02 STATEN ISLAND missing STATEN ISLAND 952271 146897 PHONE missing STATEN ISLAND missing missing missing missing missing missing missing 40.5698 -74.1151 (40.56981658187845, -74.11510599391917) 10693 30 1 14 76
125825 57102740 2023-03-21T05:10:12 2023-03-21T05:16:30 NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House 10002 243 HENRY STREET HENRY STREET CLINTON STREET MONTGOMERY STREET CLINTON STREET MONTGOMERY STREET ADDRESS NEW YORK HENRY STREET missing Closed missing The Police Department responded to the complaint and took action to fix the condition. 2023-03-21T05:16:37 03 MANHATTAN 1002860018 MANHATTAN 988256 199320 ONLINE missing MANHATTAN missing missing missing missing missing missing missing 40.7138 -73.9855 (40.71376319312491, -73.98554956996014) 11723 70 4 32 4
125826 56950149 2023-03-04T00:27:00 2023-03-04T10:15:00 DOT Department of Transportation Traffic Signal Condition Post missing 11423 missing missing missing missing JAMAICA AVENUE 201 STREET INTERSECTION QUEENS missing N/A Closed missing Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. 2023-03-04T10:15:00 12 QUEENS missing QUEENS 1051269 199315 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.7135 -73.7582 (40.7134965426944, -73.758249944708) 24332 41 3 16 61

Convert the column names from the CSV file into a form that can be parsed as a Symbol in Julia. In particular, strip whitespace at the beginning and end of the name and replace embedded blanks or . characters by underscores.

Columns of a DataFrame (and other type of tables) are usually extracted as “properties” of the table, either as, e.g. getproperty(df, :Unique_Key), or, more commonly, with the dot operator as, e.g. df.Unique_Key. The expression :Unique_Key or the right operand of df.Unique_Key evaluates to a Symbol. If the column name has an embedded blank or . this form cannot be used and the name must be quoted, either as Symbol("Unique.Key") or var"Unique.Key".

Thus it is an advantage to rewrite the column names so that each name can be used “as is” with the dot operator.

Enumerate the various forms of missing data indicators used in the file. Completing this list is often a process of trial-and-error.

Provide a Dates.DateFormat object that will be applied to text columns if they match the pattern, converting to DateTime values.

The construction dateformat"..." is a call to a “string macro” in Julia. That is, it is just a shorthand for replacing the string with the Dates.DateFormat object created from the string.

For columns of integer values, choose the smallest integer type, such as Int8, Int16, etc., that can represent all the values in the column.

Validity checking

Review the storage type for each column

In the output (remember, you must scroll horizontally to see it all) there is a description of the type of the elements of the column directly under the column name. A type name that ends in ? indicates that missing values are present in the column. Thus, for example, the Unique_Key is designated Int32 meaning the values are stored as 32-bit signed integers with no missing values. The Created_Date column, designated DateTime, is a DateTime column with no missing values but the Closed_Date, designated DateTime?, is also a DateTime, but can (and does) have missing values.

Some columns are designated as String whereas others are designated as String7 or String15. These are storage-saving representations used when the whole column consists of short character strings.

There is an anomaly in these short strings for the Agency column. The Agency acronyms should be 5 characters at most, which would be stored as String7, but they are stored as String15.

To see why, we check the unique combinations of the Agency and Agency_Name columns

unique(select(df, :Agency, :Agency_Name))
15×2 DataFrame
Row Agency Agency_Name
String15 String
1 DOE Department of Education
2 TLC Taxi and Limousine Commission
3 HPD Department of Housing Preservation and Development
4 NYPD New York City Police Department
5 DOB Department of Buildings
6 DPR Department of Parks and Recreation
7 DOT Department of Transportation
8 DEP Department of Environmental Protection
9 DOHMH Department of Health and Mental Hygiene
10 DSNY Department of Sanitation
11 DHS Department of Homeless Services
12 EDC Economic Development Corporation
13 DCA Department of Consumer Affairs
14 NYC311-PRD Department of Environmental Protection
15 OTI Office of Technology and Innovation

These columns should be in one-to-one correspondence but they are not.

At this point we would need to check with the data providers to determine why two different Agency designations are used for the “Department of Environmental Protection”.

The other problem shown in the types is that the values in the Incident_Zip column are stored as String7? when they should be Int32?.

Checking the unique values

sort(unique(df.Incident_Zip))
227-element Vector{Union{Missing, String7}}:
 "02062"
 "07002"
 "10000"
 "10001"
 "10002"
 "10003"
 "10004"
 "10005"
 "10006"
 "10007"
 "10009"
 "10010"
 "10011"
 ⋮
 "11693"
 "11694"
 "11697"
 "12212"
 "12345"
 "19034"
 "21031"
 "75007"
 "78758"
 "82001"
 "na"
 missing

shows the problem is that someone got creative and used na as a missing value indicator.

Recreate the DataFrame with "na" in the missingstring vector

df = CSV.read(
    "./data/311_half_Mar_2023.csv",  # file name
    DataFrame,           # output table type
    normalizenames=true, # convert column names to valid symbols
    missingstring=[      # strings that are used to indicate missing vals
        "",
        "Unspecified",
        "unspecified",
        "Unknown",
        "unknown",
        "NA",
        "na",
    ],
    dateformat=dateformat"m/d/Y I:M:S p", # DateTimes look like this
    downcast=true,       # use smallest type of Int that can represent the column 
    stripwhitespace=true # strip leading and trailing whitespace 
)
125826×46 DataFrame
125801 rows omitted
Row Unique_Key Created_Date Closed_Date Agency Agency_Name Complaint_Type Descriptor Location_Type Incident_Zip Incident_Address Street_Name Cross_Street_1 Cross_Street_2 Intersection_Street_1 Intersection_Street_2 Address_Type City Landmark Facility_Type Status Due_Date Resolution_Description Resolution_Action_Updated_Date Community_Board BBL Borough X_Coordinate_State_Plane_ Y_Coordinate_State_Plane_ Open_Data_Channel_Type Park_Facility_Name Park_Borough Vehicle_Type Taxi_Company_Borough Taxi_Pick_Up_Location Bridge_Highway_Name Bridge_Highway_Direction Road_Ramp Bridge_Highway_Segment Latitude Longitude Location Zip_Codes Community_Districts Borough_Boundaries City_Council_Districts Police_Precincts
Int32 DateTime DateTime? String15 String String String? String? Int32? String? String? String? String? String? String? String15? String31? String? String15? String15? DateTime? String? DateTime? String31 Int64? String15? Int32? Int32? String7 String? String15? String15? String15? String? String31? String? String? String? Float64? Float64? String? Int16? Int8? Int8? Int8? Int8?
1 57000688 2023-03-09T15:05:25 2023-03-13T08:21:11 DOE Department of Education School Maintenance Other School Condition School missing missing missing missing missing missing missing missing missing missing missing Closed missing The Department of Education determined that this complaint is not within its jurisdiction because it concerns a private school. Please contact the private school directly. 2023-03-13T08:21:17 Unspecified BROOKLYN missing BROOKLYN 989013 173052 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6417 -73.9828 (40.641663101476226, -73.98283748678588) 17620 2 2 27 39
2 56993837 2023-03-08T21:27:34 2023-05-03T07:17:20 TLC Taxi and Limousine Commission For Hire Vehicle Complaint Driver Complaint - Non Passenger Street 10075 995 MADISON AVENUE MADISON AVENUE EAST 77 STREET EAST 78 STREET EAST 77 STREET EAST 78 STREET ADDRESS NEW YORK MADISON AVENUE missing Closed missing The Taxi and Limousine Commission (TLC) reviewed your complaint. The driver or owner pled guilty to a violation and paid a fine. TLC has sent you a written confirmation of this outcome. 2023-05-03T07:17:24 08 MANHATTAN 1013927501 MANHATTAN 994507 221687 ONLINE missing MANHATTAN missing missing 995 MADISON AVENUE, MANHATTAN (NEW YORK), NY, 10075 missing missing missing missing 40.7751 -73.963 (40.775149965980326, -73.96296687549152) 10092 23 4 51 11
3 57049479 2023-03-15T19:26:41 2023-03-21T21:16:53 HPD Department of Housing Preservation and Development UNSANITARY CONDITION PESTS RESIDENTIAL BUILDING 10459 1010 BRYANT AVENUE BRYANT AVENUE missing missing missing missing ADDRESS BRONX missing missing Closed missing The Department of Housing Preservation and Development inspected the following conditions. No violations were issued. The complaint has been closed. 2023-03-21T00:00:00 02 BRONX 2027560001 BRONX 1015090 239359 PHONE missing BRONX missing missing missing missing missing missing missing 40.8236 -73.8886 (40.82360689628868, -73.8885703894082) 10937 8 5 43 24
4 57056381 2023-03-16T23:58:53 2023-03-17T02:44:31 NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 10455 541 UNION AVENUE UNION AVENUE EAST 147 STREET EAST 149 STREET EAST 147 STREET EAST 149 STREET ADDRESS BRONX UNION AVENUE missing Closed missing The Police Department responded to the complaint and determined that police action was not necessary. 2023-03-17T02:44:41 01 BRONX 2025820042 BRONX 1010420 235260 MOBILE missing BRONX missing missing missing missing missing missing missing 40.8124 -73.9055 (40.81237137041546, -73.90545977958996) 10933 49 5 35 23
5 57023097 2023-03-12T15:15:54 2023-04-19T00:00:00 DOB Department of Buildings Building/Use Illegal Conversion Of Residential Building/Space missing 11365 69-35 166 STREET 166 STREET missing missing missing missing ADDRESS FRESH MEADOWS missing missing Assigned missing The Department of Buildings attempted to investigate this complaint but could not gain access to the location. Please schedule an appointment for a follow up inspection by contacting the appropriate unit at Department of Buildings. You can find contact information at https://www1.nyc.gov/site/buildings/about/contact-us.page. 2023-04-19T00:00:00 08 QUEENS 4069350033 QUEENS 1038821 206195 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.7325 -73.8031 (40.732465966173585, -73.80309625279911) 14507 25 3 24 65
6 56958927 2023-03-04T09:14:44 2023-03-04T10:21:46 DPR Department of Parks and Recreation Animal in a Park Injured Wildlife Park 11691 25 BEACH 28 STREET BEACH 28 STREET BEND SEAGIRT AVENUE BEND SEAGIRT AVENUE ADDRESS FAR ROCKAWAY BEACH 28 STREET missing Closed missing NYC Parks has reviewed the reported condition and corrected the problem. 2023-03-04T10:21:51 14 QUEENS missing QUEENS 1050687 155489 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.5932 -73.7608 (40.593208594326164, -73.76078103938401) 20529 51 3 47 59
7 57082890 2023-03-19T18:51:07 2023-03-20T08:28:00 DOT Department of Transportation Street Condition Pothole missing 11374 AUSTIN STREET AUSTIN STREET 63 DRIVE 64 ROAD missing missing BLOCKFACE QUEENS missing N/A Closed missing The Department of Transportation inspected this complaint and repaired the problem. 2023-03-20T08:28:00 06 QUEENS missing QUEENS missing missing UNKNOWN missing QUEENS missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing
8 57062847 2023-03-16T22:39:01 2023-03-17T01:10:14 NYPD New York City Police Department Noise - Residential Banging/Pounding Residential Building/House 11364 212-04 75 AVENUE 75 AVENUE 210 STREET BELL BOULEVARD 210 STREET BELL BOULEVARD ADDRESS OAKLAND GARDENS 75 AVENUE missing Closed missing The Police Department responded to the complaint but officers were unable to gain entry into the premises. 2023-03-17T01:10:19 11 QUEENS 4077480500 QUEENS 1050720 208220 ONLINE missing QUEENS missing missing missing missing missing missing missing 40.7379 -73.7601 (40.73794265357658, -73.76014232450228) 14506 26 3 16 69
9 56981582 2023-03-07T17:32:00 2023-03-07T17:40:00 DEP Department of Environmental Protection Water System No Water (WNW) missing 11234 6924 AVENUE L AVENUE L E 69 ST E 70 ST missing missing ADDRESS BROOKLYN missing missing Closed missing The Department of Environmental Protection investigated this complaint and found it to be a temporary condition. 2023-03-07T17:40:00 18 BROOKLYN 3083560049 BROOKLYN 1007660 167173 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6255 -73.9157 (40.62549684383184, -73.91566718936745) 13825 5 2 8 38
10 56989554 2023-03-08T11:41:47 2023-03-12T13:56:42 HPD Department of Housing Preservation and Development HEAT/HOT WATER APARTMENT ONLY RESIDENTIAL BUILDING 10468 2286 SEDGWICK AVENUE SEDGWICK AVENUE missing missing missing missing ADDRESS BRONX missing missing Closed missing The Department of Housing Preservation and Development was not able to gain access to your apartment or others in the building to inspect for a lack of heat or hot water. The complaint has been closed. If the condition still exists, please file a new complaint. 2023-03-12T00:00:00 07 BRONX 2032250149 BRONX 1009196 253132 PHONE missing BRONX missing missing missing missing missing missing missing 40.8614 -73.9098 (40.86142834654178, -73.90981505703196) 11606 24 5 29 34
11 57152044 2023-03-26T17:09:23 2023-03-26T22:21:41 NYPD New York City Police Department Illegal Parking Double Parked Blocking Traffic Street/Sidewalk 10034 9 HENSHAW STREET HENSHAW STREET RIVERSIDE DRIVE DYCKMAN STREET RIVERSIDE DRIVE DYCKMAN STREET ADDRESS NEW YORK HENSHAW STREET missing Closed missing The Police Department responded to the complaint and took action to fix the condition. 2023-03-26T22:21:49 12 MANHATTAN 1022460125 MANHATTAN 1003624 255090 MOBILE missing MANHATTAN missing missing missing missing missing missing missing 40.8668 -73.93 (40.86681644892715, -73.92995331784537) 13092 47 4 39 22
12 57124879 2023-03-24T01:19:51 2023-03-24T02:51:56 NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10463 2676 HEATH AVENUE HEATH AVENUE WEST 193 STREET WEST KINGSBRIDGE ROAD WEST 193 STREET WEST KINGSBRIDGE ROAD ADDRESS BRONX HEATH AVENUE missing Closed missing The Police Department issued a summons in response to the complaint. 2023-03-24T02:52:00 07 BRONX 2032400010 BRONX 1010303 256514 PHONE missing BRONX missing missing missing missing missing missing missing 40.8707 -73.9058 (40.87070770667293, -73.90579987814343) 11272 24 5 29 34
13 57148781 2023-03-26T14:14:18 2023-03-26T17:05:37 NYPD New York City Police Department Graffiti Police Report Not Requested Store/Commercial 11373 45 AVENUE 45 AVENUE 45 AVENUE BROADWAY 45 AVENUE BROADWAY INTERSECTION missing missing missing Closed missing The Police Department responded and upon arrival those responsible for the condition were gone. 2023-03-26T17:05:43 04 QUEENS missing QUEENS 1017225 209572 PHONE missing QUEENS missing missing missing missing missing missing missing 40.7418 -73.881 (40.74184186444486, -73.8810025231725) 14784 66 3 5 68
125815 56928915 2023-03-01T14:00:00 2023-03-15T12:51:00 DOT Department of Transportation Street Light Condition Street Light Out missing missing missing missing missing missing 69 AVENUE 168 STREET INTERSECTION QUEENS missing missing Closed missing Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. 2023-03-15T12:51:00 08 QUEENS missing QUEENS 1039328 206376 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.733 -73.8013 (40.73295962284684, -73.80126541379411) 14507 25 3 24 65
125816 57028889 2023-03-14T03:03:03 2023-03-14T06:01:34 NYPD New York City Police Department Illegal Parking Overnight Commercial Storage Street/Sidewalk missing 24-37 49 STREET 49 STREET BROOKLYN QUEENS EXPRESSWAY WEST 25 AVENUE BROOKLYN QUEENS EXPRESSWAY WEST 25 AVENUE ADDRESS ASTORIA 49 STREET missing Closed missing The Police Department responded to the complaint and took action to fix the condition. 2023-03-14T06:01:39 01 QUEENS 4010160023 QUEENS 1010893 218162 ONLINE missing QUEENS missing missing missing missing missing missing missing 40.7654 -73.9038 (40.765440563155224, -73.9038188686141) 16860 39 3 4 72
125817 56977801 2023-03-07T12:04:41 2023-03-07T14:09:09 NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk missing 342 TEN EYCK STREET TEN EYCK STREET WATERBURY STREET BOGART STREET WATERBURY STREET BOGART STREET ADDRESS BROOKLYN TEN EYCK STREET missing Closed missing The Police Department responded to the complaint and determined that police action was not necessary. 2023-03-07T14:09:13 01 BROOKLYN 3030290027 BROOKLYN 1001939 198530 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.7116 -73.9362 (40.71157805562249, -73.93619438372028) 17213 36 2 30 56
125818 56933591 2023-03-01T16:27:10 2023-03-05T17:32:57 HPD Department of Housing Preservation and Development UNSANITARY CONDITION MOLD RESIDENTIAL BUILDING missing 1236 PACIFIC STREET PACIFIC STREET missing missing missing missing ADDRESS BROOKLYN missing missing Closed missing The following complaint conditions are still open. HPD may attempt to contact you to verify the correction of the condition or may conduct an inspection. 2023-03-05T00:00:00 08 BROOKLYN 3012060031 BROOKLYN 997897 186188 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6777 -73.9508 (40.67770929012322, -73.95079919734158) 17618 16 2 49 49
125819 57054603 2023-03-15T06:49:34 2023-03-15T08:54:02 NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk missing 115-02 ATLANTIC AVENUE ATLANTIC AVENUE 115 STREET 116 STREET 115 STREET 116 STREET ADDRESS SOUTH RICHMOND HILL ATLANTIC AVENUE missing Closed missing The Police Department responded and upon arrival those responsible for the condition were gone. 2023-03-15T08:54:09 09 QUEENS 4094010001 QUEENS 1031061 191855 ONLINE missing QUEENS missing missing missing missing missing missing missing 40.6932 -73.8312 (40.69315062980763, -73.83119561317962) 24014 46 3 46 60
125820 57199244 2023-03-31T16:46:20 missing DOHMH Department of Health and Mental Hygiene Construction Lead Dust In Common Areas 3+ Family Apartment Building missing 161 PROSPECT PARK WEST PROSPECT PARK WEST 10 STREET 11 STREET 10 STREET 11 STREET ADDRESS BROOKLYN PROSPECT PARK WEST missing In Progress missing missing missing 06 BROOKLYN 3010950050 BROOKLYN 990416 181008 PHONE missing BROOKLYN missing missing missing missing missing missing missing 40.6635 -73.9778 (40.66349972132597, -73.97777478483737) missing 14 2 27 50
125821 57199802 2023-03-31T11:15:46 2023-04-03T00:00:00 DOB Department of Buildings Boilers Boiler - Defective/Inoperative/No Permit missing missing 631 HOWARD AVENUE HOWARD AVENUE missing missing missing missing ADDRESS STATEN ISLAND missing N/A Closed missing The Department of Buildings investigated this complaint and issued an Office of Administrative Trials and Hearings (OATH) summons. 2023-04-03T00:00:00 01 STATEN ISLAND 5006200001 STATEN ISLAND 957806 163766 UNKNOWN missing STATEN ISLAND missing missing missing missing missing missing missing 40.6161 -74.0952 (40.61613680750325, -74.0952492197216) 10369 4 1 13 74
125822 57116487 2023-03-22T17:47:08 missing DPR Department of Parks and Recreation New Tree Request For One Address Street missing 209 WEST 93 STREET WEST 93 STREET AMSTERDAM AVENUE BROADWAY AMSTERDAM AVENUE BROADWAY ADDRESS NEW YORK WEST 93 STREET missing In Progress missing NYC Parks will inspect the site to determine if it is suitable for a new street tree, including a review of potential conflicts with other infrastructure. If the site is found to be suitable, a tree will be planted during the next available planting season. You may review recently completed and upcoming street tree plantings in your neighborhood by visiting the NYC Parks Tree Work Hub at nyc.gov/parks/treework. 2023-04-12T09:02:03 07 MANHATTAN 1012410025 MANHATTAN 991932 227949 MOBILE missing MANHATTAN missing missing missing missing missing missing missing 40.7923 -73.9723 (40.792340101747136, -73.97225680841004) 12422 20 4 19 15
125823 56954525 2023-03-04T16:23:00 2023-03-08T10:50:00 DEP Department of Environmental Protection Lead Lead Kit Request (Residential) (L10) missing missing 220 WEST 16 STREET WEST 16 STREET 7 AVE 8 AVE missing missing ADDRESS NEW YORK missing N/A Closed missing The Department of Environmental Protection (DEP) mailed you the free lead test kit you requested. The kit includes instructions for collecting a water sample and contact information for DEP's Lead Program Unit if you have any questions. Please use the included pre-paid label to mail the sample to the lab for analysis. It may take the lab up to 30 days to complete their analysis and mail you the test results. This Service Request is closed and no further status will be available from 311. 2023-03-08T10:50:00 04 MANHATTAN 1007650055 MANHATTAN 984375 208960 ONLINE missing MANHATTAN missing missing missing missing missing missing missing 40.7402 -73.9995 (40.74022355178869, -73.99954892134235) 12074 12 4 10 6
125824 57132547 2023-03-24T21:57:44 2023-03-24T23:18:11 NYPD New York City Police Department Illegal Parking Commercial Overnight Parking Street/Sidewalk missing 38 STERLING AVENUE STERLING AVENUE 10 STREET CLAWSON STREET 10 STREET CLAWSON STREET ADDRESS STATEN ISLAND STERLING AVENUE missing Closed missing The Police Department responded and upon arrival those responsible for the condition were gone. 2023-03-24T23:18:15 02 STATEN ISLAND missing STATEN ISLAND 952271 146897 PHONE missing STATEN ISLAND missing missing missing missing missing missing missing 40.5698 -74.1151 (40.56981658187845, -74.11510599391917) 10693 30 1 14 76
125825 57102740 2023-03-21T05:10:12 2023-03-21T05:16:30 NYPD New York City Police Department Noise - Residential Loud Music/Party Residential Building/House missing 243 HENRY STREET HENRY STREET CLINTON STREET MONTGOMERY STREET CLINTON STREET MONTGOMERY STREET ADDRESS NEW YORK HENRY STREET missing Closed missing The Police Department responded to the complaint and took action to fix the condition. 2023-03-21T05:16:37 03 MANHATTAN 1002860018 MANHATTAN 988256 199320 ONLINE missing MANHATTAN missing missing missing missing missing missing missing 40.7138 -73.9855 (40.71376319312491, -73.98554956996014) 11723 70 4 32 4
125826 56950149 2023-03-04T00:27:00 2023-03-04T10:15:00 DOT Department of Transportation Traffic Signal Condition Post missing missing missing missing missing missing JAMAICA AVENUE 201 STREET INTERSECTION QUEENS missing N/A Closed missing Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. 2023-03-04T10:15:00 12 QUEENS missing QUEENS 1051269 199315 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.7135 -73.7582 (40.7134965426944, -73.758249944708) 24332 41 3 16 61

Further checks

Are the Unique_Key values unique?

length(unique(df.Unique_Key)) == nrow(df)
true

Are the Created_Date values all from March, 2023?

extrema(df.Created_Date)
(DateTime("2023-03-01T00:00:27"), DateTime("2023-03-31T23:44:52"))

Are the Closed_Date values in the expected range?

extrema(skipmissing(df.Closed_Date))
(DateTime("2022-11-30T17:42:00"), DateTime("2023-05-23T22:41:42"))

There are missing values in the Closed_Date column which must be skipped when evaluating the extremes.

Now we can see a problem with the record giving the earliest Closed_Date because it is before the Created_Date range.

We save the indicators of rows with early closing or immediate closing for later manual checking.

earlyclosing = @. !ismissing(df.Closed_Date) && (df.Closed_Date < df.Created_Date)
immediateclosing = @. !ismissing(df.Closed_Date) && (df.Closed_Date == df.Created_Date)
(early = count(earlyclosing), immediate = count(immediateclosing))
(early = 301, immediate = 4279)

Dot vectorization in Julia provides “syntactic sugar” to vectorize a scalar operation. Prefacing an expression with @. applies dot vectorization to all the function calls and operators in the expression.

If we want to examine the rows with these characteristics we index the rows with the BitVector created from the logical expression. To see the first 3 rows with early closings

earlyfrm = df[earlyclosing, :]
first(earlyfrm, 3)
3×46 DataFrame
Row Unique_Key Created_Date Closed_Date Agency Agency_Name Complaint_Type Descriptor Location_Type Incident_Zip Incident_Address Street_Name Cross_Street_1 Cross_Street_2 Intersection_Street_1 Intersection_Street_2 Address_Type City Landmark Facility_Type Status Due_Date Resolution_Description Resolution_Action_Updated_Date Community_Board BBL Borough X_Coordinate_State_Plane_ Y_Coordinate_State_Plane_ Open_Data_Channel_Type Park_Facility_Name Park_Borough Vehicle_Type Taxi_Company_Borough Taxi_Pick_Up_Location Bridge_Highway_Name Bridge_Highway_Direction Road_Ramp Bridge_Highway_Segment Latitude Longitude Location Zip_Codes Community_Districts Borough_Boundaries City_Council_Districts Police_Precincts
Int32 DateTime DateTime? String15 String String String? String? Int32? String? String? String? String? String? String? String15? String31? String? String15? String15? DateTime? String? DateTime? String31 Int64? String15? Int32? Int32? String7 String? String15? String15? String15? String? String31? String? String? String? Float64? Float64? String? Int16? Int8? Int8? Int8? Int8?
1 57008185 2023-03-10T09:14:00 2023-03-09T09:13:00 DOT Department of Transportation Street Light Condition Street Light Out missing 10308 missing missing missing missing FERN AVENUE LAMOKA AVENUE INTERSECTION STATEN ISLAND missing missing Pending missing Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. 2023-03-09T09:13:00 03 STATEN ISLAND missing STATEN ISLAND 940237 139664 UNKNOWN missing STATEN ISLAND missing missing missing missing missing missing missing 40.5499 -74.1584 (40.54991185122116, -74.15837431589053) 10695 15 1 9 76
2 57115432 2023-03-22T09:19:00 2023-03-21T09:18:00 DOT Department of Transportation Street Light Condition Multiple St Lts Dayburning missing 10464 PELHAM BAY PARK PELHAM BAY PARK BRUCKNER BLVD EASTCHESTER BAY missing missing PLACENAME BRONX missing missing Pending missing Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. 2023-03-21T09:18:00 28 BRONX 2056500001 BRONX 1036955 255379 UNKNOWN missing BRONX missing missing missing missing missing missing missing 40.8675 -73.8094 (40.86747358275403, -73.8094430998493) 11273 44 5 12 28
3 57171848 2023-03-29T11:49:00 2023-03-24T11:49:00 DOT Department of Transportation Street Light Condition Street Light Out missing 11368 FLUSHING MEADOWS PARK FLUSHING MEADOWS PARK 111 ST COLLEGE POINT BLVD missing missing PLACENAME CORONA missing missing Pending missing Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. 2023-03-24T11:49:00 81 QUEENS 4020180001 QUEENS 1027989 209784 UNKNOWN missing QUEENS missing missing missing missing missing missing missing 40.7424 -73.8422 (40.742377064999275, -73.84215697754966) 14510 67 3 21 68

In this case the use of : as the second index indicates that all the columns should be included in the subset.

These three are from the “Department of Transport” and, in fact, all rows with early closing dates are from the DOT

unique(select(earlyfrm, :Agency, :Agency_Name))
1×2 DataFrame
Row Agency Agency_Name
String15 String
1 DOT Department of Transportation

Is Location redundant if Latitude and Longitude are given?

This is a tricky question because the values of Latitude and Longitude are stored to 17 significant digits, which is much more accuracy than could possibly be measured, but equality comparisons, either as floating-point numbers or as strings with 17 significant digits, would require an exact match.

To check for consistency we need to convert the Location to a Latitude-Longitude pair of floating point numbers then check for approximate equality of these values to the recorded Latitude and Longitude. We’ll leave that as an exercise for the reader and just assure you that the Location column is redundant and can be dropped.

Saving the dataset as an Arrow IPC file

We use Arrow.write to write the dataframe in the Arrow IPC format.

fn = Arrow.write(    # returns the file name
    "./data/311_half_Mar_2023.arrow",
    select(df, Not(:Location)),
)

The Arrow file is less than one third the size of the original CSV file but contains much more specificity about the data types and representations. Furthermore it can be read into other data science environments without needing to tinker with arguments to CSV file-reading functions.

The Arrow IPC format and the Arrow.write function in the Julia package allow for optional compression of the contents using lz4 or zstd compression, if further reduction of the file size is desired. The downside of compression is that an uncompressed file can be memory-mapped, allowing for extremely fast input, whereas the compressed files must be decompressed into memory.

This file can be read into Julia

tbl = Arrow.Table(fn)
Arrow.Table with 125826 rows, 45 columns, and schema:
 :Unique_Key                Int32
 :Created_Date              DateTime
 :Closed_Date               Union{Missing, DateTime}
 :Agency                    String
 :Agency_Name               String
 :Complaint_Type            String
 :Descriptor                Union{Missing, String}
 :Location_Type             Union{Missing, String}
 :Incident_Zip              Union{Missing, Int32}
 :Incident_Address          Union{Missing, String}
 :Street_Name               Union{Missing, String}
 :Cross_Street_1            Union{Missing, String}
 :Cross_Street_2            Union{Missing, String}
 ⋮                          
 :Taxi_Pick_Up_Location     Union{Missing, String}
 :Bridge_Highway_Name       Union{Missing, String}
 :Bridge_Highway_Direction  Union{Missing, String}
 :Road_Ramp                 Union{Missing, String}
 :Bridge_Highway_Segment    Union{Missing, String}
 :Latitude                  Union{Missing, Float64}
 :Longitude                 Union{Missing, Float64}
 :Zip_Codes                 Union{Missing, Int16}
 :Community_Districts       Union{Missing, Int8}
 :Borough_Boundaries        Union{Missing, Int8}
 :City_Council_Districts    Union{Missing, Int8}
 :Police_Precincts          Union{Missing, Int8}

or R

> tibble::glimpse(arrow::read_ipc_file("./data/311_half_Mar_2023.arrow"))
Rows: 125,826
Columns: 45
$ Unique_Key                     <int> 57000688, 56993837, 57049479, 57056381,…
$ Created_Date                   <dttm> 2023-03-09 09:05:25, 2023-03-08 15:27:
$ Closed_Date                    <dttm> 2023-03-13 03:21:11, 2023-05-03 02:17:
$ Agency                         <fct> DOE, TLC, HPD, NYPD, DOB, DPR, DOT, NYP…
$ Agency_Name                    <fct> Department of Education, Taxi and Limou…
$ Complaint_Type                 <fct> School Maintenance, For Hire Vehicle Co…
$ Descriptor                     <chr> "Other School Condition", "Driver Compl…
$ Location_Type                  <fct> School, Street, RESIDENTIAL BUILDING, S…
$ Incident_Zip                   <int> NA, 10075, 10459, 10455, 11365, 11691, …
$ Incident_Address               <chr> NA, "995 MADISON AVENUE", "1010 BRYANT …
$ Street_Name                    <chr> NA, "MADISON AVENUE", "BRYANT AVENUE", …
$ Cross_Street_1                 <chr> NA, "EAST   77 STREET", NA, "EAST  147 …
$ Cross_Street_2                 <chr> NA, "EAST   78 STREET", NA, "EAST  149
$ Intersection_Street_1          <chr> NA, "EAST   77 STREET", NA, "EAST  147 …
$ Intersection_Street_2          <chr> NA, "EAST   78 STREET", NA, "EAST  149
$ Address_Type                   <fct> NA, ADDRESS, ADDRESS, ADDRESS, ADDRESS,…
$ City                           <fct> NA, NEW YORK, BRONX, BRONX, FRESH MEADO…
$ Landmark                       <chr> NA, "MADISON AVENUE", NA, "UNION AVENUE…
$ Facility_Type                  <fct> NA, NA, NA, NA, NA, NA, N/A, NA, NA, NA…
$ Status                         <fct> Closed, Closed, Closed, Closed, Assigne…
$ Due_Date                       <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Resolution_Description         <fct> "The Department of Education determined…
$ Resolution_Action_Updated_Date <dttm> 2023-03-13 03:21:17, 2023-05-03 02:17:
$ Community_Board                <fct> Unspecified BROOKLYN, 08 MANHATTAN, 02
$ BBL                            <int64> NA, 1013927501, 2027560001, 202582004…
$ Borough                        <fct> BROOKLYN, MANHATTAN, BRONX, BRONX, QUEE…
$ X_Coordinate_State_Plane_      <int> 989013, 994507, 1015090, 1010420, 10388…
$ Y_Coordinate_State_Plane_      <int> 173052, 221687, 239359, 235260, 206195,…
$ Open_Data_Channel_Type         <fct> PHONE, ONLINE, PHONE, MOBILE, UNKNOWN, …
$ Park_Facility_Name             <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Park_Borough                   <fct> BROOKLYN, MANHATTAN, BRONX, BRONX, QUEE…
$ Vehicle_Type                   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Taxi_Company_Borough           <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Taxi_Pick_Up_Location          <chr> NA, "995 MADISON AVENUE, MANHATTAN (NEW…
$ Bridge_Highway_Name            <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Bridge_Highway_Direction       <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Road_Ramp                      <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Bridge_Highway_Segment         <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Latitude                       <dbl> 40.64166, 40.77515, 40.82361, 40.81237,…
$ Longitude                      <dbl> -73.98284, -73.96297, -73.88857, -73.90…
$ Zip_Codes                      <int> 17620, 10092, 10937, 10933, 14507, 2052…
$ Community_Districts            <int> 2, 23, 8, 49, 25, 51, NA, 26, 5, 24, 47…
$ Borough_Boundaries             <int> 2, 4, 5, 5, 3, 3, NA, 3, 2, 5, 4, 5, 3,…
$ City_Council_Districts         <int> 27, 51, 43, 35, 24, 47, NA, 16, 8, 29, …
$ Police_Precincts               <int> 39, 11, 24, 23, 65, 59, NA, 69, 38, 34,…

The R arrow package appears to read the DateTime values as if they are UTC values then prints them in the local time zone.

or in Python with pyarrow or with polars. (Note that the Arrow IPC format is called feather in the pyarrow package.)

>>> import pyarrow.feather as fea
>>> import polars as pl
>>> fea.read_table('./data/311_half_Mar_2023.arrow')
pyarrow.Table
Unique_Key: int32 not null
Created_Date: timestamp[ms] not null
Closed_Date: timestamp[ms]
Agency: dictionary<values=string, indices=int8, ordered=0> not null
Agency_Name: dictionary<values=string, indices=int8, ordered=0> not null
Complaint_Type: dictionary<values=string, indices=int16, ordered=0> not null
Descriptor: string
Location_Type: dictionary<values=string, indices=int8, ordered=0>
Incident_Zip: int32
Incident_Address: string
Street_Name: string
Cross_Street_1: string
Cross_Street_2: string
Intersection_Street_1: string
Intersection_Street_2: string
Address_Type: dictionary<values=string, indices=int8, ordered=0>
City: dictionary<values=string, indices=int8, ordered=0>
Landmark: string
Facility_Type: dictionary<values=string, indices=int8, ordered=0>
Status: dictionary<values=string, indices=int8, ordered=0>
Due_Date: timestamp[ms]
Resolution_Description: dictionary<values=string, indices=int16, ordered=0>
Resolution_Action_Updated_Date: timestamp[ms]
Community_Board: dictionary<values=string, indices=int8, ordered=0> not null
BBL: int64
Borough: dictionary<values=string, indices=int8, ordered=0>
X_Coordinate_State_Plane_: int32
Y_Coordinate_State_Plane_: int32
Open_Data_Channel_Type: dictionary<values=string, indices=int8, ordered=0> not null
Park_Facility_Name: dictionary<values=string, indices=int16, ordered=0>
Park_Borough: dictionary<values=string, indices=int8, ordered=0>
Vehicle_Type: dictionary<values=string, indices=int8, ordered=0>
Taxi_Company_Borough: dictionary<values=string, indices=int8, ordered=0>
Taxi_Pick_Up_Location: string
Bridge_Highway_Name: dictionary<values=string, indices=int8, ordered=0>
Bridge_Highway_Direction: dictionary<values=string, indices=int8, ordered=0>
Road_Ramp: dictionary<values=string, indices=int8, ordered=0>
Bridge_Highway_Segment: dictionary<values=string, indices=int16, ordered=0>
Latitude: double
Longitude: double
Zip_Codes: int16
Community_Districts: int8
Borough_Boundaries: int8
City_Council_Districts: int8
Police_Precincts: int8
----
Unique_Key: [[57000688,56993837,57049479,57056381,57023097,...,57116487,56954525,57132547,57102740,56950149]]
Created_Date: [[2023-03-09 15:05:25.000,2023-03-08 21:27:34.000,2023-03-15 19:26:41.000,2023-03-16 23:58:53.000,2023-03-12 15:15:54.000,...,2023-03-22 17:47:08.000,2023-03-04 16:23:00.000,2023-03-24 21:57:44.000,2023-03-21 05:10:12.000,2023-03-04 00:27:00.000]]
Closed_Date: [[2023-03-13 08:21:11.000,2023-05-03 07:17:20.000,2023-03-21 21:16:53.000,2023-03-17 02:44:31.000,2023-04-19 00:00:00.000,...,null,2023-03-08 10:50:00.000,2023-03-24 23:18:11.000,2023-03-21 05:16:30.000,2023-03-04 10:15:00.000]]
Agency: [  -- dictionary:
["DOE","TLC","HPD","NYPD","DOB",...,"DHS","EDC","DCA","NYC311-PRD","OTI"]  -- indices:
[0,1,2,3,4,...,5,7,3,3,6]]
Agency_Name: [  -- dictionary:
["Department of Education","Taxi and Limousine Commission","Department of Housing Preservation and Development","New York City Police Department","Department of Buildings",...,"Department of Sanitation","Department of Homeless Services","Economic Development Corporation","Department of Consumer Affairs","Office of Technology and Innovation"]  -- indices:
[0,1,2,3,4,...,5,7,3,3,6]]
Complaint_Type: [  -- dictionary:
["School Maintenance","For Hire Vehicle Complaint","UNSANITARY CONDITION","Illegal Parking","Building/Use",...,"Building Drinking Water Tank","Taxi Licensee Complaint","Peeling Paint","Green Taxi Report","Bus Stop Shelter Placement"]  -- indices:
[0,1,2,3,4,...,30,80,3,7,16]]
Descriptor: [["Other School Condition","Driver Complaint - Non Passenger","PESTS","Posted Parking Sign Violation","Illegal Conversion Of Residential Building/Space",...,"For One Address","Lead Kit Request (Residential) (L10)","Commercial Overnight Parking","Loud Music/Party","Post"]]
Location_Type: [  -- dictionary:
["School","Street","RESIDENTIAL BUILDING","Street/Sidewalk",null,...,"Building Entrance","Public Stairs","Cafeteria - Public School","Commercial","Speed Reducer"]  -- indices:
[0,1,2,3,null,...,1,null,3,6,null]]
Incident_Zip: [[null,10075,10459,10455,11365,...,null,null,null,null,null]]
Incident_Address: [[null,"995 MADISON AVENUE","1010 BRYANT AVENUE","541 UNION AVENUE","69-35 166 STREET",...,"209 WEST   93 STREET","220 WEST   16 STREET","38 STERLING AVENUE","243 HENRY STREET",null]]
...
>>> pl.read_ipc('./data/311_half_Mar_2023.arrow', use_pyarrow=True)
shape: (125_826, 45)
┌──────────┬────────────┬───────────┬────────┬───┬────────────┬────────────┬────────────┬────────────┐
│ Unique_K ┆ Created_Da ┆ Closed_Da ┆ Agency ┆ … ┆ Community_ ┆ Borough_Bo ┆ City_Counc ┆ Police_Pre │
│ ey       ┆ te         ┆ te        ┆ ---    ┆   ┆ Districts  ┆ undaries   ┆ il_Distric ┆ cincts     │
---------       ┆ cat    ┆   ┆ ------        ┆ ts         ┆ ---
│ i32      ┆ datetime[m ┆ datetime[ ┆        ┆   ┆ i8         ┆ i8         ┆ ---        ┆ i8         │
│          ┆ s]         ┆ ms]       ┆        ┆   ┆            ┆            ┆ i8         ┆            │
╞══════════╪════════════╪═══════════╪════════╪═══╪════════════╪════════════╪════════════╪════════════╡
570006882023-03-092023-03-1 ┆ DOE    ┆ … ┆ 222739
│          ┆ 15:05:253         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 08:21:11  ┆        ┆   ┆            ┆            ┆            ┆            │
569938372023-03-082023-05-0 ┆ TLC    ┆ … ┆ 2345111
│          ┆ 21:27:343         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 07:17:20  ┆        ┆   ┆            ┆            ┆            ┆            │
570494792023-03-152023-03-2 ┆ HPD    ┆ … ┆ 854324
│          ┆ 19:26:411         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 21:16:53  ┆        ┆   ┆            ┆            ┆            ┆            │
570563812023-03-162023-03-1 ┆ NYPD   ┆ … ┆ 4953523
│          ┆ 23:58:537         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 02:44:31  ┆        ┆   ┆            ┆            ┆            ┆            │
│ …        ┆ …          ┆ …         ┆ …      ┆ … ┆ …          ┆ …          ┆ …          ┆ …          │
569545252023-03-042023-03-0 ┆ DEP    ┆ … ┆ 124106
│          ┆ 16:23:008         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 10:50:00  ┆        ┆   ┆            ┆            ┆            ┆            │
571325472023-03-242023-03-2 ┆ NYPD   ┆ … ┆ 3011476
│          ┆ 21:57:444         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 23:18:11  ┆        ┆   ┆            ┆            ┆            ┆            │
571027402023-03-212023-03-2 ┆ NYPD   ┆ … ┆ 704324
│          ┆ 05:10:121         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 05:16:30  ┆        ┆   ┆            ┆            ┆            ┆            │
569501492023-03-042023-03-0 ┆ DOT    ┆ … ┆ 4131661
│          ┆ 00:27:004         ┆        ┆   ┆            ┆            ┆            ┆            │
│          ┆            ┆ 10:15:00  ┆        ┆   ┆            ┆            ┆            ┆            │
└──────────┴────────────┴───────────┴────────┴───┴────────────┴────────────┴────────────┴────────────┘

The columns that are described as <fct> in the glimpse output from R (and dictionary in the output from fea.read_table and cat (for “categorical”) in the Polars data frame) are factor representations, consisting of a (short) vector of the unique values, called the levels of the factor, and an integer vector of indices into the levels. The conversion of a column of character strings, many of which are repetitions, to a factor or “dictionary encoded” representation, can result in considerable storage saving, at the expense of a trivial amount of processing. For example, storing the Agency_Name as a vector of strings takes about 34 MB but, after dictionary encoding, it takes up about 1/8 MB.

The output from fea.read_table is often the most informative representation of the structure of the table.