using Arrow, CSV, Dates, DataFrames
Creating an Arrow IPC file from the NYC 311 data
Abstract
We show conversion of a CSV file containing data on 311 service calls in New York City during March, 2023 to the Arrow IPC (Inter-Process Communication) file format (also called the Feather V2 format), which can be read quickly and easily into several different data science environments. This conversion is done in Julia but it could equally well have been done in a different language, if that is more convenient for the person doing the conversion.
Reading the CSV file as a DataFrame
First we attach the packages to be used,
The first two lines of the CSV file are
open("./data/311_half_Mar_2023.csv", "r") do io # open the file
println(readline(io)) # read and print the first line
println(readline(io)) # read and print the next line
end; # this open/do/end construction automatically closes the file when done
"Unique.Key","Created.Date","Closed.Date","Agency","Agency.Name","Complaint.Type","Descriptor","Location.Type","Incident.Zip","Incident.Address","Street.Name","Cross.Street.1","Cross.Street.2","Intersection.Street.1","Intersection.Street.2","Address.Type","City","Landmark","Facility.Type","Status","Due.Date","Resolution.Description","Resolution.Action.Updated.Date","Community.Board","BBL","Borough","X.Coordinate..State.Plane.","Y.Coordinate..State.Plane.","Open.Data.Channel.Type","Park.Facility.Name","Park.Borough","Vehicle.Type","Taxi.Company.Borough","Taxi.Pick.Up.Location","Bridge.Highway.Name","Bridge.Highway.Direction","Road.Ramp","Bridge.Highway.Segment","Latitude","Longitude","Location","Zip.Codes","Community.Districts","Borough.Boundaries","City.Council.Districts","Police.Precincts"
57000688,"03/09/2023 03:05:25 PM","03/13/2023 08:21:11 AM","DOE","Department of Education","School Maintenance","Other School Condition","School","","","","","","","","","","","","Closed","","The Department of Education determined that this complaint is not within its jurisdiction because it concerns a private school. Please contact the private school directly.","03/13/2023 08:21:17 AM","Unspecified BROOKLYN",NA,"BROOKLYN",989013,173052,"PHONE","Unspecified","BROOKLYN","","","","","","","",40.6416631014762,-73.9828374867859,"(40.641663101476226, -73.98283748678588)",17620,2,2,27,39
The output from this code block has very long lines. You will need to scroll right in the output to read it all.
Note that the character on the right of the title bar for callout blocks like this, currently a v
-shape for this block, is a toggle to show or collapse the contents of the block. Try clicking on that character to collapse the contents of the block.
In what follows the callout blocks are initially collapsed and you must click on the >
character on the right end of the title bar to show the contents.
We now have enough information, such as the date-time format used, to do an initial conversion of the CSV file to a DataFrame
, using several optional arguments to customize the process. (Details about each of these optional arguments are given in callout blocks below.)
= CSV.read(
df "./data/311_half_Mar_2023.csv", # file name
# output table type
DataFrame, =true, # convert column names to valid symbols
normalizenames=[ # strings that are used to indicate missing values
missingstring"",
"Unspecified",
"unspecified",
"Unknown",
"unknown",
"NA",
],=dateformat"m/d/Y I:M:S p", # DateTimes look like this
dateformat=true, # use smallest type of Int that can represent the column
downcast=true # strip leading and trailing whitespace
stripwhitespace )
Row | Unique_Key | Created_Date | Closed_Date | Agency | Agency_Name | Complaint_Type | Descriptor | Location_Type | Incident_Zip | Incident_Address | Street_Name | Cross_Street_1 | Cross_Street_2 | Intersection_Street_1 | Intersection_Street_2 | Address_Type | City | Landmark | Facility_Type | Status | Due_Date | Resolution_Description | Resolution_Action_Updated_Date | Community_Board | BBL | Borough | X_Coordinate_State_Plane_ | Y_Coordinate_State_Plane_ | Open_Data_Channel_Type | Park_Facility_Name | Park_Borough | Vehicle_Type | Taxi_Company_Borough | Taxi_Pick_Up_Location | Bridge_Highway_Name | Bridge_Highway_Direction | Road_Ramp | Bridge_Highway_Segment | Latitude | Longitude | Location | Zip_Codes | Community_Districts | Borough_Boundaries | City_Council_Districts | Police_Precincts |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int32 | DateTime | DateTime? | String15 | String | String | String? | String? | String7? | String? | String? | String? | String? | String? | String? | String15? | String31? | String? | String15? | String15? | DateTime? | String? | DateTime? | String31 | Int64? | String15? | Int32? | Int32? | String7 | String? | String15? | String15? | String15? | String? | String31? | String? | String? | String? | Float64? | Float64? | String? | Int16? | Int8? | Int8? | Int8? | Int8? | |
1 | 57000688 | 2023-03-09T15:05:25 | 2023-03-13T08:21:11 | DOE | Department of Education | School Maintenance | Other School Condition | School | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | Closed | missing | The Department of Education determined that this complaint is not within its jurisdiction because it concerns a private school. Please contact the private school directly. | 2023-03-13T08:21:17 | Unspecified BROOKLYN | missing | BROOKLYN | 989013 | 173052 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6417 | -73.9828 | (40.641663101476226, -73.98283748678588) | 17620 | 2 | 2 | 27 | 39 |
2 | 56993837 | 2023-03-08T21:27:34 | 2023-05-03T07:17:20 | TLC | Taxi and Limousine Commission | For Hire Vehicle Complaint | Driver Complaint - Non Passenger | Street | 10075 | 995 MADISON AVENUE | MADISON AVENUE | EAST 77 STREET | EAST 78 STREET | EAST 77 STREET | EAST 78 STREET | ADDRESS | NEW YORK | MADISON AVENUE | missing | Closed | missing | The Taxi and Limousine Commission (TLC) reviewed your complaint. The driver or owner pled guilty to a violation and paid a fine. TLC has sent you a written confirmation of this outcome. | 2023-05-03T07:17:24 | 08 MANHATTAN | 1013927501 | MANHATTAN | 994507 | 221687 | ONLINE | missing | MANHATTAN | missing | missing | 995 MADISON AVENUE, MANHATTAN (NEW YORK), NY, 10075 | missing | missing | missing | missing | 40.7751 | -73.963 | (40.775149965980326, -73.96296687549152) | 10092 | 23 | 4 | 51 | 11 |
3 | 57049479 | 2023-03-15T19:26:41 | 2023-03-21T21:16:53 | HPD | Department of Housing Preservation and Development | UNSANITARY CONDITION | PESTS | RESIDENTIAL BUILDING | 10459 | 1010 BRYANT AVENUE | BRYANT AVENUE | missing | missing | missing | missing | ADDRESS | BRONX | missing | missing | Closed | missing | The Department of Housing Preservation and Development inspected the following conditions. No violations were issued. The complaint has been closed. | 2023-03-21T00:00:00 | 02 BRONX | 2027560001 | BRONX | 1015090 | 239359 | PHONE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8236 | -73.8886 | (40.82360689628868, -73.8885703894082) | 10937 | 8 | 5 | 43 | 24 |
4 | 57056381 | 2023-03-16T23:58:53 | 2023-03-17T02:44:31 | NYPD | New York City Police Department | Illegal Parking | Posted Parking Sign Violation | Street/Sidewalk | 10455 | 541 UNION AVENUE | UNION AVENUE | EAST 147 STREET | EAST 149 STREET | EAST 147 STREET | EAST 149 STREET | ADDRESS | BRONX | UNION AVENUE | missing | Closed | missing | The Police Department responded to the complaint and determined that police action was not necessary. | 2023-03-17T02:44:41 | 01 BRONX | 2025820042 | BRONX | 1010420 | 235260 | MOBILE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8124 | -73.9055 | (40.81237137041546, -73.90545977958996) | 10933 | 49 | 5 | 35 | 23 |
5 | 57023097 | 2023-03-12T15:15:54 | 2023-04-19T00:00:00 | DOB | Department of Buildings | Building/Use | Illegal Conversion Of Residential Building/Space | missing | 11365 | 69-35 166 STREET | 166 STREET | missing | missing | missing | missing | ADDRESS | FRESH MEADOWS | missing | missing | Assigned | missing | The Department of Buildings attempted to investigate this complaint but could not gain access to the location. Please schedule an appointment for a follow up inspection by contacting the appropriate unit at Department of Buildings. You can find contact information at https://www1.nyc.gov/site/buildings/about/contact-us.page. | 2023-04-19T00:00:00 | 08 QUEENS | 4069350033 | QUEENS | 1038821 | 206195 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7325 | -73.8031 | (40.732465966173585, -73.80309625279911) | 14507 | 25 | 3 | 24 | 65 |
6 | 56958927 | 2023-03-04T09:14:44 | 2023-03-04T10:21:46 | DPR | Department of Parks and Recreation | Animal in a Park | Injured Wildlife | Park | 11691 | 25 BEACH 28 STREET | BEACH 28 STREET | BEND | SEAGIRT AVENUE | BEND | SEAGIRT AVENUE | ADDRESS | FAR ROCKAWAY | BEACH 28 STREET | missing | Closed | missing | NYC Parks has reviewed the reported condition and corrected the problem. | 2023-03-04T10:21:51 | 14 QUEENS | missing | QUEENS | 1050687 | 155489 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.5932 | -73.7608 | (40.593208594326164, -73.76078103938401) | 20529 | 51 | 3 | 47 | 59 |
7 | 57082890 | 2023-03-19T18:51:07 | 2023-03-20T08:28:00 | DOT | Department of Transportation | Street Condition | Pothole | missing | 11374 | AUSTIN STREET | AUSTIN STREET | 63 DRIVE | 64 ROAD | missing | missing | BLOCKFACE | QUEENS | missing | N/A | Closed | missing | The Department of Transportation inspected this complaint and repaired the problem. | 2023-03-20T08:28:00 | 06 QUEENS | missing | QUEENS | missing | missing | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing |
8 | 57062847 | 2023-03-16T22:39:01 | 2023-03-17T01:10:14 | NYPD | New York City Police Department | Noise - Residential | Banging/Pounding | Residential Building/House | 11364 | 212-04 75 AVENUE | 75 AVENUE | 210 STREET | BELL BOULEVARD | 210 STREET | BELL BOULEVARD | ADDRESS | OAKLAND GARDENS | 75 AVENUE | missing | Closed | missing | The Police Department responded to the complaint but officers were unable to gain entry into the premises. | 2023-03-17T01:10:19 | 11 QUEENS | 4077480500 | QUEENS | 1050720 | 208220 | ONLINE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7379 | -73.7601 | (40.73794265357658, -73.76014232450228) | 14506 | 26 | 3 | 16 | 69 |
9 | 56981582 | 2023-03-07T17:32:00 | 2023-03-07T17:40:00 | DEP | Department of Environmental Protection | Water System | No Water (WNW) | missing | 11234 | 6924 AVENUE L | AVENUE L | E 69 ST | E 70 ST | missing | missing | ADDRESS | BROOKLYN | missing | missing | Closed | missing | The Department of Environmental Protection investigated this complaint and found it to be a temporary condition. | 2023-03-07T17:40:00 | 18 BROOKLYN | 3083560049 | BROOKLYN | 1007660 | 167173 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6255 | -73.9157 | (40.62549684383184, -73.91566718936745) | 13825 | 5 | 2 | 8 | 38 |
10 | 56989554 | 2023-03-08T11:41:47 | 2023-03-12T13:56:42 | HPD | Department of Housing Preservation and Development | HEAT/HOT WATER | APARTMENT ONLY | RESIDENTIAL BUILDING | 10468 | 2286 SEDGWICK AVENUE | SEDGWICK AVENUE | missing | missing | missing | missing | ADDRESS | BRONX | missing | missing | Closed | missing | The Department of Housing Preservation and Development was not able to gain access to your apartment or others in the building to inspect for a lack of heat or hot water. The complaint has been closed. If the condition still exists, please file a new complaint. | 2023-03-12T00:00:00 | 07 BRONX | 2032250149 | BRONX | 1009196 | 253132 | PHONE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8614 | -73.9098 | (40.86142834654178, -73.90981505703196) | 11606 | 24 | 5 | 29 | 34 |
11 | 57152044 | 2023-03-26T17:09:23 | 2023-03-26T22:21:41 | NYPD | New York City Police Department | Illegal Parking | Double Parked Blocking Traffic | Street/Sidewalk | 10034 | 9 HENSHAW STREET | HENSHAW STREET | RIVERSIDE DRIVE | DYCKMAN STREET | RIVERSIDE DRIVE | DYCKMAN STREET | ADDRESS | NEW YORK | HENSHAW STREET | missing | Closed | missing | The Police Department responded to the complaint and took action to fix the condition. | 2023-03-26T22:21:49 | 12 MANHATTAN | 1022460125 | MANHATTAN | 1003624 | 255090 | MOBILE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.8668 | -73.93 | (40.86681644892715, -73.92995331784537) | 13092 | 47 | 4 | 39 | 22 |
12 | 57124879 | 2023-03-24T01:19:51 | 2023-03-24T02:51:56 | NYPD | New York City Police Department | Blocked Driveway | No Access | Street/Sidewalk | 10463 | 2676 HEATH AVENUE | HEATH AVENUE | WEST 193 STREET | WEST KINGSBRIDGE ROAD | WEST 193 STREET | WEST KINGSBRIDGE ROAD | ADDRESS | BRONX | HEATH AVENUE | missing | Closed | missing | The Police Department issued a summons in response to the complaint. | 2023-03-24T02:52:00 | 07 BRONX | 2032400010 | BRONX | 1010303 | 256514 | PHONE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8707 | -73.9058 | (40.87070770667293, -73.90579987814343) | 11272 | 24 | 5 | 29 | 34 |
13 | 57148781 | 2023-03-26T14:14:18 | 2023-03-26T17:05:37 | NYPD | New York City Police Department | Graffiti | Police Report Not Requested | Store/Commercial | 11373 | 45 AVENUE | 45 AVENUE | 45 AVENUE | BROADWAY | 45 AVENUE | BROADWAY | INTERSECTION | missing | missing | missing | Closed | missing | The Police Department responded and upon arrival those responsible for the condition were gone. | 2023-03-26T17:05:43 | 04 QUEENS | missing | QUEENS | 1017225 | 209572 | PHONE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7418 | -73.881 | (40.74184186444486, -73.8810025231725) | 14784 | 66 | 3 | 5 | 68 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
125815 | 56928915 | 2023-03-01T14:00:00 | 2023-03-15T12:51:00 | DOT | Department of Transportation | Street Light Condition | Street Light Out | missing | 11365 | missing | missing | missing | missing | 69 AVENUE | 168 STREET | INTERSECTION | QUEENS | missing | missing | Closed | missing | Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. | 2023-03-15T12:51:00 | 08 QUEENS | missing | QUEENS | 1039328 | 206376 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.733 | -73.8013 | (40.73295962284684, -73.80126541379411) | 14507 | 25 | 3 | 24 | 65 |
125816 | 57028889 | 2023-03-14T03:03:03 | 2023-03-14T06:01:34 | NYPD | New York City Police Department | Illegal Parking | Overnight Commercial Storage | Street/Sidewalk | 11103 | 24-37 49 STREET | 49 STREET | BROOKLYN QUEENS EXPRESSWAY WEST | 25 AVENUE | BROOKLYN QUEENS EXPRESSWAY WEST | 25 AVENUE | ADDRESS | ASTORIA | 49 STREET | missing | Closed | missing | The Police Department responded to the complaint and took action to fix the condition. | 2023-03-14T06:01:39 | 01 QUEENS | 4010160023 | QUEENS | 1010893 | 218162 | ONLINE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7654 | -73.9038 | (40.765440563155224, -73.9038188686141) | 16860 | 39 | 3 | 4 | 72 |
125817 | 56977801 | 2023-03-07T12:04:41 | 2023-03-07T14:09:09 | NYPD | New York City Police Department | Blocked Driveway | No Access | Street/Sidewalk | 11206 | 342 TEN EYCK STREET | TEN EYCK STREET | WATERBURY STREET | BOGART STREET | WATERBURY STREET | BOGART STREET | ADDRESS | BROOKLYN | TEN EYCK STREET | missing | Closed | missing | The Police Department responded to the complaint and determined that police action was not necessary. | 2023-03-07T14:09:13 | 01 BROOKLYN | 3030290027 | BROOKLYN | 1001939 | 198530 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.7116 | -73.9362 | (40.71157805562249, -73.93619438372028) | 17213 | 36 | 2 | 30 | 56 |
125818 | 56933591 | 2023-03-01T16:27:10 | 2023-03-05T17:32:57 | HPD | Department of Housing Preservation and Development | UNSANITARY CONDITION | MOLD | RESIDENTIAL BUILDING | 11216 | 1236 PACIFIC STREET | PACIFIC STREET | missing | missing | missing | missing | ADDRESS | BROOKLYN | missing | missing | Closed | missing | The following complaint conditions are still open. HPD may attempt to contact you to verify the correction of the condition or may conduct an inspection. | 2023-03-05T00:00:00 | 08 BROOKLYN | 3012060031 | BROOKLYN | 997897 | 186188 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6777 | -73.9508 | (40.67770929012322, -73.95079919734158) | 17618 | 16 | 2 | 49 | 49 |
125819 | 57054603 | 2023-03-15T06:49:34 | 2023-03-15T08:54:02 | NYPD | New York City Police Department | Blocked Driveway | No Access | Street/Sidewalk | 11419 | 115-02 ATLANTIC AVENUE | ATLANTIC AVENUE | 115 STREET | 116 STREET | 115 STREET | 116 STREET | ADDRESS | SOUTH RICHMOND HILL | ATLANTIC AVENUE | missing | Closed | missing | The Police Department responded and upon arrival those responsible for the condition were gone. | 2023-03-15T08:54:09 | 09 QUEENS | 4094010001 | QUEENS | 1031061 | 191855 | ONLINE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.6932 | -73.8312 | (40.69315062980763, -73.83119561317962) | 24014 | 46 | 3 | 46 | 60 |
125820 | 57199244 | 2023-03-31T16:46:20 | missing | DOHMH | Department of Health and Mental Hygiene | Construction Lead Dust | In Common Areas | 3+ Family Apartment Building | 11215 | 161 PROSPECT PARK WEST | PROSPECT PARK WEST | 10 STREET | 11 STREET | 10 STREET | 11 STREET | ADDRESS | BROOKLYN | PROSPECT PARK WEST | missing | In Progress | missing | missing | missing | 06 BROOKLYN | 3010950050 | BROOKLYN | 990416 | 181008 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6635 | -73.9778 | (40.66349972132597, -73.97777478483737) | missing | 14 | 2 | 27 | 50 |
125821 | 57199802 | 2023-03-31T11:15:46 | 2023-04-03T00:00:00 | DOB | Department of Buildings | Boilers | Boiler - Defective/Inoperative/No Permit | missing | 10301 | 631 HOWARD AVENUE | HOWARD AVENUE | missing | missing | missing | missing | ADDRESS | STATEN ISLAND | missing | N/A | Closed | missing | The Department of Buildings investigated this complaint and issued an Office of Administrative Trials and Hearings (OATH) summons. | 2023-04-03T00:00:00 | 01 STATEN ISLAND | 5006200001 | STATEN ISLAND | 957806 | 163766 | UNKNOWN | missing | STATEN ISLAND | missing | missing | missing | missing | missing | missing | missing | 40.6161 | -74.0952 | (40.61613680750325, -74.0952492197216) | 10369 | 4 | 1 | 13 | 74 |
125822 | 57116487 | 2023-03-22T17:47:08 | missing | DPR | Department of Parks and Recreation | New Tree Request | For One Address | Street | 10025 | 209 WEST 93 STREET | WEST 93 STREET | AMSTERDAM AVENUE | BROADWAY | AMSTERDAM AVENUE | BROADWAY | ADDRESS | NEW YORK | WEST 93 STREET | missing | In Progress | missing | NYC Parks will inspect the site to determine if it is suitable for a new street tree, including a review of potential conflicts with other infrastructure. If the site is found to be suitable, a tree will be planted during the next available planting season. You may review recently completed and upcoming street tree plantings in your neighborhood by visiting the NYC Parks Tree Work Hub at nyc.gov/parks/treework. | 2023-04-12T09:02:03 | 07 MANHATTAN | 1012410025 | MANHATTAN | 991932 | 227949 | MOBILE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.7923 | -73.9723 | (40.792340101747136, -73.97225680841004) | 12422 | 20 | 4 | 19 | 15 |
125823 | 56954525 | 2023-03-04T16:23:00 | 2023-03-08T10:50:00 | DEP | Department of Environmental Protection | Lead | Lead Kit Request (Residential) (L10) | missing | 10011 | 220 WEST 16 STREET | WEST 16 STREET | 7 AVE | 8 AVE | missing | missing | ADDRESS | NEW YORK | missing | N/A | Closed | missing | The Department of Environmental Protection (DEP) mailed you the free lead test kit you requested. The kit includes instructions for collecting a water sample and contact information for DEP's Lead Program Unit if you have any questions. Please use the included pre-paid label to mail the sample to the lab for analysis. It may take the lab up to 30 days to complete their analysis and mail you the test results. This Service Request is closed and no further status will be available from 311. | 2023-03-08T10:50:00 | 04 MANHATTAN | 1007650055 | MANHATTAN | 984375 | 208960 | ONLINE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.7402 | -73.9995 | (40.74022355178869, -73.99954892134235) | 12074 | 12 | 4 | 10 | 6 |
125824 | 57132547 | 2023-03-24T21:57:44 | 2023-03-24T23:18:11 | NYPD | New York City Police Department | Illegal Parking | Commercial Overnight Parking | Street/Sidewalk | 10306 | 38 STERLING AVENUE | STERLING AVENUE | 10 STREET | CLAWSON STREET | 10 STREET | CLAWSON STREET | ADDRESS | STATEN ISLAND | STERLING AVENUE | missing | Closed | missing | The Police Department responded and upon arrival those responsible for the condition were gone. | 2023-03-24T23:18:15 | 02 STATEN ISLAND | missing | STATEN ISLAND | 952271 | 146897 | PHONE | missing | STATEN ISLAND | missing | missing | missing | missing | missing | missing | missing | 40.5698 | -74.1151 | (40.56981658187845, -74.11510599391917) | 10693 | 30 | 1 | 14 | 76 |
125825 | 57102740 | 2023-03-21T05:10:12 | 2023-03-21T05:16:30 | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | 10002 | 243 HENRY STREET | HENRY STREET | CLINTON STREET | MONTGOMERY STREET | CLINTON STREET | MONTGOMERY STREET | ADDRESS | NEW YORK | HENRY STREET | missing | Closed | missing | The Police Department responded to the complaint and took action to fix the condition. | 2023-03-21T05:16:37 | 03 MANHATTAN | 1002860018 | MANHATTAN | 988256 | 199320 | ONLINE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.7138 | -73.9855 | (40.71376319312491, -73.98554956996014) | 11723 | 70 | 4 | 32 | 4 |
125826 | 56950149 | 2023-03-04T00:27:00 | 2023-03-04T10:15:00 | DOT | Department of Transportation | Traffic Signal Condition | Post | missing | 11423 | missing | missing | missing | missing | JAMAICA AVENUE | 201 STREET | INTERSECTION | QUEENS | missing | N/A | Closed | missing | Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. | 2023-03-04T10:15:00 | 12 QUEENS | missing | QUEENS | 1051269 | 199315 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7135 | -73.7582 | (40.7134965426944, -73.758249944708) | 24332 | 41 | 3 | 16 | 61 |
Convert the column names from the CSV file into a form that can be parsed as a Symbol
in Julia
. In particular, strip whitespace at the beginning and end of the name and replace embedded blanks or .
characters by underscores.
Columns of a DataFrame (and other type of tables) are usually extracted as “properties” of the table, either as, e.g. getproperty(df, :Unique_Key)
, or, more commonly, with the dot operator as, e.g. df.Unique_Key
. The expression :Unique_Key
or the right operand of df.Unique_Key
evaluates to a Symbol. If the column name has an embedded blank or .
this form cannot be used and the name must be quoted, either as Symbol("Unique.Key")
or var"Unique.Key"
.
Thus it is an advantage to rewrite the column names so that each name can be used “as is” with the dot operator.
Enumerate the various forms of missing data indicators used in the file. Completing this list is often a process of trial-and-error.
Provide a Dates.DateFormat
object that will be applied to text columns if they match the pattern, converting to DateTime
values.
The construction dateformat"..."
is a call to a “string macro” in Julia. That is, it is just a shorthand for replacing the string with the Dates.DateFormat
object created from the string.
For columns of integer values, choose the smallest integer type, such as Int8
, Int16
, etc., that can represent all the values in the column.
Validity checking
Review the storage type for each column
In the output (remember, you must scroll horizontally to see it all) there is a description of the type of the elements of the column directly under the column name. A type name that ends in ?
indicates that missing values are present in the column. Thus, for example, the Unique_Key
is designated Int32
meaning the values are stored as 32-bit signed integers with no missing values. The Created_Date
column, designated DateTime
, is a DateTime
column with no missing values but the Closed_Date
, designated DateTime?
, is also a DateTime
, but can (and does) have missing values.
Some columns are designated as String
whereas others are designated as String7
or String15
. These are storage-saving representations used when the whole column consists of short character strings.
There is an anomaly in these short strings for the Agency
column. The Agency
acronyms should be 5 characters at most, which would be stored as String7
, but they are stored as String15
.
To see why, we check the unique combinations of the Agency
and Agency_Name
columns
unique(select(df, :Agency, :Agency_Name))
Row | Agency | Agency_Name |
---|---|---|
String15 | String | |
1 | DOE | Department of Education |
2 | TLC | Taxi and Limousine Commission |
3 | HPD | Department of Housing Preservation and Development |
4 | NYPD | New York City Police Department |
5 | DOB | Department of Buildings |
6 | DPR | Department of Parks and Recreation |
7 | DOT | Department of Transportation |
8 | DEP | Department of Environmental Protection |
9 | DOHMH | Department of Health and Mental Hygiene |
10 | DSNY | Department of Sanitation |
11 | DHS | Department of Homeless Services |
12 | EDC | Economic Development Corporation |
13 | DCA | Department of Consumer Affairs |
14 | NYC311-PRD | Department of Environmental Protection |
15 | OTI | Office of Technology and Innovation |
These columns should be in one-to-one correspondence but they are not.
At this point we would need to check with the data providers to determine why two different Agency
designations are used for the “Department of Environmental Protection”.
The other problem shown in the types is that the values in the Incident_Zip
column are stored as String7?
when they should be Int32?
.
Checking the unique values
sort(unique(df.Incident_Zip))
227-element Vector{Union{Missing, String7}}:
"02062"
"07002"
"10000"
"10001"
"10002"
"10003"
"10004"
"10005"
"10006"
"10007"
"10009"
"10010"
"10011"
⋮
"11693"
"11694"
"11697"
"12212"
"12345"
"19034"
"21031"
"75007"
"78758"
"82001"
"na"
missing
shows the problem is that someone got creative and used na
as a missing value indicator.
Recreate the DataFrame with "na"
in the missingstring
vector
= CSV.read(
df "./data/311_half_Mar_2023.csv", # file name
# output table type
DataFrame, =true, # convert column names to valid symbols
normalizenames=[ # strings that are used to indicate missing vals
missingstring"",
"Unspecified",
"unspecified",
"Unknown",
"unknown",
"NA",
"na",
],=dateformat"m/d/Y I:M:S p", # DateTimes look like this
dateformat=true, # use smallest type of Int that can represent the column
downcast=true # strip leading and trailing whitespace
stripwhitespace )
Row | Unique_Key | Created_Date | Closed_Date | Agency | Agency_Name | Complaint_Type | Descriptor | Location_Type | Incident_Zip | Incident_Address | Street_Name | Cross_Street_1 | Cross_Street_2 | Intersection_Street_1 | Intersection_Street_2 | Address_Type | City | Landmark | Facility_Type | Status | Due_Date | Resolution_Description | Resolution_Action_Updated_Date | Community_Board | BBL | Borough | X_Coordinate_State_Plane_ | Y_Coordinate_State_Plane_ | Open_Data_Channel_Type | Park_Facility_Name | Park_Borough | Vehicle_Type | Taxi_Company_Borough | Taxi_Pick_Up_Location | Bridge_Highway_Name | Bridge_Highway_Direction | Road_Ramp | Bridge_Highway_Segment | Latitude | Longitude | Location | Zip_Codes | Community_Districts | Borough_Boundaries | City_Council_Districts | Police_Precincts |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int32 | DateTime | DateTime? | String15 | String | String | String? | String? | Int32? | String? | String? | String? | String? | String? | String? | String15? | String31? | String? | String15? | String15? | DateTime? | String? | DateTime? | String31 | Int64? | String15? | Int32? | Int32? | String7 | String? | String15? | String15? | String15? | String? | String31? | String? | String? | String? | Float64? | Float64? | String? | Int16? | Int8? | Int8? | Int8? | Int8? | |
1 | 57000688 | 2023-03-09T15:05:25 | 2023-03-13T08:21:11 | DOE | Department of Education | School Maintenance | Other School Condition | School | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | Closed | missing | The Department of Education determined that this complaint is not within its jurisdiction because it concerns a private school. Please contact the private school directly. | 2023-03-13T08:21:17 | Unspecified BROOKLYN | missing | BROOKLYN | 989013 | 173052 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6417 | -73.9828 | (40.641663101476226, -73.98283748678588) | 17620 | 2 | 2 | 27 | 39 |
2 | 56993837 | 2023-03-08T21:27:34 | 2023-05-03T07:17:20 | TLC | Taxi and Limousine Commission | For Hire Vehicle Complaint | Driver Complaint - Non Passenger | Street | 10075 | 995 MADISON AVENUE | MADISON AVENUE | EAST 77 STREET | EAST 78 STREET | EAST 77 STREET | EAST 78 STREET | ADDRESS | NEW YORK | MADISON AVENUE | missing | Closed | missing | The Taxi and Limousine Commission (TLC) reviewed your complaint. The driver or owner pled guilty to a violation and paid a fine. TLC has sent you a written confirmation of this outcome. | 2023-05-03T07:17:24 | 08 MANHATTAN | 1013927501 | MANHATTAN | 994507 | 221687 | ONLINE | missing | MANHATTAN | missing | missing | 995 MADISON AVENUE, MANHATTAN (NEW YORK), NY, 10075 | missing | missing | missing | missing | 40.7751 | -73.963 | (40.775149965980326, -73.96296687549152) | 10092 | 23 | 4 | 51 | 11 |
3 | 57049479 | 2023-03-15T19:26:41 | 2023-03-21T21:16:53 | HPD | Department of Housing Preservation and Development | UNSANITARY CONDITION | PESTS | RESIDENTIAL BUILDING | 10459 | 1010 BRYANT AVENUE | BRYANT AVENUE | missing | missing | missing | missing | ADDRESS | BRONX | missing | missing | Closed | missing | The Department of Housing Preservation and Development inspected the following conditions. No violations were issued. The complaint has been closed. | 2023-03-21T00:00:00 | 02 BRONX | 2027560001 | BRONX | 1015090 | 239359 | PHONE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8236 | -73.8886 | (40.82360689628868, -73.8885703894082) | 10937 | 8 | 5 | 43 | 24 |
4 | 57056381 | 2023-03-16T23:58:53 | 2023-03-17T02:44:31 | NYPD | New York City Police Department | Illegal Parking | Posted Parking Sign Violation | Street/Sidewalk | 10455 | 541 UNION AVENUE | UNION AVENUE | EAST 147 STREET | EAST 149 STREET | EAST 147 STREET | EAST 149 STREET | ADDRESS | BRONX | UNION AVENUE | missing | Closed | missing | The Police Department responded to the complaint and determined that police action was not necessary. | 2023-03-17T02:44:41 | 01 BRONX | 2025820042 | BRONX | 1010420 | 235260 | MOBILE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8124 | -73.9055 | (40.81237137041546, -73.90545977958996) | 10933 | 49 | 5 | 35 | 23 |
5 | 57023097 | 2023-03-12T15:15:54 | 2023-04-19T00:00:00 | DOB | Department of Buildings | Building/Use | Illegal Conversion Of Residential Building/Space | missing | 11365 | 69-35 166 STREET | 166 STREET | missing | missing | missing | missing | ADDRESS | FRESH MEADOWS | missing | missing | Assigned | missing | The Department of Buildings attempted to investigate this complaint but could not gain access to the location. Please schedule an appointment for a follow up inspection by contacting the appropriate unit at Department of Buildings. You can find contact information at https://www1.nyc.gov/site/buildings/about/contact-us.page. | 2023-04-19T00:00:00 | 08 QUEENS | 4069350033 | QUEENS | 1038821 | 206195 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7325 | -73.8031 | (40.732465966173585, -73.80309625279911) | 14507 | 25 | 3 | 24 | 65 |
6 | 56958927 | 2023-03-04T09:14:44 | 2023-03-04T10:21:46 | DPR | Department of Parks and Recreation | Animal in a Park | Injured Wildlife | Park | 11691 | 25 BEACH 28 STREET | BEACH 28 STREET | BEND | SEAGIRT AVENUE | BEND | SEAGIRT AVENUE | ADDRESS | FAR ROCKAWAY | BEACH 28 STREET | missing | Closed | missing | NYC Parks has reviewed the reported condition and corrected the problem. | 2023-03-04T10:21:51 | 14 QUEENS | missing | QUEENS | 1050687 | 155489 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.5932 | -73.7608 | (40.593208594326164, -73.76078103938401) | 20529 | 51 | 3 | 47 | 59 |
7 | 57082890 | 2023-03-19T18:51:07 | 2023-03-20T08:28:00 | DOT | Department of Transportation | Street Condition | Pothole | missing | 11374 | AUSTIN STREET | AUSTIN STREET | 63 DRIVE | 64 ROAD | missing | missing | BLOCKFACE | QUEENS | missing | N/A | Closed | missing | The Department of Transportation inspected this complaint and repaired the problem. | 2023-03-20T08:28:00 | 06 QUEENS | missing | QUEENS | missing | missing | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing | missing |
8 | 57062847 | 2023-03-16T22:39:01 | 2023-03-17T01:10:14 | NYPD | New York City Police Department | Noise - Residential | Banging/Pounding | Residential Building/House | 11364 | 212-04 75 AVENUE | 75 AVENUE | 210 STREET | BELL BOULEVARD | 210 STREET | BELL BOULEVARD | ADDRESS | OAKLAND GARDENS | 75 AVENUE | missing | Closed | missing | The Police Department responded to the complaint but officers were unable to gain entry into the premises. | 2023-03-17T01:10:19 | 11 QUEENS | 4077480500 | QUEENS | 1050720 | 208220 | ONLINE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7379 | -73.7601 | (40.73794265357658, -73.76014232450228) | 14506 | 26 | 3 | 16 | 69 |
9 | 56981582 | 2023-03-07T17:32:00 | 2023-03-07T17:40:00 | DEP | Department of Environmental Protection | Water System | No Water (WNW) | missing | 11234 | 6924 AVENUE L | AVENUE L | E 69 ST | E 70 ST | missing | missing | ADDRESS | BROOKLYN | missing | missing | Closed | missing | The Department of Environmental Protection investigated this complaint and found it to be a temporary condition. | 2023-03-07T17:40:00 | 18 BROOKLYN | 3083560049 | BROOKLYN | 1007660 | 167173 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6255 | -73.9157 | (40.62549684383184, -73.91566718936745) | 13825 | 5 | 2 | 8 | 38 |
10 | 56989554 | 2023-03-08T11:41:47 | 2023-03-12T13:56:42 | HPD | Department of Housing Preservation and Development | HEAT/HOT WATER | APARTMENT ONLY | RESIDENTIAL BUILDING | 10468 | 2286 SEDGWICK AVENUE | SEDGWICK AVENUE | missing | missing | missing | missing | ADDRESS | BRONX | missing | missing | Closed | missing | The Department of Housing Preservation and Development was not able to gain access to your apartment or others in the building to inspect for a lack of heat or hot water. The complaint has been closed. If the condition still exists, please file a new complaint. | 2023-03-12T00:00:00 | 07 BRONX | 2032250149 | BRONX | 1009196 | 253132 | PHONE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8614 | -73.9098 | (40.86142834654178, -73.90981505703196) | 11606 | 24 | 5 | 29 | 34 |
11 | 57152044 | 2023-03-26T17:09:23 | 2023-03-26T22:21:41 | NYPD | New York City Police Department | Illegal Parking | Double Parked Blocking Traffic | Street/Sidewalk | 10034 | 9 HENSHAW STREET | HENSHAW STREET | RIVERSIDE DRIVE | DYCKMAN STREET | RIVERSIDE DRIVE | DYCKMAN STREET | ADDRESS | NEW YORK | HENSHAW STREET | missing | Closed | missing | The Police Department responded to the complaint and took action to fix the condition. | 2023-03-26T22:21:49 | 12 MANHATTAN | 1022460125 | MANHATTAN | 1003624 | 255090 | MOBILE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.8668 | -73.93 | (40.86681644892715, -73.92995331784537) | 13092 | 47 | 4 | 39 | 22 |
12 | 57124879 | 2023-03-24T01:19:51 | 2023-03-24T02:51:56 | NYPD | New York City Police Department | Blocked Driveway | No Access | Street/Sidewalk | 10463 | 2676 HEATH AVENUE | HEATH AVENUE | WEST 193 STREET | WEST KINGSBRIDGE ROAD | WEST 193 STREET | WEST KINGSBRIDGE ROAD | ADDRESS | BRONX | HEATH AVENUE | missing | Closed | missing | The Police Department issued a summons in response to the complaint. | 2023-03-24T02:52:00 | 07 BRONX | 2032400010 | BRONX | 1010303 | 256514 | PHONE | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8707 | -73.9058 | (40.87070770667293, -73.90579987814343) | 11272 | 24 | 5 | 29 | 34 |
13 | 57148781 | 2023-03-26T14:14:18 | 2023-03-26T17:05:37 | NYPD | New York City Police Department | Graffiti | Police Report Not Requested | Store/Commercial | 11373 | 45 AVENUE | 45 AVENUE | 45 AVENUE | BROADWAY | 45 AVENUE | BROADWAY | INTERSECTION | missing | missing | missing | Closed | missing | The Police Department responded and upon arrival those responsible for the condition were gone. | 2023-03-26T17:05:43 | 04 QUEENS | missing | QUEENS | 1017225 | 209572 | PHONE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7418 | -73.881 | (40.74184186444486, -73.8810025231725) | 14784 | 66 | 3 | 5 | 68 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
125815 | 56928915 | 2023-03-01T14:00:00 | 2023-03-15T12:51:00 | DOT | Department of Transportation | Street Light Condition | Street Light Out | missing | missing | missing | missing | missing | missing | 69 AVENUE | 168 STREET | INTERSECTION | QUEENS | missing | missing | Closed | missing | Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. | 2023-03-15T12:51:00 | 08 QUEENS | missing | QUEENS | 1039328 | 206376 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.733 | -73.8013 | (40.73295962284684, -73.80126541379411) | 14507 | 25 | 3 | 24 | 65 |
125816 | 57028889 | 2023-03-14T03:03:03 | 2023-03-14T06:01:34 | NYPD | New York City Police Department | Illegal Parking | Overnight Commercial Storage | Street/Sidewalk | missing | 24-37 49 STREET | 49 STREET | BROOKLYN QUEENS EXPRESSWAY WEST | 25 AVENUE | BROOKLYN QUEENS EXPRESSWAY WEST | 25 AVENUE | ADDRESS | ASTORIA | 49 STREET | missing | Closed | missing | The Police Department responded to the complaint and took action to fix the condition. | 2023-03-14T06:01:39 | 01 QUEENS | 4010160023 | QUEENS | 1010893 | 218162 | ONLINE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7654 | -73.9038 | (40.765440563155224, -73.9038188686141) | 16860 | 39 | 3 | 4 | 72 |
125817 | 56977801 | 2023-03-07T12:04:41 | 2023-03-07T14:09:09 | NYPD | New York City Police Department | Blocked Driveway | No Access | Street/Sidewalk | missing | 342 TEN EYCK STREET | TEN EYCK STREET | WATERBURY STREET | BOGART STREET | WATERBURY STREET | BOGART STREET | ADDRESS | BROOKLYN | TEN EYCK STREET | missing | Closed | missing | The Police Department responded to the complaint and determined that police action was not necessary. | 2023-03-07T14:09:13 | 01 BROOKLYN | 3030290027 | BROOKLYN | 1001939 | 198530 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.7116 | -73.9362 | (40.71157805562249, -73.93619438372028) | 17213 | 36 | 2 | 30 | 56 |
125818 | 56933591 | 2023-03-01T16:27:10 | 2023-03-05T17:32:57 | HPD | Department of Housing Preservation and Development | UNSANITARY CONDITION | MOLD | RESIDENTIAL BUILDING | missing | 1236 PACIFIC STREET | PACIFIC STREET | missing | missing | missing | missing | ADDRESS | BROOKLYN | missing | missing | Closed | missing | The following complaint conditions are still open. HPD may attempt to contact you to verify the correction of the condition or may conduct an inspection. | 2023-03-05T00:00:00 | 08 BROOKLYN | 3012060031 | BROOKLYN | 997897 | 186188 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6777 | -73.9508 | (40.67770929012322, -73.95079919734158) | 17618 | 16 | 2 | 49 | 49 |
125819 | 57054603 | 2023-03-15T06:49:34 | 2023-03-15T08:54:02 | NYPD | New York City Police Department | Blocked Driveway | No Access | Street/Sidewalk | missing | 115-02 ATLANTIC AVENUE | ATLANTIC AVENUE | 115 STREET | 116 STREET | 115 STREET | 116 STREET | ADDRESS | SOUTH RICHMOND HILL | ATLANTIC AVENUE | missing | Closed | missing | The Police Department responded and upon arrival those responsible for the condition were gone. | 2023-03-15T08:54:09 | 09 QUEENS | 4094010001 | QUEENS | 1031061 | 191855 | ONLINE | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.6932 | -73.8312 | (40.69315062980763, -73.83119561317962) | 24014 | 46 | 3 | 46 | 60 |
125820 | 57199244 | 2023-03-31T16:46:20 | missing | DOHMH | Department of Health and Mental Hygiene | Construction Lead Dust | In Common Areas | 3+ Family Apartment Building | missing | 161 PROSPECT PARK WEST | PROSPECT PARK WEST | 10 STREET | 11 STREET | 10 STREET | 11 STREET | ADDRESS | BROOKLYN | PROSPECT PARK WEST | missing | In Progress | missing | missing | missing | 06 BROOKLYN | 3010950050 | BROOKLYN | 990416 | 181008 | PHONE | missing | BROOKLYN | missing | missing | missing | missing | missing | missing | missing | 40.6635 | -73.9778 | (40.66349972132597, -73.97777478483737) | missing | 14 | 2 | 27 | 50 |
125821 | 57199802 | 2023-03-31T11:15:46 | 2023-04-03T00:00:00 | DOB | Department of Buildings | Boilers | Boiler - Defective/Inoperative/No Permit | missing | missing | 631 HOWARD AVENUE | HOWARD AVENUE | missing | missing | missing | missing | ADDRESS | STATEN ISLAND | missing | N/A | Closed | missing | The Department of Buildings investigated this complaint and issued an Office of Administrative Trials and Hearings (OATH) summons. | 2023-04-03T00:00:00 | 01 STATEN ISLAND | 5006200001 | STATEN ISLAND | 957806 | 163766 | UNKNOWN | missing | STATEN ISLAND | missing | missing | missing | missing | missing | missing | missing | 40.6161 | -74.0952 | (40.61613680750325, -74.0952492197216) | 10369 | 4 | 1 | 13 | 74 |
125822 | 57116487 | 2023-03-22T17:47:08 | missing | DPR | Department of Parks and Recreation | New Tree Request | For One Address | Street | missing | 209 WEST 93 STREET | WEST 93 STREET | AMSTERDAM AVENUE | BROADWAY | AMSTERDAM AVENUE | BROADWAY | ADDRESS | NEW YORK | WEST 93 STREET | missing | In Progress | missing | NYC Parks will inspect the site to determine if it is suitable for a new street tree, including a review of potential conflicts with other infrastructure. If the site is found to be suitable, a tree will be planted during the next available planting season. You may review recently completed and upcoming street tree plantings in your neighborhood by visiting the NYC Parks Tree Work Hub at nyc.gov/parks/treework. | 2023-04-12T09:02:03 | 07 MANHATTAN | 1012410025 | MANHATTAN | 991932 | 227949 | MOBILE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.7923 | -73.9723 | (40.792340101747136, -73.97225680841004) | 12422 | 20 | 4 | 19 | 15 |
125823 | 56954525 | 2023-03-04T16:23:00 | 2023-03-08T10:50:00 | DEP | Department of Environmental Protection | Lead | Lead Kit Request (Residential) (L10) | missing | missing | 220 WEST 16 STREET | WEST 16 STREET | 7 AVE | 8 AVE | missing | missing | ADDRESS | NEW YORK | missing | N/A | Closed | missing | The Department of Environmental Protection (DEP) mailed you the free lead test kit you requested. The kit includes instructions for collecting a water sample and contact information for DEP's Lead Program Unit if you have any questions. Please use the included pre-paid label to mail the sample to the lab for analysis. It may take the lab up to 30 days to complete their analysis and mail you the test results. This Service Request is closed and no further status will be available from 311. | 2023-03-08T10:50:00 | 04 MANHATTAN | 1007650055 | MANHATTAN | 984375 | 208960 | ONLINE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.7402 | -73.9995 | (40.74022355178869, -73.99954892134235) | 12074 | 12 | 4 | 10 | 6 |
125824 | 57132547 | 2023-03-24T21:57:44 | 2023-03-24T23:18:11 | NYPD | New York City Police Department | Illegal Parking | Commercial Overnight Parking | Street/Sidewalk | missing | 38 STERLING AVENUE | STERLING AVENUE | 10 STREET | CLAWSON STREET | 10 STREET | CLAWSON STREET | ADDRESS | STATEN ISLAND | STERLING AVENUE | missing | Closed | missing | The Police Department responded and upon arrival those responsible for the condition were gone. | 2023-03-24T23:18:15 | 02 STATEN ISLAND | missing | STATEN ISLAND | 952271 | 146897 | PHONE | missing | STATEN ISLAND | missing | missing | missing | missing | missing | missing | missing | 40.5698 | -74.1151 | (40.56981658187845, -74.11510599391917) | 10693 | 30 | 1 | 14 | 76 |
125825 | 57102740 | 2023-03-21T05:10:12 | 2023-03-21T05:16:30 | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | missing | 243 HENRY STREET | HENRY STREET | CLINTON STREET | MONTGOMERY STREET | CLINTON STREET | MONTGOMERY STREET | ADDRESS | NEW YORK | HENRY STREET | missing | Closed | missing | The Police Department responded to the complaint and took action to fix the condition. | 2023-03-21T05:16:37 | 03 MANHATTAN | 1002860018 | MANHATTAN | 988256 | 199320 | ONLINE | missing | MANHATTAN | missing | missing | missing | missing | missing | missing | missing | 40.7138 | -73.9855 | (40.71376319312491, -73.98554956996014) | 11723 | 70 | 4 | 32 | 4 |
125826 | 56950149 | 2023-03-04T00:27:00 | 2023-03-04T10:15:00 | DOT | Department of Transportation | Traffic Signal Condition | Post | missing | missing | missing | missing | missing | missing | JAMAICA AVENUE | 201 STREET | INTERSECTION | QUEENS | missing | N/A | Closed | missing | Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. | 2023-03-04T10:15:00 | 12 QUEENS | missing | QUEENS | 1051269 | 199315 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7135 | -73.7582 | (40.7134965426944, -73.758249944708) | 24332 | 41 | 3 | 16 | 61 |
Further checks
Are the Unique_Key
values unique?
length(unique(df.Unique_Key)) == nrow(df)
true
Are the Created_Date
values all from March, 2023?
extrema(df.Created_Date)
(DateTime("2023-03-01T00:00:27"), DateTime("2023-03-31T23:44:52"))
Are the Closed_Date
values in the expected range?
extrema(skipmissing(df.Closed_Date))
(DateTime("2022-11-30T17:42:00"), DateTime("2023-05-23T22:41:42"))
There are missing values in the Closed_Date
column which must be skipped when evaluating the extremes.
Now we can see a problem with the record giving the earliest Closed_Date
because it is before the Created_Date
range.
We save the indicators of rows with early closing or immediate closing for later manual checking.
= @. !ismissing(df.Closed_Date) && (df.Closed_Date < df.Created_Date)
earlyclosing = @. !ismissing(df.Closed_Date) && (df.Closed_Date == df.Created_Date)
immediateclosing = count(earlyclosing), immediate = count(immediateclosing)) (early
(early = 301, immediate = 4279)
@.
do?
Dot vectorization in Julia provides “syntactic sugar” to vectorize a scalar operation. Prefacing an expression with @.
applies dot vectorization to all the function calls and operators in the expression.
If we want to examine the rows with these characteristics we index the rows with the BitVector
created from the logical expression. To see the first 3 rows with early closings
= df[earlyclosing, :]
earlyfrm first(earlyfrm, 3)
Row | Unique_Key | Created_Date | Closed_Date | Agency | Agency_Name | Complaint_Type | Descriptor | Location_Type | Incident_Zip | Incident_Address | Street_Name | Cross_Street_1 | Cross_Street_2 | Intersection_Street_1 | Intersection_Street_2 | Address_Type | City | Landmark | Facility_Type | Status | Due_Date | Resolution_Description | Resolution_Action_Updated_Date | Community_Board | BBL | Borough | X_Coordinate_State_Plane_ | Y_Coordinate_State_Plane_ | Open_Data_Channel_Type | Park_Facility_Name | Park_Borough | Vehicle_Type | Taxi_Company_Borough | Taxi_Pick_Up_Location | Bridge_Highway_Name | Bridge_Highway_Direction | Road_Ramp | Bridge_Highway_Segment | Latitude | Longitude | Location | Zip_Codes | Community_Districts | Borough_Boundaries | City_Council_Districts | Police_Precincts |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int32 | DateTime | DateTime? | String15 | String | String | String? | String? | Int32? | String? | String? | String? | String? | String? | String? | String15? | String31? | String? | String15? | String15? | DateTime? | String? | DateTime? | String31 | Int64? | String15? | Int32? | Int32? | String7 | String? | String15? | String15? | String15? | String? | String31? | String? | String? | String? | Float64? | Float64? | String? | Int16? | Int8? | Int8? | Int8? | Int8? | |
1 | 57008185 | 2023-03-10T09:14:00 | 2023-03-09T09:13:00 | DOT | Department of Transportation | Street Light Condition | Street Light Out | missing | 10308 | missing | missing | missing | missing | FERN AVENUE | LAMOKA AVENUE | INTERSECTION | STATEN ISLAND | missing | missing | Pending | missing | Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. | 2023-03-09T09:13:00 | 03 STATEN ISLAND | missing | STATEN ISLAND | 940237 | 139664 | UNKNOWN | missing | STATEN ISLAND | missing | missing | missing | missing | missing | missing | missing | 40.5499 | -74.1584 | (40.54991185122116, -74.15837431589053) | 10695 | 15 | 1 | 9 | 76 |
2 | 57115432 | 2023-03-22T09:19:00 | 2023-03-21T09:18:00 | DOT | Department of Transportation | Street Light Condition | Multiple St Lts Dayburning | missing | 10464 | PELHAM BAY PARK | PELHAM BAY PARK | BRUCKNER BLVD | EASTCHESTER BAY | missing | missing | PLACENAME | BRONX | missing | missing | Pending | missing | Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. | 2023-03-21T09:18:00 | 28 BRONX | 2056500001 | BRONX | 1036955 | 255379 | UNKNOWN | missing | BRONX | missing | missing | missing | missing | missing | missing | missing | 40.8675 | -73.8094 | (40.86747358275403, -73.8094430998493) | 11273 | 44 | 5 | 12 | 28 |
3 | 57171848 | 2023-03-29T11:49:00 | 2023-03-24T11:49:00 | DOT | Department of Transportation | Street Light Condition | Street Light Out | missing | 11368 | FLUSHING MEADOWS PARK | FLUSHING MEADOWS PARK | 111 ST | COLLEGE POINT BLVD | missing | missing | PLACENAME | CORONA | missing | missing | Pending | missing | Service Request status for this request is available on the Department of Transportationâ\u80\u99s website. Please click the â\u80\u9cLearn Moreâ\u80\u9d link below. | 2023-03-24T11:49:00 | 81 QUEENS | 4020180001 | QUEENS | 1027989 | 209784 | UNKNOWN | missing | QUEENS | missing | missing | missing | missing | missing | missing | missing | 40.7424 | -73.8422 | (40.742377064999275, -73.84215697754966) | 14510 | 67 | 3 | 21 | 68 |
:
as an index indicates all values for that axis
In this case the use of :
as the second index indicates that all the columns should be included in the subset.
These three are from the “Department of Transport” and, in fact, all rows with early closing dates are from the DOT
unique(select(earlyfrm, :Agency, :Agency_Name))
Row | Agency | Agency_Name |
---|---|---|
String15 | String | |
1 | DOT | Department of Transportation |
Is Location redundant if Latitude and Longitude are given?
This is a tricky question because the values of Latitude
and Longitude
are stored to 17 significant digits, which is much more accuracy than could possibly be measured, but equality comparisons, either as floating-point numbers or as strings with 17 significant digits, would require an exact match.
To check for consistency we need to convert the Location
to a Latitude-Longitude
pair of floating point numbers then check for approximate equality of these values to the recorded Latitude
and Longitude
. We’ll leave that as an exercise for the reader and just assure you that the Location
column is redundant and can be dropped.
Saving the dataset as an Arrow IPC file
We use Arrow.write
to write the dataframe in the Arrow IPC format.
= Arrow.write( # returns the file name
fn "./data/311_half_Mar_2023.arrow",
select(df, Not(:Location)),
)
The Arrow file is less than one third the size of the original CSV file but contains much more specificity about the data types and representations. Furthermore it can be read into other data science environments without needing to tinker with arguments to CSV file-reading functions.
The Arrow IPC format and the Arrow.write
function in the Julia package allow for optional compression of the contents using lz4
or zstd
compression, if further reduction of the file size is desired. The downside of compression is that an uncompressed file can be memory-mapped, allowing for extremely fast input, whereas the compressed files must be decompressed into memory.
This file can be read into Julia
= Arrow.Table(fn) tbl
Arrow.Table with 125826 rows, 45 columns, and schema:
:Unique_Key Int32
:Created_Date DateTime
:Closed_Date Union{Missing, DateTime}
:Agency String
:Agency_Name String
:Complaint_Type String
:Descriptor Union{Missing, String}
:Location_Type Union{Missing, String}
:Incident_Zip Union{Missing, Int32}
:Incident_Address Union{Missing, String}
:Street_Name Union{Missing, String}
:Cross_Street_1 Union{Missing, String}
:Cross_Street_2 Union{Missing, String}
⋮
:Taxi_Pick_Up_Location Union{Missing, String}
:Bridge_Highway_Name Union{Missing, String}
:Bridge_Highway_Direction Union{Missing, String}
:Road_Ramp Union{Missing, String}
:Bridge_Highway_Segment Union{Missing, String}
:Latitude Union{Missing, Float64}
:Longitude Union{Missing, Float64}
:Zip_Codes Union{Missing, Int16}
:Community_Districts Union{Missing, Int8}
:Borough_Boundaries Union{Missing, Int8}
:City_Council_Districts Union{Missing, Int8}
:Police_Precincts Union{Missing, Int8}
or R
> tibble::glimpse(arrow::read_ipc_file("./data/311_half_Mar_2023.arrow"))
: 125,826
Rows: 45
Columns$ Unique_Key <int> 57000688, 56993837, 57049479, 57056381,…
$ Created_Date <dttm> 2023-03-09 09:05:25, 2023-03-08 15:27:…
$ Closed_Date <dttm> 2023-03-13 03:21:11, 2023-05-03 02:17:…
$ Agency <fct> DOE, TLC, HPD, NYPD, DOB, DPR, DOT, NYP…
$ Agency_Name <fct> Department of Education, Taxi and Limou…
$ Complaint_Type <fct> School Maintenance, For Hire Vehicle Co…
$ Descriptor <chr> "Other School Condition", "Driver Compl…
$ Location_Type <fct> School, Street, RESIDENTIAL BUILDING, S…
$ Incident_Zip <int> NA, 10075, 10459, 10455, 11365, 11691, …
$ Incident_Address <chr> NA, "995 MADISON AVENUE", "1010 BRYANT …
$ Street_Name <chr> NA, "MADISON AVENUE", "BRYANT AVENUE", …
$ Cross_Street_1 <chr> NA, "EAST 77 STREET", NA, "EAST 147 …
$ Cross_Street_2 <chr> NA, "EAST 78 STREET", NA, "EAST 149 …
$ Intersection_Street_1 <chr> NA, "EAST 77 STREET", NA, "EAST 147 …
$ Intersection_Street_2 <chr> NA, "EAST 78 STREET", NA, "EAST 149 …
$ Address_Type <fct> NA, ADDRESS, ADDRESS, ADDRESS, ADDRESS,…
$ City <fct> NA, NEW YORK, BRONX, BRONX, FRESH MEADO…
$ Landmark <chr> NA, "MADISON AVENUE", NA, "UNION AVENUE…
$ Facility_Type <fct> NA, NA, NA, NA, NA, NA, N/A, NA, NA, NA…
$ Status <fct> Closed, Closed, Closed, Closed, Assigne…
$ Due_Date <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Resolution_Description <fct> "The Department of Education determined…
$ Resolution_Action_Updated_Date <dttm> 2023-03-13 03:21:17, 2023-05-03 02:17:…
$ Community_Board <fct> Unspecified BROOKLYN, 08 MANHATTAN, 02 …
$ BBL <int64> NA, 1013927501, 2027560001, 202582004…
$ Borough <fct> BROOKLYN, MANHATTAN, BRONX, BRONX, QUEE…
$ X_Coordinate_State_Plane_ <int> 989013, 994507, 1015090, 1010420, 10388…
$ Y_Coordinate_State_Plane_ <int> 173052, 221687, 239359, 235260, 206195,…
$ Open_Data_Channel_Type <fct> PHONE, ONLINE, PHONE, MOBILE, UNKNOWN, …
$ Park_Facility_Name <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Park_Borough <fct> BROOKLYN, MANHATTAN, BRONX, BRONX, QUEE…
$ Vehicle_Type <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Taxi_Company_Borough <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Taxi_Pick_Up_Location <chr> NA, "995 MADISON AVENUE, MANHATTAN (NEW…
$ Bridge_Highway_Name <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Bridge_Highway_Direction <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Road_Ramp <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Bridge_Highway_Segment <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Latitude <dbl> 40.64166, 40.77515, 40.82361, 40.81237,…
$ Longitude <dbl> -73.98284, -73.96297, -73.88857, -73.90…
$ Zip_Codes <int> 17620, 10092, 10937, 10933, 14507, 2052…
$ Community_Districts <int> 2, 23, 8, 49, 25, 51, NA, 26, 5, 24, 47…
$ Borough_Boundaries <int> 2, 4, 5, 5, 3, 3, NA, 3, 2, 5, 4, 5, 3,…
$ City_Council_Districts <int> 27, 51, 43, 35, 24, 47, NA, 16, 8, 29, …
$ Police_Precincts <int> 39, 11, 24, 23, 65, 59, NA, 69, 38, 34,…
The R arrow
package appears to read the DateTime values as if they are UTC values then prints them in the local time zone.
or in Python with pyarrow
or with polars. (Note that the Arrow IPC format is called feather
in the pyarrow
package.)
>>> import pyarrow.feather as fea
>>> import polars as pl
>>> fea.read_table('./data/311_half_Mar_2023.arrow')
pyarrow.Tablenot null
Unique_Key: int32 not null
Created_Date: timestamp[ms]
Closed_Date: timestamp[ms]<values=string, indices=int8, ordered=0> not null
Agency: dictionary<values=string, indices=int8, ordered=0> not null
Agency_Name: dictionary<values=string, indices=int16, ordered=0> not null
Complaint_Type: dictionary
Descriptor: string<values=string, indices=int8, ordered=0>
Location_Type: dictionary
Incident_Zip: int32
Incident_Address: string
Street_Name: string
Cross_Street_1: string
Cross_Street_2: string
Intersection_Street_1: string
Intersection_Street_2: string<values=string, indices=int8, ordered=0>
Address_Type: dictionary<values=string, indices=int8, ordered=0>
City: dictionary
Landmark: string<values=string, indices=int8, ordered=0>
Facility_Type: dictionary<values=string, indices=int8, ordered=0>
Status: dictionary
Due_Date: timestamp[ms]<values=string, indices=int16, ordered=0>
Resolution_Description: dictionary
Resolution_Action_Updated_Date: timestamp[ms]<values=string, indices=int8, ordered=0> not null
Community_Board: dictionary
BBL: int64<values=string, indices=int8, ordered=0>
Borough: dictionary
X_Coordinate_State_Plane_: int32
Y_Coordinate_State_Plane_: int32<values=string, indices=int8, ordered=0> not null
Open_Data_Channel_Type: dictionary<values=string, indices=int16, ordered=0>
Park_Facility_Name: dictionary<values=string, indices=int8, ordered=0>
Park_Borough: dictionary<values=string, indices=int8, ordered=0>
Vehicle_Type: dictionary<values=string, indices=int8, ordered=0>
Taxi_Company_Borough: dictionary
Taxi_Pick_Up_Location: string<values=string, indices=int8, ordered=0>
Bridge_Highway_Name: dictionary<values=string, indices=int8, ordered=0>
Bridge_Highway_Direction: dictionary<values=string, indices=int8, ordered=0>
Road_Ramp: dictionary<values=string, indices=int16, ordered=0>
Bridge_Highway_Segment: dictionary
Latitude: double
Longitude: double
Zip_Codes: int16
Community_Districts: int8
Borough_Boundaries: int8
City_Council_Districts: int8
Police_Precincts: int8----
57000688,56993837,57049479,57056381,57023097,...,57116487,56954525,57132547,57102740,56950149]]
Unique_Key: [[2023-03-09 15:05:25.000,2023-03-08 21:27:34.000,2023-03-15 19:26:41.000,2023-03-16 23:58:53.000,2023-03-12 15:15:54.000,...,2023-03-22 17:47:08.000,2023-03-04 16:23:00.000,2023-03-24 21:57:44.000,2023-03-21 05:10:12.000,2023-03-04 00:27:00.000]]
Created_Date: [[2023-03-13 08:21:11.000,2023-05-03 07:17:20.000,2023-03-21 21:16:53.000,2023-03-17 02:44:31.000,2023-04-19 00:00:00.000,...,null,2023-03-08 10:50:00.000,2023-03-24 23:18:11.000,2023-03-21 05:16:30.000,2023-03-04 10:15:00.000]]
Closed_Date: [[-- dictionary:
Agency: [ "DOE","TLC","HPD","NYPD","DOB",...,"DHS","EDC","DCA","NYC311-PRD","OTI"] -- indices:
[0,1,2,3,4,...,5,7,3,3,6]]
[-- dictionary:
Agency_Name: [ "Department of Education","Taxi and Limousine Commission","Department of Housing Preservation and Development","New York City Police Department","Department of Buildings",...,"Department of Sanitation","Department of Homeless Services","Economic Development Corporation","Department of Consumer Affairs","Office of Technology and Innovation"] -- indices:
[0,1,2,3,4,...,5,7,3,3,6]]
[-- dictionary:
Complaint_Type: [ "School Maintenance","For Hire Vehicle Complaint","UNSANITARY CONDITION","Illegal Parking","Building/Use",...,"Building Drinking Water Tank","Taxi Licensee Complaint","Peeling Paint","Green Taxi Report","Bus Stop Shelter Placement"] -- indices:
[0,1,2,3,4,...,30,80,3,7,16]]
["Other School Condition","Driver Complaint - Non Passenger","PESTS","Posted Parking Sign Violation","Illegal Conversion Of Residential Building/Space",...,"For One Address","Lead Kit Request (Residential) (L10)","Commercial Overnight Parking","Loud Music/Party","Post"]]
Descriptor: [[-- dictionary:
Location_Type: [ "School","Street","RESIDENTIAL BUILDING","Street/Sidewalk",null,...,"Building Entrance","Public Stairs","Cafeteria - Public School","Commercial","Speed Reducer"] -- indices:
[0,1,2,3,null,...,1,null,3,6,null]]
[10075,10459,10455,11365,...,null,null,null,null,null]]
Incident_Zip: [[null,"995 MADISON AVENUE","1010 BRYANT AVENUE","541 UNION AVENUE","69-35 166 STREET",...,"209 WEST 93 STREET","220 WEST 16 STREET","38 STERLING AVENUE","243 HENRY STREET",null]]
Incident_Address: [[null,
...>>> pl.read_ipc('./data/311_half_Mar_2023.arrow', use_pyarrow=True)
125_826, 45)
shape: (
┌──────────┬────────────┬───────────┬────────┬───┬────────────┬────────────┬────────────┬────────────┐
│ Unique_K ┆ Created_Da ┆ Closed_Da ┆ Agency ┆ … ┆ Community_ ┆ Borough_Bo ┆ City_Counc ┆ Police_Pre │--- ┆ ┆ Districts ┆ undaries ┆ il_Distric ┆ cincts │
│ ey ┆ te ┆ te ┆ --- ┆ --- ┆ --- ┆ cat ┆ ┆ --- ┆ --- ┆ ts ┆ --- │
│ --- ┆ i8 │
│ i32 ┆ datetime[m ┆ datetime[ ┆ ┆ ┆ i8 ┆ i8 ┆
│ ┆ s] ┆ ms] ┆ ┆ ┆ ┆ ┆ i8 ┆ │
╞══════════╪════════════╪═══════════╪════════╪═══╪════════════╪════════════╪════════════╪════════════╡57000688 ┆ 2023-03-09 ┆ 2023-03-1 ┆ DOE ┆ … ┆ 2 ┆ 2 ┆ 27 ┆ 39 │
│ 15:05:25 ┆ 3 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 08:21:11 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ 56993837 ┆ 2023-03-08 ┆ 2023-05-0 ┆ TLC ┆ … ┆ 23 ┆ 4 ┆ 51 ┆ 11 │
│ 21:27:34 ┆ 3 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 07:17:20 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ 57049479 ┆ 2023-03-15 ┆ 2023-03-2 ┆ HPD ┆ … ┆ 8 ┆ 5 ┆ 43 ┆ 24 │
│ 19:26:41 ┆ 1 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 21:16:53 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ 57056381 ┆ 2023-03-16 ┆ 2023-03-1 ┆ NYPD ┆ … ┆ 49 ┆ 5 ┆ 35 ┆ 23 │
│ 23:58:53 ┆ 7 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 02:44:31 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │56954525 ┆ 2023-03-04 ┆ 2023-03-0 ┆ DEP ┆ … ┆ 12 ┆ 4 ┆ 10 ┆ 6 │
│ 16:23:00 ┆ 8 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 10:50:00 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ 57132547 ┆ 2023-03-24 ┆ 2023-03-2 ┆ NYPD ┆ … ┆ 30 ┆ 1 ┆ 14 ┆ 76 │
│ 21:57:44 ┆ 4 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 23:18:11 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ 57102740 ┆ 2023-03-21 ┆ 2023-03-2 ┆ NYPD ┆ … ┆ 70 ┆ 4 ┆ 32 ┆ 4 │
│ 05:10:12 ┆ 1 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 05:16:30 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ 56950149 ┆ 2023-03-04 ┆ 2023-03-0 ┆ DOT ┆ … ┆ 41 ┆ 3 ┆ 16 ┆ 61 │
│ 00:27:00 ┆ 4 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ 10:15:00 ┆ ┆ ┆ ┆ ┆ ┆ │
│ ┆ ┆ └──────────┴────────────┴───────────┴────────┴───┴────────────┴────────────┴────────────┴────────────┘
The columns that are described as <fct>
in the glimpse
output from R (and dictionary
in the output from fea.read_table
and cat
(for “categorical”) in the Polars data frame) are factor
representations, consisting of a (short) vector of the unique values, called the levels
of the factor, and an integer vector of indices into the levels. The conversion of a column of character strings, many of which are repetitions, to a factor or “dictionary encoded” representation, can result in considerable storage saving, at the expense of a trivial amount of processing. For example, storing the Agency_Name
as a vector of strings takes about 34 MB but, after dictionary encoding, it takes up about 1/8 MB.
The output from fea.read_table
is often the most informative representation of the structure of the table.