JOIN THE AUTOMATION REVOLUTION.
  Pro-Cal Powertrain
  • Home
  • IMI
    • Inca Matlab Interface
    • Download
    • FAQ
  • Projects
    • Caroar
    • Deserted EV
    • Mole Monitor
  • About Us
  • Contact
  • Blog

Moving from New york to toronto

8/18/2020

0 Comments

 
​Battle of the Neighbourhoods

1. Introduction

Moving from one country, city or neighbourhood to another is a common problem we have to deal with at some point in our lives whether for personal, work or other reasons. We often make quite basic enquiries into the positives and negatives of our new neighbourhood before moving but today we have detailed objective information about neighbourhoods which can be leveraged to ensure it meets our criteria in terms of local conveniences, purchase and rental prices, proximity to schools and more. Often though, we're leaving a familiar and comfortable neighbourhood we're we've been from some time and are looking for something similar to what we know.

The purpose of this project is to characterise the neighbourhoods of New York and Toronto and classify them as a means of gaining insight into their similarity against key metrics often considered when moving. In this way, someone theoretically moving from their favourite neighbourhood in New York can instantly narrow their search to similar neighbourhoods in Toronto.

2. Methodology

2.1 DataTo be able to generate these insights, data on the various neighbourhoods and postcodes within both New York and Toronto will be required. This has already been provided through the Applied Data Science coursework as follows:
​
New York Neighbourhoods - https://geo.nyu.edu/catalog/nyu_2451_34572
Toronto Neighbourhoods - https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
​

Furthermore, for the characterisation of these neighbourhoods, data on local venues and other objective metrics describing the neighbourhood will be required and can be retrieved from the Foursquare API. An example of the Foursquare API output processed into tabular format is as follows:
Picture
The Foursquare API requires latitude and longitude as an input to determine the features within a specified search radius. It is therefore necessary to map the New York and Toronto neighbourhood postcode data to their equivalent latitude and longitude. This requires the retrieval of an additional dataset provided by Cognitive Class AI.

Toronto Geospatial dataset - http://cocl.us/Geospatial_data

2.2 Processing

For the purpose of extracting insights into the similarity of neighbourhoods in New York to those in Toronto it was first necessary to ensure that all the necessary data was at hand. This included the following information on each neighbourhood:
​
  • Name
  • Postcode
  • Latitude
  • Longitude
The New York neighbourhood dataset download from NYU contained all necessary data including latitude and longitude so no further data retrieval was necessary. The neighbourhood dataset for Toronto did not include the latitude and longitude values and it was therefore necessary to retrieve this data from a separate source. Geospatial data on Toronto was provided by Cognitive Class AI and allowed the neighbourhood coordinates to be retrieved against postcode to complete the Toronto dataset.
Picture
Using the Folium Python package it was then possible to visualise the neighbourhood datasets for New York and Toronto in their entirety.
​
New York
Picture
Toronto
Picture
So as to gauge the similarity in the lifestyle associated with living in neighbourhoods in both New York and Toronto an analysis of the venue types located within a 500m radius of each neighbourhood's centre as defined by its coordindates. These data were retrieved from the Foursquare API for all neighbourhoods within New York and Toronto.
Picture
To analyse the neighbourhood similarity across both regions, the separate venue datasets were concatenated then one-hot encoded, aggregated and the frequency of each venue category was determined with the intention that a higher venue frequency in a neighbourhood would be linked to the lifestyle habits of its residents (i.e. a higher frequency of coffee shops is likely to occur in the vicinity of a higher number of residents wanting to visit coffee shops).
Picture
This dataset could then be clustered using the k-means algorithm to determine a subset of like neighbourhoods and therefore an appropriate label for each neighbhourhood in the entire dataset according to this clustering. For the purposes of this exercise 10 clusters were assigned across the entire dataset to better account for the diversity of the two cities.
Picture

With this approach it was then possible to visualise similarities in neighbourhoods between the two cities, find similar neighbourhoods in Toronto as in New York (and vice versa) and understand the frequency of venues associated with each cluster.

3. Results

Using this data it was possible to successfully cluster neighbourhoods into 10 groups of similar venue frequncy distributions across the entire New York/Toronto dataset. Visualising the cluster labelling in both cities we see the following:
​
New York
Picture
Where:
​ Cluster 0,Cluster 1,Cluster 2,Cluster 3,Cluster 4,Cluster 5,Cluster 6,Cluster 7,Cluster 8,Cluster 9
Toronto
Picture
Where:
Cluster 0,Cluster 1,Cluster 2,Cluster 3,Cluster 4,Cluster 5,Cluster 6,Cluster 7,Cluster 8,Cluster 9
This allows us to see similar neighbourhoods between the two cities and the distribution of similar neighbourhoods within the city but for the purpose of finding a similar neighbourhood in Toronto as in New York in terms of lifestyle it helps to understand the makeup of the venue distribution within the labelled clusters.
Picture
4. Discussion

As shown in the results section of this report, a meaningful categorisation of the neighbourhoods of New York and Toronto was successfully carried out. While some clusters lack what might be described as a theme in the venue frequency which might reflect a lifestyle, the venues and locations of clusters 2,3,6,8,9 quite well describe lifestyles that are diverse, near to open spaces, close to public transport, good for metropolitan coffee and restaurants and beach-going respectively.

Taking a hypothethical situation where a New Yorker lives in the outer suburb of Bayside where they enjoy many of the benefits of the metropolitan lifestyle including proximity to a number of coffee shops, bars, restaurants and bakeries while still being somewhat removed from the city we can then use the cluster mapping to find candidate neighbourhoods in Toronto.
​
Bayside
Picture
Immediately we can see from the Toronto cluster map that the cluster 8 neighbourhoods of Bay Shores and Cliffside West might be suitable locations to move to given their similarity of venue distribution and relative remoteness from the centre of the city.

Bay Shores
Picture
​Cliffside West
Picture
5. Conclusion¶

​​
By the application of k-means clustering, it was possible to determine similarity against venue distribution within a 500m radius of the neighbourhoods in New York and Toronto. With more development, a tool such as this could prove useful in the search for a new home when the movers are looking to relocate to a similar neighbourhood. For future development it would be interesting to focus on introducing new data including average house and rent prices, crime rates and performing parameter sweeps on the venue radius, number of clusters and clustering methods
0 Comments



Leave a Reply.

    Author

    Write something about yourself. No need to be fancy, just an overview.

    Archives

    August 2020

    Categories

    All

    RSS Feed

Powered by Create your own unique website with customizable templates.