Exploration of clustering techniques over geographical and other dimensions
This post explains attempts to define more homogeneous neighborhoods for a project about predicting gentrification in Philadelphia. For more context on the project as a whole, see this post.
Neighborhoods are an important characteristic of cities, many with evolving subcultures and lifestyles within them. Computationally, defining the boundaries of a neighborhood can be very difficult. Measures over predefined spatial boundaries can lead to a misrepresentation of the data, known as the Modifiable Areal Unit Problem (MAUP). Instead, clustering techniques can be used to delineate more internally homogenous regions for analysis. …
Leveraging spatial indices for geospatial feature engineering
This post explains k-nearest neighbors as a feature engineering technique in geospatial machine learning for a project about predicting gentrification in Philadelphia. For more context on the project as a whole, see this post.
Many geospatial datasets include data detailing locations of specific events, such as incidents of crime. In order to use this crucial data as features for house price prediction, this city-wide data had to be converted into a consistent, per-parcel feature. One technique I used was identifying the average distance to the k nearest neighbors of each event. For example…
Over the last eight years, the Philadelphia housing market has turned around from recession and is primed to accelerate. At the same time, thousands of impoverished tenants struggle to find and maintain reasonably priced housing. Affordable housing initiatives have not come without criticisms regarding the placement of new housing projects, particularly with the concentration of new developments in already low-income areas. While one can argue that locating affordable housing projects in these areas keeps tenants close to their existing communities, it also concentrates them away from possible economic growth and social mobility.
A journey through the SoundCloud network
Though SoundCloud’s journey has been anything but stable, one key advantage they have over other music streaming services is a prime combination of content distribution and social networks. As a novice producer starting out myself, I asked the question on everyone’s mind: How do songs go viral? Using networks, I tried to figure it out.
My original hypothesis was based on the idea of “Mavens”, coined in Malcolm Gladwell’s Tipping Point.
“Mavens are […] information specialists who we rely on to connect us to new information.”
In this case, a Maven would be the…
Current Data Science Masters student at University of Pennsylvania