Exploration of clustering techniques over geographical and other dimensions

This post explains attempts to define more homogeneous neighborhoods for a project about predicting gentrification in Philadelphia. For more context on the project as a whole, see this post.

Neighborhoods are an important characteristic of cities, many with evolving subcultures and lifestyles within them. Computationally, defining the boundaries of a neighborhood can be very difficult. Measures over predefined spatial boundaries can lead to a misrepresentation of the data, known as the Modifiable Areal Unit Problem (MAUP). Instead, clustering techniques can be used to delineate more internally homogenous regions for analysis. …

Leveraging spatial indices for geospatial feature engineering

This post explains k-nearest neighbors as a feature engineering technique in geospatial machine learning for a project about predicting gentrification in Philadelphia. For more context on the project as a whole, see this post.

Many geospatial datasets include data detailing locations of specific events, such as incidents of crime. In order to use this crucial data as features for house price prediction, this city-wide data had to be converted into a consistent, per-parcel feature. One technique I used was identifying the average distance to the k nearest neighbors of each event. For example…

Hands-on Tutorials

Forecasting gentrification in Philadelphia to inform affordable housing policy

Photo by Ethan Hoover on Unsplash


Over the last eight years, the Philadelphia housing market has turned around from recession and is primed to accelerate. At the same time, thousands of impoverished tenants struggle to find and maintain reasonably priced housing. Affordable housing initiatives have not come without criticisms regarding the placement of new housing projects, particularly with the concentration of new developments in already low-income areas. While one can argue that locating affordable housing projects in these areas keeps tenants close to their existing communities, it also concentrates them away from possible economic growth and social mobility.

Gentrification is a major source of neighborhood change…

A journey through the SoundCloud network

Though SoundCloud’s journey has been anything but stable, one key advantage they have over other music streaming services is a prime combination of content distribution and social networks. As a novice producer starting out myself, I asked the question on everyone’s mind: How do songs go viral? Using networks, I tried to figure it out.

My original hypothesis was based on the idea of “Mavens”, coined in Malcolm Gladwell’s Tipping Point.

“Mavens are […] information specialists who we rely on to connect us to new information.”

In this case, a Maven would be the…

Prateek Agarwal

Current Data Science Masters student at University of Pennsylvania

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store