Remote Sensing Blog

Which Scientific Approach Does Spatial Data Science Follow? Causality or Data Mining?

09 Aug 2019

In science we used causality to answer “Why” whereas data science largely relies on data mining and correlation. As we know spatial data science is a combination of spatial science and data science. The question arises what scientific approach do we use to answer questions in spatial data science, causality or data mining? In fact, spatial data science employs both causality and data mining approaches to answer spatial questions.

In the following, we look into few spatial data science problems and how those problems might be solved using causality and data mining approaches:

Estimating Store Revenue:

Imagine we want to estimate revenue of a store at a given spatial location. How can we possibly use either causality approach or data mining approach?

Causality approach will investigate supply and demand of the products with focus on location and capacity of nearby competing stores as well as mapping of spatial distribution of potential. For that, probably we could use service accessibility models, such as gravity model or 2-step floating catchment analysis are applied. Then we can estimate customer shares of nearby stores and eventually get a revenue estimation at the given location.

Data mining approach would require us to collect data of all the existing stores, which have features similar to the new store and then use the revenue of the existing stores to do the estimation for the new store. Other data mining approach could be building a regression model. The target variable of our regression model would revenue and the input variables would be characteristics of the stores and their locations. Once we have learned the regression model, we could use it to predict/estimate store revenue.
Typical data mining approach works better for small retail businesses. Whereas causality approach works for larger stores such as grocery stores, departmental stores etc.

Estimate Travel Time:

What time a taxi should leave Square One, Mississauga in order to arrive at Toronto Union at 9:00am on Monday?

Causality approach would involve steps of trip generation to build origin-generation table, modal split about decision-making of transportation type, trip assignment to roads, and then get the travel time estimation.

Data mining approach would use historical data sets of taxi trajectory, then find a similar trajectory which covers the origin Square One, Mississauga and the destination, Toronto Union Station. Under similar condition parameter such as, 9:00 a.m., Monday, weather condition and occurrence of any special events. Based on the trajectory of pattern mining and matching, the travel time can be estimated as well.

Predict Flooding Location

Which areas of Toronto city would be flooded at the precipitation rate of one inch per hour for five hours?
We can use either conventional hydrological process with spatial data, such as digital elevation model, which is a causality approach. Or a heuristics data mining approach, with pattern matching as explained in the slide.

Air Pollution Prediction

Imagine, we want to predict the air pollution level at University of Toronto at any given time in future.

The causality approach would combine inventory of pollutant emission and weather condition into computational fluid dynamics and estimate air pollution level at a given location and time.

Data mining approach would train a machine learning model which takes in predictors historical observations and learns mapping function to the air pollution level.