Weekly Meeting with Prof. Wang, 09/26/2012

  • Ideas
  1. The roles of metadata in integrating CyberGIS-based web portals
  • Discussion
  1. Fundamental semantic frameworks are necessary to facilitate the integration of CyberGIS-based geospatial portals.
  2. A starting point for defining such semantic frameworks is to carry out case studies, develop semantic frameworks for each case, and identify the common elements of those frameworks.
  3. To develop semantic frameworks in an objective manner, we need to establish principles to which those frameworks can conform. Example principles are as follows: 1) with such frameworks, human involvement in the process of software integration should be minimized; 2) such frameworks should guide or improve software integration; and 3) such frameworks not only should support the mechanical process of software integration but also should help humans understand the integration process by means of provenance.

Weekly Meeting with Prof. Wang, 08/20/2012

  • Agenda
  1. Broad ideas for new research
  • Discussion points
  1. Are these ideas worth to pursue?
  • Ideas
  1. An analytic framework for spatial exploration of social media data
    • Myung-Hwa (MH): The current analyses of social media data (e.g., georeferenced data from wikipedia, tweets, etc.) remain as visualizations such as heatmaps. These kinds of visualization always raise the so-what question. To obtain meaningful outputs from social media data, we need to introduce a systematic (?) analytic framework for spatial exploration of social media data. The first exploratory step is to examine relationships among spatial proximity, cultural association (derived from social media data), and other socioeconomic variables (e.g., economic development). In the literature of spatial analysis, there are various frameworks for quantifying spatial proximity. Examples include physical distance, inter-regional flows, travel distance, or social-network-based distances. One research direction can be to examine how the association between spatial proximity and cultural similarity varies with the different measures of spatial proximity. Also, by using spatial proximity and cultural similarity as two axes of a plane, we can map how the location of a city or place has changed on the plane over time. When dealing with multiple spatial units, we can also group those units according to their traces on the aforementioned plane. The main idea is that we need a well-framed exploratory framework for analyzing spatial dimensions beneath social media data, rather than rigorous confirmatory frameworks.
    • Shaowen (SW): Before analyzing social media data, we first need to address the issue of uncertainty in the social media data. The locations of places mentioned in social media data cannot be determined precisely due to the lack of contexts and capabilities of disambiguating those contexts in the current geocoding procedures. Also, texts in social media data usually refer to “places” rather than “locations”. Thus, an important issue becomes how to demarcate and represent the boundary of places and how to incorporate those boundaries into analytic frameworks.
  2. A generic solution for integrating user interfaces.
    • MH: The current approach to integrating user interfaces is not generic, meaning each integration needs to be implemented differently from an application to another. A better approach would be to create a mashup-like framework. In this framework, a team member of the CyberGIS project contributes his/her user interfaces and descriptions of related user interactions and supporting backend services. This contributed artifact becomes a component of a mashup API. Any user of this API now can mix and match the contributed components as they want. The benefits of the proposed approach is that it allows for the creation and management of reusable APIs while contributed user interfaces can still be customized to satisfy the requirements of individual integration works. In addition to this mashup approach, the mechanisms for accessing the CyberGIS gateway can be varied from web portals to desktop clients (e.g., google drive).
    • SW: This idea can be generalized more to another concept of “user interface template.” The template idea includes taxonomies of user interfaces and relevant user interactions that are frequently required for spatial analysis. Abstract user interface templates can be defined for each analysis task and can be further refined to have specific implementations that fits each context of use. A challenge here is to specify the taxonomies under the consensus of the spatial analysis communities.
    • Literature
  1. Semantic approaches to the RMMS application
    • MH: Some users of the RMMS application contributes data voluntarily. Also, the main role of the RMMS application is to integrate various types of spatial data. Can we renovate some components of the RMMS application under the frameworks of volunteered geographic information (VGI) and semantic web? For example, we can apply semantic-web-based approaches to data integration and show their benefits in the current use cases of the RMMS application.
    • SW: The connection between the RMMS app and the VGI framework is rather weak. How about focusing on tracing temporal changes in the data managed by the RMMS? The framework of data provenance can be introduced here. Related questions are database modelling for tracing temporal changes as well as documentation and visualization of such changes. Also, we need to show what values these approaches add to the RMMS application.

Weekly Meeting with Prof. Wang, 07/22/2012

  • Agenda
  1. Developing research problems
  • Discussion points
  1. Is the following idea worth pursuing? If not, what other approaches should I take to find a better idea?
  2. How can I develop the abstract idea into solid research problems?
  • Idea: A CyberGIS-enabled inference framework for detecting space-time clusters on a network space
  1. As research interests in network-constrained phenomena increase, so do spatial analysis methods that can explicitly consider the characteristics of a network space. Multiple methods for analyzing global clustering and local clusters on a Euclidean space have been extended to a network space. Examples include global K-function, local indicators of spatial association, and kernel density estimation. In spite of these advancements, statistical methods for detecting space-time clusters on a network space are still not available in the current literature of geographic information science. This lack of methods is attributable to two factors. First, it has not been examined how existing null models of space-time point processes (e.g., inhomogeneous Poisson and Cox processes) can be realized or can be extended to reflect space-time behaviors and interaction of observations (as in agent-based modeling) or distributions of spatial covariates on a network space. In other words, we do not have solid frameworks for generating reference distributions for network-constrained statistics of space-time clusters. Second, empirical approaches to simulate null space-time models (e.g., Monte Carlo simulations) are computationally expensive, thus difficult to carry out on general desktop computing environments. The goal of this research is three-folds:
    • To provide a network version of the Q statistics, which are originally developed to describe global space-time clustering and detect local space-time clusters on a Euclidean space
    • To develop an inference framework through which researchers can test various alternative hypotheses of network-constrained space-time patterns. Examples of those alternative hypotheses are as follows:
    1. Each event occurs randomly, and the probability of its occurrence is spatiotemporally uniform.
    2. Event occurrences are correlated, and the probability of an event occurrence is spatiotemporally uniform.
    3. Each event occurs randomly, and the probability of its occurrence is inhomogeneous. There can be multiple approaches to defining a function of inhomogeneous probabilities. Examples include the use of spatial covariates, known populations, and agent-based models.
    4. Event occurrences are correlated, and the probability of an event occurrence is inhomogeneous.
    • To develop parallel computational frameworks to ease the use of the proposed inference framework

Weekly Meeting with Prof. Wang, 07/09/2012

  • Agenda
  1. Potential research problems
  2. Logistics: dissertation defense schedule
  • Discussion points
  1. Which problems are worth studying, feasible, low hanging fruits, and needed for the lab?
  2. How should I approach the problems? What strategies are most likely to succeed in solving the problems?
  • Research problems
  1. A holistic metadata model for distributed spatial analysis
    • Distributed spatial analysis refers to the process of analyzing spatial data through static or dynamic interconnections of distributed geographic information services and computing resources. For distributed spatial analysis, researchers usually undertake a workflow modeling. They 1) decompose a research problem into small analytical tasks, 2) arrange the tasks in the form of a workflow, 3) identify and link distributed services and computing resources that are needed for each task, 4) load input data and parameters into the workflow, 5) execute and monitor the workflow, and 6) interpret and validate the correctness of the workflow and its outputs. At each step of this workflow modeling, researchers require metadata about various aspects of distributed services and computing resources. Nonetheless, little research has been done up to now to elucidate the details of metadata required for distributed spatial analysis and formulate a model for organizing and documenting such metadata. To fill this research gap, this study analyzes the informational needs for workflow modeling and develops a holistic model of metadata required for distributed spatial analysis.
  2. A visual query system for spatial provenance metadata
    • Spatial provenance refers to “lineage and workflow information of spatial data manipulation and related analysis” (Wang et al., 2008). Documentation of spatial provenance facilitates the interpretation, replication, and validation of analysis results, thereby helping to improve the overall quality of geographic research. As this important role of spatial provenance has been increasingly recognized, multiple studies have proposed metadata models for spatial provenance and have demonstrated their utility in the context of database update and spatial error propagation. Despite these existing proofs for the usefulness of spatial provenance metadata, it still remains challenging for researchers to make meaningful queries against the provenance metadata. This study seeks to mitigate the difficulties in querying spatial provenance metadata by developing a system where researchers use predefined visual metaphors or representations to retrieve meaningful information from a database of the provenance metadata.
  3. Parallelizing Monte Carlo tests for the network-constrained local K-function
    • The network-constrained local K-function is a method for identifying spatial clusters in a set of point events distributed over a network space, such as traffic accidents and street crimes (Yamada and Thill, 2006). This method is useful for determining both the locations and sizes of spatial clusters as it tests spatial clustering of point events at multiple locations and multiple scales. While this specificity of the local K-function helps to delineate spatial clusters, this benefit does not come without expense. The local K-function is computationally intensive as it uses a statistical inference framework based on Monte Carlo tests in which network-constrained random point patterns are repeatedly simulated and tested for spatial clustering at multiple locations and scales on a network space. This research focuses on developing and evaluating parallel algorithms for the Monte Carlo tests to enable efficient estimation of the local K-function on a large network space.