Mining Geospatial Knowledge on the Social Web

Mining Geospatial Knowledge on the Social Web

Suradej Intagorn, Kristina Lerman
DOI: 10.4018/978-1-4666-2788-8.ch007
(Individual Chapters)
No Current Special Offers


Up-to-date geospatial information can help crisis management community to coordinate its response. In addition to data that is created and curated by experts, there is an abundance of user-generated, user-curated data on Social Web sites such as Flickr, Twitter, and Google Earth. User-generated data and metadata can be used to harvest knowledge, including geospatial knowledge that will help solve real-world problems including information discovery, geospatial information integration and data management. This paper proposes a method for acquiring geospatial knowledge in the form of places and relations between them from the user-generated data and metadata on the Social Web. The key to acquiring geospatial knowledge from social metadata is the ability to accurately represent places. The authors describe a simple, efficient algorithm for finding a non-convex boundary of a region from a sample of points from that region. Used within a procedure that learns part-of relations between places from real-world data extracted from the social photo-sharing site Flickr, the proposed algorithm leads to more precise relations than the earlier method and helps uncover knowledge not contained in expert-curated geospatial knowledge bases.
Chapter Preview


When disaster strikes, people are increasingly turning to social media sites to reach out to friends and family, post information, including images and videos, about current conditions, and receive updates about shelters and safe harbors. Humanitarian relief community could potentially integrate this data with geo-spatial information contained in maps, satellite imagery and news reports, to monitor the unfolding situation, assess damage and help coordinate relief efforts. As an example, images posted on the social photo-sharing site Flickr both before and after a tornado devastated a town, could by combined with eyewitness accounts and missing persons reports on the microblogging service Twitter to create a detailed view of the affected area and its population. First-hand accounts on Twitter could then be used to monitor the availability of shelter and critical supplies, such as fresh water. The challenge, however, is to link places people talk about in their posts to the actual geo-spatial entities, since ordinary people are highly unlikely to use terms from a predefined geo-spatial vocabulary, or may refer to places that are not formally defined, such as, neighborhoods and landmarks.

We address this problem by automatically mining social media content to learn about places and relations between them. In addition to creating rich content in the form of text documents, images, and videos on the Social Web sites such as Flickr and YouTube, people often annotate content with keywords, called tags that they use to label and categorize content, as well as geographic coordinates, or geo-tags. Although social metadata lacks a controlled vocabulary and predefined structure, it reflects how a community organizes knowledge, including geospatial knowledge. A corpus of social metadata created by large numbers of people can be mined to reveal concepts (Plangprasopchok, 2004) including places and relations between them (Keating, 2005). Community-generated knowledge1 that is automatically extracted from social metadata can complement expert-curated geospatial knowledge (Keating & Montoya, 2005; Kavouras et al., 2006), such as Geonames ( Community-generated knowledge is more likely to stay complete and current, since it is learned from metadata that is distributed and dynamic in nature (Golder & Huberman, 2006). It is also more likely to reflect colloquial folk knowledge that people use to talk about places.

Recently we proposed a method for aggregating geo-tagged data created by thousands of users of the social photo-sharing site Flickr to learn places and relations between them (Intagorn et al., 2010). The method represents a place by the coordinates of the geo-tagged images Flickr users labeled with the place name and uses geospatial subsumption to learn relations between places. Our key challenge is to efficiently and accurately represent places. In the original work, we used convex hulls to represent places, but found they did a poor job, since places were often concave. To address this problem, we present a simple, computationally efficient algorithm to find a possibly concave contour of a planar shape. Our method starts with a bounding box that subsumes all points and gradually erodes it until the boundary converges to a polygon that best represents that shape. We evaluate the method on data set consisting of US zipcodes. We then apply it to learn the boundaries of places extracted from social metadata. We show that the new method enables us to learn more precise relations between places using geospatial subsumption. Some of what we learn includes novel relations not found in the formal directories, for example, that Wild Animal Park is in San Diego. While not technically correct, such expressions of folk knowledge are still quite useful.

Complete Chapter List

Search this Book: