Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Towards an Embedding-Based Approach for the Geolocation of Texts and Users on Social Networks

Sarra Hasni

Source Title: Interdisciplinary Approaches to Spatial Optimization Issues

DOI: 10.4018/978-1-7998-1954-7.ch012

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The geolocation task of textual data shared on social networks like Twitter attracts a progressive attention. Since those data are supported by advanced geographic information systems for multipurpose spatial analysis, new trends to extend the paradigm of geolocated data become more emergent. Differently from statistical language models that are widely adopted in prior works, the authors propose a new approach that is adopted to the geolocation of both tweets and users through the application of embedding models. The authors boost the geolocation strategy with a sequential modelling using recurrent neural networks to delimit the importance of words in tweets with respect to contextual information. They evaluate the power of this strategy in order to determine locations of unstructured texts that reflect unlimited user's writing styles. Especially, the authors demonstrate that semantic proprieties and word forms can be effective to geolocate texts without specifying local words or topics' descriptions per region.

Chapter Preview

Top

Introduction

Nowadays, a radical transformation in the paradigm of geospatial data is manifested through the evolution of related technologies. For example, the use of Geographic Information Systems (GIS) becomes wider by reinforcing their storage and management capacities. Such advantage makes the support of data from even unofficial sources more possible (Sui, 2011). Among those data, geotagged messages (tweets) that are published in the location-based social network (LBSN) Twitter constitute a considerable part of Big GeoData and proven to be useful for many purposes. For example, such messages report on human practices and daily lives which are in turn valuable for epidemiological monitoring (Allen, 2016), analysis of geolocated sentiments (Yaqot, 2018), prevention and resolution of crimes (Corso, 2017), etc.

Despite their effectiveness, the ability of geotagged tweets to bridge the gap between the physical world and the virtual one is still limited. In fact, previous studies demonstrate that the rate of geolocated tweets is less than 0.85%. This limitation promotes the development of several works for user/tweet geolocation based on textual content analysis in order to concretize the relationship between texts and space (Han, 2012). Through these works, a particular attention was accorded to statistical language models. For example, a given word may be a representative of a region if its use is more frequent compared to other words (Cheng, 2010). Otherwise, frequent terms and topics are marked by a set of relevant geospatial features making them useful to distinguish between different regions.

From a deep study of the proposed geolocation strategies, we think that the employment of statistical language models limits their performance. Precisely, we assume that the propagation of topics on social networks makes the estimation of their distribution more complex. For example, a user may report an event that occurs in a different location to that where it is located. A second limit that we consider the most critical is the rigidity of these models. They particularly seem to be less effective to treat new tweets which often contain out-of-vocabulary (OOV) words. Otherwise, the selective choice of local words limits their performance.

Given these problems, we consider that the geolocation task must be approached by paying more attention to word proprieties. In other words, we have to foster the relationship between inherent textual proprieties and geospatial dimensions. Particularly, we think that measuring the geo-semantic distribution may be efficient similarly to (Ballatore et al., 2013) and (Hu et al.,2017). Starting from the assumption that similar meanings occur in similar contexts, we consider that word embedding models can be effective solutions to measure the distribution of words / topics in space. Nevertheless, this utility is conditioned by their ability to determine the geographical belonging of a context that occurs in several regions at the same time. In addition, we enable our geolocation strategy to treat new tweets that contain OOVs and new topics by employing an imitation-based character embedding model. We adopt a recurrent neural network (RNN) architecture and an attention mechanism for a sequential modelling and a word’s importance measurement respectively. Hence, we boost the value of contextual information to implicitly delimit the sparse uses of words with spatial indications across social networks.

Evaluated on three corpora, generated results demonstrate that our theoretical choices are valid to geolocate individual tweets. They guarantee more efficiency and more scalability against non-standard words and topics’ variety. Otherwise, geo-semantics prove to be local indicators for shared texts on social networks with better efficiency compared to words’ frequencies. Sequential modelling and the attention model also show their effectiveness to differentiate between similar writing styles. But, we estimate that their utility can be more valuable to geolocate users by considering further linguistic proprieties. It is particularly interesting to redefine words’ importance when dealing with a single language variant.

The rest of the paper is organized as follows: we give an overview of some related concepts in order to assimilate more effectively the particularities of the geolocation task in section2. Some elaborated works for both user and tweet geolocation are studied in section 3. Then, section 4 is reserved for the description of our geolocation strategy. We expose our generated results in section 5. A conclusion and future works are finally presented in section 6.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Towards an Embedding-Based Approach for the Geolocation of Texts and Users on Social Networks

Abstract

Introduction

Complete Chapter List