Dr. Shalin Hai-Jew speaks on social media and her forthcoming IGI Global title

Social Media Data Extraction and Content Analysis: A Q&A with Dr. Shalin Hai-Jew

By IGI Global on Jul 26, 2016
Social Media Data Extraction and Content AnalysisDr. Shalin Hai-Jew is one of IGI Global's most esteemed editors and contributors. An instructional designer at Kansas State University (K-State), Dr. Hai-Jew has taught at the university and college levels for many years (including four years in the People’s Republic of China) and was tenured at Shoreline Community College. She has Bachelor’s degrees in English and psychology, a Master’s degree in Creative Writing from the University of Washington (Hugh Paradise Scholar), and an Ed.D in Educational Leadership with a focus on public administration from Seattle University (where she was a Morford Scholar). Dr. Hai-Jew's newest title, Social Media Data Extraction and Content Analysis explores various social networking platforms and the technologies being utilized to gather and analyze information being posted to these venues. Highlighting emergent research, analytical techniques, and best practices in data extraction in global electronic culture, this publication is an essential reference source for researchers, academics, and professionals. Dr. Hai-Jew recently took some time to speak with IGI Global on her ongoing research and exciting new book.
How did you become interested in social media?

I really like the affordances of social media platforms for research but am not by disposition a social media sharer. I don’t have a smart phone with a camera. I will only Tweet during a Twitter campaign for a professional organization that I’m part of, but that’s about it. I’m cyber-squatting a few social media sites, but I’m still unhatched on Twitter. I guess I do share a bit on SlideShare (, but that’s a great space for sharing slideshows and documents, and they’re great about enabling updates to contents. So I won’t say that I don’t share socially at all…but I do so in a very narrow channel…for work purposes. That said, if you give me even a small window, I will gladly access and study social media data…because there is a lot there that is of interest to me.

What is the state of social media currently?

Social media platforms are the current social “commons” or “public square” for this current age. It’s where people flock to engage in image-making and social performance. People will perform for an imagined audience. For example, recent news stories have covered the wide use of fembots for an online service that professed to connect married people with possible extramarital partners. It’s where people go to people-watch.

People are hardwired to be interested in other people—to engage others and to understand issues through the so-called “personality frame.” We intuitively “follow” those who admire or who intrigue us. Our eyes go to others’ faces, and our brains are rewarded based on social shout-outs and kudos. People care about their own reputations in the eyes of others, and they care about social relationships. These tendencies are amped up through social media, where people can imagine their hordes of fans and where social bots are deployed to create a sense of camaraderie where none may actually exist. People can be led on by their imaginations and their wishful thinking.

Any who have aspirations for affecting others’ awareness and behaviors at population scale—whether they have political, social, commercial, or other aims—have to have some presence online if they want any hope of making a difference. Think hashtag campaigns. Think some pretty impressive crowd-funding efforts. Most endeavors worth doing involve some level of group coordination and cooperation for modern human endeavors. Over a decade in, people are still engaged and entranced with social media.

Social media providers have been hard at work, too, to keep people engaged. The recent Pokemon phenom is yet another example of mass distraction, like the selfie trend before it.

In this decade or so, we’ve started to de-mythify what social media can and cannot do. For years, the goal was to spark “virality,” and there are now deeper understandings about the kinds of messages which go viral, and how to tell when something is starting to spark and trend. One core factor is that a message has to cross social networks in terms of human interest. Time is a factor, and the speed of spread of the message is important. Even so, all messages have a decay curve of some initial attention and then basically being forgotten over time.

In the mad rush for others’ attention, only a few have remained on the sidelines…those who work in security writ large, those who work in information (and know how much can be known about another through unintentional data leakage), privacy advocates, and other all-around skeptics. And maybe we can add technological Luddites. It takes a real discipline to keep secrets while engaging in regular communications.

In recent years, have social media platforms met your perceived expectations?

So much of how people think of the world is through social frames and social constructs. People do social reference; they look to others to know how to interpret the events around them. They judge others’ attractiveness and social standing based on who those people socialize with; it’s that “company you keep” phenomena. It’s intriguing to see how social relationships on social media platforms so often follow power law distribution curves, with a few who garner most of the attention and the common folk and wannabes existing somewhere along the “long tail,” with just a friend or two following. The research on activating people is also eye-opening. While people can create a huge following and a lot of attention on a thing, it’s not always that attention translates into money spent or behavioral activation. It is hard to change people’s minds because of built-in confirmation biases and how people selectively pay attention to information that supports what they already think. We all wear interpretive lenses that are informed by what we want to believe. Many times, there is a lot of noise and smoke and rumor but no “fire”.

In your opinion, why is social media currently such an area of research interest ?

So beyond the human factor, researchers are finding value to exploring social media data. First, generically speaking, researchers can collect a lot of data at minimal cost. This is data in the wild, and it is empirical data. After some cleaning to remove spam and other noise, this data can be analyzed with software tools that offer quantitative and statistical based insights.

They can study mass Web-scale various phenomena. They can localize their interests to particular regions. They can probe data by group variables—such as slicing population responses by demographic factors (think age, gender, class, race, ethnicity, regions, education level, and other aspects), by outcome variables, and so on.

We’re in a time when research interests have aligned with a major outpouring of information and ways to access and harness that information for research ends.

Technologically, is it difficult to extract data from social media?

Yes and no. A number of social media platforms have application programming interfaces (APIs) which enable developers to access some public (non-protected) data from their platforms, but these are often limited amounts of data—both in the sense of rate-limiting (amount accessible in a time period) and also in terms of total amount of data available. Other platforms are built on open technologies which are crawlable and which enable access to contents, to trace data, and to metadata.

Then, there are research software tools that harness web browser add-ons to extract data from social media platforms. These tools are designed around graphical user interfaces (GUIs) and are fairly well documented in terms of procedures and processes. A number of open-source and free high-level programming (scripting) languages have defined methods for how to scrape data from social media platforms, with some able to even tap the Deep Web by auto-filling out web forms.

Then, there are stand-alone web browser add-ons that may be used to capture text, URLs, imagery, and videos from various sites.

In other words, the cost-of-entry to capturing social media data contents is comparatively low. The data is there for the taking, and there’s lots of it. What could go wrong? Here is where researcher training and professional skepticism come in handy. Online, there is a lot of cheap talk (vs. costly signaling). There is a lot of automated malicious ‘bot activity and masquerading. No researcher worth his or her salt should fall for illusion over fact. Beyond this, researchers not only have to be expert in their respective fields, but they have to know what is going on on the social media platforms, and they have to know the affordances and constraints of their software tools.

Further, for researchers who need an N = all, though, most will have to go to respected commercial companies to acquire the full dataset and to run queries on big data. To do this, the researcher and research team would have to have a fair amount of sophistication in terms of how they structure their queries, how they analyze the data, and how they represent their findings in publications and presentations.

What are some common methods for analyzing social media data?

What I share is only going to be fairly limited. Manual methods are not uncommon. Researchers have their unique areas of expertise, and the extracted data may be analyzed by researcher teams to very positive effects. There are automated theme and subtheme extraction approaches; this topic modeling enables data summarization. Network analysis is another common approach to understand relationships between social media accounts…but also between concepts and data… Once a method is discovered, researchers can be very creative in applying that to various contexts. Cluster analyses are common—to capture similarity relationships. Sentiment analysis is also common, to understand positive and negative polarities of expressed opinions. Emotion analysis is a spin-off of sentiment analysis and offers even richer insights than sentiment alone. Linguistic analysis is common. Geographical data may be extracted from social media datasets, and this data is used to identify relationships between spatial proximity and distance, and captured research variables. There’s work going on to use computer vision and open-source computer-vision technologies to extract insights from still images and videos. Of course, every analytical approach applied in small scale can be applied in big data scale.

For all the insights available through machine learning and data mining, this is not to shortchange the affordances of human expertise. There is always human researcher oversight over automated processes. The researchers have to create coherence between their own domain expertise and their technology-enabled research insights.

The trick, in part, is to learn from these data sources and methods…without getting swamped or overwhelmed by them. The fundamentals still apply: What are important research questions in the field? What are solid methods to shed light on the issues? How can researchers mitigate their own cognitive biases in their research?

What are people learning about and through social media?

There are thousands of research articles related to various social media platforms and findings from both manual and auto-coded (machine-coded) insights. Researchers have gone to social media to understand collective intelligence and sentiment around particular issues. They have explored social media for business, e-governance, security, and other strategies. While there are unique locally-focused cases, there are also boggling big data approaches that include tens of millions of records. I sound like a rube when I mention the massive scale sizes of data because it’s really been years since the whole Web was mapped, and the data collection continues apace. There are a number of methods to advance understandings in the various fields.

Tell us a bit about your new book, Social Media Data Extraction and Content Analysis.

Gladly. So Social Media Data Extraction and Content Analysis was a project which I started in March 2015. It’s been in development about a year and a few months. The proofs were just returned this past week. The book has four sections: Part 1: Modeling with Social Data, Part 2: Analytics from the Online Crowd, Part 3: Tapping Specific Social Media Platforms, and Part 4: Applied Uses of Social Media Data for Awareness and Problem-solving.

The authors hail from multiple continents, and they are informed and expert in their respective fields. To see who the authors are and their chapters, the book description is available at the book's website.

Related Posts:
Dr. Hai-Jew discusses Enhancing Qualitative and Mixed Methods Research with Technology
Q&A with Shalin Hai-Jew
Dr. Shalin Hai-Jew Presents at SIDLIT 2014
Visual Outputs of Maltego Radium
Crawling IGI Global’s Twitterverse
Browse for more posts in:
EducationMedia and CommunicationsNetworkingSocial Sciences and HumanitiesHuman Aspects of TechnologyMultimedia TechnologySocial ComputingBook SeriesBooks & E-BooksChaptersInfoSci-BooksInterviewNorth AmericaAuthor NewsResources for DistributorsResources for InstructorsResources for LibrariansResources for Researchers

No comments Comments

Log in or sign up to comment.
Be the first to comment!