Wave-Image Wave-Image

Keep It Clean! How to Clean and Validate Social Listening Data

You wrote your query and segmented out athletics, but is the data you’re receiving accurate and relevant to your school and categorized correctly? Even the most carefully written queries and rules may let irrelevant mentions seep in, so validating the integrity of (i.e., clean) the data is crucial to your social listening strategy.

Campus Sonar blog image for Keep It Clean: How to Clean and Validate Social Listening DataWhy Data Cleaning Is Necessary

Chances are your school shares its name, nickname, or acronym with a person, place, school, or term. For example, let’s look at the fictional University of California, Sunnydale (UC Sunnydale) from Buffy the Vampire Slayer.

Students may refer to the school simply as Sunnydale. However, since it's also the name of the city, searching for just Sunnydale would result in mentions of the city itself or other places with it in the name.

The school's abbreviation is UC☼D, but since symbols and searches don't work well together, the acronym with just letters could be one of the following:

  • UCSD, which is also the acronym for the University of California San Diego.
  • UCS, which is the acronym for several schools, companies, and terms.

Search tools such as location or proximity operators (e.g., Sunnydale within five words of campus or school) can help narrow your search, but it can be difficult to completely eliminate irrelevant mentions at the risk of also eliminating relevant mentions.

This is why the human touch is so important.

Humans > Software

Social listening software can do a lot of things, but it’s not perfect. The human eye is a vital step in the process. While most social listening software has built-in tools to help organize data, it can’t draw the same conclusions a human can. As an analyst, I’m in my clients’ data, reviewing all their mentions daily. As a result I’m able to:

  • Get to know the conversation happening around the school and begin to identify trends in the conversation.
  • Ensure the data I’m presenting to them is accurate and categorized correctly.
  • Adjust the sentiment appropriately.

How to Validate Data

When I clean client data, I follow these steps.

  • Use workflow management tools within our social listening software to mark mentions as checked. This helps me keep track of mentions I’ve viewed as I verify their relevance to the client.
  • View a list with snippets of all unchecked mentions since the last time I reviewed them by filtering out checked mentions.
  • Review athletics and non-athletics mentions separately to make sure they’re properly sorted, and update the segmentation rules I set up if I see trends.
  • Mark relevant and properly categorized mentions as checked.
  • Manually add/remove categories/tags from improperly categorized mentions, and mark them as checked.
  • Add mentions that are not relevant to the school to an irrelevant category and filter those out of my dashboard.

Depending on the volume of a client’s mentions, this process can be done manually, but if the client has several hundred mentions per day, it can be quite time-consuming to go through each individual mention. The following tips can help the process go more quickly and smoothly.

  • Batch mentions by terms and/or authors that you know are relevant to your school (e.g., owned content) to quickly mark them all as relevant and checked.
  • When you start to see trending topics/tweets that are relevant, irrelevant, or miscategorized, batch those and mark or adjust them all at once.
  • If you’re repeatedly seeing irrelevant mentions from the same authors or about the same topics, update your query (if possible), write rules for common irrelevant terms/conversation topics, or mark authors as irrelevant and filter them out of your dashboard.

The Cleaner, the Better

Digging into your data daily is important to make sure it’s relevant, but you’ll also begin to really get to know your data and the online conversation. You’ll start to recognize authors and trends, which will help you clean your data faster, understand the conversation, and even anticipate growing or potential issues as they arise.

Don't want to miss a single post from Campus Sonar? Subscribe to our monthly Brain Waves newsletter to get the latest and greatest about social listening in higher education delivered right to your inbox.

The post Keep It Clean! How to Clean and Validate Social Listening Data originally appeared on the Campus Sonar Brain Waves blog.

Emily Prell

Emily Prell is a Campus Sonar analyst who spends her days creating and optimizing social listening queries, cleaning and categorizing data, performing data analysis, and providing data-driven insights to clients through reports, dashboards, and presentations. Emily's love for analysis grew out of keeping stats during NBA games. Now she loves analyzing data and crunching numbers to help campuses and universities become more engaged with their students.

Subscribe to Blog

Recent Posts

View All Posts

Become a Guest Blogger

If you're doing something interesting with social listening, we want to know! Tell us your story and how you're using social listening to drive strategy on your campus. Become a guest blogger for the #BrainWavesBlog.