Keep It Clean! How to Clean and Validate Social Listening Data

You wrote your query and segmented out athletics, but do you have accurate and relevant data and is it categorized correctly? Even the most carefully written queries and rules may let irrelevant mentions seep in, so validating the integrity of (i.e., clean) the data is crucial to your social listening strategy. That's where expert human analysts come in.
Why Data Cleaning Is Necessary
Chances are your school shares its name, nickname, or acronym with a person, place, school, or term. For example, let’s look at the fictional University of California, Sunnydale (UC Sunnydale) from Buffy the Vampire Slayer.
Students may refer to the school simply as Sunnydale. However, since it's also the name of the city, searching for just Sunnydale would result in mentions of the city itself or other places with it in the name.
Okay but i cant be the only one sick of everyone in #Sunnydale dressing like theyre in #SouthPark. Girl, the desert is a sensible drive from the city. Y'all needa chill with the parkas and beanies 😒😂
— Reneaux Ruffin (@ReneauxRuffin) July 19, 2018
The school's abbreviation is UC☼D, but since symbols and searches don't work well together, the acronym with just letters could be one of the following:
- UCSD, which is also the acronym for the University of California San Diego.
My boss was sad that today was my last day and that she won’t see me for my bday so she took me to the UCSD bookstore to buy me a UCSD shirt. I’m crying, she’s so wholesome 😭 she also offered me 2 full time jobs just so I can stay in her team 😭😭
— julz (@juliannangyn) July 13, 2018
- UCS, which is the acronym for several schools, companies, and terms.
#Lego needs to make a #UCS republic #venator class cruiser I would buy that in a second #starwars #CloneWars
— Antony D Burlace (@realADBurlace) July 13, 2018
Search tools such as location or proximity operators (e.g., Sunnydale within five words of campus or school) can help narrow your search, but it can be difficult to completely eliminate irrelevant mentions at the risk of also eliminating relevant mentions.
This is why the human touch is so important.
Humans > Software
Social listening software can do a lot of things, but it’s not perfect. The human eye is a vital step in the process. While most social listening software has built-in tools to help organize data, it can’t draw the same conclusions a human can. As an analyst, we're in our clients’ data, reviewing all their mentions daily. As a result we're able to:
- Get to know the conversation happening around the school and begin to identify trends in the conversation.
- Ensure the data we're presenting to them is accurate and categorized correctly.
- Adjust the sentiment appropriately.
How to Validate Data
When we clean client data, we follow these steps.
- Use workflow management tools within our social listening software to mark mentions as checked. This helps keep track of mentions we’ve viewed as we verify their relevance to the client.
- View a list with snippets of all unchecked mentions since the last time we reviewed them by filtering out checked mentions.
- Review athletics and non-athletics mentions separately to make sure they’re properly sorted, and update the segmentation rules we set up if we see trends.
- Mark relevant and properly categorized mentions as checked.
- Manually add/remove categories/tags from improperly categorized mentions, and mark them as checked.
- Add mentions that are not relevant to the school to an irrelevant category and filter those out of the dashboard.
Depending on the volume of a client’s mentions, this process can be done manually, but if the client has several hundred mentions per day, it can be quite time-consuming to go through each individual mention. The following tips can help the process go more quickly and smoothly.
- Batch mentions by terms and/or authors that you know are relevant to your school (e.g., owned content) to quickly mark them all as relevant and checked.
- When you start to see trending topics/tweets that are relevant, irrelevant, or miscategorized, batch those and mark or adjust them all at once.
- If you’re repeatedly seeing irrelevant mentions from the same authors or about the same topics, update your query (if possible), write rules for common irrelevant terms/conversation topics, or mark authors as irrelevant and filter them out of your dashboard.
The Cleaner, the Better
Digging into your data daily is important to make sure it’s relevant, but you’ll also begin to really get to know your data and the online conversation. You’ll start to recognize authors and trends, which will help you clean your data faster, understand the conversation, and even anticipate growing or potential issues as they arise.