The promise went something like this......
Let AI do the heavy lifting of understanding the voice of the customer at scale and the effect it was having on your business.
Give CX teams more time to figure out how to improve the experience for customers in a data driven way and implement solutions quickly.
Unfortunately, the devil was in the details. What we all expected was the ability to quickly understand what customers were saying and the impact it had on metrics the business cared about.
Instead, what we got were rigid black box systems with long setup times and large amounts of ongoing manual maintenance required. Worse still, these solutions had no way to handle the unknown unknowns of customer feedback. They needed us to tell them what to look for, meaning they could only capture known threads of discourse. So how exactly did we get here?
What is the traditional approach to text analytics?
To understand the traditional approach to text analytics, we first need to delve into the human approach at solving the problem without technology. The most popular way of doing this is called a Thematic Analysis. It’s a fairly complicated process, but it can be summarised as follows:
Initial Exploration: First step is to get a general feel for the data. Read over some pieces of feedback, let's say a few hundred randomly sampled pieces, paying particular attention to any repeating patterns in the language that you can see. At the conclusion of this step, you should have some idea about the high level themes or topics people are talking about in the data.
Code Generation: This can also be referred to as label generation. In this step, we are formalising what we have found in step 1 by coming up with a set list of codes for the text. Unfortunately, this requires some more reading of data, perhaps another 500 pieces of randomly sampled data. By the end of this step, we should have a list of codes/labels that we have created. They should be descriptive enough so that the consumer can understand the “idea” behind the code and the meaning it is trying to capture.
Review of Codes: While this step is optional, it is good practice to ensure high quality outputs. Take another random data sample, a few hundred data points, and begin applying the codes to the text. Pay particular attention to any text that doesn’t get any codes/labels assigned to it or codes that have potentially been missed. If required, update the list of codes/labels.
Manually code/label the data: Now the fun really begins. Go back to the start of the data and read through it all, piece by piece, assigning the appropriate codes/labels as you go. If you want to maintain high standards, you should have more than one person do this, and you should put quality controls in place like inter-code reliability and intra-coder reliability.
Rinse and repeat: As you can well imagine, human language isn’t static. For example, prior to 2020, very few people in the western world knew what a coronavirus was, and it certainly wasn’t appearing in customer feedback. For data sources that are ongoing (like surveys, support centres, social listening, etc.), which is most data sources in a CX use case, it’s not enough to set and forget the codes. You need to periodically review. Conservatively, you need to do so at least once a quarter. For high quality results, you want to be doing this at least monthly.
(This is a simplified version of a true Thematic Analysis. I’d encourage you to read the Wikipedia article if you want a more exhaustive definition.)
Exhausting right? If you are still doing this process today, you have both my condolences and respect. But it isn’t all bad news. Technology to the rescue!
Traditional text analytics solutions saw the inefficiencies of the manual approach and applied technology to try and help. They do this by automating step 4, manually code/label the data. If you give these solutions the codes/labels you want to use and examples of where you have done this manually, they can train the system to do this for you with acceptable accuracy. So what’s the downside?
Well, as I said previously, the devil is in the details. In this instance, it comes in a few forms. Firstly, you still have to provide the codes/labels which means manually reading at least some of the data. Secondly, the amount of training data you give these systems is pretty closely correlated with their accuracy. In order to provide the training data, you need to code/label some data, at least 1,000 data points. Lastly, as we know, human language isn’t static. You will need to continually perform step 5, a periodic review of new data, to try and capture new codes/labels as they emerge. When you find a new code, you will need to manually code enough data so that you can provide the solution with some training data.
"Traditional text analytics has augmented the human approach to understanding text by attempting to automate one step of the process."
Because of this, they are inherently constrained by how a human would approach the problem. Crucially, they don't solve the discovery challenge, which is that most people don't have time to constantly review their data to check for new codes, leading to companies missing those previously mentioned unknown unknowns.
What does the new wave of solutions look like?
What if we flipped this script on it’s head?
What would the solution look like if we set aside for a minute how a human would solve the problem and instead focus on how a machine could solve this problem?
What if technology could solve the code identification problem and the coding problem? Or, put another way, what if technology could tell us both what our customers are talking about and how often are they talking about it?
Customer insights teams would get large amounts of time back, allowing them to spend more time doing what they do best - taking the time to understand how what people are saying impacts their behaviour and sharing actionable with the business. To me, this is what insights should be about!
I think the best way to conceptualize the difference between traditional text analytics solutions and the ideal solution that I’m describing is ‘search vs discovery’.
Traditional text analytics approaches are using a search based approach. You tell them the codes you want to find and train the system how to go find them. You’re searching for known knowns.
The alternative I described is a discovery based approach.
The technology tells you what you need to know, capturing the known knowns, but also the known unknowns and, most importantly, the unknown unknowns.
If you're using a traditional text analytics tool today, ask yourself this: how long did it take you or your vendor to add a COVID code to your text analytics ontology?
The only reason customer insights teams knew to do this was because COVID was in the news. Just imagine if something was affecting your customer base that wasn't in the news, and you missed it. This is exactly why you can't rely on a human understanding of intent to guide your discovery methodology.
Traditional text analytics has failed CX teams because it started from a point of how can we make the human process more efficient, rather than figuring out ways to let technology solve this problem, irrespective of how humans are solving the problem.
Understanding customer feedback and acting on that understanding has to be agile. Eight out of ten consumers rate the experience they receive with a brand as just as important as the product on offer. Taking weeks or months to notice an important issue affecting your customer’s experience is no longer acceptable.