Introduction

A year ago and with permission, we went through the process of analysing Campaign Brief articles and commentary.

A key finding from this work was that comments about agency work are net positive overall (22% are negative, 63% are positive, the rest are truly neutral), when using advanced sentiment scoring techniques.

However, if you’ve spent any time at a creative/media/PR agency or follow them (or their people) on LinkedIn, you’ll know that 22% feels low. Within the industry, there is a perception that negative comments are common and problematic. This is best represented by tongue-in-cheek promotional work from agencies and industry initiatives to address the matter.

When we first identified the 22%, we were aware that perception was likely to be higher than what our models were suggesting.

Subsequently, our first iterative question was:

Is negative commentary more likely to be strongly remembered and felt?

We are fortunate at Critical Truth that one of our founding partners (Jonathan O’Hara) is a PhD in Psychology (among many, many other talents). His explanation was:

Negativity bias - we are more likely to remember the bad than the good
The Pollyanna principle - we are more likely to overstate the veracity of something bad

But that isn’t the end of the matter. We are aware of the limitations of sentiment scoring (even advanced methods). While the fundamental methods of scoring have advanced massively (accounting for valence shifters, for example), this is not geared towards assessing additional context or potential undertones.

Historical attempts to achieve this are based on relatively archaic ancillary resources (e.g. the NRC emotion lexicon).

We (Critical Truth, who have all worked agency side) are also aware of how comments are typically presented on Campaign Brief. Something could be sentimentally neutral, but tonally negative by virtue of some linguistic or cultural undercurrent.

Thus, the leading question for this project is:

Are the sentiment scoring models missing something?

Sentiment modelling only tells us the sum of a sentences parts, in the context of how good/bad words are, and the combination in which they are used. But what is the target of the good/bad and what common, linguistic intricacies might lie within?

If they are missing something, the following research questions are:

What are they missing?
What adjustments need to be made to sentiment scoring in order to get a more accurate portrayal?

Once we understand these, our hope for this research is that we can:

Help commenters take pause and consider their writing, from the perspective of the recipient
Inform commentary guidelines
Potentially establish a baseline for moderation

Exploratory Data Analysis (EDA) and feature development

Methods: Themes, volume, and sentiment

With more data and a refined process, we find 10 common article themes on campaign brief. Qualitative review allows us to aggregate these in to three higher-order themes; The Work, Company news, and Awards.

Comment volume is not evenly balanced with article volume. Articles about the work make up 35% of articles, but account for 57% of comments. Company news is 48% of articles, and 38% of comments. Awards account for 17% of articles, and only 3.5% of comments.

Subsequently, any detailed analysis on comment sentiment will discount articles within Awards.

Sentiment score distributions, as found last year, are net positive. However, articles about The work are less likely to be so than Company news (36% highly positive, compared to 48%).

We also find that shorter comments (>15 words) are much more likely to be highly positive and more likely to be accurate. These comments contain direct, effusive praise - ‘Amazing! Well done XYZ!’. Given the context of our RQ (are the sentiment models missing something?), we focused the development of incremental features on comments that are >15 words in length.

Methods: Feature development

Something that can be inferred as negative doesn’t need to be linguistically emotive.

Take this comment, for example:

“Yeah.. I don’t get it. Feel it misses the mark. Like why? Why should I tap? What are you trying to tell the consumer?”

This is harsh comment but only by virtue of the context we, the reader, are aware of. This is commentary on somebody’s work that they are undoubtedly proud of, and we know that. A sentiment model does not. All it knows is that there aren’t really any overtly bad words in there.

Subsequently, its sentiment score is -.015, slightly negative.

At a purist level, this evaluation is probably right. It is only slightly negative, but it feels worse.

This is because sentiment modeling doesn’t label subtleties, nuance, or account for context.

If we can identify those, we will better understand the disparity between what is calculated and what is perceived or felt.

We identified some of these subtle nuances and intricacies across the components of contextual sentiment (e.g. criticism, praise), targets (who the comment is aimed at), linguistic tropes (e.g. puns or innuendo), and tonality (e.g. sarcasm, humour). These four components were identified during EDA.

Within the components, we analysed the comments to find requisite attributes that our analysis should prioritise. We did this by:

Contextual sentiment
- Isolating common predictors of positive/negative sentiment within themes
Targets
- Named Entity Recognition (NER) within themes
Linguistic tropes
- Identifying the best examples of comments across our 5 buckets of sentiment grouping and qualitatively identifying tropes from this list
Tone
- As above, but using this list

We then applied some novel methods with LLMs to calculate the probable presence of any of those attributes across the four components.

Key findings

Comments are disproportionately focused on agency announcements about work they have launched. 60% of comments, 35% of articles
Even with 12 months more data, raw sentiment of comments still tracks net positive; >50%
Short comments (<15 words) make up 62% of all comments and are more likely to be positive
Criticism is observable in 40% of comments with >15 words
Disparagement is the most common tonal intricacy employed in comments with >15 words, where the article is about agency work
Innuendo is the most commonly employed linguistic trope
In positive comments, the most likely targets are individuals or group of individuals
In negative comments, the most likely targets are the idea and the execution

RQ1 - Are the sentiment scoring models missing something?

Methods

To answer this question, we leverage the attributes of ‘Contextual sentiment’ and cross reference with sentiment scores. This allows us to identify misalignment between what the NLP sentiment scoring models see, and what the LLMs infer from contextual sentiment. Where the LLMs identify criticism or complaint but the NLP models score sentiment as positive, we label it as misaligned. We also label as misaligned if the LLMs identify praise or celebration, but the NLP models score as negative.

Sentiment scores tend to be on a -1 to +1 scale, and the attributes of Contextual sentiment are a confidence score of detection within a comment (0-100).

Key findings

We find aligned consensus on sentiment between LLM contextual labeling and NLP sentiment scoring 75% of the time.
But, are the sentiment scoring models missing something? Yes. A 25% disparity in alignment is not to be ignored.
Misalignment is most likely to occur in comments about the work (26%), and particularly celebrity endorsed campaigns (28%). Creative leadership news is where the strongest alignment can be found (85%).

RQ2 - What are they missing?

Methods

To answer this question, we leverage the attributes within ‘Tonal intricacies’, ‘Linguistic tropes’, and ‘Targets of comment’ across articles where sentiment is misaligned.

Key findings

The most likely attributes to be associated with misalignment can be found within tonal intricacies. Tonal intricacies are identifiable in 98% of comments. The most likely attributes associated with misalignment are:

Disparagement
- Present in 6.4% of all comments and have 36% misalignment
Contempt
- Present in 3.4% of all comments and have 35% misalignment
Pessimism
- Present in 2.5% of all comments and have 36% misalignment
Sarcasm
- Present in 2.5% of all comments and have 32% misalignment
Humour
- Present in 3.6% of all comments and have 23% misalignment

Linguistic tropes are uncommon overall (detectable in 50% of comments), but the most likely sources of misalignment come from:

Innuendo
- Present in 2.5% of all comments and have 31% misalignment
Antonomasia
- Present in 1% of all comments and have 26% misalignment

Targets are detectable in 84% of all comments, and the most likely attributes associated with misalignment are:

The idea
- Present in 2% of all comments and have 34% misalignment
The execution
- Present in 4.3% of all comments and have 26% misalignment

RQ3 - What adjustments need to be made to sentiment scoring in order to get a more accurate portrayal?

Methods

To answer this question, we leverage the attributes within ‘Tonal intricacies’, ‘Linguistic tropes’, and ‘Targets of comment’ as predictors of sentiment where alignment is achieved between LLMs labeling contextual sentiment and NLP sentiment scoring. We use the outcomes of this to predict how a sentiment score should be adjusted.

Contextual sentiment sets the starting point of sentiment (if misaligned).

Tonal intricacies, linguistic tropes, and targets become multipliers for a comments sentiment (if detected).

Key findings

Predictors of sentiment

Disparagement will reduce sentiment by 21%
When combined with other tonal intricacies, this effect increases (e.g. sarcasm and contempt, to 29%)
Humour can be used negatively or positively, but with a stronger detrimental effect than a positive one
All linguistic tropes bring down sentiment. Particularly innuendo, simile, and cliche
People and teams are strong multipliers of sentiment
Comments that reference the idea, the execution, clients, other commenters, or society at large, are strong predictors and multipliers of negative sentiment

Adjusted sentiment

Adjusting sentiment for comments longer than 15 words (38%), we find that 55% of comments about work are negative
Including all comments and adjustments, 27% of comments about Work are negative
For comparison - un-adjusted comments about Work are negative 20% of the time

Discussion

Our hope for outcomes of this analysis was to:

Help commenters take pause and consider their writing, from the perspective of the recipient
Inform commentary guidelines
Potentially establish a baseline for moderation

Our discussion will address each of these directly. We will do so with Negativity bias and the Polyanna principle in mind. While we did find that sentiment scoring models are missing something(s), the total sentiment of commentary is still net positive. This flips when one only considers comments more than 15 words in length, but that is 32% of all comments.

Negativity bias and the Polyanna principle are still the leading explanations for the disparity between measured sentiment and perceived sentiment.

This does not diminish the importance of the perception. Just because comments are sentimentally positive in a linguistic sense, it does not mean that it will be inferred as such by the reader. Using these techniques, we have a greater appreciation for what these aspects might be.

Help commenters take pause and consider their writing, from the perspective of the recipient

A long comment is more likely to contain nuance that might exacerbate inference of criticism and negativity.

However, if one is to share what one considers to be a nuanced critique on the work published, consider:

Disparagement is all too easily inferred
Sarcasm hurts
Cliches, similes, innuendos, hyperbole, and rhetorical questioning offer nothing constructive and just hurt the creators of the work
Always balance with recognition for the people behind the work

Inform commentary guidelines

Criticism tends to be veiled behind linguistic vaguery, where as praise is overt and effusive. As such:

If you must offer a critique, be specific and acknowledge the humans
Lazy use of tropes to convey an ethereal sentiment should be avoided
Jokes must be in good faith

Potentially establish a baseline for moderation

This approach has demonstrated that comments can be moderated manually, automatically, and potentially by the community.

Manual moderation should be the last resort but is not unmanageable with the above guidelines. Campaign Brief receives an average of 21 comments per business day overall, but this is heavily biased towards articles about the work, which contain almost 60% of all comments.

If one wishes to automate this process, simple sentiment analysis will fulfill the requirement 75% of the time.

For those that fall outside of the 75%, the most reliable methodology would be to detect disparagement, contempt, and sarcasm. These are easily veiled in a manor that won’t be detectable by sentiment scoring, but are easily detectable by LLMs and can be established in an automated fashion via API.

Thank you for reading!