Analyzing Sentiments in Tech Communities

Understanding Apple’s Brand Perception Through YouTube’s Tech Communities

Code: Github

Abstract

Understanding consumer sentiment and emerging trends in brand perception is crucial today for companies to stay competitive. This project focuses on studying conversations regarding tech products, notable Apple’s, across multiple categories. We hope to answer questions about the dominating themes on YouTube influencer channels and how consumers perceive Apple’s products compared to competitors, along with what features drive this sentiment. Our data collection includes video transcripts and comments from tech influencers on YouTube (MKBHD, Unbox Therapy and CNET), with an emphasis on balanced representation of Apple and competitor products. The findings of this analysis can help Apple and other tech companies in taking decisions around marketing strategies and brand positioning.

Introduction

This project addresses four key questions to delve into consumer perceptions. We aimed to (i) identify specific features that drive sentiment, (ii) investigate dominant themes in influencer discussions, (iii) compare consumer perceptions of Apple’s products to those of competitors, and (iv) assess the impact of creator sentiments on viewer comments.

To answer these questions, we carried out an aspect-based sentiment analysis (using DeBERTa ABSA) across video transcripts and comments on multiple YouTube channels and product categories for both Apple and competitor brands.

Our preliminary findings show that (i) users prioritize features such as build, design and performance in Apple products with a largely positive sentiment towards these aspects. In terms of influencer themes, (ii) the discussions revolve around display quality, materials, performance and user interaction.

When it comes to consumer perceptions, (iii) Apple has a more positive sentiment than its competitors for the Phones and VR category while for other products Apple has a lesser positive sentiment. And lastly, (iv) our findings suggest that consumer comments is largely independent of influencer sentiments.

Data Collection

Our data collection process involved scraping video transcripts and comments from popular YouTube tech influencers. We picked 3 YouTube channels for this purpose - Marques Brownlee (MKBHD), Unbox Therapy and CNET. We picked these channels as they are highly active on YouTube and review every new tech product in detail; and also have a high number of subscribers which suggests their videos would also have a high number of comments for us to analyze.

We focused on videos discussing tech products in 4 categories - phones, earphones, watches, and VR headsets, both from Apple and its competitors. We wanted to make sure that we cover a range of products to get a holistic understanding of brand perception and we believe these 4 categories encompass the current tech landscape quite well. We also made an effort to have a balanced representation of each brand in our dataset. For this purpose, we limited our data extraction to the same number for every product-brand combination. For instance, we collected information on 30 videos for the category ‘Phones’ for Apple, and similarly 30 videos each for the competitors - Google and Samsung.

To collect this data, we first identified Apple’s direct competitors in each category and defined keywords for the products launched by both Apple and the competition. For the category ‘Earphones’ since there were not enough videos about one particular competitor, we decided to club all competitors together.

Product	Brand	Keywords
VR headsets	Meta	‘meta quest’
VR headsets	Apple	‘apple vision’
Watches	Google	'pixel watch'
Watches	Samsung	'galaxy watch', 'watch 6 classic', 'watch 5 pro'
Watches	Apple	'apple watch', 'watch se', 'watch ultra'
Earphones	Competitors - Pixel, Samsung	'pixel buds', 'galaxy buds', 'bose qc', 'sony wh-1000'
Earphones	Apple	‘airpods’
Phones	Samsung	'S24', 'S23', 'S22'
Phones	Google	'Pixel 8', 'Pixel 7', 'Pixel 6'
Phones	Apple	'iPhone 15', 'iPhone 14', 'iPhone 13'

We then filtered out relevant videos using these keywords and we matched keywords such that if any video title contained any of these keywords that video’s information would be extracted. This way we were able to ensure we get a comprehensive and exhaustive list of videos - for instance, the keyword’ iPhone 15’ would also fetch the videos which have ‘iPhone 15 Pro Max’ in their titles.Finally, we extracted the video transcripts and comments corresponding to each video using Google’s Youtube API. In total, we had 235 video transcripts and 4415 comments.

Analysis

Data Preprocessing

After collecting the data for both video transcripts and comments, the next step was to process the data to prepare it for analysis. The preprocessing of our text data involved the usual steps undertaken in the NLP pipeline - removal of stop words, conversion of all characters to lowercase, removal of numbers and special characters from the text; as well as some cleaning steps that are specific to social media text such as removal of emojis, mentions and hashtags.To illustrate with an example, this comment on MKBHD’s video about the Apple Vision Pro ‘That’s completely true pal !👍I am just wondering who will spend $4000 for no app support, 2hour battery life etc... Lots of software dev ops just walked away... Remember app support is always crucial 😮 ’ after preprocessing became ‘thats completely true pal wondering spend app support hour battery life etc lots software dev ops walked away remember app support always crucial’. We can observe that the comment text is now free from numbers, special characters, and emojis and can be now converted into meaningful tokens for sentiment analysis.In addition to this, we also noticed that some comments are not relevant to the discussion about the product. These may be generic comments about the creator or the video quality. So in our next step of preprocessing, we filtered out comments that were not relevant to the product with the help of keywords from the video transcripts. Along with these keywords we also used some additional stop words including the names of the YouTube channels to manually filter some more irrelevant comments. As a result of the above, we were left with 2880 of the original 4415 comments.

VADER Sentiment Analysis

To get a basic understanding of overall sentiments of users in the comment section, we performed VADER sentiment analysis on the preprocessed comment data.

Firstly, we saw the overall sentiment amongst users for Apple products vs the competitors’ products. We saw that Apple products had comments with less positive sentiments in comparison to their competitors and more negative sentiments.

In addition to this, to get an in-depth understanding of where Apple is lacking, we performed a sentiment analysis using VADER for each product category for Apple vs competitors. By doing this, we saw that in the phone department as well as in the VR headset department, Apple is doing comparatively better with more positive sentiments in the comment section than their competitors - Google, Samsung phones and Meta VR. However, Apple is lacking in all other product categories, such as earphones, and watches.

To get a clearer picture about which features or aspects drive this sentiment for each product, we then delved into topic modeling to investigate sentiments based on different features.

Topic Modeling

Using Latent Dirichlet Allocation (LDA) from the Gensim library on our transcript data we extracted top topics being discussed in the videos. We initially extracted 10 topics for every brand-product combination (eg. Apple-Watch) and every topic had around 15 keywords that are likely to co-occur in documents.

There is a score attached with every keyword that denotes the probability of that keyword belonging to the topic. The keywords like 'phone', 'battery', and 'workout' suggest discussions about features of the Apple Watch, such as its connectivity with the iPhone, battery life, and fitness tracking capabilities. Words like 'like', 'going', and 'day' might indicate discussions about users' daily routines or activities with the Apple Watch. The last value indicates a coherence score which is a measure of how well-defined the topic is and a negative value suggests that this topic may not have keywords that strongly relate with each other.

In our analysis across all products, we found that the topics ‘Build’, ‘Design’, and ‘Performance’ are the most discussed topics in the videos.

Upon observation of the topics and keywords for every product-brand combination we analyzed the general themes of discussion about the products. We then manually came up with a list of 8-12 keywords for each product relevant for conducting an aspect-based sentiment analysis.

Product	Keywords
VR	'performance', 'experience', 'gaming', 'mixed reality', 'gyroscope', 'audio', 'visual', 'technology', 'specification'
Watches	'features', 'design', 'fitness', 'health', 'comfort', 'display', 'comparison', 'battery', 'integration', 'sleep',' sensor'
Earphones	'features', 'wireless', 'sound quality', 'comfort', 'technology', 'design', 'noise cancellation'
Phones	'features', 'build', 'camera', 'video', 'updates', 'connectivity', 'charging', 'power', 'performance', 'comparisons'

Aspect Based Sentiment Analysis

After defining the relevant topics for each product with the help of topic modeling, we decided to conduct an aspect-based sentiment analysis on both the transcripts and the comments to understand which aspects drive particularly positive or negative sentiment. For this purpose, we used the model DeBERTa ABSA available on HuggingFace.

Our plan was to first analyze the perception of the influencers towards these topics by measuring the sentiment towards the selected products of each brand, and compare them.

We plotted the aspect-based sentiment for every brand- channel combination. This would help disclose if certain influencers usually rate products by a certain brand higher or if someone is more critical and hence slightly biased.

For the plot for ‘Phones’ category [left] we can see that

MKBHD has the most positive sentiments about Apple products, and the gap between him and other influencers is significantly larger for Apple than for other brands. CNET also has a comparatively better perception of Apple as his perception of Google and Samsung is worst among the 3 influencers. Unbox Therapy, on the other hand has much lesser positive sentiments towards Apple than his sentiment towards Google and Samsung

In the second step of this process we checked if the influencers’ perception impacts how the viewers are percieving the products. Additionally, this would give us an idea of how different products are being rated for the identified aspects that are being discussed in the videos.

A similar analysis for the ‘Phones’ category on comments data revealed the following plot:

While in the transcripts analysis we saw the same sentiment trend across aspects for the phones, we can see that the viewers tend to have a more balanced opinion where they like particular aspects of each brand’s products. For instance, we can see that across all 3 influencers, the ‘Power’ aspect of Google phones drives a much higher positive sentiment than Apple and Samsung.

We saw in the transcripts analysis that MKBHD’s sentiments for Apple are overwhelmingly positive, but that is not the case when it comes to comments under his channel, and the sentiments are more evenly spread out. In fact, the comments under this channel have the most positive sentiments for Google.

The most positive sentiment for Samsung phones comes from comments under the 'Unbox Therapy' channel. In general, the comments under this channel often have the most positive sentiment for each aspect.

For CNET, while we saw a preference for Apple through the transcripts, the comments under the channel seem to be more balanced with the least positive sentiment for Samsung phones.

This shows that while creators may have some bias towards certain brands, the viewers in the comments do not necessarily agree with their opinions every time and their perception of the brand often differs from the creator.

Discussion and Conclusions

The results of our project provide insight into how a brand is perceived online by creators as well as consumers. Revisiting our research questions:

Most Apple products have a positive sentiment towards the features that are most important to users. Based on the assumption that influencers discuss the topics that users are most likely to find valuable in order to garner more views and engagement, users mostly look for features like build, design and performance while looking at a product. Apple products tend to fare well in these aspects, indicated by the positive consumer sentiments.
Most discussed themes in the videos are ‘Display’, ‘Build’, ‘Performance’, and ‘Interactivity’. These topics are the most discussed topics by influencers across products and brands, which may provide an insight for brands as to which areas they can focus more on during product development processes and marketing strategies.
The overall sentiment for Apple is more positive when it comes to Phones and VR headsets while competitors’ products have a higher positive sentiment in Earphones and Watches. While Apple leads the Phones and VR category by 2 and 2.45 percent more positive sentiment respectively, they are lagging behind by 12% in Earphones and 10.2% in the Watches category. This also explains why the overall perception of Apple, when combining all products, is less positive than the competitors.
The sentiments of the commenters is agnostic of the sentiments of the influencers towards most products and brands, except for VR products. This is an interesting finding and reveals that even though some influencers may be biased towards ceratin brands, for monetary or non-monetary reasons, the users commenting on these videos do not seem to be affected by that bias and give their honest opinions in the comments. The only exception to this is VR headsets and that may be due to the fact that it is a new category and there hasn’t been that much of a penetration in the market for the users to have their own opinions.

One of the biggest limitations we found during our preliminary analysis is that users post non-specific comments in the youtube videos. For reference, a lot of comments were about the Youtuber’s camera quality. These kind of broad and non-specific comments can introduce bias in our analysis and can make it difficult to pinpoint sentiments which are explicitly related to the tech products and their features.

Another limitation to this study is the difficulty in accurately categorizing the sentiments. Because of the complexity of human language, which can include a lot of idioms, slangs and context-specific phrases, the machine learning model we used, may not be able to generate accurate sentiment scores.

Some other limitations also include the usage of limited keywords by us to filter out relevant videos. Also, since our study was only focused on YouTube comments, it may not fully represent the true consumer sentiments which may be present in different platforms.

Future work can be to build upon the extension of the scope of our study. We could broaden our data collection to include user perceptions from different platforms such as Instagram, X and Reddit. A more diverse set of perceptions can provide us a more holistic view of consumers sentiments. Additionally, in our current study we used a rule based approach for sentiment analysis like VADER but in our future work we could look into more advanced machine learning models that can help us to understand the language nuances as discussed above, to increase the accuracy of sentiment analysis.

We can also compare the sentiments analysis results with the actual sales data of a company, which can help us to understand the direct correlation between the sentiments of users vs the actual consumers’ purchasing behavior.

Shinjini Guha