Video game review report

Author

Matt Crump

Published

April 17, 2023

Report overview

This report summarises reviews submitted for Video Game products on Amazon from 1999 to 2018 made available by Ni et al. (2019). In total there are 497577 reviews in the dataset.

Number of reviews by year

The below histogram shows the number of video game reviews submitted to Amazon by year. From 1999 reviews largely increased year-on-year which is unsurprising given the growth of Amazon and access to the internet. The dataset shows the peak number of reviews was 2015 with a decline from 2016 to 2018. It is likely that this reflects the dataset being incomplete for recent years rather than the number of reviews declining in reality.

Verified users

The dataset contains details of whether the review was based on a verified purchase. From Amazon Community:

An “Amazon Verified Purchase” review means that we’ve verified that the person writing the review purchased the product from Amazon, and didn’t receive the product at a big discount. Reviews that are not marked “Amazon Verified Purchase” are valuable as well, but, either we cannot confirm that the product was purchased from Amazon, or that the customer paid a price that is available to most Amazon shoppers.

Table 1 shows the number of reviews based on verified and unverified purchases.

Table 1: Number of reviews by purchase status
verified counts
FALSE 164932
TRUE 332645

Whilst the number of verified reviews is substantially larger than the number of unverified reviews, the below histogram demonstrates that this has not been a consistent trend and that the large increase in the number of reviews is largely driven by an increase in verified reviews.

`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.

Review ratings

Overall

Amazon review ratings are provided on a scale of 1 (worst) to 5 (best) stars. The histogram below shows the total number of reviews assigned each rating.

By purchase status

However, if you break this data down by verified purchases status you can see that whilst the number of verified and unverified reviews with 1 to 4 star reviews are similar, there is a very large number of 5 star reviews for verified purchases compared to unverified purchases.

#Get number of reviews per year

rating_by_verified_counts <- review_data %>%
  group_by(rating,verified) %>%
  summarize(counts = n())
`summarise()` has grouped output by 'rating'. You can override using the
`.groups` argument.
#ggplot(data = reviews_by_year, mapping = aes(x=year, y=counts))
#ggplot(reviews_by_year, aes(x=year, y=counts))

# plot
ggplot(data = rating_by_verified_counts, 
       mapping = aes(x=rating, 
                     y=counts, 
                     group = verified,
                     fill = verified))+
  geom_bar(stat= "identity", position = "dodge")+
  xlab("")+
  ylab("")+
  ggtitle("Review Ratings",
          subtitle = "By purchase status")+
  theme(
    panel.background = element_rect(fill = "lightgrey",
                                colour = "lightgrey",
                                linewidth = 0.5, linetype = "solid"),
    panel.grid.major=element_line(colour="black",linewidth = .05),
    panel.grid.minor=element_line(colour="lightgrey"))

Average ratings

By purchase status

Average ratings for verified reviews were higher (both mean and median) than for unverified review, likely driven by the number of 5-star reviews for verified reviews.

Table 2: Average ratings by purchase status
Verified Mean rating Median rating
FALSE 3.91 4
TRUE 4.37 5

By year and purchase status

Average ratings for verified purchases tended to increase over time, while average ratings for unverified purchases tended to decrease over time.

year_rating_status <- review_data %>%
  group_by(year,verified) %>%
  summarise(mean_rating = mean(rating),.groups = 'drop')

# plot
ggplot(data = year_rating_status, 
       mapping = aes(x=year, 
                     y=mean_rating, 
                     group = verified,
                     color = verified))+
  geom_line()+
  geom_point()+
  scale_x_continuous(breaks = 1999:2018)+
  scale_color_manual(values = c('#1b9e77','#d95f02'))+
  theme(axis.text.x = element_text(angle = 85, vjust = 1, hjust=1))+
  xlab("")+
  ylab("")+
  ggtitle("Average Ratings",
          subtitle = "By year and purchase status")+
  theme(
    panel.background = element_rect(fill = "lightgrey",
                                colour = "lightgrey",
                                linewidth = 0.5, linetype = "solid"),
    panel.grid.major=element_line(colour="black",linewidth = .05),
    panel.grid.minor=element_line(colour="lightgrey"))