Brew to success

by Tim Brunner, Dana Kalaaji, Cyril Golaz, Alexander Odermatt & Ajkuna Seipi

🌟 Motivation 🌟

Brewing the perfect beer is a task many set out for and only a few, if any, even achieve. When it comes to crafting a beer, choosing the right ingredients, temperature, brewing equipment and overall recipe design is crucial while leading to an explosion of possibilities. It is this vastness of possibilities that captivates so many brewers around the globe chasing the perfect beer. With our project we aim to break the flood of information supplied by the beer rating websites BeerAdvocate and RateBeer down into easily digestible pieces of information, leading to meaningful insight for interested readers and might even increase their chances of brewing a well liked beer.

Our mission is to extract the best combination of beer characteristics for each month. The goal is to be able to recommend to users and breweries the most popular one that will get your taste buds dancing ! To achieve it, we will analyze different characteristics of a beer, namely the alcohol content, the location of breweries along with the aroma, taste, appearance and palate ratings for each month. Our analysis is mostly based on time series analysis coupled with statistical tests and keywords analysis.

Say goodbye to the same-old beer that is always lying in your fridge, and get ready to break free from your beer routine and embark with us on this adventure that will allow you to experience THE beer that perfectly matches the mood of the month.

Convinced ? Then let’s go together on the hoppy, malty, and fizzy road to the perfect beer !


🍻 A popular Beer 🍻


To set the popularity of a beer based on its ratings, we need to establish the threshold rating that defines a beer as popular. To achieve this, let’s examine the distribution of ratings from both BeerAdvocate and RateBeer.



As observed earlier, notable differences exist in the distribution of ratings between the two datasets. Firstly, RateBeer has a significantly larger number of ratings compared to BeerAdvocate. Moreover, the distribution varies between the two datasets, particularly when we focus on the averages of the two distributions. We also see that users on RateBeer tend to give harsher ratings than those on BeerAdvocate. From all these findings, we conclude that the analysis of ratings needs distinct approaches for the two datasets.

To streamline our analysis, we will categorize ratings into two types: good and bad.

A good rating is one that surpasses the average rating within its respective distribution.

A bad rating is one that falls below the average rating within its corresponding distribution.

Now, let’s define our concept of popularity.

A beer is deemed popular if it accumulates a substantial number of good ratings and maintains a notably high average rating. An average rating is considered high if it exceeds the mean rating of the distributions.


Who are our consumers ?


We would like to focus on a single country for our analysis as ratings from users from different regions can introduce mixed results that represents no country in reality. This is especially important given the fact that our analysis is conducted on a monthly basis.
For example a certain type of beer can be appreciated in summer and not in winter. But depending on the region of the worls, summer and winter are in different time of the year. So looking at the rating of that beer for each month for each users will not showcase this particularity.
By narrowing down to one country, we get a clearer picture of what people in that specific place like. This helps us avoid getting mixed results that might be affected by different tastes and seasons in different parts of the world.


As shown on the plot, a significant majority of users originate from the United States. We need to keep this observation in mind as it implies a potential bias towards American taste in the ratings. To enhance the precision of our analysis and avoid getting mixed results that might be affected by the different tastes and seasons in different parts of the world, we decided to exclusively focus on consumers from the United States.

Moreover, opting to specifically identify the preferences of US users aligns more accurately with our definition of popularity. Indeed, from our definition of popularity stated above, a beer attains popularity when it accumulates ratings from a diverse user base. For instance, a beer rated by 10 users, all providing a perfect 5/5 rating, may not be considered as popular compared to a beer rated by 1000 users, even if not all ratings are 5/5. And by choosing the United States as the country of focus, we increase the likelihood of obtaining these diverse ratings.

When is a Beer Popular ?

Does popularity fluctuate based on the season or, more specifically, on a monthly basis?

Given that certain dishes gain more popularity during specific times of the year, why wouldn’t the same be true for beers?

To answer this question, let’s initially consider the scenario where individuals rate a beer at the exact moment they consume it. Therefore, if a beer receives a rating in January, it indicates that the consumer drank it during that month. In essence, the count of a beer’s ratings within a particular period of a year aligns with the frequency of times it has been sampled.

Let’s observe if there is any seasonality in the beers’ ratings.


Upon examining the above plot, it becomes evident that the popularity of beers experiences notable fluctuations during specific periods of the year. While variations exist between different years, a clear trend emerges when considering the mean number of ratings per month. There is a distinct rise in ratings towards the end and beginning of the year, during the month of March, and in August. Additionally, a small peak is observable in October, which may be associated with the renowned Oktoberfest.

Let’s examine the average monthly ratings for notable beers, including the Oktoberfest, American Pale APA, and Belgian Strong Dark Ale.


As anticipated, the results are quite gratifying! The Oktoberfest beer experiences a significant surge in popularity during the month of October, which aligns perfectly with expectations. The rise in the number of ratings is steep, reaching an average higher than 800, underscoring the beer’s heightened appeal during this particular time of the year. In contrast, the average number of ratings otherwise remains below 200.

Regarding the American Pale APA, its popularity remains relatively stable throughout the year. However, discernible peaks in rating increases are observed during the summer and in October.

Lastly, we note a substantial spike in the number of ratings for the Belgian Strong Dark Ale towards the end of the year.


🌅 On the road to the perfect Beer 🌅

Now armed with foundational information, we embark on our quest to craft the perfect beer. Let the brew-tiful journey begin! 🍻✨

The central query guiding our exploration is as follows: What attributes contribute to the popularity of a beer?


Influence of the Alcohol by Volume (ABV) on the Ratings 📈

Let’s explore how the alcohol by volume impacts the ratings. Now, keep in mind our quest for the perfect beer for each month. Is there a discernible pattern when examining the alcohol by volume (ABV) of beers throughout the entire year ? Let’s look at the plot below where we only consider good ratings.


As revealed by this plot, a distinct pattern emerges. Beers with lower ABV are favored in the summer 🌞, while there’s a noticeable shift towards higher ABV options during the colder months ❄️.
Let’s now explore how this affects the overall rating for each month.


we can see that although there is a noticeable pattern when it comes to the ABV preference for each monthFrom this heatmap, , there is not a high variability when it comes to the value of the average rating associated with the prefered ABV for each month. However they are more spread depending on the months which can affect the overall ranking of the ABV for each month. For instance for December we can see that the colors are darker than in August toward higher ABV. To this end we exctracted


Influence of the Location on the Ratings 🌍

Now, let’s find out if and how the geographical location influences beer ratings.


The Percentage of how often brewery locations appear among all monthly top beers of the years 2005 to 2016. For every month of the years 2005 to 2016 the best and most rated beer was determined. From this pool of beers, the locations of the breweries producing them is considered and the percentage of appearance of each location is shown.
It can be seen that among the locations, US locations, especially California are extremely prevalent. When grouped together, US locations make up more than 90 % of all locations in that dataset and when seen across years, there is not a single month where the US is not the most prevalent location for that month.


To better grasp the spacial aspect of these locations and the distribution across months, let’s plot this on the world map


Location of the breweries of the monthly top beers between 2005 and 2016. The term monthly top beers refers to the beers with the single most and best ratings for each month within the years 2005 to 2016. Points on the map indicate the location of the breweries that produced said top beers and their size correlates with the number of top monthly beers they have brought forth over the years. For a less cluttered plot and also to show differences between months more clearly, each frame corresponds to one month from 1 (January) to 12 (December) and can be changed with the slider at the bottom.
By the two plots above it becomes obvious that it is most frequently the US, that is home to the breweries which produce the top beer of the month. How does this relate to the fact that we only chose reviews by users located in the US? One could assume that they would be more likely to review or even favor beers by US breweries. Performing an observational study revealed that indeed, to determine if a review is likely to receive a rating above the mean of ratings, it is of statistical significance if the reviewed beer originates from a brewery within the US.


In wrapping up our exploration of brewery locations for the perfect beer, the conclusion is crystal clear— the top location across all months is the US. The dominance is striking, with a prevalence exceeding 90% among the top monthly beers. Looking further into the US, California emerges as a standout, commanding close to 60% on its own. These findings can be attributed to our focus on reviews from US users, who naturally lean towards consuming and reviewing local brews.


Influence of the Different Keywords on the Ratings 📝

Last but not least, we will look into the reviews to identify the keywords that frequently appears in terms of aroma, taste, appearance, and palate.

Predominant Good Keyword for each Characteristic…

It is important to verify the validity of the extracted good keywords. What if the same keywords appear for another less popular beer? To validate our results, we select a less popular beer and extract the most frequent keywords for each month. To select the least popular beer, we could select the one with the lowest number of rating, but this could lead us to only 1 rating, which does not reflect the opinion of a multiple people. Therefore, by a further analysis we observed that a beer receives in average 18 reviews with a standard deviation of approx. 89. Thus, we decide to threshold the beers to the ones having received at least 100 reviews, and among them we select the one with the lowest average rating.

First we plot the number of occurences of the top good keywords per month for each characteristics.

Aroma

What is very intersting with this plot, is that we can clearly see an impact of the season on the aroma. For example, in September and October, there is a pic for the keyword “pumpkin” and those months correspond to the pumkin season. During the summer months, the keywords are more fruity whereas in the cold months we observe the sweetness of “honey” or “ipa”.


Taste

From the plot, we observe that there is some kind of seasonality that influences the top keyword. Indeed, when a keyword is a top one, it stays for some period of time the best keyword. For example, “sweet” is the best keyword from January to March and won’t be the best keyword again along the year. We can observe the same phenomenon for the other top keywords. This confirms the fact that there is some correlation between the month and the best taste for a beer.


Appearance

From the above plot, we notice that the appearance changes very smoothly between the months. It tends to go from a prefered light appearance all the way to the dark appearance and then back. For example, In January we start with an “orange” appearance then we switch to something slightly darker in March with “amber” to reach the “black” appearacne in April. Then in June we go back to something lighter with caramel, reach orange again in July which is slightly lighter than caramel. We conclude that the change in appearance needs to be smooth and it would not be appropriate to have a very light appearance in a month and something very dark the next month.


Palate

We observe the same phenomenon as for the appearance. The change of palate between the months need to be smooth.


… But are these good keywords also used in a negative way ?

After this, in order to check whether good keywords are relevant, we will plot for each months and each characteristics the keywords that are used in the good as well as in the bad ratings. This will allow us to decide whether a good keyword describing a characteristic is only used in a good way or if there are also many bad reviews using this keyword in a negative way. In the second option, it would not make sense to select this keyword as best descriptor for a characteristic. Let’s see the result.


Aroma


Taste


Appearance


Palate


As we see, some keywords are used for both the good and the bad ratings. This actually makes sense. Indeed some people like an aspect wheras some other people dislike it. However, this aspect is still described the same way. For example, sweet stays sweet independently of whether we like it or not.

From the plots, we conclude that it does make sense to consider the good keyword with the highest occurence to describe our perfect beer. Indeed, we notice that only few bad ratings use those keywords. Thus, we can assume many peple like this attribute.


💛 The Perfect Beer for Each Month is .... 💛

...drumroll...

Drawing from our deep dive into the beer cosmos, here’s the ultimate beer guide to make your taste buds groove with the seasons:

Month ABV Location Aroma Taste Appearance Palate
January 8.0% US Honey Sweet Orange Light
February 8.0% US Honey Sweet Orange Light
March 8.0% US Citrus Sweet Amber Medium
April 8.0% US Bourbon Stout Dark Thick
May 8.0% US Bourbon Stout Dark Thick
June 5.0% US Citrus Fresh Caramel Medium
July 5.0% US Pine Bitter Orange Medium
August 5.0% US Pine IPA Orange Light
September 6.0% US Pumpkin Spice Brown Light
October 6.0% US Pumpkin Spice Brown Light
November 8.0% US IPA IPA Amber Medium
December 8.0% US IPA IPA Amber Medium


🌟 And That's a Wrap! 🌟

Thank you for taking this malty journey into the beer-niverse with us. Here’s to brew-tiful memories and hoppy adventures… Cheers ! 🍻✨