Bellabeat case study

Manchester United performance analysis 2020

Dark Light


Bellabeat is a high-tech company that manufactures health-focused smart products including Bellabeat app, Leaf, Time and Spring. It also offers a subscription-based membership program for users giving them access to personalised guidance on having a healthy lifestyle. Bellabeat has positioned itself as a tech-driven wellness company for women.

Bellabeat has been investing extensively on digital marketing including Google Search and being active on social media platforms. The co-founder Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth.

Statement of business task

Smart devices are a big part of people’s everyday life. As a smart device manufacturer, Bellabeat can benefit from learning the trend of smart device usage and make data-driven business strategies to explore opportunities for growth.

Key stakeholders:

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat’s co-founder; key member of the Bellabeat executive team

Good starting points would be:

  1. Investigating the products that have similar functionality
  2. Drill down to explore user behaviors when using the product in order to identify any usage trends
  3. Apply the trend on Bellabeat products to identify recommendations on functionality and marketing strategies

Insights from the investigation can help Bellabeat to identify weaknesses of their products and new functions or even inspire ideas on new products.

It can also help to inform marketing strategies like knowing the segment of the age group that is the most active user of smart devices, then Bellabeat can invest in focusing on a specific age group when doing marketing campaigns in the future.

Prepare the data

  • The data used in this analysis is the Fitbit Fitness Tracker Data made available by Mobius stored on Kaggle.
  • This dataset is under CC0: Public Domain license meaning the creator has waive his right to the work under the copyright law.
  • The data contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
  • These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016.
  • The dataset has in total 18 files in .csv format organized in long format.
ROCCC analysis
  • Reliability : LOW – dataset was collected from 30 individuals whose gender is unknown.
  • Originality : LOW – third party data collect using Amazon Mechanical Turk.
  • Comprehensive : MEDIUM – dataset contains multiple fields on daily activity intensity, calories used, daily steps taken, daily sleep time and weight record.
  • Current : MEDIUM – data is 5 years old but the habit of how people live does not change over a few years
  • Cited : HIGH – data collector and source is well documented
Data selection

Focus is on daily usage of the Fitbit device as it should provide a high-level insight on the usage pattern of smart devices. Thus the following files from the dataset has been selected:

  • dailyActivity_merged.csv
  • dailyCalories_merged.csv
  • dailyIntensities_merged.csv
  • dailySteps_merged.csv
  • sleepDay_merged.csv
  • weightLogInfo_merged.csv

Process the data

Google spreadsheet and Google BigQuery will be used to process the data as the tool functionality fits the purpose.

Data cleaning

sleepDay_merged.csv and weightLogInfo_merged.csv are loaded into Google sheet for data cleaning. The fields “SleepDay” and “Date” were not correctly formatted. The following steps has been done:

  1. The date column has been select and format to ‘Date’ using spreadsheet function
  2. Time in the column has been removed as time is irrelevant in this analysis
  3. AM/PM indicator is also removed
Data integrity

The selected data has been loaded into Google BigQuery for analysis. The following queries have been run to check the number of unique Id in each table

SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged`
SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailyCalories_merged`
SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailyIntensities_merged`
SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailySteps_merged`
SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.sleepDay_merged`
SELECT DISTINCT Id FROM `first-sandbox-100001.weightLogInfo_merged`

Result(distinct Id in each table):

  • 33
  • 33
  • 33
  • 33
  • 24
  • 8

The result shows the dataset is inconsistent as we expect 30 unique Id on all tables. The sleepDay_merged table and the weigthLogInfo_merged table have the highest inconsistencies with 6 and 22 input missing. This would affect the result of the analysis.

Data analysis

Hypothesis has been made with the data available on activity, sleep time and weight.

  • There is a relationship between activity level and calories burnt.
  • There is a relationship between activity level and sleep time
  • There is a relationship between activity level and weight

In order to find out the relation and validate the hypothesis, four queries have been constructed to aggregate the data for analysis.

## For finding activity level and calories burnt

SELECT Id, ActivityDate,Calories, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance,SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes
FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged`
WHERE VeryActiveDistance+ModeratelyActiveDistance+LightActiveDistance <> 0 AND VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes <> 0
## For finding activity level and calories burnt

SELECT Id, ActivityDate,Calories, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, (VeryActiveDistance+ModeratelyActiveDistance+LightActiveDistance) AS TotalActiveDistance,SedentaryActiveDistance, (VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes) AS TotalActiveMinutes, SedentaryMinutes
FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged`

TotalActiveDistance is the sum of the VeryActiveDistance, ModeratelyActiveDistance and LightActiveDistance that can be useful to find the relation between calories burnt and activity level.

## For finding relationship between activity level and sleep time

SELECT activity.Id, ActivityDate,Calories, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, (VeryActiveDistance + ModeratelyActiveDistance) AS ActiveDistance, (LightActiveDistance+SedentaryActiveDistance) AS non_ActiveDistance, (VeryActiveMinutes+FairlyActiveMinutes) AS ActiveMinutes, (LightlyActiveMinutes+SedentaryMinutes) AS non_ActiveMinutes
FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged` AS activity
INNER JOIN `first-sandbox-100001.Fitbit_data.sleepDay_merged` AS sleep
ON activity.Id = sleep.Id AND activity.ActivityDate = sleep.SleepDay

ActiveDistance, non-ActiveDistance, ActiveMinutes and non-ActiveMinutes have been calculated to find out the relationship on sleep quality versus a person’s activity in a day.

## For finding relationship between activiy and weight/BMI

SELECT activity.Id, Calories, BMI, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance,SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes
FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged` AS activity
INNER JOIN `first-sandbox-100001.Fitbit_data.weightLogInfo_merged` AS weight
ON activity.Id = weight.Id AND activity.ActivityDate = weight.Date

When comparing the relationship between activity and weight, using BMI is a more consistent metric as weight can also be affected by height. BMI also accounts for height, which is a more universal metric to compare whether a person is underweight or overweight.


Activity level and calories burnt relation

Coefficient of determination (Very active minutes vs Calories burnt)= 0.375

From the above chart we can see that a person who has higher active minutes tends to burn more calories in a day, the more time they spend inactive, the lower calories they tend to burn in a day.

Activity level and sleep quality relation

Coefficient of determination = 0.345

This visualisation compares a person’s non-active minutes versus the minutes asleep. We can see that the more time a person spent non-active, the lower the time they are asleep in bed. It is a negative relation which implies non-activity negatively impact sleep quality.

Activity level and BMI relation
Activity level vs BMI

The visualisation above compares the average active minutes and average non-active minutes versus the average BMI of the users. It shows a relation that a person who has a higher average non-active minutes tends to have a higher average BMI. The small size sample in this particular comparison hinders the accuracy though.

Recommendation and act


From the analysis result, it is clear that there is a clear trend in non-active people having a negative lifestyle. The three relations we found during the analysis includes:

  • Very-active minutes has a positive relation to calories burnt
  • Active person has a positive relation to sleep quality
  • Non-active person is more likely to have a high BMI
Recommendations to business

As these relations are the analysis results of participants who use smart devices to track their activity statistics, we can apply these to make data-driven decisions on Bellabeat future products/functionality:

  1. Bellabeat can include function in Bellabeat app to alert user who tends to have a high number to sedentary minutes
  2. Bellabeat can include timely notification in Leaf/Time to motivate user to move around regularly to reduce their sedentary minutes
  3. Bellabeat can use the relation between high sedentary minutes and BMI to promote an active lifestyle can reduce body fat and create better health with Bellabeat products
  4. Bellabeat can further enhance their sleep tracking function to promote the sleep/non-active relation. Use this as an incentive to purchase Bellabeat products: create a better sleeping habit by being more active in everyday life.
Related Posts