Bellabeat is a high-tech company that manufactures health-focused smart products including Bellabeat app, Leaf, Time and Spring. It also offers a subscription-based membership program for users giving them access to personalised guidance on having a healthy lifestyle. Bellabeat has positioned itself as a tech-driven wellness company for women.
Bellabeat has been investing extensively on digital marketing including Google Search and being active on social media platforms. The co-founder Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth.
Statement of business task
Smart devices are a big part of people’s everyday life. As a smart device manufacturer, Bellabeat can benefit from learning the trend of smart device usage and make data-driven business strategies to explore opportunities for growth.
- Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
- Sando Mur: Mathematician and Bellabeat’s co-founder; key member of the Bellabeat executive team
Good starting points would be:
- Investigating the products that have similar functionality
- Drill down to explore user behaviors when using the product in order to identify any usage trends
- Apply the trend on Bellabeat products to identify recommendations on functionality and marketing strategies
Insights from the investigation can help Bellabeat to identify weaknesses of their products and new functions or even inspire ideas on new products.
It can also help to inform marketing strategies like knowing the segment of the age group that is the most active user of smart devices, then Bellabeat can invest in focusing on a specific age group when doing marketing campaigns in the future.
Prepare the data
- The data used in this analysis is the Fitbit Fitness Tracker Data made available by Mobius stored on Kaggle.
- This dataset is under CC0: Public Domain license meaning the creator has waive his right to the work under the copyright law.
- The data contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
- These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016.
- The dataset has in total 18 files in .csv format organized in long format.
- Reliability : LOW – dataset was collected from 30 individuals whose gender is unknown.
- Originality : LOW – third party data collect using Amazon Mechanical Turk.
- Comprehensive : MEDIUM – dataset contains multiple fields on daily activity intensity, calories used, daily steps taken, daily sleep time and weight record.
- Current : MEDIUM – data is 5 years old but the habit of how people live does not change over a few years
- Cited : HIGH – data collector and source is well documented
Focus is on daily usage of the Fitbit device as it should provide a high-level insight on the usage pattern of smart devices. Thus the following files from the dataset has been selected:
Process the data
Google spreadsheet and Google BigQuery will be used to process the data as the tool functionality fits the purpose.
sleepDay_merged.csv and weightLogInfo_merged.csv are loaded into Google sheet for data cleaning. The fields “SleepDay” and “Date” were not correctly formatted. The following steps has been done:
- The date column has been select and format to ‘Date’ using spreadsheet function
- Time in the column has been removed as time is irrelevant in this analysis
- AM/PM indicator is also removed
The selected data has been loaded into Google BigQuery for analysis. The following queries have been run to check the number of unique Id in each table
SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged` SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailyCalories_merged` SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailyIntensities_merged` SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.dailySteps_merged` SELECT DISTINCT Id FROM `first-sandbox-100001.Fitbit_data.sleepDay_merged` SELECT DISTINCT Id FROM `first-sandbox-100001.weightLogInfo_merged`
Result(distinct Id in each table):
The result shows the dataset is inconsistent as we expect 30 unique Id on all tables. The sleepDay_merged table and the weigthLogInfo_merged table have the highest inconsistencies with 6 and 22 input missing. This would affect the result of the analysis.
Hypothesis has been made with the data available on activity, sleep time and weight.
- There is a relationship between activity level and calories burnt.
- There is a relationship between activity level and sleep time
- There is a relationship between activity level and weight
In order to find out the relation and validate the hypothesis, four queries have been constructed to aggregate the data for analysis.
## For finding activity level and calories burnt SELECT Id, ActivityDate,Calories, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance,SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged` WHERE VeryActiveDistance+ModeratelyActiveDistance+LightActiveDistance <> 0 AND VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes <> 0 ORDER BY TotalSteps DESC
## For finding activity level and calories burnt SELECT Id, ActivityDate,Calories, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, (VeryActiveDistance+ModeratelyActiveDistance+LightActiveDistance) AS TotalActiveDistance,SedentaryActiveDistance, (VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes) AS TotalActiveMinutes, SedentaryMinutes FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged`
TotalActiveDistance is the sum of the VeryActiveDistance, ModeratelyActiveDistance and LightActiveDistance that can be useful to find the relation between calories burnt and activity level.
## For finding relationship between activity level and sleep time SELECT activity.Id, ActivityDate,Calories, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, (VeryActiveDistance + ModeratelyActiveDistance) AS ActiveDistance, (LightActiveDistance+SedentaryActiveDistance) AS non_ActiveDistance, (VeryActiveMinutes+FairlyActiveMinutes) AS ActiveMinutes, (LightlyActiveMinutes+SedentaryMinutes) AS non_ActiveMinutes FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged` AS activity INNER JOIN `first-sandbox-100001.Fitbit_data.sleepDay_merged` AS sleep ON activity.Id = sleep.Id AND activity.ActivityDate = sleep.SleepDay
ActiveDistance, non-ActiveDistance, ActiveMinutes and non-ActiveMinutes have been calculated to find out the relationship on sleep quality versus a person’s activity in a day.
## For finding relationship between activiy and weight/BMI SELECT activity.Id, Calories, BMI, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance,SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes FROM `first-sandbox-100001.Fitbit_data.dailyActivity_merged` AS activity INNER JOIN `first-sandbox-100001.Fitbit_data.weightLogInfo_merged` AS weight ON activity.Id = weight.Id AND activity.ActivityDate = weight.Date
When comparing the relationship between activity and weight, using BMI is a more consistent metric as weight can also be affected by height. BMI also accounts for height, which is a more universal metric to compare whether a person is underweight or overweight.
Activity level and calories burnt relation
Coefficient of determination (Very active minutes vs Calories burnt)= 0.375
From the above chart we can see that a person who has higher active minutes tends to burn more calories in a day, the more time they spend inactive, the lower calories they tend to burn in a day.
Activity level and sleep quality relation
Coefficient of determination = 0.345
This visualisation compares a person’s non-active minutes versus the minutes asleep. We can see that the more time a person spent non-active, the lower the time they are asleep in bed. It is a negative relation which implies non-activity negatively impact sleep quality.
Activity level and BMI relation
The visualisation above compares the average active minutes and average non-active minutes versus the average BMI of the users. It shows a relation that a person who has a higher average non-active minutes tends to have a higher average BMI. The small size sample in this particular comparison hinders the accuracy though.
Recommendation and act
From the analysis result, it is clear that there is a clear trend in non-active people having a negative lifestyle. The three relations we found during the analysis includes:
- Very-active minutes has a positive relation to calories burnt
- Active person has a positive relation to sleep quality
- Non-active person is more likely to have a high BMI
Recommendations to business
As these relations are the analysis results of participants who use smart devices to track their activity statistics, we can apply these to make data-driven decisions on Bellabeat future products/functionality:
- Bellabeat can include function in Bellabeat app to alert user who tends to have a high number to sedentary minutes
- Bellabeat can include timely notification in Leaf/Time to motivate user to move around regularly to reduce their sedentary minutes
- Bellabeat can use the relation between high sedentary minutes and BMI to promote an active lifestyle can reduce body fat and create better health with Bellabeat products
- Bellabeat can further enhance their sleep tracking function to promote the sleep/non-active relation. Use this as an incentive to purchase Bellabeat products: create a better sleeping habit by being more active in everyday life.