Marketing Funnel Analysis

Conversion Rates and Velocity by Cohorts

By Oz Guner in projects

April 16, 2023

Goal

The goal of this project is to study conversion rates and conversion rate velocities by cohorts. While you can use different tools to achieve the same results, I will be including the R code step by step to (hopefully) create impactful visualizations.

Setup

We have a few simple datasets in hand that we downloaded from our system of records. Let’s look at our base dataset:

#install the readr package to read our csv, which contains basic data
library(readr)
df <- read_csv("C:/Users/ozeng/Documents/R Repository/portfolio/portfolio/content/project/2023-04-16-marketing-funnel-analysis/base.csv")

#view data
df
## # A tibble: 4 × 3
##   month         created converted
##   <chr>           <dbl>     <dbl>
## 1 January 2023     3235        60
## 2 February 2023    2899        57
## 3 March 2023       2888        30
## 4 April 2023        744        18

Looks like this data is aggregated by month. Let’s find the conversion rate for each month.

# create a new column for the conversion rate
df$conversion_rate <- df$converted / df$created
# print the updated data frame
df
## # A tibble: 4 × 4
##   month         created converted conversion_rate
##   <chr>           <dbl>     <dbl>           <dbl>
## 1 January 2023     3235        60          0.0185
## 2 February 2023    2899        57          0.0197
## 3 March 2023       2888        30          0.0104
## 4 April 2023        744        18          0.0242

Let’s create a visualization of the conversion rate by month so we have a more insightful look into our marketing funnel:

library(plotly)

# sort the data by month
df$month <- factor(df$month, levels = c("January 2023", "February 2023", "March 2023", "April 2023"))

# create the plot
plot_ly(df, x = ~month) %>%
  add_trace(y = ~created, name = "Created", type = "bar", marker = list(color = "#4aa564"),
            text = ~paste0(created)) %>%
  add_trace(y = ~converted, name = "Converted", type = "bar", marker = list(color = "#fdb81e"),
            text = ~paste0(converted)) %>%
  add_trace(y = ~conversion_rate, name = "Conversion Rate", type = "scatter", mode = "lines", yaxis = "y2",
            line = list(color = "#046b99"), marker = list(color = "#046b99"),
            text = ~paste0(scales::percent(conversion_rate, accuracy = 0.01))) %>%
  layout(title = "Conversion Rate by Month",
         xaxis = list(title = "Month"),
         yaxis = list(title = "Created/Converted Count", side = "right",
                      tickfont = list(size = 12),
                      tickformat = ",.0f",
                      showticklabels = TRUE,
                      showgrid = FALSE),
         yaxis2 = list(title = "Conversion Rate (%)", side = "left", overlaying = "y",
                       tickfont = list(size = 12),
                       tickformat = ".2%",
                       tickvals = seq(0, 1, 0.1),
                       tickmode = "linear",
                       textfont = list(size = 12),
                       showticklabels = TRUE,
                       showgrid = TRUE,
                       zeroline = TRUE),
         legend = list(x = 0.5, y = -0.2, orientation = "h",
                       font = list(size = 12),
                       xanchor = "center",
                       yanchor = "top"))

This chart communicates the leads created in a month and converted in the same month. February seems to have a higher conversion rate than January. April is going really well and March might need a little help.

There’s one caveat in looking at conversion rates in a simple way. The 57 units converted in February might not all be created in February. There’s another way to look at this problem: The cohort velocity method. Let’s create a spaghetti chart that showcases how fast leads from a cohort convert to a paid client:

##    created_cohort leads_signups days_to_convert converted_to_paid
## 1          23-Jan          3235              53                 5
## 2          23-Feb          2899              15                 4
## 3          23-Jan          3235              19                 4
## 4          23-Apr           744               2                 4
## 5          23-Apr           744               0                 4
## 6          23-Feb          2899               9                 5
## 7          23-Jan          3235               1                 4
## 8          23-Jan          3235               0                 7
## 9          23-Feb          2899               2                 5
## 10         23-Mar          2888               1                 4
## 11         23-Feb          2899               0                 6
## 12         23-Mar          2888               0                 4

Notice how the data is structured. Each record indicates a month cohort, the total leads/signups in that month, and the total leads/signups converted at specific days each month. Whenever there’s at least one unit converted at a specific day interval, one record will be created. Now let’s calculate the conversion rate for each record:

## # A tibble: 107 × 3
## # Groups:   days_to_convert [60]
##    days_to_convert created_cohort conversion_rate
##              <int> <chr>                    <dbl>
##  1               1 23-Jan                0.00124 
##  2               1 23-Mar                0.00139 
##  3               2 23-Apr                0.00538 
##  4               2 23-Feb                0.00172 
##  5               2 23-Jan                0.000927
##  6               2 23-Mar                0.00104 
##  7               3 23-Apr                0.00269 
##  8               3 23-Feb                0.000690
##  9               3 23-Mar                0.000693
## 10               4 23-Jan                0.000618
## # ℹ 97 more rows

In this case, a conversion rate is assigned for each record. If we are going to take a holistic view to look at the total leads converted at day X, we’d need to create a running conversion rate calculation, as follows:

Notice that using the cumsum() method and grouping by cohort, you can create a running conversion rate in that specific cohort, which literally sums up the cumulative values.

Next, we’ll create a plotly chart:

Remember that, in the single conversion rate method, our conversion rate for January was 1.85% and February was 1.97%. However, the cohort-based conversion rate paints a clearer picture by eliminating the time bias. January actually has a higher conversion rate with 2.41% versus February’s 2.1%. Moreover, January was converting at higher levels at on average on Day 75, where February is today: 2.26% vs 2.1%.

Other ways to look at conversion velocity.

Another way to look at conversion velocity is by custom segmentation. Examples can include intent levels, demographics, and product actions. Let’s look at the same data from another perspective:

##      channel leads_signups days_to_convert converted_to_paid
## 1  Channel 2           835               8                 3
## 2  Channel 2           835               7                 5
## 3  Channel 2           835               6                 4
## 4  Channel 2           835               4                 5
## 5  Channel 2           835               3                 6
## 6  Channel 2           835               2                 5
## 7  Channel 2           835               1                 5
## 8  Channel 2           835               0                 7
## 9  Channel 3           333              19                 4
## 10 Channel 3           333               4                 3
## 11 Channel 1          1949              53                 5
## 12 Channel 1          1949              22                 3
## 13 Channel 1          1949              15                 3
## 14 Channel 1          1949               0                 3

Similar to the other dataset, this dataset communicates the speed to convert broken down by channels instead of time. Regardless of when they were created we can look at how leads from different channels convert.

In this case, Channel 3 converts the best, followed by Channel 2 and Channel 1. Looking at this data with time controls could introduce even more insights as different channels might convert at different circumstances.

Conclusion

Conversion rates and conversion velocities are powerful tools in a marketer’s toolkit. It’s important to look at the same data from different perspectives to gain more insightful and actionable information.