Analyze Retention From Raw User Data

If you follow along with tools like Keen.io or MixPanel, you’d get the impression that calculating retention was a simple matter of just asking your analytics tool for retention numbers. Keen.io even wrote a very streamlined how-to in their blog, Radical Transparency: How to do a retention analysis (even including scripts you can use on top of your Keen account)

But what about when you don’t have pre-packaged analytics tools setup in advance, and still need answers to these questions? This was one problem I ran into while jumping into BuyAds.com.

This post explains how to calculate a retention analysis using only your app’s raw user data, without falling back on any 3rd party analytics tools. It’s not that hard. ;^]

Example User Retention Over Time

What is retention?

Retention is a measure of how well your app keeps its users over time, sort of the opposite of Churn. Improving your app’s retention means more people sticking around for longer, which makes it much easier to grow your user base and build up strong network effects.

A retention curve is a way of describing the rates at which users abandon your app over the span of their lifecycle, and usually captures the idea that most people are likely to leave right away, but, the longer someone does stick around, the more likely they are to stay even longer.

Defining what makes an active / retained user, in the context of your app:

Most consumer apps calculate retention and “active” users based on when they last logged into the app, but this isn’t a golden rule.  Your definition of “users retained” might differ from that of your peers, or the standard, consumer-app model. Do what makes sense.

For instance, when calculating the retention of BuyAds.com advertisers, we decided to measure “latest activity” in terms of when someone last placed a purchase, rather than when they last logged in, because purchases have a much stronger impact on revenue than log-ins and that’s what we’re trying to model in our case.

Query for users, when they joined and when they were last active.

Once you’ve defined your concepts, call up your raw data. My query for BuyAds looked like:

SELECT
user.email,
user.created_at,
MAX(order.created_at) as latest_purchase_at
FROM
user
LEFT JOIN
order ON order.buyer_id=user.id
WHERE
user.role=’buyer’
GROUP BY
user.id
;

This gives us the raw data we need to calculate how many of our users remain “alive” after each week of the average lifecycle.

Query your app for the relevant data, and save it to a CSV that we can post-process using the following Ruby script.

Raw CSV data describing the lifecycle lengths of my users.

Raw CSV data describing the lifecycle lengths of my users.

Crunching raw user data into a simple curve:

Once you’ve queried for the raw data, you’ll be able to compute the averages you need in order to see the patterns that exist.

The basic idea is to:

  1. Iterate from 1 to 52, call it X.
  2. Take a subset of your users who joined at least X weeks ago, count them.
  3. Count the number of users who were still active X weeks after joining.
  4. Store the resulting % of users who were retained after X weeks.
  5. Go back to #1, incrementing X.

Peep the ruby I used to perform this calculation: https://gist.github.com/jfeldstein/5479549

Now you have the data to paint retention curve.

You can graph it so it looks super sweet (like the top of this post).

Or you can work it into the math of your growth model for your app. Rahul Vohra wrote a solid 3-part walkthrough on assembling a full Excel model describing your app’s growth or decay over time, which I’d recommend to take this further.

 

Liked that post? Try this one next:

Jordan is a freelance engineer with full-stack chops, and an eye for analytics and growth.