Listening to your users: Inferring Affinities and Interests based on actual time spent vs clicks or pageloads

Personalized recommendations rely on the idea the you know the interests of your audience. In absence of explicit feedback, interests are generally derived from clickstream data: session and event (e.g. click) data. But given that sessions can be short lived (bounce) and clicks can be unintentional, they are unlikely to reflect true interests of your audience if you simply count them.

At Blueshift, we choose to actively follow along the individual’s storyline and extract intelligence from each event to gather insights of the user’s intent and interests, so we can provide better recommendations.

Let’s look at a real user example

In the table below, we see an actual clickstream of events from a user on blueshiftreads.com.

Timestamp Session_id Event Category Book title
12:30:24 session_id1 view Biography & Autobiography > Personal Memoirs Eat Pray Love
12:31:29 session_id1 view Drama > American > General Death of a Salesman
13:48:49 session_id2 view Science > Physics > General Physics of the Impossible
13:49:02 session_id2 view Biography & Autobiography > Personal Memoirs Eat Pray Love
13:49:09 session_id2 view Health & Fitness > Diet & Nutrition > Nutrition The Omnivore’s Dilemma
13:49:19 session_id2 view Health & Fitness > Diet & Nutrition > Nutrition The Omnivore’s Dilemma
13:49:35 session_id2 view Poetry > American > General Leaves of Grass
14:09:47 session_id2 view Poetry > American > General Leaves of Grass
14:10:02 session_id2 add_to_cart Poetry > American > General Leaves of Grass

This specific user interacted during two different sessions, browsing books from different categories. If we try to come up with the top categories for this user, based on total number of sessions, we get:

Rank Category Session count
1 Biography & Autobiography > Personal Memoirs 2
2 Health & Fitness > Diet & Nutrition > Nutrition 1
3 Poetry > American > General 1
4 Science > Physics > General 1

As you can see in the table above, Personal Memoirs is the top category while the three other categories tie to second-place (they have been alphabetically ordered in that case), but other tie-breaking rules can be applied.

Time spent ranking

At Blueshift, we developed algorithms to re-rank these categories according to the time the user actually spent on your products and categories:

Rank Category Time spent
1 Poetry > American > General 1212
2 Biography & Autobiography > Personal Memoirs 72
3 Health & Fitness > Diet & Nutrition > Nutrition 26
4 Science > Physics > General 13

Here, we rank ‘Poetry > American > General’ above the other categories. Note that at the end of the original event stream above, the user actually did add the book from that category to the cart. Even if we would have ignored that event, our time based ranking would have indeed capture a category of interest to this user.

There’s more: decayed time spent

You should be careful not to rely on detailed information from a single user on a single day: if the user indeed bought the book he added to the cart, that might just be an indicator of no longer being interested in that specific category of products. Furthermore, you would want to adapt to changing user interest over time.

That’s why we implemented what we call a decayed time spent algorithm, that combines the time spent by users over a certain period of time (say last week) and that weighs recent time spent as more important to the ranking than time the user spent before (say 14 days ago).

Decayed weighting of recency this way allows recommendations to adapt quickly to shifting user interests when they are shopping during holidays and might be looking for gifts for others as well as themselves.

From user-level signal to site-wide signal

Many product recommendations are related to some site-wide top categories of products, like ‘top viewed’. Using our time based algorithms, we can better rank these top categories. Let’s look at another example from blueshiftreads.com where we show you a part (20-25 to be exact) of the top 25 most popular categories.

Using classical session counting, we obtain the following ranking:

category session count
Juvenile Fiction > People & Places > United States > African American 5358
Juvenile Fiction > Girls & Women 5291
Juvenile Fiction > Family > General 5265
Fiction > Contemporary Women 5215
Fiction > Thrillers > Suspense 4971
Fiction > Mystery & Detective > Women Sleuths 4804

However, when we rerank these categories based on actual time spent by the users, we see that ‘Juvenile Fiction > Girls & Woman’ drops from position 21 (above) to position 23 (below), even though it had 76 user sessions more in the 7 days over which this was calculated. User sessions are no guarantee for actual interest (i.e spending time).

category time spent
Juvenile Fiction > People & Places > United States > African American 102164972
Juvenile Fiction > Family > General 100447985
Fiction > Contemporary Women 98897169
Juvenile Fiction > Girls & Women 98340874
Fiction > Thrillers > Suspense 91140081
Fiction > Mystery & Detective > Women Sleuths 87372604

Furthermore, if we rank the categories using our decayed time spent, we see that ‘Fiction > Contemporary Women’ is actually ranked the highest (21) while it was the lowest (23) in the original list. This indicates that this category received the highest time spend by users in the most recent past.

category time score
Juvenile Fiction > People & Places > United States > African American 28461106.29
Fiction > Contemporary Women 28179308.93
Juvenile Fiction > Girls & Women 28068989.26
Juvenile Fiction > Family > General 27608048.02
Fiction > Thrillers > Suspense 26102829.31
Fiction > Mystery & Detective > Women Sleuths 24597921.38
Ok, why bother?

So why bother re-ranking? Well, most catalogs will exhibit a Long Tail in the distribution of popularity of their content: very few items will be very popular while lots of items will be very unpopular. No matter how you rank the popularity of the top-10 categories (sessions, clicks, time, …) out of a 1000 category catalog, these extremely popular categories will always on top. Just have a look at the top 20 categories from blueshiftreads.com:

blog_post_time_spent_top20

As you can see, the top 5 categories do a lot better than the rest. For most businesses there is a lot of value in promoting content from categories other than these few favorites. Therefore, if you can avoid down-ranking interesting categories for users and do this consistently over your whole catalog, you will be able to recommend products from the appropriate category to the users who care for it. In other words, you will avoid the pitfall of recommending an overly popular yet generic product to your users.

But time spent relates to sessions/clicks anyway?

Yes and no. It is true that more sessions correlate to more time users will spend on categories, but not to the same extent: a session length can range from a second to tens of minutes. Have a look at the next graph below.

What we see is the ranking of the 1000+ categories (on the X-axis) for blueshiftreads.com by popularity (on the Y-axis, logarithmic scale) over 7 days, in terms of 3 different metrics:

  • The blue line represents ranking by session count. It is very smooth because it really ranks all categories just in descending order of session count. This is the standard ranking.
  • The red line represents ranking by time spent by the users. It is equally smooth in the beginning (left) because it ‘agrees’ with the session ranking: as mentioned above, the top popular categories will always be on top. But quite soon, the line becomes spiky: the ranking disagrees with session count, and the spikes indicate that this ranking would reorder the categories in a different way (promoting different categories to the top).
  • The green line is the decayed time spent ranking: the same holds as the time spent ranking. This algorithm also disagrees with session count and would reorder lots of categories in the long tail to promote categories of interest to the user.

blog_post_time_spent_ranking_plot

This re-ranking is exactly what you should do to stop recommending the same popular categories to users that might have indicated (time) interest in other categories.

Practical AI for Growth Marketers

A.I has had a media resurgence in the recent past, thanks to the incessant coverage in every outlet and overblown hype for and against what it all means. Underneath the hyperbole there are real breakthroughs but also many challenges and practical considerations in using these innovations. This post by Crowdflower, a crowdsourcing platform used by many for improving the RoI of A.I projects puts it well when they say “A.I is a pragmatic technology that can be applied to solving today’s problems but you need to understand the limiting beliefs of A.I, and replace myths with truths”.

Growth marketers at B2C organizations specifically face formidable challenges in using A.I or machine learning in their day to day efforts. Data at their disposal spans many sources, updating via real time streams and likely runs into petabytes in size. Here are few practical considerations in realizing good RoI from your A.I project investments.

 

Simple vs Diverse data formats:

Today’s customers are tethered to their devices 24/7 and switch between them seamlessly. Advances in Big Data technologies like Hadoop have made it easy to capture raw data in diverse formats and store them across several different data stores usually called data lakes spanning SQL systems, NoSQL systems, flat files and excel sheets. As a growth marketer this is the raw gold mine you are working with and you should prioritize data capture in any format over shoehorning it to a particular data store or schema. A.I tools that you invest in should adapt to this mix of structured and unstructured data.

 

Real Time vs Batch mode:

The half life of consumer intent is getting shorter with each passing year, and customers expect “on-demand” experiences that are contextually relevant and personalized to them across every device. Growth marketers should prioritize simpler AI algorithms and processes that can adapt well to real time data than more complex batch mode solutions that may need several hours or days to execute. Pay close attention to training time it takes to build and deploy A.I models and how fast can they incorporate new data.

 

Complete vs Sparse data:

While it’s ideal to have every attribute and preference known about all users, in reality you will end up with incomplete or partially known data fields despite your best efforts. B2C growth marketers in particular should expect this from day one and invest in tools and solutions that adapt well to incomplete data. Take for example a user location, there may be a mix of user given location data, with device lat/long, ip to geo, inferences from content viewed or searches done and more. As a growth marketer you should prefer A.I tools that can adapt well to the mix of all this data and output best effort answers for widest user base than on few users with complete and clean data.

 

Size of training data:

Most A.I algorithms expect training data to be fed to them and the size and availability of training data is big obstacle to overcome to use them effectively. Certain class of A.I algorithms like Boosted Random Forests are better at adapting to the size of training data than Convolutional Neural Networks aka Deep Learning. Growth marketers should prefer those algorithms that can work with limited training data and have in-built sampling techniques to deal with disproportionate class sizes.

 

Black box vs Explainable Models:

A.I algorithms come in many forms, from easy to understand decision trees to black box complex ones like Deep Boltzman machines. Navigating the black boxes can be tricky, what works today cannot be said of tomorrow and need very careful tuning to yield short term results. Growth marketers should prefer AI algorithms that explain their outputs, and helps marketer understand various factors and weights given to them in realizing that output. Tools that iterate quickly and incorporate domain specific knowledge much more easily are likely to work better in the long term than hyper optimized black boxes with enticing short term yields.

When it comes to the nitty gritty of it all remember that A.I is no magic bullet but a practical tool to achieving your custom goals.

Keep calm and A.I on.

Send Time Optimization or Engage Time Optimization?

Marketers should adapt their send time to each user individually, and send campaigns closer to the times when they are more likely to engage in downstream activity.

As you might have read in our previous blog post “Re-Thinking Send Time Optimization in the age of the Always On Customer“, Blueshift focuses on “Engage Time Optimization” rather than what marketers traditionally call as “Send Time Optimization”. Since we’ve posted this article, we’ve elaborated a bit on the details of the development of that feature on Quora (When is the best time (day) to send out e-mails?). Through this post however, we would like share more of those insights, and advocate for focusing on optimizing downstream user engagement metrics rather than initial open rates.

The idea of “Send Time Optimization” is not new, and has been around for quite some time. One of the more recent reports on this was posted by MailChimp in 2014, but articles and discussions on this topic go back as far as 2009 and older. The data science team at Blueshift followed the hypothesis that if there is a specific hour of the day, or day of the week that an audience is more likely to engage, that should reflect in increased open (or even click) rates when messaged at different times.

Open Rates vs Click Rates

In order to observe this effect (or the absence of it), we analyzed over 2 billion messages that were sent through Blueshift. Some of the results are presented in the graphs below for one of our biggest clients.

Through the Lens of Open Rates

“irrespective of the segment that was targeted, the audience size and the send time, the open rate is the highest in the first two hours after the send”

We looked at the open rate (%, shown on the Y-axis) in the first 24 hours after the send was executed (in hours, shown on the X-axis).

open_rates

What you see are 18 email campaigns from one client over the period of one month (totaling over 20 million emails). On the top left, we see campaigns sent out on Monday, next, Tuesday, and so on – through Saturdays on the bottom right. There were no campaigns on Sunday for this client during this month. These campaigns were sent to audiences ranging from tens of thousands of users in specialized segments (e.g. highly engaged  customers) to large segments of 2–3M users. The send times varied from 5AM – 12PM (in parenthesis in the legend).

What you can see from this graph, is that even though the campaigns were sent out on different days of the week and at different hours, the initial response in term of open rates is very predictable for the first hours. The conclusion from these plots is that irrespective of the segment that was targeted, the audience size and the send time, the open rate is the highest in the first two hours after the send. Depending on the actual time of the send you can achieve a slightly higher open rate in the first hour, but you might loose more ‘area’ in the following hours, accumulating to more or less the same open rates after some hours.

Through the Lens of Click Rates

Naturally, the question comes to mind if there is any measurable effect when we look at clicks, which can be considered as a deeper form of engagement by the users that received the message:

click_rates

But as you can see from these second set of graphs where the Y-axis represents the click rate (%), we observed a very similar behavior: the actual response rate in terms of clicks does not significantly change when a campaign is sent at a different time.

We came to the same conclusion when repeating this experiment for opens and clicks for other clients in our dataset as well. After doing more in-depth analysis on our datasets, we observed that users that were targeted in email campaigns at certain times, showed engagement (e.g. visits to the website or app) at other times. Users prefer to engage deeply at certain hours of the day while casually browsing through out. Marketers should adapt their send time to each user individually, and send campaigns closer to the times when they are more likely to engage in downstream activity. You can find more info about this “Engage Time Optimization” in this post.

 

Re-Thinking Send Time Optimization in the age of the Always On Customer

Many email service providers tout Send Time Optimization as an add-on feature and promise marketers that they can tailor their marketing campaigns to the exact time their customers are expected to open their emails. It’s tempting to take that at face value and think it’s a silver bullet to improving your customer engagement. Our internal research, after analyzing over a billion emails sent through the Blueshift platform over last year, has shown that in the age of smartphones and always on connectivity, the notion of “Send Time Optimization” needs some serious re-thinking.

Stop Optimizing to “Open Rates”

“look at full downstream activity and measure what windows of time their customers are more likely to follow through and complete specific goals”

Today’s perpetually connected customers are much more likely to have many more frequent bursts of activity around the clock than a recurring habit of opening their emails at a certain time of day or clicking onto sites or apps at specific hour. Then what does it mean to do “Send Time Optimization” for marketers? Instead of optimizing for immediate opens, marketers need to focus their attention and look at full downstream activity and measure what windows of time their customers are more likely to follow through and complete specific goals than when they open or click emails. The true measure of success should be specific conversion goals or sum total of time spent on your site or apps.

As a results-driven marketer ask yourself: “Would you rather have someone who opened a message, or someone who converted/made a purchase?”

Enter => Engagement Time Optimization

Blueshift’s recently released Engage Time Optimization computes windows of time for each user where they are more likely to engage fully, rather than optimizing for immediate opens or clicks. We look at the sum total of time spent by each customer over a long period of time and rank each hour in the day based on time spent and how deep in the conversion funnel they got to. You can access “hour affinity” for each user through the segments panel under “User Affinity” tab inside our application dashboard.

Re-Thinking Send Time Optimization in the age of the Always On Customer - look at engage time optimization to optimize your campaign sends to further down the purchase funnel

 

You can use these “hour affinities” like any other user affinity attributes during the segment creation and tailor campaigns to specific audiences. For example you can create segments of users who prefer “morning” hours by picking 5am to 8am or those who prefer “evening” hours by picking 5pm to 8pm or any other combination. We believe this offers a powerful alternative to traditional “Send Time Optimization” feature by tailoring the campaigns to the customers based on their full funnel behavior than on immediate opens or clicks.

 


If you’d like to see a demo or request more information on Engagement Time Optimization, contact us via our site or email us at hello@getblueshift.com.


 

Obama on Technology, AI and an Optimistic Future

President Obama chatting with Ito and Scott Dadich

President Obama chatting with Ito and Scott Dadich

“This year, Artificial Intelligence will become more than just a computer science problem. Everybody need to understand how A.I. behaves.”

Recent advances in computer science and AI (more specifically advances in building and running large convolutional neural networks) have given a fresh fodder to the age old debate on how technology is replacing workers and making us all obsolete. The current political climate only amplifies the anxiety and generates FUD (Fear, Uncertainty, and Doubt) about our collective future. So it’s very refreshing to see President Obama re-framing the discussion in this Wired article and talking about common humanity and a confidence in our ability to solve problems. If one can ignore the media hype and peek below the surface there are real opportunities to build solutions to many seemingly intractable problems.

Machine learning, data mining and deep learning techniques can nudge us to lead healthier lives, change our habits and build stronger communities. Imagine AI powered tools that remind us in context of whatever we are doing in our daily lives to consider factors that we may have missed, overcome biases in thinking fast and slow, present information in ways that helps us build better financial portfolios that are in our long term interests, prevents us from being defrauded or phished or scammed online, helps us communicate with every one one the planet crossing language boundaries and more. That’s the optimistic future we can aspire to and it’s refreshing to see this possibility being talked about.

President Obama chatting with Ito and Scott Dadich

Read the full article on Wired.com