Researching Shared Viewing on Netflix

This April, Bitesize UX hosted the Netflix UX Research Contest for its community.

The challenge?

“You are tasked as a UX researcher to investigate how groups sharing a TV or living space make decisions about what movies or series to watch together.”

The constraints?

“You have a total of three weeks to propose and theoretically execute your research plan. Your plan must be resource-efficient and feasible within a limited scope and timeframe. Consider the availability of tools and participants.”

Below, I’ll run through my proposed research plan (which won 1st place—thank you Bitesize crew!✨)

The following work is speculative, and Netflix is not associated with this contest or entry in any capacity.


Overview

The following research plan aims to uncover the tools and behaviors couples and housemates rely on to decide what to watch together on Netflix. We aim to identify pain points in our application, and areas for potential feature improvements.

Our overarching goal is to enhance the Netflix recommendation system for shared accounts.

The fun part of this contest—

Was imagining what resources I might have have access to as a UX Researcher at Netflix.

While preparing my research plan, I found Netflix’s robust research site. I saw they had an entire area of research focusing just on Recommendations.

This inspired me to think big. I decided we would certainly have access to a research lab. This was a key factor in deciding to go with a lab-based experimental rather than descriptive research approach, as you’ll see further down.

 

Imagining a Dream Team

Although we were told we could pretend to be Head of Research at Netflix, I wanted a role that could be directly involved with conducting research as well as planning it.

I set myself as a Lead Researcher with three UXRs under my wing. I intended for us to work in efficient pairs.

 

Framing My Research Questions

I thought about what the larger Netflix design & development team could do with the data our research uncovered. What kind of information would best help us design solutions?

What matters to users?

I wanted to uncover which subjective and objective factors matter to users when they know they will be viewing a show as a group, as opposed to individually. We could use this information to add or highlight the most relevant features of the recommendations system, and pare away or reduce features that weren’t important.

Seeking inspiration inside, and out

Which Netflix features do users actively use now, and why? Do users utilize any outside tools to make a decision, and if yes, what do they use them for? We can take outside inspiration to refine our solutions, and ensure what we’re doing right stays on track.

“How long on average does it take for each group to make a decision about what to watch?”

My last research question targets one of the success metrics I talk about later: “time to complete task.” I decided we could aim to improve the recommendations system for groups such that it lowers the average time it takes to make a group decision by 20%, hopefully resulting in reduced stress and increased delight for users.

 

Methodology

In a real scenario, I would probably have some secondary research by the Netflix Research team to lean on. For the sake of sticking to the limitations of the contest, I decided to pretend that research didn’t exist.

Instead, I imagined our team was spearheading an untapped area of research. Therefore, our research would be exploratory/formative in nature since “we don’t know what we don’t know” yet.

Qualitative methods would help me uncover new insights, which could inform the design of prototypes that could be further tested and evaluated down the road. This is how I started to conceptualize my research plan as part of a multi-phase process.

Formative qualitative research

Combining user interviews with in-lab, controlled observations

 

How did I select my research methods?

Initially, I wanted to send researchers into users’ homes to immerse themselves in ethnographic observation.

I imagined what it might realistically be like to deploy a set of researchers over 1-2 days to observe our participants in person. It seemed ambitious, but also time and resource-consuming.

I then considered remote ethnographic research. Our team could ship cameras (with built-in privacy shutters, like the Brio webcam in the image.) We would ask participants to open the camera shutter and record themselves when they sat down as a group to watch a show. The recordings would be sent back to our team.

This sounds easier than sending researchers in person. But it comes with a risk of technology problems, as this great article by UXR Chloe Evans highlights. Will participants set the camera up correctly? What if they forget to open the privacy shutter? What about lighting, and sound?

I couldn’t confirm whether a fail-proof camera ecosystem designed specifically for remote ethnographic research already exists.

So, I decided to invite our participants to our own lab, where we could limit any disruptions by external factors.

 

In-Lab Group Observation

While we run the risk of participants behaving atypically in our lab due to the Hawthorne Effect, I figured it would be more convenient to perform a research sprint on our own turf.

Before each observation session, researchers would review the questions below to guide their note-taking:

  • What factors influence participants’ decision making (show genre? actors? mood? etc.)

  • What stumbling blocks do participants experience, and how do they correct them?

  • How slowly or quickly users seem to come to a decision (and what seems to influence that?)

 

I prepared a script for the researchers to follow once the participants were in the prepared room.

This closed, passive observation would form the bulk of my research and pose the greatest use of our team’s resources.

 

Of course, finding participants who can come in to the lab on short notice is another challenge.

So I took a step back to consider how we might recruit and qualify our participants. Maybe we could capture some interesting attitudinal data at the same time.

 

Recruiting and Screening Participants

Screener Survey and Deployment

I determined the qualifications for test participants (see above) and designed a screener survey to qualify applicants (see here on Google Forms, and below.) I tailored my survey questions to include/exclude the right participants.

We would deploy the survey to our target audience via email blast and in-app pop up to Netflix users local to our lab, and harness the power of local news to send out a call for applicants.

 

User Interviews

Before our test groups would come in to the lab, I would have our team conduct remote interviews with the “lead” or representative of each group.

This representative would be the primary Netflix account holder for the group and would receive the rewards for participating in our research, as outlined below.

These user interviews would help our team further qualify participants, and provide attitudinal data to line up against the behavioral data that would come from our in-lab observations.

I prepared an interview script for my researchers to follow.

We want to confirm that the personal information users submitted in the survey is true and accurate.

Follow up questions could include: If you succeeded, what was the deciding factor for you all? If you did not, what prevented you from making a decision? Specifically ask for a story about a time they could not make a decision - why, and what did they do instead?

 

Designing Participant Rewards

Funnily enough, part of the inspiration to interview just one person from each group came from fiddling around with ideas for what reward might be appropriate for test participants after they completed their sessions.

Initially, I came up with Option 1 below, which I thought wouldn’t be too crazy… until I did the math. Maybe that’s in-budget for Netflix, but I wanted to reel it in a bit.

 

Option 1

6 months free of Premium Subscription (A $137+ value)

  • $50 Visa gift card

    For every participant.

$187 x approx. 12 participants = $2,244

 

Option 2

  • 6 months free of Premium Subscription (a $137+ value) for the primary account owner of each group

$137 x 4 (1 per each group) = $548

 

I figured there would only need to be one person in each group that actually owns the Netflix account which everyone watches shows from. Therefore, that person having access to a higher tier of Netflix subscription meant everyone else in the group would benefit. Hence, Option 2. And who knows, maybe the test participants would like the experience of the higher tier plan so much, they’ll enroll again after the reward runs out… 😉

While ~$600 might be a steep cost for some UXR teams, I imagined it would be within Netflix’s budget. Plus, I wanted users to really get some value back for helping us out.

 

Now I knew how our team would find, screen, and conduct research with our participants. That left the question of balancing all of these activities (and more) within the 3 week time frame.

 

Doing Schedule Math

The 3 week time limit we were given to prepare and conduct the research had a strong influence on the design of my plan. After some math-ing back and forth, I landed on the below breakdown of hours.

User In-Take Interviews

  • 45 mins per user (interview + break)

  • x 4 users
    = 180 mins
    = 3 hours

 

In-Lab Group Observations

  • 15 minutes, welcoming groups

  • 60 minute (brief and study) - 2 groups running concurrently

  • 60 min break

  • 15 minutes, welcoming groups

  • 60 minute (brief and study) - 2 groups running concurrently
    = 210 mins
    = 3.5 hours

This helped me plan the final schedule below, from planning to execution to synthesis. This schedule would give us enough time to gather data without exhausting either participants or researchers, or creating too large a volume of data for researchers to unpack the following week.

 

How to Measure our Success?

This was the hardest part of developing the research plan—how might we determine whether all this research helped us improve group viewing for Netflix users?

 

image from bubble.io

The qualitative research our team would conduct would be rich with narratives, experiences, and emotions, but it would be challenging to measure quantitatively.

I decided to lean into the idea of a multi-phase process. Instead of seeking to measure success during our first, formative phase of research (the user interviews and in-lab observation), we would use that data to run a Design & Research sprint.

The full feature improvement and development cycle could look like this:

Formative Research (Phase 1) ➡️ Design & Research Sprint (Phase 2) ➡️ Development, QA (Phase 3) ➡️ Launch ➡️ Management (Phase 4)

 

Measuring Success in the Design/UXR Sprint

It’s during this sprint that we could establish and measure for metrics that would define success.

During the spring, designers and developers would breakout into teams to iterate on the current group viewing experience and propose new features.

 

What Metrics should we Measure for?

The designs produced during the sprint would be tested with users, and UX researchers would interview those test participants to measure the following metrics.

Learnability

I picked up the concept of learnability from games UX/UI. When gaming, users often taken breaks from their games, sometimes for weeks or even months. Designing a user interface that is easily “learnable” (or, easy to pick back up after a long break) is key.

I figure, same for Netflix. If we can design features that users understand and recall easily, then we can reduce the amount of technology-based friction for groups trying to choose a show.

 

Happiness (as part of the Google HEART framework) is an important metric for seeing whether we are successfully fulfilling user needs. Our UXRs can poll users for how satisfied they felt with using our new experience, ranking it on a clearly defined scale.

Task completion or success (also part of Google HEART), while not a metric that always directly translates to actual success, might apply well in this situation, where groups might take a long time struggling to make a decision. If their struggle goes on long enough, they might give up. Thus, reducing the time it takes to agree on a show would be helpful, as it could reflect a reduction in user frustration.

Users would try the designs produced during the sprint, and then be interviewed by UXRs.

Some metrics we could measure which I thought of after submitting my contest entry:

  • How many user groups go on to actually successfully complete the show they chose as a group?

  • How many “decision fails” (i.e. shows they began watching and then collectively chose to stop watching) do they experience before landing on success (or not)? (And what factors play into each of those fail/success moments?)

 

Research Tools of the Trade

Joe Formica (of Bitesize UX) introduced us to the newly rebranded Lyssna.com (formerly UsabilityHub), which is rumored to begin offering screener surveys soon. It seems like an interesting tool to help with participant management.

A screenshot from DoveTail’s qualitative analysis landing page.

During my research for this project, I stumbled across DoveTail, which seems to offer powerful tools like automated text analysis (with several data import options) and a way to combine multiple media forms (clips from recordings, notes) into one neat presentation.

Definitely bookmarking DoveTail for the future!

 

Summary

 

Contest Results

 

From Bitesize UX: “After much deliberation, based on criteria of solution completeness, execution, and presentation, we are thrilled to announce Irene Geller as the winner of the Netflix UX Research challenge!

“Below is a quote from our judges to highlight some aspects of Irene’s submission that stood out us:”

“Irene nailed the screening process, which is essential in a study like this. Netflix has a massive audience, and there is lot of nuance in the relationships of “co-watchers” that would certainly impact their behavior and decision making process. Irene did an excellent job of recognizing this, and creating a recruitment process that reflected the importance of a high-quality pool of participants”

While there’s no way to know which research plan would yield the most useful insights (that’s why you do the research), I felt that Irene’s proposed plan gave the best likelihood of understanding and improving the co-watching experience. This could be as a direct result of the proposed research, or in subsequent studies based on those findings.”

Ultimately, Irene’s proposal incorporated a detailed plan for observing the shared decision-making process, which I initially thought was the key piece to understanding the process of finding something to watch. While there were other plans that included an observation method, Irene’s was among the most thorough. It was supported by other methods outlined, including a detailed recruitment process and initial qualitative interviews. I think Irene’s plan would produce some really interesting insights related to the questions and goals outlined in the brief.

Congratulations, Irene!!!”

 

Thank you so much to the judges & Bitesize UX crew for being awesome and sponsoring this challenge🙏


What do you think of my research plan? Are there any gaps I didn’t account for? Are YOU a researcher at Netflix? 👀

Feel free to reach out and let me know your thoughts via Linkedin!

Next
Next

Prototyping Quizzie, a Vocab App