2025 Netflix Workshop on Personalization, Recommendation and Search (PRS)

October

10th

2019

7:00pm—8:00pm

Session title

Ticket

$0000

Ticket description

Day Month 0th 0:00pm

100 remaining

List Item

Clear your calendar - It's going down! Splash Blocks kicks off on April 20th, and you're invited to take part in the festivities.

Speaker Name

Short speaker biography.

On what date does the event take place?

The event is scheduled for Wednesday, May 31, 2024.

On what date does the event take place?

The event is scheduled for Wednesday, May 31, 2024.

2025 Netflix Workshop on Personalization, Recommendation and Search (PRS)

Friday

May

8:30AM

–

6:00PM

PDT

Event Details

The ninth Netflix workshop on Personalization, Recommendation and Search (PRS) aims at bringing together practitioners and researchers working in domains to facilitate the sharing of ideas, information and approaches to build bridges between these communities.

Please register in advance using the RSVP button above. Registrations will close when we reach capacity (which we have in prior years) or by Friday, May 2nd. So if you're interested, don't delay.

If you are interested in presenting a poster during the workshop, please fill out this form before Monday, April 14th. Accepted posters will be notified by Monday, April 21st.

The event will be in-person only, at our beautiful Netflix campus in Los Gatos, CA.

This @NetflixResearch workshop is organized by:

Justin Basilico - jbasilico[at]netflix.com

Grace Huang - ghuang[at]netflix.com

Sudarshan Lamkhede - slamkhede[at]netflix.com

Kriti Kohli - kritik[at]netflix.com

Aish Fenton - afenton[at]netflix.com

Nathan Kallus - nkallus[at]netflix.com

Linas Baltrunas - lbaltrunas[at]netflix.com

Anne Cocos - acocos[at]netflix.com

Ogheneovo Dibie - odibie[at]netflix.com

Gary Tang - garyt[at]netflix.com

For questions, contact prs-organizers[at]netflix.com

Previous PRS workshops: 2024, 2023, 2022, 2021, 2019, 2018, 2017, 2016.

Tentative Agenda

Note

Sequencing and timing of sessions is still subject to change

8:30 AM PDT

Registration Opens

Breakfast & Coffee

9:15 AM PDT

Welcome & Opening remarks

Workshop Organizers (Netflix)

9:30 AM PDT

Rethinking Machine Learning in the Era of AI Product Development

Zachary Lipton (CMU, Abridge)

10:00 AM PDT

Learning to Recommend via Generative Optimization

Adith Swaminathan (Netflix)

10:30 AM PDT

Break

Registration Closes at 11am

11:00 AM PDT

Graph Transformers in Practice: Kumo’s Approach to Personalization at Scale

Hema Raghavan (Kumo.AI)

11:30 AM PDT

Synthetic Evaluations & GenAI Application Development for Finance

Edgar Meij (Bloomberg)

12 PM PDT

Lunch

12:30 PM PDT

Poster Session

1:30 PM PDT

Putting the 'You' in YouTube: Better Personalization through Larger Models

Lexi Baugher (YouTube/Google)

2:00 PM PDT

From Many Models to Few: Instacart's LLM-Driven Approach to Search and Discovery

Tejaswi Tenneti (Instacart)

2:30 PM PDT

Fireside chat with Kevin Scott on AI and Personalization in conversation with Elizabeth Stone

Kevin Scott (Microsoft), Elizabeth Stone (Netflix)

3:00PM PDT

Break

3:30 PM PDT

Domain Adapting Open Weight Models to Unlock Spotify Catalog Understanding

Divita Vohra and Jacqueline Wood (Spotify)

4:00 PM PDT

Evolution of Netflix Recommendations: Unleashing the Power of Multi-task and Foundation Models for Scalable Recommendation

Ying Li and Ko-Jen Hsiao (Netflix)

4:30 PM PDT

Closing & Happy hour

6:00 PM PDT

End of event

The Final Countdown!

Time left for the event days hours minutes seconds

The countdown doesn't work if the event start date is set to TBD

Speakers

Kevin Scott

Microsoft

Kevin Scott is Microsoft’s chief technology officer. Prior to joining Microsoft, he was senior vice president of engineering and operations at LinkedIn, where he helped build the technology and engineering teams and led the company through an IPO and six years of rapid growth. Earlier in his career, he oversaw mobile ads engineering at Google, including the integration of Google’s $750 million acquisition of AdMob. Scott is the host of the podcast Behind the Tech, which features interviews with technology heroes who have helped create the tech industry of today. He also authored the book “Reprogramming the American Dream,” which explores how artificial intelligence can be realistically used to serve the interests of everyone, not just the privileged few. He is a member of the board of directors of Shopify, an adviser to several Silicon Valley startups, an active angel investor, the founder of the non-profit organization Behind the Tech, an emeritus of the Anita Borg Institute’s board of trustees and a trustee of The Scott Foundation.

Elizabeth Stone

Netflix

Elizabeth Stone was named Chief Technology Officer in October 2023. She joined Netflix in 2020, serving first as Vice President, Product Data Science & Engineering and more recently as Vice President, Data & Insights. Elizabeth previously worked in finance as a trader at Merrill Lynch and economist at Analysis Group before transitioning to the tech industry, where she served as Chief Operating Officer at Nuna and Vice President, Science at Lyft. She is a graduate of MIT and Stanford University.

Zachary Lipton

CMU, Abridge

Zachary Lipton is the Chief Technology Officer and Chief Scientist of Abridge, the leading platform for AI-based ambient listening technology in healthcare. Abridge’s technology or speech and language models to unburden doctors from repetitive clerical tasks, liberating them to focus on patient care. He is also the Raj Reddy Associate Professor of Machine Learning at Carnegie Mellon University, where he directs the Approximately Correct Machine Intelligence (ACMI) lab. Research focuses include the theoretical and engineering foundations of robust and adaptive machine learning algorithms, applications to both prediction and decision-making problems in clinical medicine, natural language processing, and the impact of machine learning systems on society. A key theme in his current work is to take advantage of causal structure underlying the observed data while producing algorithms that are compatible with the modern deep learning power tools that dominate practical applications. He is the founder of the Approximately Correct blog (approximatelycorrect.com) and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks that has reached millions of readers. He can be found on X (@zacharylipton), GitHub (@zackchase), or his lab's website (acmilab.org).

Divita Vohra

Spotify

Divita Vohra is a Senior Product Manager at Spotify, where she leads AI infrastructure and tooling for scalable, responsible, and high-impact ML systems across real-time and batch workflows. Her work powers personalization and discovery for 600+ million users and spans fine-tuning, evaluation, and inference. She introduced the open-source framework Ray into Spotify’s ML stack and contributes to industry efforts on model and system cards for transparent AI. Previously, she worked on ML infrastructure at Capital One. She holds an M.S. in Computer Science from Georgia Tech and a B.S. in Computer Engineering from Virginia Tech.

Adith Swaminathan

Netflix

Adith Swaminathan is a research scientist in the Machine Learning and Inference Research team at Netflix. He studies machine learning to improve decision-making for user-interactive systems, including recommender systems, LLM-augmented agents, interactive games and cloud computing systems. He received a PhD from Cornell University in 2017 and was a researcher at Microsoft Research Redmond from 2017-2024.

Lexi Baugher

YouTube/Google

Lexi Baugher is a Distinguished Software Engineer at YouTube. She got her Bachelor of Science from Caltech in Engineering and Applied Science. She has worked at Google for 23 years. She is currently the Area Tech Lead of YouTube Discovery, overseeing development of YouTube's recommendations systems. She lives in San Francisco with her husband, kid, and cat; and enjoys baseball, running, and baking.

Tejaswi Tenneti

Instacart

Tejaswi Tenneti is currently a Director of Machine Learning at Instacart, the North American leader in online grocery. Prior to Instacart, Tejaswi was a tech lead in machine learning teams at Apple and Oracle where he worked on various applications related to Search and Recommendations for local maps data and Enterprise. Tejaswi holds a BS from IIIT, Allahabad and an MS from Stanford University specializing in AI.

Ying Li
Netflix

Ying Li is a Research Scientist in the Core Recommendation team at Netflix. At Netflix, her work focuses on core recommendation ranking and large-scale recommendation systems. Prior to Netflix, she was an Applied Scientist in Amazon, focusing on cold-start classification and large-scale extreme classification using NLP. She obtained her Ph.D. from the University of California, Los Angeles, and B.S. from Peking University. She co-chaired the REVEAL (Reinforcement learning-based recommender systems at scale) workshop at RecSys 2022.

Ko-Jen Hsiao

Netflix

Ko-Jen (Mark) Hsiao is a research scientist/engineer at Netflix, with 8 years of experience working on Netflix's core recommendation systems. He has expertise in applying machine learning at scale, constructing and A/B testing ranking algorithms and recommendation systems. His current focus is on developing foundational models for various algorithms that drive Netflix's personalized experiences. He earned his PhD from the University of Michigan, where he specialized in combining disparate information for machine learning applications. In his free time, he enjoys outdoor activities with his three kids.

Hema Raghavan

Kumo.AI

Hema Raghavan is Vice President of Engineering and Co-founder of Kumo AI where she is responsible for developing the AI technology to help Kumo users build better ML models. Previously, Raghavan was Senior Director of Engineering at LinkedIn where she led a globally distributed diverse team that built AI and ML solutions for fueling LinkedIn’s growth, including People You May Know and the company’s Air Traffic Controller AI that governed member communications. She has also worked as a Research Staff Member at IBM and a Scientist at Yahoo!. Raghavan has a PhD in Computer Science from the University of Massachusetts Amherst, and a degree in Computer Engineering from the University of Mumbai.

Edgar Meij

Bloomberg

Edgar Meij is the head of the AI Platforms division in Bloomberg’s Artificial Intelligence (AI) group, and leads 10+ teams of engineers and researchers that are responsible for all key AI, NLP, ML, LLM/GenAI, and Search technology platforms used across the company. Edgar holds a Ph.D. in computer science from the University of Amsterdam and has an extensive track record in artificial intelligence, information retrieval, natural language processing, machine learning, large-scale computing infrastructures, knowledge graphs, and semantic search. He has published more than 150 papers in top international venues, which have been cited more than 3,000 times. He is also a (Senior) Program Committee member of virtually every major conference in the field (including The Web Conference, WSDM, ECIR, SIGIR, CIKM, and EMNLP), has organized tutorials and workshops at those same conferences, has served as sponsorship co-chair for The Web Conference, ECIR, and ICAIF, and as co-chair for the Industry Track (SIRIP) for SIGIR 2024.

Jacqueline Wood

Spotify

Jacqueline Wood is a Senior Machine Learning Engineer at Spotify, specializing in adapting open-weight large language models to enhance catalog understanding and personalization. She works on LLM fine-tuning and inference at scale, focusing on optimizing cost and latency to unlock innovative recommendation capabilities across Spotify’s catalog of music, podcasts, and audiobooks. Previously, she developed Spotify’s lifetime value model to forecast the long-term impact of decisions on Spotify’s users and creators. She holds a BA in Statistics from UC Berkeley.

Talks & Abstracts

Title: Fireside chat with Kevin Scott in conversation with Elizabeth Stone on AI and Personalization

Speakers: Kevin Scott (Microsoft), Elizabeth Stone (Netflix)

Abstract:

Join us for a fireside chat with Microsoft CTO Kevin Scott, a leader who has played a critical role in Microsoft’s AI transformation, in conversation with Netflix CTO Elizabeth Stone. Together, they'll explore emerging trends, groundbreaking advances, and insights into how AI is shaping the future of personalized user experiences.

Title: Domain Adapting Open Weight Models to Unlock Spotify Catalog Understanding

Speaker: Divita Vohra and Jacqueline Wood (Spotify)

Abstract:

Open-weight Large Language Models (LLMs) provide a powerful foundation for personalization and content understanding but lack intrinsic awareness of Spotify’s rich catalog and dynamic user ecosystem. In this talk, we present our approach to domain-adapting open-weight LLMs by introducing structured representations of catalog entities – such as artists, episodes, and audiobooks – using semantic tokenization techniques. By grounding these models in Spotify’s unique data, we unlock a range of impactful use cases, including playlist sequencing, cold-start video recommendations, and personalized podcast experiences.

We’ll explore key technical challenges, including designing robust catalog tokenization workflows, training strategies for integrating platform-specific knowledge, and bridging world knowledge with domain expertise in production-scale systems. This work highlights how generative models can be transformed into scalable, domain-aware systems that deeply understand Spotify’s content and users, paving the way for the next generation of personalized experiences.

Title: Rethinking Machine Learning in the Era of AI Product Development

Speaker: Zachary Lipton (CMU, Abridge)

Abstract:

In this informal talk, I’ll share some observations and lessons learned from building Abridge, a company whose core technology listens to doctor-patient conversations and reads patient histories, automatically drafting notes, orders, codes, flowsheets, & other key artifacts, liberating doctors to focus on patient care. Over the past two years, Abridge has grown from an academic prototype into a core part of care delivery, now supporting tens of thousands of clinicians across millions of patient encounters each week. In particular I’ll reflect on how in the past few years, many of the frames through which we have pursued machine problems have been inverted. In an older paradigm we collected (or inherited) data, then trained models, then evaluated, then deployed. Now, we prototype capabilities without labeled data, then evaluate, then engage in limited deploys, and only then train models. The notion of “the distribution” has given way to a flexible view in which datasets grow organically, the definition of “gold standard” is a moving target, and any one metric or dataset is taken only to be one among many components of a larger sensory apparatus. Moreover, as the success of AI projects has grown more tightly coupled to in vivo experimentation and data flywheels, siloed research organizations have given way to more dynamic teams in which where deep learning modeling, and post-training orchestration are woven into the fabric of product development alongside software engineering and interaction design. This talk will reflect on how the experience of building Abridge has reshaped my perspective on applied machine learning and some of the core problems (e.g., distribution shift) that anchor my research agenda.

Title: Learning to Recommend via Generative Optimization

Speaker: Adith Swaminathan (Netflix)

Abstract:

By leveraging Large Foundation Models (LFMs) trained on internet-scale data, recommender agents can ingest world knowledge, enhance user and content understanding, and interpret complex user feedback. Yet, LFMs alone are insufficient for personalized recommendations; integrating external tools such as item catalogs and user interaction histories remains essential. Current workflows for LFM-augmented recommender agents involve manually tuning exotic parameters, including prompts and orchestration code. This talk describes the Trace platform, which enables generative optimization of these parameters using rich feedback. Trace allows agent designers to operate at a meta level; rather than crafting recommendations or tuning parameters by hand, we design optimizers that iteratively refine the agent's performance. Adopting this meta-optimization perspective surfaces new research challenges. First, for agents to learn effectively we should design workflows that yield optimizable computation graphs. Second, feedback design is important so that generative optimizers resist overfitting and reward hacking. Finally, we describe workflow design patterns through which optimization aligns with intended learning objectives. By embracing this optimization perspective, we draw lessons from how neural networks co-evolved with their optimizers, and take a step towards self-improving recommender agents that continuously learn, adapt and personalize at scale.

Title: Putting the 'You' in YouTube: Better Personalization through Larger Models

Speaker: Lexi Baugher (YouTube/Google)

Abstract:

Large recommender systems power the world's biggest online media platforms, finding patterns in past user feedback data in order to match billions of viewers to billions of pieces of content. As users' expectations for platform performance increase, it becomes important for our recommender system to perform better under a broader range of circumstances - even ones that the system hasn't seen before.

However, it is too easy to build a system that focuses narrowly on amplifying the choices that produced the best outcomes yesterday, and harder to build one that understands when these choices aren't optimal for tomorrow. The rise of LLMs presents us with inspiration for how large models can help us with a planet-sized personalization challenge. This inspiration suggests three approaches: scaling traditional recommender models, delegating planning tasks to LLMs, and building a generative recommender model.

This talk will present these approaches and show how they can be applied in production settings.

Title: From Many Models to Few: Instacart's LLM-Driven Approach to Search and Discovery

Speaker: Tejaswi Tenneti (Instacart)

Abstract:

Traditional deep learning models have revolutionized search and recommendations for ecommerce. But each component in the system - such as query understanding, retrieval, and ranking often requires its bespoke model, complicating maintenance and optimization. Imagine replacing these with a few LLMs to deliver hyper-personalized results pages.
In this talk, discover how Instacart revolutionized its search and recommendation systems by:
- Generating offline content for improved recommendations
- Replacing multiple query understanding models with a single LLM
- Employing LLMs for generative retrieval
- Ongoing efforts to agents to orchestrate the overall workflow, producing the final search results page

Title: Evolution of Netflix Recommendations: Unleashing the Power of Multi-task and Foundation Models for Scalable Recommendation

Speaker: Ying Li and Ko-Jen Hsiao (Netflix)

Abstract:

In this talk, we delve into the evolution of Netflix's homepage recommendation system, which effectively scales the recommendation across diverse canvases and content types through the use of multi-task learning and foundation models. The Netflix homepage employs a 2D "Lololo" framework—a list of lists of movies—to present a variety of content types, organized into themed canvases like "Your Next Watch," "Action Movies," "Mobile Games," and "We Think You'll Love These." Traditionally, numerous personalization algorithms were developed to rank these canvases and items. However, maintaining such a vast array of algorithms is not scalable and presents challenges in accommodating new business needs. To address this scalability challenge, we introduced "Hydra" models, which use multi-task learning to consolidate diverse ranking signals and models. This innovation simplifies the system and facilitates the seamless integration of new business requirements, such as live and game recommendations, across all canvases. Furthermore, Hydra models are integrated with the Foundation Model (FM) to enhance the user and item understanding. Inspired by large language models, the FM model centralizes the learning of member preferences and item insights. It efficiently disseminates these insights across various downstream applications, including homepage recommendations, search, messaging, and evidence.

Title: Graph Transformers in Practice: Kumo’s Approach to Personalization at Scale

Speaker: Hema Raghavan (Kumo.AI)

Abstract:

Personalization and risk detection in enterprise environments often rely on complex pipelines and manual feature engineering, limiting speed and flexibility. At Kumo, we take a different approach. By leveraging Graph Transformers, we learn directly from relational data—capturing deeper context and delivering real-time predictions without traditional data prep.
In this talk, I’ll share how Kumo’s platform operationalizes this technology to power large-scale recommendation systems and detect fraud patterns with high accuracy. I’ll walk through the core architecture, discuss deployment in production, and highlight key learnings from applying Graph Transformers across diverse use cases. Attendees will gain a practical understanding of how this model class can simplify workflows, accelerate experimentation, and unlock more value from enterprise data.

Title: Synthetic Evaluations & GenAI Application Development for Finance

Speaker: Edgar Meij (Bloomberg)

Abstract:

Financial decision-making is a multifaceted process that encompasses identifying and summarizing key information, generating and vetting trade ideas, locating liquidity, as well as conducting post-trade analyses and crafting reports. It involves domain-specific financial and critical thinking skills. Therefore, building meaningful AI-driven applications in this setting places specific constraints on evaluation at all stages of the application lifecycle – ranging from ideation and development to post-release monitoring. Even though evaluation has mostly been a manual process, LLMs hold promise for faster, easier, and (sometimes) more accurate judgments and annotations. This talk will focus on evaluation and application development in the setting of capital markets financial services, the emergence of synthetic evaluation paradigms, implications in light of recent work, and open challenges in this emerging field.

Posters

The following posters will be presented from 12:30-1:30 at the workshop.

Enhancing Large Language Models with Domain-Specific Content Knowledge for Improved Recommendations

Zhe Zhang, Yesu Feng (Netflix)

In the entertainment sector, content significantly influences viewer decisions, necessitating a Large Language Model (LLM) with specialized knowledge of the entertainment catalog. Traditional LLMs, while proficient in capturing broad factual information, struggle with static knowledge and lack domain-specific expertise. This paper explores post-training knowledge injection via instruction tuning to address these challenges. By converting content into diverse Question-Answer pairs, the model gains a nuanced understanding of titles, enhancing its ability to generalize to new instructions. This enriched context is expected to improve the model's accuracy in recommending the next title, aligning more closely with individual viewer preferences and histories.

LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

Madhu Arun (LinkedIn)

The poster LinkedIn’s large-scale, GPU-based retrieval system for out-of-network feed. The new retrieval system supports a billion-sized index on GPU models where both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. A key focus is on enabling attribute-based pre-filtering for exhaustive GPU searches, addressing the common challenge of post-filtering in KNN searches that often reduces system quality. We believe this represents one of the industry’s first Live-updated model-based retrieval indexes. Applied to out-of-network post recommendations on LinkedIn Feed, it has contributed to a +0.1% lift in daily unique professional users. We envisage this as a step towards integrating retrieval and ranking into a single GPU model, simplifying complex infrastructures and enabling end-to-end optimization of the entire differentiable infrastructure through gradient descent.

Counterfactual Inference under Thompson Sampling

Olivier Jeunen (Aampe)

Recommender systems exemplify sequential decision-making under uncertainty, strategically deciding what content to serve to users, to optimise a range of potential objectives.
To balance the explore-exploit trade-off successfully, Thompson sampling provides a natural and widespread paradigm to probabilistically select which action to take.
Questions of causal and counterfactual inference, which underpin use-cases like offline evaluation, are not straightforward to answer in these contexts.
Specifically, whilst most existing estimators rely on action propensities, these are not readily available under Thompson sampling procedures.

We derive exact and efficiently computable expressions for action propensities under a variety of parameter and outcome distributions, enabling the use of off-policy estimators in Thompson sampling scenarios.
This opens up a range of practical use-cases where counterfactual inference is crucial, including unbiased offline evaluation of recommender systems, as well as general applications of causal inference in online advertising, personalisation, and beyond.

LLM for Member Taste Summarization

Thea Wang (Netflix)

In today’s fast-evolving digital space, personalization is key to enhancing user experience and satisfaction. As LLM demonstrates remarkable capabilities in NLP tasks, we explore fine tune open source LLMs for member taste summarization to improve personalization.

Tweedie Regression for Video Recommendation System

Qiang Chen (Tubi)

Modern recommendation systems aim to increase click-through rates (CTR) for better user experience, through commonly treating ranking as a classification task focused on predicting CTR. However, there’s a gap between this method and the actual objectives of businesses across different sectors. In video recommendation services, the objective of video on demand (VOD) extends beyond merely encouraging clicks, but also guiding users to discover their true interests, leading to increased watch time. And longer users’ watch time will leads to more revenue through increased chances of presenting online display advertisements. This research addresses the issue by redefining the problem from classification to regression, with a focus on maximizing revenue through user viewing time. Due to the lack of positive labels on recommendation, the study introduces Tweedie Loss Function, which is better suited in this scenario than the traditional mean square error loss. The paper also provides insights on how Tweedie process capture users’ diverse interests. Our offline simulation and online A/B test revealed that we can substantially enhance our core business objectives: user engagement in terms of viewing time and, consequently, revenue. Additionally, we provide a theoretical comparison between the Tweedie Loss and the commonly employed viewing time weighted Logloss, highlighting why Tweedie Regression stands out as an efficient solution. We further outline a framework for designing a loss function that focuses on a singular objective.

Finding Interest Needle in Popularity Haystack: Improving Retrieval by Modeling Item Exposure

Amit Jaspal, Nicolas Bievre (Meta)

Recommender systems operate in closed feedback loops, where user interactions reinforce popularity bias, leading to over-recommendation of already popular items while under-exposing niche or novel content. Existing bias mitigation methods, such as Inverse Propensity Scoring (IPS) and Off- Policy Correction (OPC), primarily operate at the ranking stage or during training, lacking explicit real-time control over exposure dynamics. In this work, we introduce an exposure- aware retrieval scoring approach, which explicitly models item exposure probability and adjusts retrieval-stage ranking at inference time. Unlike prior work, this method decouples exposure effects from engagement likelihood, enabling controlled trade-offs between fairness and engagement in large-scale recommendation platforms. We validate our approach through online A/B experiments in a real-world video recommendation system, demonstrating a 25% increase in uniquely retrieved items and a 40% reduction in the dominance of over-popular content, all while maintaining overall user engagement levels. Our results establish a scalable, deployable solution for mitigating popularity bias at the retrieval stage, offering a new paradigm for bias-aware personalization.

Leveraging Multimodality for Netflix Recommender Systems

Emma Kong, Asad Abbasi, David Fagnan, Bowei Yan, Aneesh Vartakavi, Dhaval Patel, Elliot Chow (Netflix

When logging into Netflix, members are greeted with a diverse array of titles on the homepage. To enrich member’s browsing and discovery experience, we present diverse evidence assets for each title placed in different locations. For example, the most prominent asset is the artwork, or box-art image, which showcases the main characters and underscores the title's theme. We also provide a comprehensive synopsis and evidence cards to outline the content, eye-catching badges to draw users' attention, and supplemental videos (trailers) to offer a sneak peek into the shows.

With the thrilling advancements on the multimodal models, computer visions, and large language models (LLMs), we are motivated to incorporate rich asset information, such as images, text, and videos, into our recommendation algorithms.

Our initial move is to leverage multimodal embeddings in the evidence personalization domain. We have established the paved path to experiment various types of embedding in our image/video/text evidence personalization, and specific applications like query-aware evidence on the search page.

Personalized explanations via GenAI

JJ Tan, Emma Kong, David Fagnan (Netflix)

Recent advancements in Generative AI (GenAI) empower us to generate free-text personalized explanations that connect a recommended title given a member's viewing history. These explanations are not only more expressive but can also incorporate a broader context, offering an unprecedented level of personalization with evidence creation that surpasses the conventional evidence assets. Our goal is to harness AI to empower our recommendation system with reasoning and explainability, make our recommendation algorithm more transparent for our Netflix members.

Real-Time Recommendation Reranking with Goal-Oriented Linear Optimization

Shreyas S Vidyarthi, Sukanya Moorthy (Intuit Credit Karma)

This work presents a novel approach to real-time personalization of recommendations by integrating user-specific goals through linear optimization. Traditional recommendation systems often rely on static models trained on delayed data, limiting their ability to adapt to dynamic user preferences and immediate needs. We address this limitation by introducing a two-stage architecture: (1) A genAI agent invokes recsys tool and generates an initial ranked list of recommendations based on a robust but potentially outdated feature set. (2) Upon user interaction, we capture their specific goal in real-time and employ a lightweight linear optimization layer to dynamically re-rank the initial recommendations by treating the user-specified goal as a constraint within the optimization problem. This approach allows us to maintain the efficacy of the pre-generated recommendations while ensuring precise alignment with the user's immediate objectives. Our method offers a computationally efficient and scalable solution for real-time personalization, enhancing user experience and engagement within the FinTech domain.

Reward Alignment for Recommendation Systems using Two Stage Training

Swanand Joshi, Jaewook Yu, Varad Pathak, Anuj Shah, Gary Tang, Kriti Kohli (Netflix)

Our mission at Netflix is to entertain every member by recommending the right shows to them. While conventional recommendation systems typically focus on immediate metrics such as clicks or short-term engagement, these measures often fail to reflect a user's enduring satisfaction. Our approach aims to recommend content that delivers both immediate appeal and lasting enjoyment, ultimately providing greater value to our members and fostering long-term retention. Over time, achieving this leads to each surface ranker optimizing its own set of reward-maximizing policies, making it harder to innovate on the entire recommendation stack holistically. Moreover, this significantly increases the compute costs associated with training and maintaining bespoke reward-aligned surface policies.

To address these challenges, we are transitioning from independent policies for each surface recommendation model to implementing a centralized "core behavior value" that aligns all ranking models through a consistent algorithmic procedure—two-stage training.

We propose this two-stage training process that decouples reward optimization from training extensive recommendation models. The first stage involves pre-training a policy without specific reward optimization, while the second stage incorporates engineered proxy rewards that better reflect long-term satisfaction. This approach offers backward compatibility and multiple implementation options, including fine-tuning the base policy on smaller datasets optimized for longer-term objectives.

By adopting this two-stage training methodology, we've achieved significant improvements in compute resource efficiency—approximately 70% savings in pre-processing and model training pipelines across various surface models including homepage, category tabs, and candidate generation. Additionally, this approach enables independent innovation tracks for researchers working separately on engagement models and long-term satisfaction metrics.

CATE-based Treatment Covariate Calibration for Messaging Personalization

Ishan Gupta, Matthew Wood (Netflix)

The Messaging Selection Algorithm (MESA) is the core algorithm used by the Netflix Messaging Personalization System (MPS) to create a personalized ranking of messages for a user based on their interests. MESA relies on accurate predictions of the individual treatment effect of a message on a user’s engagement. MESA faces challenges with unpredictable fluctuations in the treatment effect estimation when there is insufficient historical data for specific messages which can negatively impact user experience when specific message intents are under- or over-selected. We present CATE-based Treatment Covariate Calibration, a technique for calibrating treatment effects of any blackbox model based on IPS-weighted causal effect estimates. Our approach is designed with extensibility in mind, allowing for straightforward incorporation of treatment covariates such as message intent or delivery channel into the calibration action space. This flexibility enables more granular calibration across multiple dimensions of the recommendation system. In practical applications to MESA, this algorithm has demonstrated improvements in the calibration of individual treatment effect estimates for models trained using both S-Learner and Double Machine Learning causal learning frameworks.

Multi-layer Bandit Algorithm for Personalizing Ranking and Pacing of Netflix Messages

Ishan Gupta, Kevin Mercurio, Sergi Perez, Matthew Wood (Netflix)

The Messaging Personalization System (MPS) is the Netflix algorithm responsible for providing a personalized message experience for each user by choosing the best messages and delivery channel (e.g. push or email) as well as the timing and frequency of messages. Personalizing the frequency of messages is critical for a high quality user experience as sending too many messages may lead to fatigue or opt-out, while too few messages may lead to reduced user engagement. We present the design of a two-layer algorithm that simultaneously optimizes for both short- and long-term user engagement. The first layer, a slow policy, runs on a weekly cadence to determine the optimal message pacing for each user across different delivery channels and targets the causal effect of messaging on long-term engagement signals (e.g. weekly aggregated). The personalized message pacing is then sent as an input to the second layer, a ranking policy that is triggered on a more frequent cadence (e.g. daily). This fast ranking policy targets short-term engagement signals and makes the final ranking and targeting decisions for a user while adhering to the constraints established by the slow policy.

Towards Addressing Title Cold-Starting in LLM GenRec

Yongchang Hao, Rein Houthooft, Jiangwei Pan, Justin Basilico (Netflix)

Large Language Models (LLMs), pre-trained on extensive internet discussions, present a new paradigm for improving title (i.e., shows and movies) recommendations. Generative recommenders (GenRec) using LLMs have already shown promising results compared to previous baselines. However, the frequent release of new content introduces the challenge of title cold-starting: new titles, not present in interaction sequence datasets, are not learned by the model for recommendation. To alleviate this issue, this work explores methods to incorporate new titles into LLM-based recommendation systems without compromising their inherent strengths, aiming to preserve the advantage LLMs offer in predicting and recommending newly released titles.

Adaptive Multi Turn Intent Classification and Discovery

Sukanya Moorthy , Shreyas Vidyarthi (Intuit Credit Karma)

This work introduces a novel, multi-stage framework for robust intent detection in production dialogue-based search assistants and query rerouting systems. It addresses challenges such as ambiguity in multi-turn interactions, malicious intent detection, continuous adaptation to new intents, and initial data scarcity. The framework uses a staged approach: starting with few-shot learning via Large Language Models (LLMs) and a high-level ontology, it transitions to a supervised classifier with a sliding window for improved contextual awareness as data grows. Unsupervised clustering and weak supervision enable continuous learning, granular intent detection, and identification of harmful requests. A final hybrid architecture integrates the classifier, clustering, and LLMs to handle edge cases and novel intents, ensuring rapid deployment and long-term scalability.

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Mingfu Liang, Xi Liu, Huayu Li, Jiyan Yang, Nancy Yu, et al. (Meta)

Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.

Generative Search: Building an Interactive Discovery Experience by Leveraging LLMs

Netflix

Aditya Singha, Chris Samarinas, Ding Tong, Ehsan Golami, Matteo Rinaldi, Shahrzad Naseri, Spencer L'Heureux, Sudarshan Lamkhede, Vito Ostuni, Yesu Feng, Zhe Zhang (Netflix)

We discuss how we built a more interactive discovery experience for Netflix using LLMs to understand complex user queries for providing useful recommendations. Audience will be able to try out the Beta as well.

Venue

Get to know us

NETFLIX RESEARCH