Search

I was an invited speaker at the multi-armed bandit workshop organized by Gui Liberali at the Erasmus Center for Optimization of Digital Experiments in Rotterdam:

 

https://www.erim.eur.nl/e-code-erasmus-centre-for-optimization-of-digital-experiments/workshop-on-multi-armed-bandits-and-learning-algorithms/programme/

 

There was an amazing line up of speakers, featuring researchers in the decision-making field from very diverse communities.

 

I presented my work on Delayed Conversions. My poster is here and the slides are similar to those presented at UW last month :)

Updated: 7 mai 2018

On Tuesday, May 1st, I'll be talking about stochastic bandits with delayed feedback at the University of Washington. It's a real honor and pleasure to speak there and I truly thank Joseph Salmon and Zaid Harchaoui for their invitation.

 

This is a joint work with Olivier Cappé and Vianney Perchet that was accepted at UAI 2017.

 

Abstract:

Almost all real world implementations of bandit algorithms actually deal with bandit feedback: after a customer is presented an ad, his click (if any) is not sent within milliseconds but rather minutes or even hours or days, depending on the application. Moreover, this problem is coupled with an observation ambiguity: while the system is waiting for a click feedback, the customer might already have decided not to click at all and the learner will never get the awaited reward.

In this talk we introduce a delayed feedback model for stochastic bandits. We first consider the situation when the learner has an infinite patience and show that in that case the problem is actually not harder than the non-delayed one and can be solved similarly. However, this comes at a huge memory cost in O(T), T being the length of the experiment. Thus, we introduce a short-memory setting that mitigates the previously mentioned issue at the price of an additional censoring effect on the feedback that we carefully handle. We present an asymptotically optimal algorithm together with a regret bound and demonstrate empirically its behavior on simulated data.

 

Slides here

On Friday, October 20th, I defended my PhD on Statistical Models of User Behavior for Sequential Learning under Bandit Feedback.

Many thanks to everyone who came or joined the party after !