BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//IFDS - ECPv6.0.1.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-ORIGINAL-URL:https://ifds.info
X-WR-CALDESC:Events for IFDS
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240216T133000
DTEND;TZID=America/Los_Angeles:20240216T143000
DTSTAMP:20260514T210524
CREATED:20240318T212625Z
LAST-MODIFIED:20240318T212625Z
UID:2884-1708090200-1708093800@ifds.info
SUMMARY:Offline Multi-task Transfer RL with Representational Penalization
DESCRIPTION:Speaker Bio: Avinandan is a second year PhD student\, advised by Maryam Fazel and Lillian Ratliff. His interests are in sequential learning and game theory. \n\n\nAbstract: We study the problem of representational transfer in offline Reinforcement Learning (RL)\, where a learner has access to episodic data from a number of source tasks collected a priori\, and aims to learn a shared representation to be used in finding a good policy for a target task. Unlike in online RL where the agent interacts with the environment while learning a policy\, in the offline setting there cannot be such interactions in either the source tasks or the target task; thus multi-task offline RL can suffer from incomplete coverage.We propose an algorithm to compute pointwise uncertainty measures for the learnt representation\, and establish a data-dependent upper bound for the suboptimality of the learnt policy for the target task. Our algorithm leverages the collective exploration done by source tasks to mitigate poor coverage at some points by a few tasks\, thus overcoming the limitation of needing uniformly good coverage for a meaningful transfer by existing offline algorithms. We complement our theoretical results with empirical evaluation on a rich-observation MDP which requires many samples for complete coverage. Our findings illustrate the benefits of penalizing and quantifying the uncertainty in the learnt representation.
URL:https://ifds.info/event/offline-multi-task-transfer-rl-with-representational-penalization/
LOCATION:CSE (Allen) 403
CATEGORIES:MLOpt@UWash
END:VEVENT
END:VCALENDAR