您所在的位置: 首页 / 学术空间

Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

讲座编号:jz-yjsb-2023-y016

讲座题目:Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

主 讲 人:夏俐 教授 中山大学

讲座时间:2023年12月1日(星期)下午15:30

讲座地点:新2最新登录口阜成路校区东区科教楼四层会议室

参加对象:计算机与人工智能学院信息管理系研究生及本科生

主办单位:计算机与人工智能学院

主讲人简介:

夏俐,中山大学管理学院教授。分别于2002年和2007年在清华大学自动化系获得学士和博士学位,博士生期间在香港科技大学联合培养,博士毕业后分别在IBM中国研究院、沙特国王科技大学从事科研工作,2011年至2019年在清华大学自动化系任教,历任讲师、副教授(博士生导师),2019年调入中山大学。主要研究方向为马氏决策过程、强化学习、排队论、博弈论等理论研究,以及在能源、金融等领域的应用研究。发表论文100余篇,获得美国专利3项、中国专利8项,主持4项国家自然科学基金项目、3项国家重点研发计划子课题、多项华为公司等合作研发项目。担任IEEE Transactions on Automation Science and Engineering、Discrete Event Dynamic Systems等国际权威SCI期刊的副主编(AE)等学术兼职。曾获2021年和2014年教育部高等学校自然科学二等奖等学术奖励。

讲内容:

CVaR(Conditional Value at Risk) is an important risk measure in finance engineering. Traditional studies on the optimization of CVaR metrics are usually for single-stage problem. When extended to multi-stage scenarios, the CVaR risk function is not additive per stage, which does not fit the standard MDP(Markov decision process) model and the principle of dynamic programming fails. In this talk, we study the MDP optimization problem for long-run CVaR criterion using a new tool called the sensitivity-based optimization. By introducing a pseudo CVaR metric, we convert the original problem as a bilevel MDP problem: the inner is a standard MDP optimizing the pseudo CVaR, the outer is an optimization problem for a single auxiliary variable. We derive a CVaR difference formula which quantifies the difference of long-run CVaR values under any two randomized policies. With this difference formula, we prove the optimality of deterministic policies. We also obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. We further develop a policy iteration type algorithm to efficiently optimize CVaR. We prove that the iterative algorithm can converge to local optima in the mixed policy space. Finally, we conduct a numerical experiment about portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.