Chapter 93: RL for Algorithmic Trading
Learning objectives Simulate a simple stock market with one asset (e.g. price follows a random walk or a simple mean-reverting process). Design an MDP: state = (price, position, cash, or features); actions = buy / sell / hold (possibly with size); reward = profit (or risk-adjusted return). Train an agent (e.g. DQN or PPO) on this MDP and evaluate its Sharpe ratio (mean return / std return over episodes or over time). Discuss risk management: position limits, drawdown, transaction costs; how the reward and state design affect behavior. Relate the exercise to trading and finance anchor scenarios (state = market + portfolio, action = trade, reward = profit or Sharpe). Concept and real-world RL ...