new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jun 1

Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

As LLM agents advance, they are increasingly mediating economic decisions, ranging from product discovery to transactions, on behalf of users. Such applications promise benefits but also raise many questions about agent accountability and value for users. Addressing these questions requires understanding how agents behave in realistic market conditions. However, previous research has largely evaluated agents in constrained settings, such as single-task marketplaces (e.g., negotiation) or structured two-agent interactions. Real-world markets are fundamentally different: they require agents to handle diverse economic activities and coordinate within large, dynamic ecosystems where multiple agents with opaque behaviors may engage in open-ended dialogues. To bridge this gap, we investigate two-sided agentic marketplaces where Assistant agents represent consumers and Service agents represent competing businesses. To study these interactions safely, we develop Magentic-Marketplace-- a simulated environment where Assistants and Services can operate. This environment enables us to study key market dynamics: the utility agents achieve, behavioral biases, vulnerability to manipulation, and how search mechanisms shape market outcomes. Our experiments show that frontier models can approach optimal welfare-- but only under ideal search conditions. Performance degrades sharply with scale, and all models exhibit severe first-proposal bias, creating 10-30x advantages for response speed over quality. These findings reveal how behaviors emerge across market conditions, informing the design of fair and efficient agentic marketplaces.

MicrosoftResearch Microsoft Research
·
Oct 27, 2025 2

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to directly interacting with marketplaces, their collective behavior can amplify volatility and mask deception at scale. We introduce the Agent Bazaar, a multi-agent simulation framework for evaluating Economic Alignment, the capacity of agentic systems to preserve market stability and integrity. We identify two failure modes: (1) Algorithmic Instability in a B2C market ("The Crash"), where firms amplify price volatility until the market collapses, and (2) Sybil Deception in a C2C market ("The Lemon Market"), where a single deceptive agent controlling multiple coordinated seller identities floods the market with fraudulent listings, eroding trust and consumer welfare. We evaluate frontier and open-weight models across both scenarios and find that models largely fail to self-regulate, with failure severity varying by model rather than by size. We propose economically aligned harnesses, Stabilizing Firms and Skeptical Guardians, that improve outcomes but remain fragile under harder market conditions. To close this gap, we train agents with REINFORCE++ using an adaptive curriculum, producing a 9B model that outperforms all evaluated frontier and open-weight models. We propose the Economic Alignment Score (EAS), a 4-component scalar metric aggregating stability, integrity, welfare, and profitability, enabling direct cross-model comparison. Our results show that economic alignment is orthogonal to general capability and can be directly trained with targeted RL.

The Subtle Interplay between Square-root Impact, Order Imbalance & Volatility: A Unifying Framework

In this work, we aim to reconcile several apparently contradictory observations in market microstructure: is the famous "square-root law" of metaorder impact, which decays with time, compatible with the random-walk nature of prices and the linear impact of order imbalances? Can one entirely explain the volatility of prices as resulting from the flow of uninformed metaorders that mechanically impact them? We introduce a new theoretical framework to describe metaorders with different signs, sizes and durations, which all impact prices as a square-root of volume but with a subsequent time decay. We show that, as in the original propagator model, price diffusion is ensured by the long memory of cross-correlations between metaorders. In order to account for the effect of strongly fluctuating volumes q of individual trades, we further introduce two q-dependent exponents, which allow us to describe how the moments of generalized volume imbalance and the correlation between price changes and generalized order flow imbalance scale with T. We predict in particular that the corresponding power-laws depend in a non-monotonic fashion on a parameter a, which allows one to put the same weight on all child orders or to overweight large ones, a behaviour that is clearly borne out by empirical data. We also predict that the correlation between price changes and volume imbalances should display a maximum as a function of a, which again matches observations. Such noteworthy agreement between theory and data suggests that our framework correctly captures the basic mechanism at the heart of price formation, namely the average impact of metaorders. We argue that our results support the "Order-Driven" theory of excess volatility, and are at odds with the idea that a "Fundamental" component accounts for a large share of the volatility of financial markets.

  • 2 authors
·
Mar 3

What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce

Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, vision-language-model (VLM) agents can parse webpages, evaluate products, and transact. This raises a fundamental question: what do AI agents buy, and why? We develop ACES, a sandbox environment that pairs a platform-agnostic VLM agent with a fully programmable mock marketplace to study this question. We first conduct basic rationality checks in the context of simple tasks, and then, by randomizing product positions, prices, ratings, reviews, sponsored tags, and platform endorsements, we obtain causal estimates of how frontier VLMs actually shop. Models show strong but heterogeneous position effects: all favor the top row, yet different models prefer different columns, undermining the assumption of a universal "top" rank. They penalize sponsored tags and reward endorsements. Sensitivities to price, ratings, and reviews are directionally human-like but vary sharply in magnitude across models. Motivated by scenarios where sellers use AI agents to optimize product listings, we show that a seller-side agent that makes minor tweaks to product descriptions, targeting AI buyer preferences, can deliver substantial market-share gains if AI-mediated shopping dominates. We also find that modal product choices can differ across models and, in some cases, demand may concentrate on a few select products, raising competition questions. Together, our results illuminate how AI agents may behave in e-commerce settings and surface concrete seller strategy, platform design, and regulatory questions in an AI-mediated ecosystem.

  • 5 authors
·
Aug 4, 2025 2

Reinforcement Learning Framework for Quantitative Trading

The inherent volatility and dynamic fluctuations within the financial stock market underscore the necessity for investors to employ a comprehensive and reliable approach that integrates risk management strategies, market trends, and the movement trends of individual securities. By evaluating specific data, investors can make more informed decisions. However, the current body of literature lacks substantial evidence supporting the practical efficacy of reinforcement learning (RL) agents, as many models have only demonstrated success in back testing using historical data. This highlights the urgent need for a more advanced methodology capable of addressing these challenges. There is a significant disconnect in the effective utilization of financial indicators to better understand the potential market trends of individual securities. The disclosure of successful trading strategies is often restricted within financial markets, resulting in a scarcity of widely documented and published strategies leveraging RL. Furthermore, current research frequently overlooks the identification of financial indicators correlated with various market trends and their potential advantages. This research endeavors to address these complexities by enhancing the ability of RL agents to effectively differentiate between positive and negative buy/sell actions using financial indicators. While we do not address all concerns, this paper provides deeper insights and commentary on the utilization of technical indicators and their benefits within reinforcement learning. This work establishes a foundational framework for further exploration and investigation of more complex scenarios.

  • 2 authors
·
Nov 12, 2024

Universal features of price formation in financial markets: perspectives from Deep Learning

Using a large-scale Deep Learning approach applied to a high-frequency database containing billions of electronic market quotes and transactions for US equities, we uncover nonparametric evidence for the existence of a universal and stationary price formation mechanism relating the dynamics of supply and demand for a stock, as revealed through the order book, to subsequent variations in its market price. We assess the model by testing its out-of-sample predictions for the direction of price moves given the history of price and order flow, across a wide range of stocks and time periods. The universal price formation model is shown to exhibit a remarkably stable out-of-sample prediction accuracy across time, for a wide range of stocks from different sectors. Interestingly, these results also hold for stocks which are not part of the training sample, showing that the relations captured by the model are universal and not asset-specific. The universal model --- trained on data from all stocks --- outperforms, in terms of out-of-sample prediction accuracy, asset-specific linear and nonlinear models trained on time series of any given stock, showing that the universal nature of price formation weighs in favour of pooling together financial data from various stocks, rather than designing asset- or sector-specific models as commonly done. Standard data normalizations based on volatility, price level or average spread, or partitioning the training data into sectors or categories such as large/small tick stocks, do not improve training results. On the other hand, inclusion of price and order flow history over many past observations is shown to improve forecasting performance, showing evidence of path-dependence in price dynamics.

  • 2 authors
·
Mar 19, 2018

Dynamical Linear Bandits

In many real-world sequential decision-making problems, an action does not immediately reflect on the feedback and spreads its effects over a long time frame. For instance, in online advertising, investing in a platform produces an instantaneous increase of awareness, but the actual reward, i.e., a conversion, might occur far in the future. Furthermore, whether a conversion takes place depends on: how fast the awareness grows, its vanishing effects, and the synergy or interference with other advertising platforms. Previous work has investigated the Multi-Armed Bandit framework with the possibility of delayed and aggregated feedback, without a particular structure on how an action propagates in the future, disregarding possible dynamical effects. In this paper, we introduce a novel setting, the Dynamical Linear Bandits (DLB), an extension of the linear bandits characterized by a hidden state. When an action is performed, the learner observes a noisy reward whose mean is a linear function of the hidden state and of the action. Then, the hidden state evolves according to linear dynamics, affected by the performed action too. We start by introducing the setting, discussing the notion of optimal policy, and deriving an expected regret lower bound. Then, we provide an optimistic regret minimization algorithm, Dynamical Linear Upper Confidence Bound (DynLin-UCB), that suffers an expected regret of order mathcal{O} Big( d sqrt{T}{(1-rho)^{3/2}} Big), where rho is a measure of the stability of the system, and d is the dimension of the action vector. Finally, we conduct a numerical validation on a synthetic environment and on real-world data to show the effectiveness of DynLin-UCB in comparison with several baselines.

  • 3 authors
·
Nov 16, 2022

Financial Risk Assessment via Long-term Payment Behavior Sequence Folding

Online inclusive financial services encounter significant financial risks due to their expansive user base and low default costs. By real-world practice, we reveal that utilizing longer-term user payment behaviors can enhance models' ability to forecast financial risks. However, learning long behavior sequences is non-trivial for deep sequential models. Additionally, the diverse fields of payment behaviors carry rich information, requiring thorough exploitation. These factors collectively complicate the task of long-term user behavior modeling. To tackle these challenges, we propose a Long-term Payment Behavior Sequence Folding method, referred to as LBSF. In LBSF, payment behavior sequences are folded based on merchants, using the merchant field as an intrinsic grouping criterion, which enables informative parallelism without reliance on external knowledge. Meanwhile, we maximize the utility of payment details through a multi-field behavior encoding mechanism. Subsequently, behavior aggregation at the merchant level followed by relational learning across merchants facilitates comprehensive user financial representation. We evaluate LBSF on the financial risk assessment task using a large-scale real-world dataset. The results demonstrate that folding long behavior sequences based on internal behavioral cues effectively models long-term patterns and changes, thereby generating more accurate user financial profiles for practical applications.

  • 7 authors
·
Nov 22, 2024

The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems

Online allocation problems with resource constraints are central problems in revenue management and online advertising. In these problems, requests arrive sequentially during a finite horizon and, for each request, a decision maker needs to choose an action that consumes a certain amount of resources and generates reward. The objective is to maximize cumulative rewards subject to a constraint on the total consumption of resources. In this paper, we consider a data-driven setting in which the reward and resource consumption of each request are generated using an input model that is unknown to the decision maker. We design a general class of algorithms that attain good performance in various input models without knowing which type of input they are facing. In particular, our algorithms are asymptotically optimal under independent and identically distributed inputs as well as various non-stationary stochastic input models, and they attain an asymptotically optimal fixed competitive ratio when the input is adversarial. Our algorithms operate in the Lagrangian dual space: they maintain a dual multiplier for each resource that is updated using online mirror descent. By choosing the reference function accordingly, we recover the dual sub-gradient descent and dual multiplicative weights update algorithm. The resulting algorithms are simple, fast, and do not require convexity in the revenue function, consumption function and action space, in contrast to existing methods for online allocation problems. We discuss applications to network revenue management, online bidding in repeated auctions with budget constraints, online proportional matching with high entropy, and personalized assortment optimization with limited inventory.

  • 3 authors
·
Nov 4, 2021

Harnessing Deep Q-Learning for Enhanced Statistical Arbitrage in High-Frequency Trading: A Comprehensive Exploration

The realm of High-Frequency Trading (HFT) is characterized by rapid decision-making processes that capitalize on fleeting market inefficiencies. As the financial markets become increasingly competitive, there is a pressing need for innovative strategies that can adapt and evolve with changing market dynamics. Enter Reinforcement Learning (RL), a branch of machine learning where agents learn by interacting with their environment, making it an intriguing candidate for HFT applications. This paper dives deep into the integration of RL in statistical arbitrage strategies tailored for HFT scenarios. By leveraging the adaptive learning capabilities of RL, we explore its potential to unearth patterns and devise trading strategies that traditional methods might overlook. We delve into the intricate exploration-exploitation trade-offs inherent in RL and how they manifest in the volatile world of HFT. Furthermore, we confront the challenges of applying RL in non-stationary environments, typical of financial markets, and investigate methodologies to mitigate associated risks. Through extensive simulations and backtests, our research reveals that RL not only enhances the adaptability of trading strategies but also shows promise in improving profitability metrics and risk-adjusted returns. This paper, therefore, positions RL as a pivotal tool for the next generation of HFT-based statistical arbitrage, offering insights for both researchers and practitioners in the field.

  • 1 authors
·
Sep 13, 2023

Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets

Polymarket is a prediction market platform where users can speculate on future events by trading shares tied to specific outcomes, known as conditions. Each market is associated with a set of one or more such conditions. To ensure proper market resolution, the condition set must be exhaustive -- collectively accounting for all possible outcomes -- and mutually exclusive -- only one condition may resolve as true. Thus, the collective prices of all related outcomes should be \1, representing a combined probability of 1 of any outcome. Despite this design, Polymarket exhibits cases where dependent assets are mispriced, allowing for purchasing (or selling) a certain outcome for less than (or more than) 1, guaranteeing profit. This phenomenon, known as arbitrage, could enable sophisticated participants to exploit such inconsistencies. In this paper, we conduct an empirical arbitrage analysis on Polymarket data to answer three key questions: (Q1) What conditions give rise to arbitrage (Q2) Does arbitrage actually occur on Polymarket and (Q3) Has anyone exploited these opportunities. A major challenge in analyzing arbitrage between related markets lies in the scalability of comparisons across a large number of markets and conditions, with a naive analysis requiring O(2^{n+m}) comparisons. To overcome this, we employ a heuristic-driven reduction strategy based on timeliness, topical similarity, and combinatorial relationships, further validated by expert input. Our study reveals two distinct forms of arbitrage on Polymarket: Market Rebalancing Arbitrage, which occurs within a single market or condition, and Combinatorial Arbitrage, which spans across multiple markets. We use on-chain historical order book data to analyze when these types of arbitrage opportunities have existed, and when they have been executed by users. We find a realized estimate of 40 million USD of profit extracted.

  • 4 authors
·
Aug 4, 2025

TRADES: Generating Realistic Market Simulations with Diffusion Models

Financial markets are complex systems characterized by high statistical noise, nonlinearity, and constant evolution. Thus, modeling them is extremely hard. We address the task of generating realistic and responsive Limit Order Book (LOB) market simulations, which are fundamental for calibrating and testing trading strategies, performing market impact experiments, and generating synthetic market data. Previous works lack realism, usefulness, and responsiveness of the generated simulations. To bridge this gap, we propose a novel TRAnsformer-based Denoising Diffusion Probabilistic Engine for LOB Simulations (TRADES). TRADES generates realistic order flows conditioned on the state of the market, leveraging a transformer-based architecture that captures the temporal and spatial characteristics of high-frequency market data. There is a notable absence of quantitative metrics for evaluating generative market simulation models in the literature. To tackle this problem, we adapt the predictive score, a metric measured as an MAE, by training a stock price predictive model on synthetic data and testing it on real data. We compare TRADES with previous works on two stocks, reporting an x3.27 and x3.47 improvement over SoTA according to the predictive score, demonstrating that we generate useful synthetic market data for financial downstream tasks. We assess TRADES's market simulation realism and responsiveness, showing that it effectively learns the conditional data distribution and successfully reacts to an experimental agent, giving sprout to possible calibrations and evaluations of trading strategies and market impact experiments. We developed DeepMarket, the first open-source Python framework for market simulation with deep learning. Our repository includes a synthetic LOB dataset composed of TRADES's generates simulations. We release the code at github.com/LeonardoBerti00/DeepMarket.

  • 3 authors
·
Jan 31, 2025

Brewing Discontent: How U.S. Reciprocal Tariffs on Coffee Could Echo the Boston Tea Party

This research employs quantitative techniques interpreted through relevant economic theories to analyze a proposed U.S. "Discounted Reciprocal Tariff" structure. Statistical modeling (linear regression) quantifies the policy's consistent 'discounted reciprocity' pattern, which is interpreted using a Game Theory perspective on strategic interaction. Machine learning (K-Means clustering) identifies distinct country typologies based on tariff exposure and Economic Complexity Index (ECI), linking the policy to Economic Complexity theory. The study's primary application focuses on the major coffee exporting sector, utilizing simulation modeling grounded in principles of demand elasticity and substitution to project potential trade flow impacts. Specifically, for coffee, this simulation demonstrates how the proposed tariff differentials can induce significant substitution effects, projecting a potential shift in U.S. import demand away from high-tariff origins toward lower-tariff competitors. This disruption, stemming from the tariffs impacting exporting countries, is projected to ultimately increase coffee prices for consumers in the United States. Findings throughout are contextualized within Political Economy considerations. Overall, the study demonstrates how integrating regression, clustering, and simulation with economic theory exemplified through the coffee sector analysis provides a robust framework for assessing the potential systemic impacts, including consumer price effects, of strategic trade policies.

  • 1 authors
·
Apr 2, 2025

Empirical Study of Market Impact Conditional on Order-Flow Imbalance

In this research, we have empirically investigated the key drivers affecting liquidity in equity markets. We illustrated how theoretical models, such as Kyle's model, of agents' interplay in the financial markets, are aligned with the phenomena observed in publicly available trades and quotes data. Specifically, we confirmed that for small signed order-flows, the price impact grows linearly with increase in the order-flow imbalance. We have, further, implemented a machine learning algorithm to forecast market impact given a signed order-flow. Our findings suggest that machine learning models can be used in estimation of financial variables; and predictive accuracy of such learning algorithms can surpass the performance of traditional statistical approaches. Understanding the determinants of price impact is crucial for several reasons. From a theoretical stance, modelling the impact provides a statistical measure of liquidity. Practitioners adopt impact models as a pre-trade tool to estimate expected transaction costs and optimize the execution of their strategies. This further serves as a post-trade valuation benchmark as suboptimal execution can significantly deteriorate a portfolio performance. More broadly, the price impact reflects the balance of liquidity across markets. This is of central importance to regulators as it provides an all-encompassing explanation of the correlation between market design and systemic risk, enabling regulators to design more stable and efficient markets.

  • 1 authors
·
Apr 17, 2020

Position Auctions in AI-Generated Content

We consider an extension to the classic position auctions in which sponsored creatives can be added within AI generated content rather than shown in predefined slots. New challenges arise from the natural requirement that sponsored creatives should smoothly fit into the context. With the help of advanced LLM technologies, it becomes viable to accurately estimate the benefits of adding each individual sponsored creatives into each potential positions within the AI generated content by properly taking the context into account. Therefore, we assume one click-through rate estimation for each position-creative pair, rather than one uniform estimation for each sponsored creative across all positions in classic settings. As a result, the underlying optimization becomes a general matching problem, thus the substitution effects should be treated more carefully compared to standard position auction settings, where the slots are independent with each other. In this work, we formalize a concrete mathematical model of the extended position auction problem and study the welfare-maximization and revenue-maximization mechanism design problem. Formally, we consider two different user behavior models and solve the mechanism design problems therein respectively. For the Multinomial Logit (MNL) model, which is order-insensitive, we can efficiently implement the optimal mechanisms. For the cascade model, which is order-sensitive, we provide approximately optimal solutions.

  • 10 authors
·
Jun 3, 2025

Entire Chain Uplift Modeling with Context-Enhanced Learning for Intelligent Marketing

Uplift modeling, vital in online marketing, seeks to accurately measure the impact of various strategies, such as coupons or discounts, on different users by predicting the Individual Treatment Effect (ITE). In an e-commerce setting, user behavior follows a defined sequential chain, including impression, click, and conversion. Marketing strategies exert varied uplift effects at each stage within this chain, impacting metrics like click-through and conversion rate. Despite its utility, existing research has neglected to consider the inter-task across all stages impacts within a specific treatment and has insufficiently utilized the treatment information, potentially introducing substantial bias into subsequent marketing decisions. We identify these two issues as the chain-bias problem and the treatment-unadaptive problem. This paper introduces the Entire Chain UPlift method with context-enhanced learning (ECUP), devised to tackle these issues. ECUP consists of two primary components: 1) the Entire Chain-Enhanced Network, which utilizes user behavior patterns to estimate ITE throughout the entire chain space, models the various impacts of treatments on each task, and integrates task prior information to enhance context awareness across all stages, capturing the impact of treatment on different tasks, and 2) the Treatment-Enhanced Network, which facilitates fine-grained treatment modeling through bit-level feature interactions, thereby enabling adaptive feature adjustment. Extensive experiments on public and industrial datasets validate ECUPs effectiveness. Moreover, ECUP has been deployed on the Meituan food delivery platform, serving millions of daily active users, with the related dataset released for future research.

  • 9 authors
·
Feb 3, 2024

Dynamic Factor Analysis of Price Movements in the Philippine Stock Exchange

The intricate dynamics of stock markets have led to extensive research on models that are able to effectively explain their inherent complexities. This study leverages the econometrics literature to explore the dynamic factor model as an interpretable model with sufficient predictive capabilities for capturing essential market phenomena. Although the model has been extensively applied for predictive purposes, this study focuses on analyzing the extracted loadings and common factors as an alternative framework for understanding stock price dynamics. The results reveal novel insights into traditional market theories when applied to the Philippine Stock Exchange using the Kalman method and maximum likelihood estimation, with subsequent validation against the capital asset pricing model. Notably, a one-factor model extracts a common factor representing systematic or market dynamics similar to the composite index, whereas a two-factor model extracts common factors representing market trends and volatility. Furthermore, an application of the model for nowcasting the growth rates of the Philippine gross domestic product highlights the potential of the extracted common factors as viable real-time market indicators, yielding over a 34% decrease in the out-of-sample prediction error. Overall, the results underscore the value of dynamic factor analysis in gaining a deeper understanding of market price movement dynamics.

  • 6 authors
·
Oct 8, 2025

Algorithmic Content Selection and the Impact of User Disengagement

Digital services face a fundamental trade-off in content selection: they must balance the immediate revenue gained from high-reward content against the long-term benefits of maintaining user engagement. Traditional multi-armed bandit models assume that users remain perpetually engaged, failing to capture the possibility that users may disengage when dissatisfied, thereby reducing future revenue potential. In this work, we introduce a model for the content selection problem that explicitly accounts for variable user engagement and disengagement. In our framework, content that maximizes immediate reward is not necessarily optimal in terms of fostering sustained user engagement. Our contributions are twofold. First, we develop computational and statistical methods for offline optimization and online learning of content selection policies. For users whose engagement patterns are defined by k distinct levels, we design a dynamic programming algorithm that computes the exact optimal policy in O(k^2) time. Moreover, we derive no-regret learning guarantees for an online learning setting in which the platform serves a series of users with unknown and potentially adversarial engagement patterns. Second, we introduce the concept of modified demand elasticity which captures how small changes in a user's overall satisfaction affect the platform's ability to secure long-term revenue. This notion generalizes classical demand elasticity by incorporating the dynamics of user re-engagement, thereby revealing key insights into the interplay between engagement and revenue. Notably, our analysis uncovers a counterintuitive phenomenon: although higher friction (i.e., a reduced likelihood of re-engagement) typically lowers overall revenue, it can simultaneously lead to higher user engagement under optimal content selection policies.

  • 4 authors
·
Feb 18, 2025

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

Large Language Models (LLMs) show significant potential in economic and strategic interactions, where communication via natural language is often prevalent. This raises key questions: Do LLMs behave rationally? Can they mimic human behavior? Do they tend to reach an efficient and fair outcome? What is the role of natural language in the strategic interaction? How do characteristics of the economic environment influence these dynamics? These questions become crucial concerning the economic and societal implications of integrating LLM-based agents into real-world data-driven systems, such as online retail platforms and recommender systems. While the ML community has been exploring the potential of LLMs in such multi-agent setups, varying assumptions, design choices and evaluation criteria across studies make it difficult to draw robust and meaningful conclusions. To address this, we introduce a benchmark for standardizing research on two-player, sequential, language-based games. Inspired by the economic literature, we define three base families of games with consistent parameterization, degrees of freedom and economic measures to evaluate agents' performance (self-gain), as well as the game outcome (efficiency and fairness). We develop an open-source framework for interaction simulation and analysis, and utilize it to collect a dataset of LLM vs. LLM interactions across numerous game configurations and an additional dataset of human vs. LLM interactions. Through extensive experimentation, we demonstrate how our framework and dataset can be used to: (i) compare the behavior of LLM-based agents to human players in various economic contexts; (ii) evaluate agents in both individual and collective performance measures; and (iii) quantify the effect of the economic characteristics of the environments on the behavior of agents.

  • 6 authors
·
Oct 7, 2024 2

An Introduction to Artificial Prediction Markets for Classification

Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants' budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI datasets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost's detection rate from 79.6% to 81.2% at 3 false positives/volume.

  • 2 authors
·
Jul 8, 2012

AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation

Market manipulation now routinely originates from coordinated social media campaigns, not isolated trades. Retail investors, regulators, and brokerages need tools that connect online narratives and coordination patterns to market behavior. We present AIMM, an AI-driven framework that fuses Reddit activity, bot and coordination indicators, and OHLCV market features into a daily AIMM Manipulation Risk Score for each ticker. The system uses a parquet-native pipeline with a Streamlit dashboard that allows analysts to explore suspicious windows, inspect underlying posts and price action, and log model outputs over time. Due to Reddit API restrictions, we employ calibrated synthetic social features matching documented event characteristics; market data (OHLCV) uses real historical data from Yahoo Finance. This release makes three contributions. First, we build the AIMM Ground Truth dataset (AIMM-GT): 33 labeled ticker-days spanning eight equities, drawing from SEC enforcement actions, community-verified manipulation cases, and matched normal controls. Second, we implement forward-walk evaluation and prospective prediction logging for both retrospective and deployment-style assessment. Third, we analyze lead times and show that AIMM flagged GME 22 days before the January 2021 squeeze peak. The current labeled set is small (33 ticker-days, 3 positive events), but results show preliminary discriminative capability and early warnings for the GME incident. We release the code, dataset schema, and dashboard design to support research on social media-driven market surveillance.

  • 1 authors
·
Dec 17, 2025

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy or short-sighted under market volatility and may conflict with a user's long-term goals. Treating what users chose as the sole ground truth, therefore, conflates behavioral imitation with decision quality. We introduce Conv-FinRe, a conversational and longitudinal benchmark for stock recommendation that evaluates LLMs beyond behavior matching. Given an onboarding interview, step-wise market context, and advisory dialogues, models must generate rankings over a fixed investment horizon. Crucially, Conv-FinRe provides multi-view references that distinguish descriptive behavior from normative utility grounded in investor-specific risk preferences, enabling diagnosis of whether an LLM follows rational analysis, mimics user noise, or is driven by market momentum. We build the benchmark from real market data and human decision trajectories, instantiate controlled advisory conversations, and evaluate a suite of state-of-the-art LLMs. Results reveal a persistent tension between rational decision quality and behavioral alignment: models that perform well on utility-based ranking often fail to match user choices, whereas behaviorally aligned models can overfit short-term noise. The dataset is publicly released on Hugging Face, and the codebase is available on GitHub.

TheFinAI The Fin AI
·
Feb 18 2

Introduction to Multi-Armed Bandits

Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a brief review of the further developments; many of the chapters conclude with exercises. The book is structured as follows. The first four chapters are on IID rewards, from the basic model to impossibility results to Bayesian priors to Lipschitz rewards. The next three chapters cover adversarial rewards, from the full-feedback version to adversarial bandits to extensions with linear rewards and combinatorially structured actions. Chapter 8 is on contextual bandits, a middle ground between IID and adversarial bandits in which the change in reward distributions is completely explained by observable contexts. The last three chapters cover connections to economics, from learning in repeated games to bandits with supply/budget constraints to exploration in the presence of incentives. The appendix provides sufficient background on concentration and KL-divergence. The chapters on "bandits with similarity information", "bandits with knapsacks" and "bandits and agents" can also be consumed as standalone surveys on the respective topics.

  • 1 authors
·
Apr 15, 2019

Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse

Machine learning models routinely automate decisions in applications like lending and hiring. In such settings, consumer protection rules require companies that deploy models to explain predictions to decision subjects. These rules are motivated, in part, by the belief that explanations can promote recourse by revealing information that individuals can use to contest or improve their outcomes. In practice, many companies comply with these rules by providing individuals with a list of the most important features for their prediction, which they identify based on feature importance scores from feature attribution methods such as SHAP or LIME. In this work, we show how these practices can undermine consumers by highlighting features that would not lead to an improved outcome and by explaining predictions that cannot be changed. We propose to address these issues by highlighting features based on their responsiveness score -- i.e., the probability that an individual can attain a target prediction by changing a specific feature. We develop efficient methods to compute responsiveness scores for any model and any dataset. We conduct an extensive empirical study on the responsiveness of explanations in lending. Our results show that standard practices in consumer finance can backfire by presenting consumers with reasons without recourse, and demonstrate how our approach improves consumer protection by highlighting responsive features and identifying fixed predictions.

  • 4 authors
·
Oct 29, 2024

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements. This creates the potential for LLMs to face conflicts of interest, where the most beneficial response to a user may not be aligned with the company's incentives. For instance, a sponsored product may be more expensive but otherwise equal to another; in this case, what does (and should) the LLM recommend to the user? In this paper, we provide a framework for categorizing the ways in which conflicting incentives might lead LLMs to change the way they interact with users, inspired by literature from linguistics and advertising regulation. We then present a suite of evaluations to examine how current models handle these tradeoffs. We find that a majority of LLMs forsake user welfare for company incentives in a multitude of conflict of interest situations, including recommending a sponsored product almost twice as expensive (Grok 4.1 Fast, 83%), surfacing sponsored options to disrupt the purchasing process (GPT 5.1, 94%), and concealing prices in unfavorable comparisons (Qwen 3 Next, 24%). Behaviors also vary strongly with levels of reasoning and users' inferred socio-economic status. Our results highlight some of the hidden risks to users that can emerge when companies begin to subtly incentivize advertisements in chatbots.

  • 5 authors
·
Apr 8

Real-Time Bidding by Reinforcement Learning in Display Advertising

The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time. Most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume. However, the bidding for a given ad campaign would repeatedly happen during its life span before the budget runs out. As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign (e.g., the rewards from generated clicks), which is only observed after the campaign has completed. Thus, it is of great interest to devise an optimal bidding strategy sequentially so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. In this paper, we formulate the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign's real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks.

  • 7 authors
·
Jan 10, 2017

Deep Neural Net with Attention for Multi-channel Multi-touch Attribution

Customers are usually exposed to online digital advertisement channels, such as email marketing, display advertising, paid search engine marketing, along their way to purchase or subscribe products( aka. conversion). The marketers track all the customer journey data and try to measure the effectiveness of each advertising channel. The inference about the influence of each channel plays an important role in budget allocation and inventory pricing decisions. Several simplistic rule-based strategies and data-driven algorithmic strategies have been widely used in marketing field, but they do not address the issues, such as channel interaction, time dependency, user characteristics. In this paper, we propose a novel attribution algorithm based on deep learning to assess the impact of each advertising channel. We present Deep Neural Net With Attention multi-touch attribution model (DNAMTA) model in a supervised learning fashion of predicting if a series of events leads to conversion, and it leads us to have a deep understanding of the dynamic interaction effects between media channels. DNAMTA also incorporates user-context information, such as user demographics and behavior, as control variables to reduce the estimation biases of media effects. We used computational experiment of large real world marketing dataset to demonstrate that our proposed model is superior to existing methods in both conversion prediction and media channel influence evaluation.

  • 5 authors
·
Sep 6, 2018

Computational Foundations for Strategic Coopetition: Formalizing Collective Action and Loyalty

Mixed-motive multi-agent settings are rife with persistent free-riding because individual effort benefits all members equally, yet each member bears the full cost of their own contribution. Classical work by Holmström established that under pure self-interest, Nash equilibrium is universal shirking. While i* represents teams as composite actors, it lacks scalable computational mechanisms for analyzing how collective action problems emerge and resolve in coopetitive settings. This technical report extends computational foundations for strategic coopetition to team-level dynamics, building on companion work formalizing interdependence/complementarity (arXiv:2510.18802) and trust dynamics (arXiv:2510.24909). We develop loyalty-moderated utility functions with two mechanisms: loyalty benefit (welfare internalization plus intrinsic contribution satisfaction) and cost tolerance (reduced effort burden for loyal members). We integrate i* structural dependencies through dependency-weighted team cohesion, connecting member incentives to team-level positioning. The framework applies to both human teams (loyalty as psychological identification) and multi-agent systems (alignment coefficients and adjusted cost functions). Experimental validation across 3,125 configurations demonstrates robust loyalty effects (15.04x median effort differentiation). All six behavioral targets achieve thresholds: free-riding baseline (96.5%), loyalty monotonicity (100%), effort differentiation (100%), team size effect (100%), mechanism synergy (99.5%), and bounded outcomes (100%). Empirical validation using published Apache HTTP Server (1995-2023) case study achieves 60/60 points, reproducing contribution patterns across formation, growth, maturation, and governance phases. Statistical significance confirmed at p<0.001, Cohen's d=0.71.

  • 2 authors
·
Jan 20

MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading

High-frequency trading (HFT) that executes algorithmic trading in short time scales, has recently occupied the majority of cryptocurrency market. Besides traditional quantitative trading methods, reinforcement learning (RL) has become another appealing approach for HFT due to its terrific ability of handling high-dimensional financial data and solving sophisticated sequential decision-making problems, e.g., hierarchical reinforcement learning (HRL) has shown its promising performance on second-level HFT by training a router to select only one sub-agent from the agent pool to execute the current transaction. However, existing RL methods for HFT still have some defects: 1) standard RL-based trading agents suffer from the overfitting issue, preventing them from making effective policy adjustments based on financial context; 2) due to the rapid changes in market conditions, investment decisions made by an individual agent are usually one-sided and highly biased, which might lead to significant loss in extreme markets. To tackle these problems, we propose a novel Memory Augmented Context-aware Reinforcement learning method On HFT, a.k.a. MacroHFT, which consists of two training phases: 1) we first train multiple types of sub-agents with the market data decomposed according to various financial indicators, specifically market trend and volatility, where each agent owns a conditional adapter to adjust its trading policy according to market conditions; 2) then we train a hyper-agent to mix the decisions from these sub-agents and output a consistently profitable meta-policy to handle rapid market fluctuations, equipped with a memory mechanism to enhance the capability of decision-making. Extensive experiments on various cryptocurrency markets demonstrate that MacroHFT can achieve state-of-the-art performance on minute-level trading tasks.

  • 6 authors
·
Jun 20, 2024

PreBit -- A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin

Bitcoin, with its ever-growing popularity, has demonstrated extreme price volatility since its origin. This volatility, together with its decentralised nature, make Bitcoin highly subjective to speculative trading as compared to more traditional assets. In this paper, we propose a multimodal model for predicting extreme price fluctuations. This model takes as input a variety of correlated assets, technical indicators, as well as Twitter content. In an in-depth study, we explore whether social media discussions from the general public on Bitcoin have predictive power for extreme price movements. A dataset of 5,000 tweets per day containing the keyword `Bitcoin' was collected from 2015 to 2021. This dataset, called PreBit, is made available online. In our hybrid model, we use sentence-level FinBERT embeddings, pretrained on financial lexicons, so as to capture the full contents of the tweets and feed it to the model in an understandable way. By combining these embeddings with a Convolutional Neural Network, we built a predictive model for significant market movements. The final multimodal ensemble model includes this NLP model together with a model based on candlestick data, technical indicators and correlated asset prices. In an ablation study, we explore the contribution of the individual modalities. Finally, we propose and backtest a trading strategy based on the predictions of our models with varying prediction threshold and show that it can used to build a profitable trading strategy with a reduced risk over a `hold' or moving average strategy.

  • 2 authors
·
May 30, 2022

LiveTradeBench: Seeking Real-World Alpha with Large Language Models

Large language models (LLMs) achieve strong performance across benchmarks--from knowledge quizzes and math reasoning to web-agent tasks--but these tests occur in static settings, lacking real dynamics and uncertainty. Consequently, they evaluate isolated reasoning or problem-solving rather than decision-making under uncertainty. To address this, we introduce LiveTradeBench, a live trading environment for evaluating LLM agents in realistic and evolving markets. LiveTradeBench follows three design principles: (i) Live data streaming of market prices and news, eliminating dependence on offline backtesting and preventing information leakage while capturing real-time uncertainty; (ii) a portfolio-management abstraction that extends control from single-asset actions to multi-asset allocation, integrating risk management and cross-asset reasoning; and (iii) multi-market evaluation across structurally distinct environments--U.S. stocks and Polymarket prediction markets--differing in volatility, liquidity, and information flow. At each step, an agent observes prices, news, and its portfolio, then outputs percentage allocations that balance risk and return. Using LiveTradeBench, we run 50-day live evaluations of 21 LLMs across families. Results show that (1) high LMArena scores do not imply superior trading outcomes; (2) models display distinct portfolio styles reflecting risk appetite and reasoning dynamics; and (3) some LLMs effectively leverage live signals to adapt decisions. These findings expose a gap between static evaluation and real-world competence, motivating benchmarks that test sequential decision making and consistency under live uncertainty.

  • 3 authors
·
Nov 5, 2025 2

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function R) from their behaviour (represented as a policy pi). To do this, we need a behavioural model of how pi relates to R. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship between a human's preferences and their behaviour is much more complex than any of these behavioural models. This means that the behavioural models are misspecified, which raises the concern that they may lead to systematic errors if applied to real data. In this paper, we analyse how sensitive the IRL problem is to misspecification of the behavioural model. Specifically, we provide necessary and sufficient conditions that completely characterise how the observed data may differ from the assumed behavioural model without incurring an error above a given threshold. In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e.g.\ the discount rate). Our analysis suggests that the IRL problem is highly sensitive to misspecification, in the sense that very mild misspecification can lead to very large errors in the inferred reward function.

  • 2 authors
·
Mar 11, 2024

AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets

Large Language Models (LLMs) have demonstrated remarkable potential as autonomous agents, approaching human-expert performance through advanced reasoning and tool orchestration. However, decision-making in fully dynamic and live environments remains highly challenging, requiring real-time information integration and adaptive responses. While existing efforts have explored live evaluation mechanisms in structured tasks, a critical gap remains in systematic benchmarking for real-world applications, particularly in finance where stringent requirements exist for live strategic responsiveness. To address this gap, we introduce AI-Trader, the first fully-automated, live, and data-uncontaminated evaluation benchmark for LLM agents in financial decision-making. AI-Trader spans three major financial markets: U.S. stocks, A-shares, and cryptocurrencies, with multiple trading granularities to simulate live financial environments. Our benchmark implements a revolutionary fully autonomous minimal information paradigm where agents receive only essential context and must independently search, verify, and synthesize live market information without human intervention. We evaluate six mainstream LLMs across three markets and multiple trading frequencies. Our analysis reveals striking findings: general intelligence does not automatically translate to effective trading capability, with most agents exhibiting poor returns and weak risk management. We demonstrate that risk control capability determines cross-market robustness, and that AI trading strategies achieve excess returns more readily in highly liquid markets than policy-driven environments. These findings expose critical limitations in current autonomous agents and provide clear directions for future improvements. The code and evaluation data are open-sourced to foster community research: https://github.com/HKUDS/AI-Trader.

  • 6 authors
·
Nov 30, 2025

MM-DREX: Multimodal-Driven Dynamic Routing of LLM Experts for Financial Trading

The inherent non-stationarity of financial markets and the complexity of multi-modal information pose significant challenges to existing quantitative trading models. Traditional methods relying on fixed structures and unimodal data struggle to adapt to market regime shifts, while large language model (LLM)-driven solutions - despite their multi-modal comprehension - suffer from static strategies and homogeneous expert designs, lacking dynamic adjustment and fine-grained decision mechanisms. To address these limitations, we propose MM-DREX: a Multimodal-driven, Dynamically-Routed EXpert framework based on large language models. MM-DREX explicitly decouples market state perception from strategy execution to enable adaptive sequential decision-making in non-stationary environments. Specifically, it (1) introduces a vision-language model (VLM)-powered dynamic router that jointly analyzes candlestick chart patterns and long-term temporal features to allocate real-time expert weights; (2) designs four heterogeneous trading experts (trend, reversal, breakout, positioning) generating specialized fine-grained sub-strategies; and (3) proposes an SFT-RL hybrid training paradigm to synergistically optimize the router's market classification capability and experts' risk-adjusted decision-making. Extensive experiments on multi-modal datasets spanning stocks, futures, and cryptocurrencies demonstrate that MM-DREX significantly outperforms 15 baselines (including state-of-the-art financial LLMs and deep reinforcement learning models) across key metrics: total return, Sharpe ratio, and maximum drawdown, validating its robustness and generalization. Additionally, an interpretability module traces routing logic and expert behavior in real time, providing an audit trail for strategy transparency.

  • 9 authors
·
Sep 5, 2025

Deep Reinforcement Learning for Quantitative Trading

Artificial Intelligence (AI) and Machine Learning (ML) are transforming the domain of Quantitative Trading (QT) through the deployment of advanced algorithms capable of sifting through extensive financial datasets to pinpoint lucrative investment openings. AI-driven models, particularly those employing ML techniques such as deep learning and reinforcement learning, have shown great prowess in predicting market trends and executing trades at a speed and accuracy that far surpass human capabilities. Its capacity to automate critical tasks, such as discerning market conditions and executing trading strategies, has been pivotal. However, persistent challenges exist in current QT methods, especially in effectively handling noisy and high-frequency financial data. Striking a balance between exploration and exploitation poses another challenge for AI-driven trading agents. To surmount these hurdles, our proposed solution, QTNet, introduces an adaptive trading model that autonomously formulates QT strategies through an intelligent trading agent. Incorporating deep reinforcement learning (DRL) with imitative learning methodologies, we bolster the proficiency of our model. To tackle the challenges posed by volatile financial datasets, we conceptualize the QT mechanism within the framework of a Partially Observable Markov Decision Process (POMDP). Moreover, by embedding imitative learning, the model can capitalize on traditional trading tactics, nurturing a balanced synergy between discovery and utilization. For a more realistic simulation, our trading agent undergoes training using minute-frequency data sourced from the live financial market. Experimental findings underscore the model's proficiency in extracting robust market features and its adaptability to diverse market conditions.

  • 5 authors
·
Dec 25, 2023

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.

  • 3 authors
·
Dec 20, 2013

Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms

This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth. Due to the uncertainty of strategies' performances in the future market, which are often based on specific models and statistical assumptions, investors often mitigate risk and enhance robustness by combining multiple strategies, akin to common approaches in collective learning prediction. However, the absence of a distribution-free and consistent preference framework complicates decisions of combination due to the ambiguous objective. To address this gap, we introduce a novel framework for decision-making in combining strategies, irrespective of market conditions, by establishing the investor's preference between decisions and then forming a clear objective. Through this framework, we propose a combinatorial strategy construction, free from statistical assumptions, for any scale of component strategies, even infinite, such that it meets the determined criterion. Finally, we test the proposed strategy along with its accelerated variant and some other multi-strategies. The numerical experiments show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios, in which their cumulative wealths eventually exceed those of the best component strategies while the accelerated strategy significantly improves performance.

  • 1 authors
·
Jun 5, 2024

Deep Learning for Solving and Estimating Dynamic Models in Economics and Finance

This script offers an implementation-oriented introduction to deep learning methods for solving and estimating high-dimensional dynamic stochastic models in economics and finance. Its starting point is the curse of dimensionality: heterogeneous-agent economies, overlapping-generations models with aggregate risk, continuous-time models with occasionally binding constraints, climate-economy models, and macro-finance environments with many assets and frictions generate state and parameter spaces that strain classical tensor-product grid methods. The exposition is organized around four complementary methodologies. Deep Equilibrium Nets embed discrete-time equilibrium conditions into neural-network loss functions. Physics-Informed Neural Networks approximate continuous-time Hamilton--Jacobi--Bellman, Kolmogorov forward, and related partial differential equations. Deep surrogate models provide fast, differentiable approximations to expensive structural models, while Gaussian processes add a probabilistic layer that quantifies approximation uncertainty; together they support estimation, sensitivity analysis, and constrained policy design. Gaussian-process-based dynamic programming, combined with active learning and dimension reduction, extends value-function iteration to very large continuous state spaces. Applications span representative-agent and international real business cycle models, overlapping-generations and heterogeneous-agent economies, continuous-time macro-finance, structural estimation by simulated method of moments, and climate economics under uncertainty. Companion notebooks in TensorFlow and PyTorch invite hands-on experimentation. These notes are a deliberately subjective and inevitably incomplete snapshot of a rapidly evolving field, aimed at equipping PhD students and researchers to engage with this frontier hands-on.

  • 1 authors
·
May 13