Chinmay Maheshwari

Chinmay Maheshwari

Email: chinmay_maheshwari@jhu.edu

Office: S619, Wyman Park Building, Baltimore, MD

About Me

I am an Assistant Professor in the Department of Electrical and Computer Engineering at Johns Hopkins University. I am also a member of the Data Science and AI Institute. I obtained my PhD in Electrical Engineering and Computer Sciences from the University of California, Berkeley in 2025, where I was advised by Prof. Shankar Sastry. I completed my undergraduate studies (B.Tech and M.Tech) at the Indian Institute of Technology (IIT) Bombay in 2019, where I received the Institute Academic Medal. I was selected as an NSF CPS Rising Star in 2025 and received the Lotfi A. Zadeh Prize in 2025.

Research Interests

My research develops novel theoretical and algorithmic frameworks to design and analyze interactions between strategic autonomous agents with one another and humans in domains such as mobility (air and road), multi‑robot systems and energy systems. Grounded in tools from AI, systems theory, optimization and economics, my work aims to build efficient, equitable and safe autonomous technologies for multi‑agent systems. There are three main themes in my research:

  1. Decentralized Learning in Multi‑agent Systems: Developing decentralized learning rules for strategic agents in dynamic, uncertain and resource‑constrained multi‑agent environments, with guarantees on convergence and performance.
  2. Real‑Time Multi‑Agent Decision‑Making: Creating algorithms for provably efficient, safe and robust decision‑making in competitive environments, especially in real time.
  3. Adaptive Incentives and Markets: Designing adaptive incentives and market mechanisms, under limited information about agents’ preferences and learning algorithms, to align strategic agent behavior with societal objectives such as efficiency, equity and safety.

Publications

A complete list of my publications is included below. You may also visit my Google Scholar profile.

Under review

  • EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization Co‑authored with Chinmay Pimpalkhare and Debasish Chatterjee.

    View paper

    Abstract: Min–max optimization arise in many domains such as game theory, adversarial machine learning, robust optimization, control, and signal processing. In convex–concave min-max optimization, gradient-based methods are well understood and enjoy strong guarantees. However, in absence of convexity or concavity, existing approaches study convergence to an approximate saddle point or first-order stationary points, which may be arbitrarily far from global optima. In this work, we present an algorithmic apparatus for computing globally optimal solutions in convex--non-concave and non-convex--concave min-max optimization. For convex--non-concave min-max problems, we employ a reformulation that transforms it into a non-concave--convex max-min optimization problem with suitably defined feasible sets and objective function; the new form can be viewed as a generalization of Sion’s minimax theorem to convex–non-concave min-max optimization. Next, we introduce EXOTIC --- an Exact, Optimistic, Tree-based algorithm for solving the reformulated max–min problem. EXOTIC employs an iterative convex optimization solver to (approximately) solve the inner minimization and a hierarchical tree search for the outer maximization to optimistically select promising regions to search based on the approximate solution returned by convex optimization solver. We establish an upper bound on its optimality gap as a function of the number of calls to the inner solver, the solver’s convergence rate, and additional problem-dependent parameters. Both our algorithmic apparatus along with its accompanying theoretical analysis can also be applied for non-convex--concave min-max optimization. In addition, we propose a class of benchmark convex–non-concave min–max problems along with their analytical global solutions, providing a testbed for evaluating algorithms for min-max optimization. Empirically, EXOTIC outperforms gradient-based methods on this benchmark as well as on existing numerical benchmark problems from the literature. Finally, we demonstrate the utility of EXOTIC by computing security strategies in multi-player games with three or more players--a computationally challenging task that, to our knowledge, no prior method solves exactly.

  • Congestion Pricing for Efficiency and Equity: Theory and Applications to the San Francisco Bay Area Co‑authored with Kshitij Kulkarni, Druv Pai, Rachel Yang, Manxi Wu and Shankar Sastry.

    View paper

    Abstract: Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. We address this concern by proposing a new class of congestion pricing schemes that not only minimize total travel time, but also incorporate an equity objective, reducing disparities in the relative change in travel costs across populations with different incomes, following the implementation of tolls. Our analysis builds on a congestion game model with heterogeneous traveler populations. We present four pricing schemes that account for practical considerations, such as the ability to charge differentiated tolls to various traveler populations and the option to toll all or only a subset of edges in the network. We evaluate our pricing schemes in the calibrated freeway network of the San Francisco Bay Area. We demonstrate that the proposed congestion pricing schemes improve both the total travel time and the equity objective compared to the current pricing scheme. Our results further show that pricing schemes charging differentiated prices to traveler populations with varying value-of-time lead to a more equitable distribution of travel costs compared to those that charge a homogeneous price to all.

Journal publications

  • Adaptive Incentive Design with Learning Agents IEEE Transactions on Automatic Control (conditionally accepted), 2025.
    Co‑authored with Kshitij Kulkarni, Manxi Wu and Shankar Sastry.

    View paper

    Abstract: We propose an adaptive incentive mechanism that learns the optimal incentives in environments where players continuously update their strategies. Our mechanism updates incentives based on each player's externality, defined as the difference between the player's marginal cost and the operator's marginal cost at each time step. The proposed mechanism updates the incentives on a slower timescale compared to the players' learning dynamics, resulting in a two-timescale coupled dynamical system. Notably, this mechanism is agnostic to the specific learning dynamics used by players to update their strategies. We show that any fixed point of this adaptive incentive mechanism corresponds to the optimal incentive mechanism, ensuring that the Nash equilibrium coincides with the socially optimal strategy. Additionally, we provide sufficient conditions under which the adaptive mechanism converges to a fixed point. Our results apply to both atomic and non-atomic games. To demonstrate the effectiveness of our proposed mechanism, we verify the convergence conditions in two practically relevant classes of games: atomic aggregative games and non-atomic routing games.

  • Markov α‑Potential Games IEEE Transactions on Automatic Control (to appear), 2026.
    Co‑authored with Xin Guo, Xinyu Li, Manxi Wu and Shankar Sastry.

    View paper

    Abstract: We propose a new framework of Markov α-potential games to study Markov games. We show that any Markov game with finite-state and finite-action is a Markov α-potential game, and establish the existence of an associated α-potential function. Any optimizer of an α-potential function is shown to be an α-stationary Nash equilibrium. We study two important classes of practically significant Markov games, Markov congestion games and the perturbed Markov team games, via the framework of Markov α-potential games, with explicit characterization of an upper bound for α and its relation to game parameters. Additionally, we provide a semi-infinite linear programming based formulation to obtain an upper bound for α for any Markov game. Furthermore, we study two equilibrium approximation algorithms, namely the projected gradient-ascent algorithm and the sequential maximum improvement algorithm, along with their Nash regret analysis, and corroborate the results with numerical experiments.

  • Independent and Decentralized Learning in Markov Potential Games To appear in IEEE Transactions on Automatic Control, 2025.
    Co‑authored with Manxi Wu, Druv Pai and Shankar Sastry.

    View paper

    Abstract: We study a multi-agent reinforcement learning dynamics, and analyze its asymptotic behavior in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized setting, where players do not know the game parameters, and cannot communicate or coordinate. In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward in an asynchronous manner. Then, players independently update their policies by incorporating an optimal one-stage deviation strategy based on the estimated Q-function. Inspired by the actor-critic algorithm in single-agent reinforcement learning, a key feature of our learning dynamics is that agents update their Q-function estimates at a faster timescale than the policies. Leveraging tools from two-timescale asynchronous stochastic approximation theory, we characterize the convergent set of learning dynamics

  • Privacy Preserving Mechanisms for Coordinating Airspace Usage in Advanced Air Mobility ACM Journal on Autonomous Transportation Systems, 2025.
    Co‑authored with Maria Mendoza, Victoria Tuck, Pan‑Yang Su, Victor Qin, Sanjit Seshia, Hamsa Balakrishnan and Shankar Sastry.

    View paper

    Abstract: Advanced Air Mobility (AAM) operations are expected to transform air transportation while challenging current air traffic management practices. By introducing a novel market-based mechanism, we address the problem of on-demand allocation of capacity-constrained airspace to AAM vehicles with heterogeneous and private valuations. We model airspace and air infrastructure as a collection of contiguous regions (or sectors) with constraints on the number of vehicles that simultaneously enter, stay, or exit each region. Vehicles request access to airspace with trajectories spanning multiple regions at different times. We use the graph structure of our airspace model to formulate the allocation problem as a path allocation problem on a time-extended graph. To ensure that the cost information of AAM vehicles remains private, we introduce a novel mechanism that allocates each vehicle a budget of “air-credits” (an artificial currency) and anonymously charges prices for traversing the edges of the time-extended graph. We seek to compute a competitive equilibrium that ensures that: (i) capacity constraints are satisfied, (ii) a strictly positive resource price implies that the sector capacity is fully utilized, and (iii) the allocation is integral and optimal for each AAM vehicle given current prices, without requiring access to individual vehicle utilities. However, a competitive equilibrium with integral allocations may not always exist. We provide sufficient conditions for the existence and computation of a fractional-competitive equilibrium, where allocations can be fractional. Building on these theoretical insights, we propose a distributed, iterative, two-step algorithm that: (1) computes a fractional competitive equilibrium, and (2) derives an integral allocation from this equilibrium. We validate the effectiveness of our approach in allocating trajectories for the emerging urban air mobility service of drone delivery.

  • Convergence of Decentralized Actor–Critic Algorithm in General‑Sum Markov Games IEEE Control Systems Letters and American Control Conference, 2025.
    Co‑authored with Manxi Wu and Shankar Sastry.

    View paper

    Abstract: Markov games provide a powerful framework for modeling strategic multi-agent interactions in dynamic environments. Traditionally, convergence properties of decentralized learning algorithms in these settings have been established only for special cases, such as Markov zero-sum and potential games, which do not fully capture real-world interactions. In this letter, we address this gap by studying the asymptotic properties of learning algorithms in general-sum Markov games. In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic with asynchronous step sizes. This decentralized approach enables agents to operate independently, without requiring knowledge of others’ strategies or payoffs. We introduce the concept of a Markov Near-Potential Function (MNPF) and demonstrate that it serves as an approximate Lyapunov function for the policy updates in the decentralized learning dynamics, which allows us to characterize the convergent set of strategies. We further strengthen our result under specific regularity conditions and with finite Nash equilibria.

  • Stabilization under round-robin scheduling of control inputs in nonlinear systems Automatica, 2021.
    Co‑authored with Sukumar Srikant and Debasish Chatterjee.

    View paper

    Abstract: We study stability of multivariable control-affine nonlinear systems under sparsification of feedback controllers. Given a stabilizing feedback, sparsification in our context refers to the scheduling of the individual control inputs one at a time in rapid periodic sweeps over the set of control inputs, which corresponds to round-robin scheduling. We prove that if a locally asymptotically stabilizing feedback controller is sparsified via the round-robin scheme and each control action is scaled appropriately, then the corresponding equilibrium of the resulting system is stabilized when the scheduling is sufficiently fast; under mild additional conditions, local asymptotic stabilization of the corresponding equilibrium can also be guaranteed. Moreover, the basin of attraction for the equilibrium of scheduled system also remains the same as the original system under sufficiently fast switching Our technical tools are derived from optimal control theory, and our results also contribute to the literature on the stability of switched systems in the fast switching regime. Illustrative numerical examples depicting several subtle features of our results are included.

  • Optimal Multiplexing of Discrete-Time Constrained Control Systems on Matrix Lie Groups IEEE Transactions on Automatic Control, 2020.
    Co‑authored with Sukumar Srikant and Debasish Chatterjee.

    View paper

    Abstract: In this article, we study a constrained optimal control problem for an ensemble of control systems in a centralized setting. Each system evolves on a matrix Lie group, and must satisfy given state and control action constraints pointwise in time. In addition, the controller must be shared between the plants in the sense that at any time instant the control signal may be sent to only one plant while minimizing a given objective function. We provide first-order necessary conditions for optimality in the form of a Pontryagin maximum principle for such optimal control problems.

  • Effect of Progressive Tool Wear on the Evolution of the Dynamic Stability Limits in High-Speed Micromilling of Ti-6Al-4V J. Manuf. Sci. Eng., 2019.
    Co‑authored with Rinku Mittal, Salil Kulkarni and Ramesh Singh.

    View paper

    Abstract: Micromilling can fabricate complex features in a wide range of engineering materials with an excellent finish but the limited flexural stiffness of the micro-end mill can result in catastrophic tool failure. This issue can be overcome by using high rotational speeds. Note that the combination of high rotational speeds and low flexural stiffness can induce process instability which is aggravated by the accelerated wear of the micro-tools at high speeds, specifically, for Ti-alloys. The effect of progressive tool wear on the stability has been investigated in micromilling of Ti-6Al-4V. For incorporating tool wear, the cutting force coefficients are modeled as a function of initial and instantaneous cutting edge radius (CER) and feed per tooth. The initial CER of the micro-tool is considered due to the inherent variability in the tool grinding process. A significant increase (85–114%) in the instantaneous CER is observed with an increase in the length of cut. A 2DOF time-domain model based on semi-discretization method has been used to characterize the evolution of stability limits with an increase in the length of cut. The progressive tool wear affects the stability limits along with the initial CER and the feed per tooth. At higher speeds (90,000–110,000 rpm), the effect of progressive tool wear is pronounced and the stability limits reduce by ∼30% in that range.

  • Machine Learning Enabled Computational Screening of Inorganic Solid Electrolytes for Suppression of Dendrite Formation in Lithium Metal Anodes ACS Central Science, 2018.
    Co‑authored with Zeeshan Ahmad, Tian Xie, Jefferey Grossman and Venkat Viswanathan.

    View paper

    Abstract: Next generation batteries based on lithium (Li) metal anodes have been plagued by the dendritic electrodeposition of Li metal on the anode during cycling, resulting in short circuit and capacity loss. Suppression of dendritic growth through the use of solid electrolytes has emerged as one of the most promising strategies for enabling the use of Li metal anodes. We perform a computational screening of over 12 000 inorganic solids based on their ability to suppress dendrite initiation in contact with Li metal anode. Properties for mechanically isotropic and anisotropic interfaces that can be used in stability criteria for determining the propensity of dendrite initiation are usually obtained from computationally expensive first-principles methods. In order to obtain a large data set for screening, we use machine-learning models to predict the mechanical properties of several new solid electrolytes. The machine-learning models are trained on purely structural features of the material, which do not require any first-principles calculations. We train a graph convolutional neural network on the shear and bulk moduli because of the availability of a large training data set with low noise due to low uncertainty in their first-principles-calculated values. We use gradient boosting regressor and kernel ridge regression to train the elastic constants, where the choice of the model depends on the size of the training data and the noise that it can handle. The material stiffness is found to increase with an increase in mass density and ratio of Li and sublattice bond ionicity, and decrease with increase in volume per atom and sublattice electronegativity. Cross-validation/test performance suggests our models generalize well. We predict over 20 mechanically anisotropic interfaces between Li metal and four solid electrolytes which can be used to suppress dendrite growth. Our screened candidates are generally soft and highly anisotropic, and present opportunities for simultaneously obtaining dendrite suppression and high ionic conductivity in solid electrolytes.

Conference publications

  • α‑RACER: Real‑Time Algorithm for Game‑Theoretic Motion Planning and Control in Autonomous Racing using Near‑Potential Function Learning for Decision and Control (L4DC), 2025.
    Co‑authored with Dvij Kalaria and Shankar Sastry.

    View paper

    Abstract: Autonomous racing extends beyond the challenge of controlling a racecar at its physical limits. Professional racers employ strategic maneuvers to outwit other competing opponents to secure victory. While modern control algorithms can achieve human-level performance by computing offline racing lines for single-car scenarios, research on real-time algorithms for multi-car autonomous racing is limited. To bridge this gap, we develop game-theoretic modeling framework that incorporates the competitive aspect of autonomous racing like overtaking and blocking through a novel policy parametrization, while operating the car at its limit. Furthermore, we propose an algorithmic approach to compute the (approximate) Nash equilibrium strategy, which represents the optimal approach in the presence of competing agents. Specifically, we introduce an algorithm inspired by recently introduced framework of dynamic near-potential function, enabling real-time computation of the Nash equilibrium. Our approach comprises two phases: offline and online. During the offline phase, we use simulated racing data to learn a near-potential function that approximates utility changes for agents. This function facilitates the online computation of approximate Nash equilibria by maximizing its value. We evaluate our method in a head-to-head 3-car racing scenario, demonstrating superior performance compared to several existing baselines.

  • Incentive‑Compatible Vertiport Reservation in Advanced Air Mobility: An Auction‑Based Approach Conference on Decision and Control (CDC), 2024.
    Co‑authored with Pan‑Yang Su, Victoria Tuck and Shankar Sastry.

    View paper

    Abstract: The rise of advanced air mobility (AAM) is expected to become a multibillion-dollar industry in the near future. Market-based mechanisms are touted to be an integral part of AAM operations, which comprise heterogeneous operators with private valuations. In this work, we study the problem of designing a mechanism to coordinate the movement of electric vertical take-off and landing (eVTOL) aircraft, operated by multiple operators each having heterogeneous valuations associated with their fleet, between vertiports, while enforcing the arrival, departure, and parking constraints at vertiports. Particularly, we propose an incentive-compatible and individually rational vertiport reservation mechanism that maximizes a social welfare metric, which encapsulates the objective of maximizing the overall valuations of all operators while minimizing the congestion at vertiports. Additionally, we improve the computational tractability of designing the reservation mechanism by proposing a mixed binary linear programming approach that leverages the network flow structure.

  • Understanding the Impact of Coalitions between EV Charging Stations Conference on Decision and Control (CDC), 2024.
    Co‑authored with Sukanya Kudva, Kshitij Kulkarni, Anil Aswani and Shankar Sastry.

    View paper

    Abstract: The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. As charging stations become ubiquitous, their substantial electricity consumption can influence grid operation and electricity pricing. Naturally, some groups of charging stations, which could be jointly operated by a company, may coordinate to decide their charging profile. While coordination among all charging stations is ideal, it is unclear if coordination of some charging stations is better than no coordination. In this paper, we analyze this intermediate regime between no and full coordination of charging stations. We model EV charging as a non-cooperative aggregative game, where each station’s cost is determined by both monetary payments tied to reactive electricity prices on the grid and its sensitivity to deviations from a desired charging profile. We consider a solution concept that we call C-Nash equilibrium, which is tied to a coalition C of charging stations coordinating to reduce their costs. We provide sufficient conditions, in terms of the demand and sensitivity of charging stations, to determine when independent (aka uncoordinated) operation of charging stations could result in lower overall costs to charging stations, coalition and charging stations outside the coalition. Somewhat counter to common intuition, we show numerical instances where allowing charging stations to operate independently is better than coordinating a subset of stations as a coalition. Jointly, these results provide operators of charging stations insights into how to coordinate their charging behavior, and open several research directions.

  • Follower Agnostic Learning in Stackelberg Games Conference on Decision and Control (CDC), 2024.
    Co‑authored with James Cheng, Shankar Sastry, Lillian Ratliff and Eric Mazumdar.

    View paper

    Abstract: In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers’ utility functions, strategy space or learning algorithm. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a departure from traditional assumptions of optimal play, we model followers’ responses using a convergent adaptation rule, allowing for realistic and dynamic interactions. The leader constructs the gradient estimator solely based on observations of followers’ actions. We provide both non-asymptotic convergence rates to stationary points of the leader’s objective and demonstrate asymptotic convergence to a local Stackelberg equilibrium. To validate the effectiveness of our algorithm, we use this algorithm to solve the problem of incentive design on a largescale transportation network, showcasing its robustness even when the leader lacks access to followers’ demand information.

  • Dynamic Tolling in Arc‑based Traffic Assignment Models Allerton Conference on Communication, Control and Computing, 2023.
    Co‑authored with Chih‑Yuan Chiu, Pan‑Yang Su and Shankar Sastry.

    View paper

    Abstract: Tolling in traffic networks offers a popular measure to minimize overall congestion. Existing toll designs primarily focus on congestion in route-based traffic assignment models (TAMs), in which travelers make a single route selection from source to destination. However, these models do not reflect real-world traveler decisions because they preclude deviations from a chosen route, and because the enumeration of all routes is computationally expensive. To address these limitations, our work focuses on arc-based TAMs, in which travelers sequentially select individual arcs (or edges) on the network to reach their destination. We first demonstrate that marginal pricing, a tolling scheme commonly used in route-based TAMs, also achieves socially optimal congestion levels in our arc-based formulation. Then, we use perturbed best response dynamics to model the evolution of travelers’ arc selection preferences over time, and a marginal pricing scheme to capture the social planner’s adaptive toll updates in response. We prove that our adaptive learning and marginal pricing dynamics converge to a neighborhood of the socially optimal loads and tolls. We then present empirical results that verify our theoretical claims.

  • Arc-Based Traffic Assignment: Equilibrium Characterization and Learning Conference on Decision and Control (CDC), 2023.
    Co‑authored with Chih‑Yuan Chiu, Pan‑Yang Su and Shankar Sastry.

    View paper

    Abstract: Arc-based traffic assignment models (TAMs) are a popular framework for modeling traffic network congestion generated by self-interested travelers who sequentially select arcs based on their perceived latency on the network. However, existing arc-based TAMs either assign travelers to cyclic paths, or do not extend to networks with bidirectional arcs (edges) between nodes. To overcome these difficulties, we propose a new modeling framework for stochastic arc-based TAMs. Given a traffic network with bidirectional arcs, we replicate its arcs and nodes to construct a directed acyclic graph (DAG), which we call the Condensed DAG (CoDAG) representation. Self-interested travelers sequentially select arcs on the CoDAG representation to reach their destination. We show that the associated equilibrium flow, which we call the Condensed DAG equilibrium, exists, is unique, and can be characterized as a strictly convex optimization problem. Moreover, we propose a discrete-time dynamical system that captures a natural adaptation rule employed by self-interested travelers to learn about the emergent congestion on the network. We show that the arc flows generated by this adaptation rule converges to a neighborhood of Condensed DAG equilibrium. To our knowledge, our work is the first to study learning and adaptation in an arc-based TAM. Finally, we present numerical results that corroborate our theoretical results.

  • Competing Bandits in Time Varying Matching Markets Learning for Decision and Control (L4DC), 2023.
    Co‑authored with Deepan Muthirayan, Pramod Khargonekar and Shankar Sastry.

    View paper

    Abstract: We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of \(\tilde{\mathcal{O}}(L^{1/2}_T T^{1/2})\) up to the number of changes in the underlying preferences of the agents, \(L_T\). Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.

  • Online Learning for Traffic Navigation in Congested Networks Algorithmic Learning Theory (ALT), 2023.
    Co‑authored with Sreenivas Gollapudi, Kostas Kollias and Manxi Wu.

    View paper

    Abstract: We develop an online learning algorithm for a navigation platform to route travelers in a congested network with multiple origin-destination (o-d) pairs while simultaneously learning unknown cost functions of road segments (edges) using the crowd-sourced data. The number of travel requests is randomly realized, and the travel time of each edge is stochastically distributed with the mean being a linear function that increases with the edge load (the number of travelers who take the edge). In each step of our algorithm, the platform updates the estimates of cost function parameters using the collected travel time data, and maintains a rectangular confidence interval of each parameter. The platform then routes travelers in the next step using an optimistic strategy based on the lower bound of the up-to-date confidence interval. The key aspects of our setting include (i) the size and the spatial distribution of collected travel time data depend on travelers’ routing strategies; (ii) we evaluate the regret of our algorithm for platforms with different objectives, ranging from minimizing the social cost to minimizing the individual cost of self-interested users. We prove that the regret upper bound of our algorithm is \(\mathcal{O}(\sqrt{T}\log(T)|E|)\), where \(T\) is the time horizon, and \(|E|) is the number of edges in the network. Furthermore, we show that the regret bound decreases as the number of travelers increases, which implies that the platform learns faster with a larger user population. Finally, we implement our algorithm on the network of New York City, and demonstrate the efficacy of the proposed algorithm.

  • Inducing Social Optimality in Games via Adaptive Incentive Design Conference on Decision and Control (CDC), 2022.
    Co‑authored with Kshitij Kulkarni, Manxi Wu and Shankar Sastry.

    View paper

    Abstract: How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social planner updates the incentive mechanism at a slower timescale. In particular, the update of the incentive mechanism is based on each player’s externality, which is evaluated as the difference between the player’s marginal cost and the society’s marginal cost in each time step. We show that any fixed point of our learning dynamics corresponds to the optimal incentive mechanism such that the corresponding Nash equilibrium also achieves social optimality. We also provide sufficient conditions for the learning dynamics to converge to a fixed point so that the adaptive incentive mechanism eventually induces a socially optimal outcome. Finally, as an example, we demonstrate that the sufficient conditions for convergence are satisfied in Cournot competition with finite players.

  • Decentralized, communication-and coordination-free learning in structured matching markets International Conference on Neural Information Processing Systems (NeuRIPS), 2022.
    Co‑authored with Shankar Sastry and Eric Mazumdar.

    View paper

    Abstract: We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication-and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets. In contrast to prior works, the proposed algorithms make decisions based solely on an agent's own history of play and requires no foreknowledge of the firms' preferences. Our algorithms are constructed by splitting up the statistical problem of learning one's preferences, from noisy observations, from the problem of competing for firms. We show that under realistic structural assumptions on the underlying preferences of the agents and firms, the proposed algorithms incur a regret which grows at most logarithmically in the time horizon. However, we note that in the worst case, it may grow exponentially in the size of the market.

  • Dynamic Tolling for Inducing Socially Optimal Traffic Loads American Control Conference (ACC), 2022.
    Co‑authored with Kshitij Kulkarni, Manxi Wu and Shankar Sastry.

    View paper

    Abstract: How to design tolls that induce socially optimal traffic loads with dynamically arriving travelers who make selfish routing decisions? We propose a two-timescale discrete-time stochastic dynamics that adaptively adjusts the toll prices on a parallel link network while accounting for the updates of traffic loads induced by the incoming and outgoing travelers and their route choices. The updates of loads and tolls in our dynamics have three key features: (i) The total demand of incoming and outgoing travelers is stochastically realized; (ii) Travelers are myopic and selfish in that they choose routes according to a perturbed best response given the current latency and tolls on parallel links; (iii) The update of tolls is at a slower timescale as compared to the the update of loads. We show that the loads and the tolls eventually concentrate in a neighborhood of the fixed point, which corresponds to the socially optimal load and toll price. Moreover, the fixed point load is also a stochastic user equilibrium with respect to the toll price. Our results can be useful for traffic authorities to efficiently manage traffic loads in response to the arrival and departure of travelers.

  • Zeroth-Order Methods for Convex-Concave Min-max Problems: Applications to Decision-Dependent Risk Minimization International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
    Co‑authored with Chih-Yuan Chiu, Eric Mazumdar, Shankar Sastry and Lillian Ratliff.

    View paper

    Abstract: Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose the random reshuffling-based gradient-free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algorithms for convex minimization problems. We deploy the algorithm to solve the distributionally robust strategic classification problem, where gradient information is not readily available, by reformulating the latter into a finite dimensional convex concave min-max problem. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.

  • Round-robin temporal scheduling of exponentially stabilizing controllers Asian Control Conference (AsCC), 2019.
    Co‑authored with Sukumar Srikant and Debasish Chatterjee.

    View paper

    Abstract: We study the behavior of a control-affine nonlinear system under periodic switching of the active control channel. The periodic switching between the control channels is implemented in a round-robin fashion. We claim that if a globally exponentially stabilizing feedback controller is sparsified using the round-robin scheme and the control action to each channel is scaled appropriately, then the resulting system is stabilized. For numerical proof of concept, we illustrate the effect of such sparsification on a linearized model of a coupled inverted pendulum - simple harmonic oscillator system.

Teaching Experience

I am fortunate to have contributed to teaching and content creation in the following courses:

As Instructor

As Teaching Assistant

Contact

Feel free to reach out if you are interested in collaborating, advising or co‑advising opportunities.

Email: chinmay_maheshwari@jhu.edu

Office: S619, Wyman Park Building, Baltimore, MD

Google Scholar | ORCID| Research Gate