new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jun 16

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

While significant progress has been made in single-view 3D human pose estimation, multi-view 3D human pose estimation remains challenging, particularly in terms of generalizing to new camera configurations. Existing attention-based transformers often struggle to accurately model the spatial arrangement of keypoints, especially in occluded scenarios. Additionally, they tend to overfit specific camera arrangements and visual scenes from training data, resulting in substantial performance drops in new settings. In this study, we introduce a novel Multi-View State Space Modeling framework, named MV-SSM, for robustly estimating 3D human keypoints. We explicitly model the joint spatial sequence at two distinct levels: the feature level from multi-view images and the person keypoint level. We propose a Projective State Space (PSS) block to learn a generalized representation of joint spatial arrangements using state space modeling. Moreover, we modify Mamba's traditional scanning into an effective Grid Token-guided Bidirectional Scanning (GTBS), which is integral to the PSS block. Multiple experiments demonstrate that MV-SSM achieves strong generalization, outperforming state-of-the-art methods: +10.8 on AP25 (+24%) on the challenging three-camera setting in CMU Panoptic, +7.0 on AP25 (+13%) on varying camera arrangements, and +15.3 PCP (+38%) on Campus A1 in cross-dataset evaluations. Project Website: https://aviralchharia.github.io/MV-SSM

On composition and decomposition operations for vector spaces, graphs and matroids

In this paper, we study the ideas of composition and decomposition in the context of vector spaces, graphs and matroids. For vector spaces V_{AB}, treated as collection of row vectors, with specified column set Auplus B, we define V_{SP}lrarv V_{PQ}, Scap Q= emptyset, to be the collection of all vectors (f_S,f_Q) such that (f_S,f_P)in V_{SP}, (f_P,f_Q)in V_{PQ}. An analogous operation G_{SP}lrarg G_{PQ}equivd G_{PQ} can be defined in relation to graphs G_{SP}, G_{PQ}, on edge sets Suplus P, Puplus Q, respectively in terms of an overlapping subgraph G_P which gets deleted in the right side graph (see for instance the notion of k-sum oxley). For matroids we define the `linking' M_{SP}lrarm M_{PQ} equivd (M_{SP}vee M_{PQ})times (Suplus Q), denoting the contraction operation by 'times'. In each case, we examine how to minimize the size of the `overlap' set P, without affecting the right side entity. In the case of vector spaces, there is a polynomial time algorithm for achieving the minimum, which we present. Similar ideas work for graphs and for matroids under appropriate conditions. Next we consider the problem of decomposition. Here, in the case of vector spaces, the problem is to decompose V_{SQ} as V_{SP}lrarv V_{PQ}, with minimum size P. We give a polynomial time algorithm for this purpose. In the case of graphs and matroids we give a solution to this problem under certain restrictions.

  • 1 authors
·
Jul 13, 2023

Yet another argument in favour of NP=CoNP

This article shows yet another proof of NP=CoNP$. In a previous article, we proved that NP=PSPACE and from it we can conclude that NP=CoNP immediately. The former proof shows how to obtain polynomial and, polynomial in time checkable Dag-like proofs for all purely implicational Minimal logic tautologies. From the fact that Minimal implicational logic is PSPACE-complete we get the proof that NP=PSPACE. This first proof of NP=CoNP uses Hudelmaier linear upper-bound on the height of Sequent Calculus minimal implicational logic proofs. In an addendum to the proof of NP=PSPACE, we observe that we do not need to use Hudelmaier upper-bound since any proof of non-hamiltonicity for any graph is linear upper-bounded. By the CoNP-completeness of non-hamiltonicity, we obtain NP=CoNP as a corollary of the first proof. In this article we show the third proof of CoNP=NP, also providing polynomial size and polynomial verifiable certificates that are Dags. They are generated from normal Natural Deduction proofs, linear height upper-bounded too, by removing redundancy, i.e., repeated parts. The existence of repeated parts is a consequence of the redundancy theorem for a family of super-polynomial proofs in the purely implicational Minimal logic. It is mandatory to read at least two previous articles to get the details of the proof presented here. The article that proves the redundancy theorem and the article that shows how to remove the repeated parts of a normal Natural Deduction proof to have a polynomial Dag certificate for minimal implicational logic tautologies.

  • 1 authors
·
Dec 28, 2020

Finding Kissing Numbers with Game-theoretic Reinforcement Learning

Since Isaac Newton first studied the Kissing Number Problem in 1694, determining the maximal number of non-overlapping spheres around a central sphere has remained a fundamental challenge. This problem is the local analogue of Hilbert's 18th problem, bridging geometry, number theory, and information theory. Although significant progress has been made through lattices and codes, the irregularities of high-dimensional geometry, dimensional structure variability, and combinatorial explosion beyond Go limit the scalability and generality of existing methods. Here we model the problem as a two-player matrix completion game and train the reinforcement learning system, PackingStar, to play the games. The matrix entries represent pairwise cosines of sphere center vectors. One player fills entries while another corrects suboptimal ones to improve exploration quality, cooperatively maximizing the matrix size, corresponding to the kissing number. These matrices are decomposed into representative substructures, providing diverse bases and structural constraints that steer subsequent games and make extremely large spaces tractable. PackingStar surpasses records from dimensions 25 to 31 and sets new lower bounds for generalized kissing numbers under various angular constraints. It achieves the first breakthrough beyond rational structures from 1971 in 13 dimensions and discovers over 6000 new structures in other dimensions. Notably, some configurations challenge long-held antipodal paradigms, revealing algebraic correspondences with finite simple groups as well as geometric relationships across dimensions. Inspired by these patterns, humans devised further improved constructions. These results demonstrate AI's power to explore high-dimensional spaces beyond human intuition via extreme-scale reinforcement learning and open new pathways for the Kissing Number Problem and broader geometry research.

  • 9 authors
·
Feb 10

Hyperbolic Category Discovery

Generalized Category Discovery (GCD) is an intriguing open-world problem that has garnered increasing attention. Given a dataset that includes both labelled and unlabelled images, GCD aims to categorize all images in the unlabelled subset, regardless of whether they belong to known or unknown classes. In GCD, the common practice typically involves applying a spherical projection operator at the end of the self-supervised pretrained backbone, operating within Euclidean or spherical space. However, both of these spaces have been shown to be suboptimal for encoding samples that possesses hierarchical structures. In contrast, hyperbolic space exhibits exponential volume growth relative to radius, making it inherently strong at capturing the hierarchical structure of samples from both seen and unseen categories. Therefore, we propose to tackle the category discovery challenge in the hyperbolic space. We introduce HypCD, a simple Hyperbolic framework for learning hierarchy-aware representations and classifiers for generalized Category Discovery. HypCD first transforms the Euclidean embedding space of the backbone network into hyperbolic space, facilitating subsequent representation and classification learning by considering both hyperbolic distance and the angle between samples. This approach is particularly helpful for knowledge transfer from known to unknown categories in GCD. We thoroughly evaluate HypCD on public GCD benchmarks, by applying it to various baseline and state-of-the-art methods, consistently achieving significant improvements.

  • 3 authors
·
Apr 8, 2025

Denotational validation of higher-order Bayesian inference

We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such inference algorithms as manipulating intermediate representations of probabilistic programs using higher-order functions and inductive types, and their denotational semantics. Semantic accounts of continuous distributions use measurable spaces. However, our use of higher-order functions presents a substantial technical difficulty: it is impossible to define a measurable space structure over the collection of measurable functions between arbitrary measurable spaces that is compatible with standard operations on those functions, such as function application. We overcome this difficulty using quasi-Borel spaces, a recently proposed mathematical structure that supports both function spaces and continuous distributions. We define a class of semantic structures for representing probabilistic programs, and semantic validity criteria for transformations of these representations in terms of distribution preservation. We develop a collection of building blocks for composing representations. We use these building blocks to validate common inference algorithms such as Sequential Monte Carlo and Markov Chain Monte Carlo. To emphasize the connection between the semantic manipulation and its traditional measure theoretic origins, we use Kock's synthetic measure theory. We demonstrate its usefulness by proving a quasi-Borel counterpart to the Metropolis-Hastings-Green theorem.

  • 10 authors
·
Nov 8, 2017

Lie Group Decompositions for Equivariant Neural Networks

Invariance and equivariance to geometrical transformations have proven to be very useful inductive biases when training (convolutional) neural network models, especially in the low-data regime. Much work has focused on the case where the symmetry group employed is compact or abelian, or both. Recent work has explored enlarging the class of transformations used to the case of Lie groups, principally through the use of their Lie algebra, as well as the group exponential and logarithm maps. The applicability of such methods to larger transformation groups is limited by the fact that depending on the group of interest G, the exponential map may not be surjective. Further limitations are encountered when G is neither compact nor abelian. Using the structure and geometry of Lie groups and their homogeneous spaces, we present a framework by which it is possible to work with such groups primarily focusing on the Lie groups G = GL^{+}(n, R) and G = SL(n, R), as well as their representation as affine transformations R^{n} rtimes G. Invariant integration as well as a global parametrization is realized by decomposing the `larger` groups into subgroups and submanifolds which can be handled individually. Under this framework, we show how convolution kernels can be parametrized to build models equivariant with respect to affine transformations. We evaluate the robustness and out-of-distribution generalisation capability of our model on the standard affine-invariant benchmark classification task, where we outperform all previous equivariant models as well as all Capsule Network proposals.

  • 2 authors
·
Oct 17, 2023

Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing

Sphere packing, Hilbert's eighteenth problem, asks for the densest arrangement of congruent spheres in n-dimensional Euclidean space. Although relevant to areas such as cryptography, crystallography, and medical imaging, the problem remains unresolved: beyond a few special dimensions, neither optimal packings nor tight upper bounds are known. Even a major breakthrough in dimension n=8, later recognised with a Fields Medal, underscores its difficulty. A leading technique for upper bounds, the three-point method, reduces the problem to solving large, high-precision semidefinite programs (SDPs). Because each candidate SDP may take days to evaluate, standard data-intensive AI approaches are infeasible. We address this challenge by formulating SDP construction as a sequential decision process, the SDP game, in which a policy assembles SDP formulations from a set of admissible components. Using a sample-efficient model-based framework that combines Bayesian optimisation with Monte Carlo Tree Search, we obtain new state-of-the-art upper bounds in dimensions 4-16, showing that model-based search can advance computational progress in longstanding geometric problems. Together, these results demonstrate that sample-efficient, model-based search can make tangible progress on mathematically rigid, evaluation limited problems, pointing towards a complementary direction for AI-assisted discovery beyond large-scale LLM-driven exploration.

  • 6 authors
·
Dec 4, 2025 2

Vietoris--Rips Shadow for Euclidean Graph Reconstruction

The shadow of an abstract simplicial complex K with vertices in R^N is a subset of R^N defined as the union of the convex hulls of simplices of K. The Vietoris--Rips complex of a metric space (S,d) at scale β is an abstract simplicial complex whose each k-simplex corresponds to (k+1) points of S within diameter β. In case Ssubsetmathbb R^2 and d(a,b)=|a-b| the standard Euclidean metric, the natural shadow projection of the Vietoris--Rips complex is already proved by Chambers et al. to induce isomorphisms on π_0 and π_1. We extend the result beyond the standard Euclidean distance on Ssubsetmathbb R^N to a family of path-based metrics, d^varepsilon_{S}. From the pairwise Euclidean distances of points in S, we introduce a family (parametrized by varepsilon) of path-based Vietoris--Rips complexes R^varepsilon_β(S) for a scale β>0. If SsubsetR^2 is Hausdorff-close to a planar Euclidean graph G, we provide quantitative bounds on scales β,varepsilon for the shadow projection map of the Vietoris--Rips complex of (S,d^varepsilon_S) at scale β to induce π_1-isomorphism. This paper first studies the homotopy-type recovery of Gsubsetmathbb R^N using the abstract Vietoris--Rips complex of a Hausdorff-close sample S under the d^varepsilon_S metric. Then, our result on the π_1-isomorphism induced by the shadow projection lends itself to providing also a geometrically close embedding for the reconstruction. Based on the length of the shortest loop and large-scale distortion of the embedding of G, we quantify the choice of a suitable sample density varepsilon and a scale β at which the shadow of R^varepsilon_β(S) is homotopy-equivalent and Hausdorff-close to G.

  • 3 authors
·
Jun 2, 2025

Fast, Expressive SE(n) Equivariant Networks through Weight-Sharing in Position-Orientation Space

Based on the theory of homogeneous spaces we derive geometrically optimal edge attributes to be used within the flexible message-passing framework. We formalize the notion of weight sharing in convolutional networks as the sharing of message functions over point-pairs that should be treated equally. We define equivalence classes of point-pairs that are identical up to a transformation in the group and derive attributes that uniquely identify these classes. Weight sharing is then obtained by conditioning message functions on these attributes. As an application of the theory, we develop an efficient equivariant group convolutional network for processing 3D point clouds. The theory of homogeneous spaces tells us how to do group convolutions with feature maps over the homogeneous space of positions R^3, position and orientations R^3 {times} S^2, and the group SE(3) itself. Among these, R^3 {times} S^2 is an optimal choice due to the ability to represent directional information, which R^3 methods cannot, and it significantly enhances computational efficiency compared to indexing features on the full SE(3) group. We support this claim with state-of-the-art results -- in accuracy and speed -- on five different benchmarks in 2D and 3D, including interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models.

  • 5 authors
·
Oct 4, 2023

Derivations and Sobolev functions on extended metric-measure spaces

We investigate the first-order differential calculus over extended metric-topological measure spaces. The latter are quartets mathbb X=(X,τ,{sf d},mathfrak m), given by an extended metric space (X,{sf d}) together with a weaker topology τ (satisfying suitable compatibility conditions) and a finite Radon measure mathfrak m on (X,τ). The class of extended metric-topological measure spaces encompasses all metric measure spaces and many infinite-dimensional metric-measure structures, such as abstract Wiener spaces. In this framework, we study the following classes of objects: - The Banach algebra {rm Lip}_b(X,τ,{sf d}) of bounded τ-continuous {sf d}-Lipschitz functions on X. - Several notions of Lipschitz derivations on X, defined in duality with {rm Lip}_b(X,τ,{sf d}). - The metric Sobolev space W^{1,p}(mathbb X), defined in duality with Lipschitz derivations on X. Inter alia, we generalise both Weaver's and Di Marino's theories of Lipschitz derivations to the extended setting, and we discuss their connections. We also introduce a Sobolev space W^{1,p}(mathbb X) via an integration-by-parts formula, along the lines of Di Marino's notion of Sobolev space, and we prove its equivalence with other approaches, studied in the extended setting by Ambrosio, Erbar and Savaré. En route, we obtain some results of independent interest, among which are: - A Lipschitz-constant-preserving extension result for τ-continuous {sf d}-Lipschitz functions. - A novel and rather robust strategy for proving the equivalence of Sobolev-type spaces defined via an integration-by-parts formula and those obtained with a relaxation procedure. - A new description of an isometric predual of the metric Sobolev space W^{1,p}(mathbb X).

  • 2 authors
·
Mar 3, 2025

Measuring the Intrinsic Dimension of Objective Landscapes

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

  • 4 authors
·
Apr 24, 2018

Weighted least-squares approximation with determinantal point processes and generalized volume sampling

We consider the problem of approximating a function from L^2 by an element of a given m-dimensional space V_m, associated with some feature map varphi, using evaluations of the function at random points x_1,dots,x_n. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features varphi(x_i). We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples n = O(mlog(m)), that means that the expected L^2 error is bounded by a constant times the best approximation error in L^2. Also, further assuming that the function is in some normed vector space H continuously embedded in L^2, we further prove that the approximation is almost surely bounded by the best approximation error measured in the H-norm. This includes the cases of functions from L^infty or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.

  • 2 authors
·
Dec 21, 2023

Understanding and Improving Hyperbolic Deep Reinforcement Learning

The performance of reinforcement learning (RL) agents depends critically on the quality of the underlying feature representations. Hyperbolic feature spaces are well-suited for this purpose, as they naturally capture hierarchical and relational structure often present in complex RL environments. However, leveraging these spaces commonly faces optimization challenges due to the nonstationarity of RL. In this work, we identify key factors that determine the success and failure of training hyperbolic deep RL agents. By analyzing the gradients of core operations in the Poincaré Ball and Hyperboloid models of hyperbolic geometry, we show that large-norm embeddings destabilize gradient-based training, leading to trust-region violations in proximal policy optimization (PPO). Based on these insights, we introduce Hyper++, a new hyperbolic PPO agent that consists of three components: (i) stable critic training through a categorical value loss instead of regression; (ii) feature regularization guaranteeing bounded norms while avoiding the curse of dimensionality from clipping; and (iii) using a more optimization-friendly formulation of hyperbolic network layers. In experiments on ProcGen, we show that Hyper++ guarantees stable learning, outperforms prior hyperbolic agents, and reduces wall-clock time by approximately 30%. On Atari-5 with Double DQN, Hyper++ strongly outperforms Euclidean and hyperbolic baselines. We release our code at https://github.com/Probabilistic-and-Interactive-ML/hyper-rl .

univie University of Vienna
·
Dec 16, 2025 2

Manifold k-NN: Accelerated k-NN Queries for Manifold Point Clouds

k-nearest neighbor (k-NN) search is a fundamental primitive in geometry processing and computer graphics. While spatial partitioning structures such as kd-trees are standard, they are often manifold-blind, failing to exploit the intrinsic low-dimensional structure of points sampled from 2-manifolds. Recent advances in dynamic programming-based nearest neighbor search (DP-NNS) leverage incrementally constructed Voronoi diagrams to accelerate queries, where each site p maintains a list of successors that progressively refine its Voronoi cell. However, DP-NNS is restricted to single nearest neighbor (k=1) searches, precluding their adoption in applications that require local neighborhood statistics. In this paper, we generalize the DP-NNS framework to support arbitrary k-NN queries for manifold-aligned data. Our approach is founded on the geometric observation that if p_i is the nearest neighbor of a query q in P, then the second nearest neighbor of q must reside either within the prefix set P_{1:i-1} = {p_1, \dots, p_{i-1}} or within p_i's successor list. By recursively extending this principle, we introduce Manifold k-NN, a recursive algorithmic scheme that significantly outperforms conventional kd-trees for manifold-aligned data. Our method achieves a 1\times--10\times speedup in volume-to-surface query scenarios and inherently supports dynamic prefix queries -- enabling k-NN searches within any subset P_{1:m} (m \leq n) with zero overhead. Furthermore, we extend the framework to support point deletion via local Delaunay updates, providing a complete suite of dynamic operations for point set modification. Comprehensive experiments on diverse geometric datasets demonstrate the efficiency and broad applicability of our approach for modern graphics pipelines. Source code is available at https://github.com/sssomeone/manifold-knn.

  • 7 authors
·
May 3