Research

Streaming Algorithms: Count-Min Sketch with Conservative Updates

Context. Online counting of item occurrences in high-rate data streams is essential for extracting key features (e.g., detecting top-k items). Since streams may contain billions of distinct items, algorithms must operate with sublinear memory. This inevitably introduces errors. The Count-Min Sketch (CMS) is a widely used solution: it maintains a compact counter array that supports approximate frequency queries with a memory–accuracy trade-off. A refinement, CMS with Conservative Update (CMS-CU), improves accuracy, but unlike the vanilla CMS, theoretical guarantees for CMS-CU have long been missing.

Contributions. We provide tractable analytic frameworks that bound the frequency estimation error of CMS-CU in two complementary regimes:

  • Frequent items under stationary stochastic streams — see [C1], [J1].
  • Infrequent items under arbitrary streams — see [C5], [J3].

For both settings, the main obstacle is dealing with processes similar to balls and bins with power of d, except that, on ties, all minimal bins among selected ones are incremented instead of a tie‑breaking rule where a single bin is incremented. In [C1, J1], we show that the analysis boils down to studying a constrained variant of the aforementioned process: a random hypergraph is drawn once, its vertices are the counters, each hyperedge encodes an admissible d‑tuple, and each edge is selected at every step according to a fixed, non‑uniform probability distribution modeling the heterogeneity of the items in the stream. On the other hand, in [C5, J3], all hyperedges are admissible and have homogeneous probabilities of being selected.

Publications

No‑Regret Caching and Similarity Caching

Context (No‑Regret Caching). Caches are small memories that speed up data retrieval. Caching policies strategically select cache content based on previous requests to maximize a utility function, such as the number of requests answered directly by the cache. Worst‑case analysis of caching algorithms evaluates their performance against an optimal policy with knowledge of future requests. One way to do that is to consider the regret metric, which is the difference between the utility of a static optimum and that of the algorithm. Numerous caching policies that leverage online convex optimization and exhibit vanishing regret have been proposed. Nevertheless, these algorithms are computationally expensive and necessitate complete knowledge of prior requests, which may not be feasible in practical scenarios like caching at a cellular base station.

Contributions. In [J4, C3]), we show that a Follow‑The‑Perturbed Leader adaptation achieves near‑optimal regret bounds in a partial‑observability setting with O(1) update time.

Publications

Context (Similarity Caching). In this problem, items are represented as points within a metric space, enabling the quantification of similarity between any two items through their distance. Similarity caching extends the traditional caching problem by allowing requests for items to be approximately satisfied with similar cached items. The objective is to optimize a utility function that considers how similar the requested item is to the cached item used for the response.

Contributions. In [J2, C2], we propose an analytical framework for computing the hit ratio, i.e., the fraction of requests satisfied directly by the cache, of an adaptation of the well‑known Least‑Recently‑Used (LRU) policy to similarity caching under stationary requests. This computation is challenging as it involves examining a Markov process with exponentially growing state space. To make the problem tractable, we introduced a surrogate similarity caching policy with independent caching decisions and identified conditions where this fictitious policy closely approximates the LRU‑based similarity caching variant.

Publications

Repeated Proportional Allocation Auctions Games

Context. Decentralized resource allocation in large‑scale systems is a fundamental problem that has been extensively studied in network economics. The Kelly, also called proportional‑auction, is a simple and efficient mechanism in which each agent receives a share of the resource proportional to its bid. The resulting game has been shown to achieve near‑optimal social welfare at its Nash equilibrium. These guarantees, however, assume that agents can actually play this equilibrium, which in turn requires every player to know the utilities of all others. A more realistic setting assumes that each agent knows only its own utility, yet can adapt its bid over repeated, synchronous rounds using limited feedback, such as the aggregate bid of all players.

Contributions. In [C4], we study the repeated game induced by the Kelly mechanism. We prove that when all agents update their bids via a dual‑averaging no‑regret algorithm and their utilities grow logarithmically with the resource they receive, the system converges to the Nash equilibrium of the one‑shot game.

Publications