CPM 2023

34th Annual Symposium on Combinatorial Pattern Matching

Marne-la-Vallée, France, June 26–28, 2023

Keynote speakers

Olgica Milenkovic

University of Illinois Urbana-Champaign, USA

Olgica Milenkovic

Load Balancing for Distributed Storage Codes with Dynamically Changing File Popularities

The problem of designing codes for distributed storage systems has received significant attention due to its practical applications and interesting analytical challenges. Many relevant distributed coding schemes are based on combinatorial designs which ensure efficient content regeneration and system recovery. An example includes the conceptually simple and easy-to-implement fractional repetition codes.
One important problem mostly overlooked in coding for distributed storage pertains to access balancing (i.e., load balancing) where one requires that the number of access requests to servers during data download is close-to-uniform. To enable such a balancing scheme, one needs to take into account the popularities of the stored data chunks or files as these dictate the number of access requests.
We introduce access balancing fractional repetition codes that represent the first instance of combinatorial designs with both classical intersection and new labeling (coloring) constraints. The codes maintain the properties of classical fractional repetition codes but also ensure that the average popularities of data files at the servers are close to balanced. We then describe a dynamic popularity model and explain how it relates to combinatorial trades. We conclude the talk by introducing a new problem of designing perturbation-stable trades and provide constructions for the same based on specialized graph models.

Tatiana Starikovskaya

École Normale Supérieure, France

Tatiana Starikovskaya

Approximate membership in formal languages

Formal language membership is a classical problem with myriad applications. Intuitively, the goal is to succinctly describe the subset of strings that carry important information in order to be able to identify them efficiently. For instance, searching for all substrings that belong to a given regular language is a common task in web scraping. Furthermore, determining the well-formedness of a data file can be reduced to the Dyck language membership problem. Moreover, repetitions in biological sequences, such as palindromes and squares, often indicate sites of functional significance. The quick growth of data volume and the presence of noise in applications demand efficient and robust methods for membership testing, and in this talk, we will provide a survey of such methods and open questions.

Virginia Vassilevska Williams

Massachusetts Institute of Technology, USA

Virginia Vassilevska Williams

Fredman’s Trick Meets Dominance Product: counting and detection equivalences and more

Two useful tools in graph algorithms are Fredman's trick (used in many All-Pairs Shortest Paths algorithms) and Matousek's Dominance product algorithm (used e.g. to solve all-pairs bottleneck paths or nondecreasing paths). In this talk I will give a new application of these techniques with some intriguing consequences. For instance, we will see that many well-studied problems such as 3SUM, Min-Plus convolution, the Min-Plus product of matrices and versions of All-Pairs Shortest Paths are fine-grained equivalent to their natural counting variants.
In particular, for 3SUM the result is as follows. In 3SUM one is given three lists of n integers each, A,B and C and one needs to determine whether there are a in A, b in B and c in C so that a+b+c=0. In the counting version #3SUM, given the same input, one needs to count for every c in C the number of pairs (a,b) so that a+b+c=0. Both 3SUM and #3SUM have simple Õ(n2) time algorithms. A basic open problem is whether 3SUM admits a ``truly subquadratic'', O(n2-ϵ) time algorithm for any ϵ > 0, and it is often conjectured that this is not the case. Our result shows that 3SUM has a truly subquadratic time algorithm if and only if the seemingly harder #3SUM problem has a truly subquadratic algorithm.
We will discuss other applications of the techniques and relationships to some pattern matching questions such as Text-to-Pattern Hamming Distances.
Based on joint work with Timothy Chan, Yinzhan Xu and Ce Jin.