Discrete Algorithms Seminar

We meet Fridays, 12pm-2pm (unless otherwise noted), in room 7395 at the CUNY Graduate Center, located 365 Fifth Ave, New York. Send a blank email here to subscribe to the mailing list of talk announcements.

Upcoming Talks - Spring 2016

4/04/16: Liang Zhao (CUNY Graduate Center) Title: Matrix Low-rank Approximation with Random Matrices
Abstract. Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientic computing. In this talk I will present a fast stochastic algorithm developed by Halko, Martinsson, and Tropp in 2011, which in many cases, is superior to the classical algorithms in both speed and robustness. I will provide an error analysis using results from linear algebra and random matrix theory. Also I will mention the recent development of the algorithm and the application of randomization in Gaussian elimination without pivoting.

Past Talks

3/18/16: Samuel Bald (CUNY Graduate Center) Title: Applications of the Probabilistic Method
Abstract. “The probabilistic method is a powerful tool for tackling many problems in discrete mathematics. Roughly speaking, the method works as follows: Trying to prove that a structure with certain desired properties exists, one defines an appropriate probability space of structures and then shows that the desired properties hold in this space with positive probability.” - from The Probabilistic Method

I will discuss some applications of the celebrated probabilistic method, drawing on material from Prof. Joel Spencer’s Random Graphs class at NYU this semester. Focus will be on general probabilistic techniques and methodologies, demonstrated by elegant applications to specific areas and problems. Prof. Spencer briefly covered a few algorithmic applications, and I intend to present these as well.

Topics include:

i) Method - diagonal Ramsey numbers, (un)balancing vectors, edge-connectivity, sum-free subsets
ii) Alterations - independent set, dominating set, 2-coloring uniform hypergraphs
iii) Second Moment - prime divisors, threshold functions for subgraphs
3/11/16: Itai Feigenbaum (Columbia) Title: Optimization in Strategic Environments
Abstract. In this talk, we consider the problem of optimization with incomplete data. The missing data is privately held by agents whose objectives are different from ours, and who can manipulate that data in order to advance those objectives. Our choice of optimization algorithm induces a manipulation game among the agents. Thus, our goal is to design algorithms that induce "good" game-theoretic equilibria, in terms of approximating optimality (having small worst-case approximation ratio). In our work, we design such algorithms for strategic variants of two classical optimization problems. The first is the knapsack problem, where the items are privately owned by agents, each trying to maximize the total value of her own items included in the knapsack. The second is a facility location problem, where each agent's utility/disutility equals her distance from the facility, but her location is known only to her.
2/19/16: John Connor (Brooklyn College) Title: A Constructive Lower Bound On Zimin Word Avoidance
Abstract. In 1984 Zimin completely characterized the unavoidable words for finite alphabets by defining what are now known as Zimin words. We give a constructive lower bound for Zimin word encounters, first by giving a short overview of the subject and briefly describing past results, and then extending these results to construct a word avoiding the Zimin word of order n+1 from the set of words avoiding the Zimin word of order n.

If time permits, I have a separate result showing an interesting (but very technical) relationship between Zimin words and number theory.
3/04/16: Mayank Goswami (Max-Planck Institute for Informatics) Title: Geometry in algorithms for big data: dynamic optimality, load balancing and Teichmuller maps
Abstract. I will talk about algorithms for three types of large data sets: centralized, distributed, and visual. The focus will be on a classical algorithmic problem from each type, and the role of geometry in the recent progress on these problems. 1) Binary search trees (BSTs) with rotations can adapt to various kinds of structure in search sequences, achieving amortized access times substantially better than the O(log n) worst-case guarantee. A famous example of such a BST is the Splay tree invented by Tarjan and Sleator in 1985, which gave rise to the dynamic optimality conjecture, one of the most important open problems in data structures for online algorithms. After stating the conjecture, I will mention its geometric counterpart, and describe some of the recent progress, most notably the resolution of the thirty-year-old traversal conjecture.

2) Load balancing on networks is a long standing open problem, which is already NP-hard in the centralized setting, leaving very little hope for an efficient distributed algorithm. Even for the simple case of sensors distributed uniformly in some planar domain, no efficient approximation algorithm was known. I will describe an algorithm that routes using the medial axis of the domain, that turns out to be the first constant factor approximation for load balancing in this setting.

3) The study of conformal (angle-preserving) maps forms the basis of complex analysis, and recently conformal geometry has found many applications in computer graphics, vision and medical imaging. However, given two surfaces and some boundary constraints, in general it is not possible to find a conformal map between them. One wants a map that minimizes the maximum angular distortion. Such maps are called Teichmuller maps, and although their existence was proved by Teichmuller in the 1940s, no constructive procedure was known until now. I will briefly describe an algorithm for computing such maps between polygons.

Fall 2015

9/04/15: Jiemin Zeng (Stony Brook) Title: Exact and Approximation Algorithms for Data Mule Scheduling in a Sensor Network
Abstract. We consider the fundamental problem of scheduling data mules for managing a wireless sensor network. A data mule tours around a sensor network and can help with network maintenance such as data collection and battery recharging/replacement. We assume that each sensor has a ﬁxed data generation rate and a capacity (upper bound on storage size). If the data mule arrives after the storage capacity is met, additional data generated is lost. In this paper we formulate several fundamental problems for the best schedule of single or multiple data mules and provide algorithms with provable performance. First, we consider using a single data mule to collect data from sensors, and we aim to maximize the data collection rate. We then generalize this model to consider k data mules. Additionally, we study the problem of minimizing the number of data mules such that it is possible for them to collect all data, without loss. For the above problems, when we assume that the capacities of all sensors are the same, we provide exact algorithms for special cases and constant-factor approximation algorithms for more general cases. We also show that several of these problems are NP-hard. When we allow sensor capacities to diﬀer, we have a constant-factor approximation for each of the aforementioned problems when the ratio of the maximum capacity to the minimum capacity is constant.
9/11/15: Reuven Bar-Yehuda (Technion) Title: Growing Half-Balls: Minimizing Storage and Communication Costs in CDNs
Abstract. The Dynamic Content Distribution problem addresses the trade-off between storage and delivery costs in modern virtual Content Delivery Networks (CDNs). That is, a video file can be stored in multiple places so that the request of each user is served from a location that is near by to the user. This minimizes the delivery costs, but is associated with a storage cost. This problem is NP-hard even in grid networks. In this paper, we present a constant factor approximation algorithm for grid networks. We also present an $O(\log \delta)$-competitive algorithm, where $\delta$ is the normalized diameter of the network, for general networks with general metrics. We show a matching lower bound by using a reduction from online undirected Steiner tree. Our algorithms use a rather intuitive approach that has an elegant representation in geometric terms. Paper.
9/18/15: informal meeting
9/25/15: CUNY Tuesday schedule, no meeting
11am 10/02/15 (special time!): Reuven Bar-Yehuda (Technion) Title: A Unified Approach to Approximating Resource Allocation and Scheduling
Abstract. We present a general framework for solving resource allocation and scheduling problems. Given a resource of fixed size, we present algorithms that approximate the maximum throughput or the minimum loss by a constant factor. Our approximation factors apply to many problems, among which are: (i) real-time scheduling of jobs on parallel machines, (ii) bandwidth allocation for sessions between two endpoints, (iii) general caching, (iv) dynamic storage allocation, and (v) bandwidth allocation on optical line and ring topologies. For some of these problems we provide the first constant factor approximation algorithm. Our algorithms are simple and efficient and are based on the local-ratio technique. We note that they can equivalently be interpreted within the primal-dual schema. Paper.
2pm Wednesday 10/07/15, rm 3305 (special day/time/location!): Dror Rawitz (Bar-Ilan University) Title: Rent, Lease or Buy: Randomized Strategies for Multislope Ski Rental
Abstract. In the Multislope Ski Rental problem, the user needs a certain resource for some unknown period of time. To use the resource, the user must subscribe to one of several options, each of which consists of a one-time setup cost ("buying price"), and cost proportional to the duration of the usage ("rental rate"). The larger the price, the smaller the rent. The actual usage time is determined by an adversary, and the goal of an algorithm is to minimize the cost by choosing the best option at any point in time. Multislope Ski Rental is a natural generalization of the classical Ski Rental problem (where the only options are pure rent and pure buy), which is one of the fundamental problems of online computation. The Multislope Ski Rental problem is an abstraction of many problems, where online choices cannot be modeled by just two alternatives, e.g., power management in systems which can be shut down in parts. In this work we study randomized online strategies for Multislope Ski Rental. Our results include an algorithm that produces the best possible randomized online strategy for any additive instance, where the cost of switching from one alternative to another is the difference in their buying prices; and an e-competitive randomized strategy for any (non-additive) instance. We also provide a randomized strategy with a matching lower bound for the case of two slopes, where both slopes have a positive rent. Joint work with Zvi Lotker and Boaz Patt-Shamir.
10/16/15: Chao Chen (Queens College) Title: Modes of a Discrete Graphical Model
Abstract. A discrete graphical model defines a distribution over a space of discrete label configurations, e.g. segmentations of an image, gesture labelings of a video, etc. We define modes as local maxima, namely, configurations each of which has a bigger probability than all its neighbors. These modes provide a geometric description of the topographical landscape of the distribution. They also constitute a candidate set for the prediction of a diverse set of structured outputs, e.g., a set of different hypotheses of a given observation. The computation of the modes is not trivial. We design different algorithms to compute the M most probable modes when the graphical model is a simple chain, a junction chain or a tree. We also discuss the remaining open problems.

Related publications: Chao Chen, Vladimir Kolmogorov, Yan Zhu, Dimitris Metaxas and Christoph Lampert: "Computing the M most probable modes of a graphical model", in AISTATS 2013

Chao Chen, Han Liu, Dimitris Metaxas and Tianqi Zhao: "Mode estimation for high dimensional discrete tree distributions", in NIPS 2014
2pm Wednesday 10/23/15, rm 3305 (special day/time/location!): Chao Chen (Queens College) Title: Modes of a Discrete Graphical Model (continued)
Abstract. A discrete graphical model defines a distribution over a space of discrete label configurations, e.g. segmentations of an image, gesture labelings of a video, etc. We define modes as local maxima, namely, configurations each of which has a bigger probability than all its neighbors. These modes provide a geometric description of the topographical landscape of the distribution. They also constitute a candidate set for the prediction of a diverse set of structured outputs, e.g., a set of different hypotheses of a given observation. The computation of the modes is not trivial. We design different algorithms to compute the M most probable modes when the graphical model is a simple chain, a junction chain or a tree. We also discuss the remaining open problems.

Related publications: Chao Chen, Vladimir Kolmogorov, Yan Zhu, Dimitris Metaxas and Christoph Lampert: "Computing the M most probable modes of a graphical model", in AISTATS 2013

Chao Chen, Han Liu, Dimitris Metaxas and Tianqi Zhao: "Mode estimation for high dimensional discrete tree distributions", in NIPS 2014
10/30/15: Devorah Kletenik (Brooklyn College) Title: Discrete Stochastic Submodular Maximization: Adaptive vs. Non-Adaptive vs. Offline
We consider the problem of stochastic monotone submodular function maximization, subject to constraints. We give results on adaptivity gaps, and on the gap between the optimal offline and online solutions. We present a procedure that transforms a decision tree (adaptive algorithm) into a non-adaptive chain. We prove that this chain achieves at least $\tau$ times the utility of the decision tree, over a product distribution and binary state space, where $\tau=\min_{i,j} \Pr[x_i=j]$. This proves an adaptivity gap of $\frac{1}{\tau}$ (which is 2 in the case of a uniform distribution) for the problem of stochastic monotone submodular maximization subject to state-independent constraints. For a cardinality constraint, we prove that a simple adaptive greedy algorithm achieves an approximation factor of $(1-\frac{1}{e^\tau})$ with respect to the optimal offline solution; previously, it has been proven that the algorithm achieves an approximation factor of $(1-\frac{1}{e})$ with respect to the optimal adaptive online solution. Finally, we show that there exists a non-adaptive solution for the stochastic max coverage problem that is within a factor $(1-\frac{1}{e})$ of the optimal adaptive solution and within a factor of $\tau(1-\frac{1}{e})$ of the optimal offline solution.

Joint work with Lisa Hellerstein and Patrick Lin.
11/06/15: Talya Eden (Tel Aviv University) Title: Approximately Counting Triangles in Sublinear Time
Paper.
Abstract. We consider the problem of estimating the number of triangles in a graph. This problem has been extensively studied in both theory and practice, but all existing algorithms read the entire graph. In this work we design a {\em sublinear-time\/} algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph. The allowed queries are degree queries, vertex-pair queries and neighbor queries.

We show that for any given approximation parameter $0<\epsilon<1$, the algorithm provides an estimate $\widehat{\tr}$ such that with high constant probability, $(1-\epsilon)\cdot \tr< \widehat{\tr}<(1+\epsilon)\cdot \tr$, where $t$ is the number of triangles in the graph $G$. The expected query complexity of the algorithm is $(\frac{n}{\tr^{1/3}} + \min \{m, \frac{m^{3/2}}{\tr} \}) \cdot poly(\log n, 1/\epsilon)$, where $n$ is the number of vertices in the graph and $m$ is the number of edges, and the expected running time is $(\frac{n}{\tr^{1/3}} + \frac{m^{3/2}}{\tr}) \cdot poly(\log n, 1/\epsilon)$. We also prove that $\Omega(\frac{n}{\tr^{1/3}} + \min \{m, \frac{m^{3/2}}{\tr}\})$ queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in $n$ (and the dependence on $1/\epsilon$).
11/13/15: Theory Day, no meeting
11/20/15: NYCAC, no meeting
2pm Wednesday 11/25/15, rm 3305 (special day/time/location!): Gui Citovsky (Stony Brook) Title: Conflict-Free Covering
Abstract. Let P={C1,C2,…,Cn} be a set of color classes, where each color class Ci consists of a set of points. In this talk, we address a family of covering problems in which each color class must be covered but no two points from the same color class are allowed to be covered by the same geometric object. We consider the case that P is restricted to be on a line and the case that P is anywhere in the Euclidean plane. We examine the following two objectives:

1. Cover at least one point from each color class.
2. Cover exactly one point from each color class.

We prove that the problems in this family are NP-complete (or NP-hard) and offer several constant-factor approximation algorithms.

This is joint work with Esther M. Arkin, Aritra Banik, Paz Carmi, Matthew J. Katz, Joseph S. B. Mitchell and Marina Simakov.
Paper.
12/04/15: Itai Feigenbaum (Columbia) Title: Selfish Knapsack
Paper.
Abstract. We consider a selfish variant of the knapsack problem. In our version, the items are owned by agents, and each agent can misrepresent the set of items she owns---either by avoiding reporting some of them (understating), or by reporting additional ones that do not exist (overstating). Each agent's objective is to maximize, within the items chosen for inclusion in the knapsack, the total valuation of her own chosen items. The knapsack problem, in this context, seeks to minimize the worst-case approximation ratio for social welfare at equilibrium. We show that a randomized greedy mechanism yields a $2$-approximation in Bayes-Nash equilibrium (in worst-case over all distributions). We then explore each type of manipulation separately. We show that when agents only overstate, the greedy mechanism becomes a strategyproof $2$-approximate mechanism (in pointwise worst-case). When agents only understate, we consider two cases. The first is when the competition is between two agents, where we provide a randomized strategyproof $\frac{5+4\sqrt{2}}{7} \approx 1.522$-approximate mechanism. The second is when there is only one manipulative agent, where we provide a deterministic strategyproof $\frac{1+\sqrt{5}}{2} \approx 1.618$-approximate mechanism. We also provide several lower bounds on the approximation ratio obtainable by strategyproof mechanisms, which in turn proves the optimality of some of our mechanisms. Finally, we consider a model where agents can misreport their items' properties rather than existence. Specifically, each agent owns a single item, whose value-to-size ratio is publicly known, but whose actual value is not. We show that a simple adaptation of our greedy mechanism is strategyproof and $2$-approximate, and provide a matching lower bound; we also show that no deterministic strategyproof mechanism can provide a bounded approximation ratio.
12/11/15: Supriyo Chakraborty (IBM Research) Title: Balancing Behavioral Privacy and Information Utility While Sharing Sensory Data Streams
Abstract. Our personal, social, work, and urban spaces are being increasingly instrumented using sensors of various modalities, resulting in the generation and sharing of large amounts of personal sensor data. Embedded in this data are digital footprints of our daily lives, and, therefore, the problem of protecting the behavioral privacy of users while simultaneously maintaining the utility of the shared data has become more relevant than ever before.

Traditional approaches to addressing privacy threats have primarily focused on data sanitization for disassociating the user's identity from shared data and on anonymization techniques to make a user indistinguishable within a sub-population. However, due to the nature of applications in domains such as mobile health, insurance, etc., user identity is often an inalienable part of the shared data. Thus, instead of identity privacy, we focus on protecting the privacy of inferences that can be derived from the shared data.

In this talk, we will present a formalization of the notion of inference privacy for sensor data, discuss privacy metrics and algorithms for achieving inference privacy, and outline a system architecture for their realization.