publications
2025
- Sequential Diversification with Provable GuaranteesHonglian Wang, Sijing Tu, and Aristides GionisIn WSDM’25, 2025
Diversification is a useful tool for exploring large collections of information items. It has been used to reduce redundancy and cover multiple perspectives in information-search settings. Diversification finds applications in many different domains, including presenting search results of information-retrieval systems and selecting suggestions for recommender systems. Interestingly, existing measures of diversity are defined over \emphsets of items, rather than evaluating \emphsequences of items. This design choice comes in contrast with commonly-used relevance measures, which are distinctly defined over sequences of items, taking into account the ranking of items. The importance of employing sequential measures is that information items are almost always presented in a sequential manner, and during their information-exploration activity users tend to prioritize items with higher ranking. In this paper, we study the problem of \emphmaximizing sequential diversity. This is a new measure of \emphdiversity, which accounts for the \emphranking of the items, and incorporates \emphitem relevance and \emphuser behavior. The overarching framework can be instantiated with different diversity measures, and here we consider the measures of \emphsum diversity and \emphcoverage diversity. The problem was recently proposed by Coppolillo et al. \citepcoppolillo2024relevance, where they introduce empirical methods that work well in practice. Our paper is a theoretical treatment of the problem: we establish the problem hardness and present algorithms with constant approximation guarantees for both diversity measures we consider. Experimentally, we demonstrate that our methods are competitive against strong baselines.
- Optirefine: densest subgraphs and maximum cuts with k refinementsSijing Tu, Aleksa Stankovic, Stefan Neumann, and Aristides GionisData Mining and Knowledge Discovery, 2025
Data-analysis tasks often involve an iterative process, which requires refining previous solutions. For instance, when analyzing social networks, we may obtain initial communities based on noisy metadata, and we want to improve them by adding influential nodes and removing non-important ones, without making too many changes. However, classic optimization algorithms, which typically find solutions from scratch, potentially return communities that are very dissimilar to the initial one. To mitigate these issues, we introduce the OptiRefine framework. The framework optimizes initial solutions by making a small number of refinements, thereby ensuring that the new solution remains close to the initial solution and simultaneously achieving a near-optimal solution for the optimization problem. We apply the OptiRefine framework to two classic graph-optimization problems: densest subgraph and maximum cut. For the densest-subgraph problem, we optimize a given subgraph’s density by adding or removing k nodes. We show that this novel problem is a generalization of k-densest subgraph, and provide constant-factor approximation algorithms for k= Ω(n) refinements. We also study a version of maximum cut in which the goal is to improve a given cut. We provide connections to the maximum cut with cardinality constraints and provide an optimal approximation algorithm in most parameter regimes under the Unique Games Conjecture for k= Ω(n) refinements. We evaluate our theoretical methods and scalable heuristics on synthetic and real-world data and show that they are highly effective in practice.
- Streaming Stochastic Submodular Maximization with On-Demand User RequestsHonglian Wang, Sijing Tu, Lutz Oettershagen, and Aristides GionisIn NeurIPS’25, 2025
We explore a novel problem in streaming submodular maximization, inspired by the dynamics of news-recommendation platforms. We consider a setting where users can visit a news website at any time, and upon each visit, the website must display up to k news items. User interactions are inherently stochastic: each news item presented to the user is consumed with a certain acceptance probability by the user, and each news item covers certain topics. Our goal is to design a streaming algorithm that maximizes the expected total topic coverage. To address this problem, we establish a connection to submodular maximization subject to a matroid constraint. We show that we can effectively adapt previous methods to address our problem when the number of user visits is known in advance or linear-size memory in the stream length is available. However, in more realistic scenarios where only an upper bound on the visits and sublinear memory is available, the algorithms fail to guarantee any bounded performance. To overcome these limitations, we introduce a new online streaming algorithm that achieves a competitive ratio of 1/(8δ), where δcontrols the approximation quality. Moreover, it requires only a single pass over the stream, and uses memory independent of the stream length. Empirically, our algorithms consistently outperform the baselines.
- Fair Committee Selection under Ordinal Preferences and Limited Cardinal InformationAmeet Gadekar, Aristides Gionis, Suhas Thejaswi, and Sijing TuIn AAMAS Extended Abstracts’26, 2025In alphabetical order
We study the problem of fair k-committee selection under an egalitarian objective. Given n agents partitioned into m groups (e.g., demographic quotas), the goal is to aggregate their preferences to form a committee of size k that guarantees minimum representation from each group while minimizing the maximum cost incurred by any agent. We model this setting as the ordinal fair k-center problem, where agents are embedded in an unknown metric space, and each agent reports a complete preference ranking (i.e., ordinal information) over all agents, consistent with the underlying distance metric (i.e., cardinal information). The cost incurred by an agent with respect to a committee is defined as its distance to the closest committee member. The quality of an algorithm is evaluated using the notion of distortion, which measures the worst-case ratio between the the cost of the committee produced by the algorithm and the cost of an optimal committee, when given complete access to the underlying metric space. When cardinal information is not available, no constant distortion is possible for the ordinal k-center problem, even without fairness constraints, when k ≥3 [Burkhardt et.al., AAAI’24]. To overcome this hardness, we allow limited access to cardinal information by querying the metric space. In this setting, our main contribution is a factor-5 distortion algorithm that requires only O(k \log^2 k) queries. Along the way, we present an improved factor-3 distortion algorithm using O(k^2) queries.
2024
- The Impact of External Sources on the Friedkin-Johnsen ModelCharlotte Out, Sijing Tu, Stefan Neumann, and Ahad N. ZehmakanIn CIKM’24, Boise, ID, USA, 2024
To obtain a foundational understanding of timeline algorithms and viral content in shaping public opinions, computer scientists started to study augmented versions of opinion formation models from sociology. In this paper, we generalize the popular Friedkin–Johnsen model to include the effects of external media sources on opinion formation. Our goal is to mathematically analyze the influence of biased media, arising from factors such as manipulated news reporting or the phenomenon of false balance. Within our framework, we examine the scenario of two opposing media sources, which do not adapt their opinions like ordinary nodes, and analyze the conditions and the number of periods required for radicalizing the opinions in the network. When both media sources possess equal influence, we theoretically characterize the final opinion configuration. In the special case where there is only a single media source present, we prove that media sources which do not adapt their opinions are significantly more powerful than those which do. Lastly, we conduct the experiments on real-world and synthetic datasets, showing that our theoretical guarantees closely align with experimental simulations.
2023
- Adversaries with Limited Information in the Friedkin-Johnsen ModelSijing Tu, Stefan Neumann, and Aristides GionisIn KDD’23, Long Beach, CA, USA, 2023
In recent years, online social networks have been the target of adversaries who seek to introduce discord into societies, to undermine democracies and to destabilize communities. Often the goal is not to favor a certain side of a conflict but to increase disagreement and polarization. To get a mathematical understanding of such attacks, researchers use opinion-formation models from sociology, such as the Friedkin–Johnsen model, and formally study how much discord the adversary can produce when altering the opinions for only a small set of users. In this line of work, it is commonly assumed that the adversary has full knowledge about the network topology and the opinions of all users. However, the latter assumption is often unrealistic in practice, where user opinions are not available or simply difficult to estimate accurately.To address this concern, we raise the following question: Can an attacker sow discord in a social network, even when only the network topology is known? We answer this question affirmatively. We present approximation algorithms for detecting a small set of users who are highly influential for the disagreement and polarization in the network. We show that when the adversary radicalizes these users and if the initial disagreement/polarization in the network is not very high, then our method gives a constant-factor approximation on the setting when the user opinions are known. To find the set of influential users, we provide a novel approximation algorithm for a variant of MaxCut in graphs with positive and negative edge weights. We experimentally evaluate our methods, which have access only to the network topology, and we find that they have similar performance as methods that have access to the network topology and all user opinions. We further present an NP-hardness proof, which was left as an open question by Chen and Racz [IEEE Transactions on Network Science and Engineering, 2021].
2022
- A Viral Marketing-Based Model For Opinion Dynamics in Online Social NetworksSijing Tu, and Stefan NeumannIn WWW’22, Virtual Event, Lyon, France, 2022
Online social networks provide a medium for citizens to form opinions on different societal issues, and a forum for public discussion. They also expose users to viral content, such as breaking news articles. In this paper, we study the interplay between these two aspects: opinion formation and information cascades in online social networks. We present a new model that allows us to quantify how users change their opinion as they are exposed to viral content. Our model is a combination of the popular Friedkin–Johnsen model for opinion dynamics and the independent cascade model for information propagation. We present algorithms for simulating our model, and we provide approximation algorithms for optimizing certain network indices, such as the sum of user opinions or the disagreement-controversy index; our approach can be used to obtain insights into how much viral content can increase these indices in online social networks. Finally, we evaluate our model on real-world datasets. We show experimentally that marketing campaigns and polarizing contents have vastly different effects on the network: while the former have only limited effect on the polarization in the network, the latter can increase the polarization up to 59% even when only 0.5% of the users start sharing a polarizing content. We believe that this finding sheds some light into the growing segregation in today’s online media.
2020
- Tell me something my friends do not know: Diversity maximization in social networksAntonis Matakos, Sijing Tu, and Aristides GionisKnowledge and Information Systems, 2020
- Co-exposure Maximization in Online Social NetworksSijing Tu, Cigdem Aslay, and Aristides GionisIn NeurIPS, 2020
Social media has created new ways for citizens to stay informed on societal matters and participate in political discourse. However, with its algorithmically-curated and virally-propagating content, social media has contributed further to the polarization of opinions by reinforcing users’ existing viewpoints. An emerging line of research seeks to understand how content-recommendation algorithms can be re-designed to mitigate societal polarization amplified by social-media interactions. In this paper, we study the problem of allocating seed users to opposing campaigns: by drawing on the equal-time rule of political campaigning on traditional media, our goal is to allocate seed users to campaigners with the aim to maximize the expected number of users who are co-exposed to both campaigns. We show that the problem of maximizing co-exposure is NP-hard and its objective function is neither submodular nor supermodular. However, by exploiting a connection to a submodular function that acts as a lower bound to the objective, we are able to devise a greedy algorithm with provable approximation guarantee. We further provide a scalable instantiation of our approximation algorithm by introducing a novel extension to the notion of random reverse-reachable sets for efficiently estimating the expected co-exposure. We experimentally demonstrate the quality of our proposal on real-world social networks.