Hi! My name is Zhenyu Huang (黄振宇). I’m a Ph.D. student from College of Computer Science, Sichuan Univerisity, China. Currently, I’m a student of Professor Xi Peng [website]. In 2020, I interned at Baidu NLP mentored by Xinyan Xiao [website].
My research interests span Multimodal Learning, Noisy Label Learning, and Unsupervised Learning. For now, I focus on designing robust multimodal learning models with strong empirical performance & real-world deployability.
Cross-modal matching, which aims to establish the correspondence between two different modalities, is fundamental to a variety of tasks such as cross-modal retrieval and vision-and-language understanding. Although a huge number of cross-modal matching methods have been proposed and achieved remarkable progress in recent years, almost all of these methods implicitly assume that the multimodal training data are correctly aligned. In practice, however, such an assumption is extremely expensive even impossible to satisfy. Based on this observation, we reveal and study a latent and challenging direction in cross-modal matching, named noisy correspondence, which could be regarded as a new paradigm of noisy labels. Different from the traditional noisy labels which mainly refer to the errors in category labels, our noisy correspondence refers to the mismatch paired samples. To solve this new problem, we propose a novel method for learning with noisy correspondence, named Noisy Correspondence Rectifier (NCR). In brief, NCR divides the data into clean and noisy partitions based on the memorization effect of neural networks and then rectifies the correspondence via an adaptive prediction model in a co-teaching manner. To verify the effectiveness of our method, we conduct experiments by using the image-text matching as a showcase. Extensive experiments on Flickr30K, MS-COCO, and Conceptual Captions verify the effectiveness of our method. The code could be accessed from www.pengxi.me .
NeurIPS Oral
Partially View-aligned Clustering
Zhenyu Huang, Peng Hu, Joey Tianyi Zhou, and
2 more authors
In Proceedings of the 34th Conference on Neural Information Processing Systems , NeurIPS’2020, Dec 2020
In this paper, we study one challenging issue in multi-view data clustering. To be specific, for two data matrices \mathbfX^(1) and \mathbfX^(2) corresponding to two views, we do not assume that \mathbfX^(1) and \mathbfX^(2) are fully aligned in row-wise. Instead, we assume that only a small portion of the matrices has established the correspondence in advance. Such a partially view-aligned problem (PVP) could lead to the intensive labor of capturing or establishing the aligned multi-view data, which has less been touched so far to the best of our knowledge. To solve this practical and challenging problem, we propose a novel multi-view clustering method termed partially view-aligned clustering (PVC). To be specific, PVC proposes to use a differentiable surrogate of the non-differentiable Hungarian algorithm and recasts it as a pluggable module. As a result, the category-level correspondence of the unaligned data could be established in a latent space learned by a neural network, while learning a common space across different views using the “aligned” data. Extensive experimental results show promising results of our method in clustering partially view-aligned data.
IJCAI
Multi-view Spectral Clustering Network
Zhenyu Huang, Joey Tianyi Zhou, Xi Peng, and
3 more authors
In Proceedings of the Twenty-Eighth International Joint Conference on
Artificial Intelligence, IJCAI’2019, 10–16 aug 2019
Multi-view clustering aims to cluster data from diverse sources or domains, which has drawn considerable attention in recent years. In this paper, we propose a novel multi-view clustering method named multi-view spectral clustering network (MvSCN) which could be the first deep version of multi-view spectral clustering to the best of our knowledge. To deeply cluster multi-view data, MvSCN incorporates the local invariance within every single view and the consistency across different views into a novel objective function, where the local invariance is defined by a deep metric learning network rather than the Euclidean distance adopted by traditional approaches. In addition, we enforce and reformulate an orthogonal constraint as a novel layer stacked on an embedding network for two advantages, i.e. jointly optimizing the neural network and performing matrix decomposition and avoiding trivial solutions. Extensive experiments on four challenging datasets demonstrate the effectiveness of our method compared with 10 state-of-the-art approaches in terms of three evaluation metrics.