I am a Ph.D. candidate at Korea Advanced Institute of Science and Technology (KAIST), advised by Prof. Jinwoo Shin. During the study, I was fortunate to intern at Amazon Web Services (AWS) (Seattle, WA) twice, in 2022 and 2021. I am also a recipient of Qualcomm Innovation Fellowship Korea 2020 from two of my papers. Previously, I received a B.S. in Mathematics and Computer Science from KAIST in 2017.

My research goal is to understand why neural networks behave so differently from our brain, and ultimately, how our brain makes inferences. Specifically, I am interested in discovering (if exist) simple priors that would close the gap between neural network and human perception. Many topics are related, particularly on robustness (or generalization) to distribution shifts, e.g., adversarial, out-of-distribution, and label shifts, just to name a few.

Email: jongheonj (at) kaist dot ac dot kr


:sparkles:  News

  • Sept 2022: Our paper “NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation” will be presented at NeurIPS 2022.
  • Sept 2022: I will re-join AWS AI (Seattle, WA) as a returning intern and work until December.
  • Aug 2022: Three papers will be presented at ECCV Workshop 2022.
  • July 2022: Our paper “SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation” will be presented at ECCV 2022.
  • June 2022: Our paper “Consistency Regularization for Adversarial Robustness” will be presented at AAAI 2022.

:page_with_curl:  Publications

(*: Equal contribution, C: Conference, W: Workshop, P: Preprint)

2022

tl;dr: The first test-time adaptation method concerning temporally correlated data.

Additional information

Abstract

Test-time adaptation (TTA) is an emerging paradigm that addresses distributional shifts between training and testing phases without additional data acquisition or labeling cost; only unlabeled test data streams are used for continual model adaptation. Previous TTA schemes assume that the test samples are independent and identically distributed (i.i.d.), even though they are often temporally correlated (non-i.i.d.) in application scenarios, e.g., autonomous driving. We discover that most existing TTA methods fail dramatically under such scenarios. Motivated by this, we present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams. Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner. Our evaluation with various datasets, including real-world non-i.i.d. streams, demonstrates that the proposed robust TTA not only outperforms state-of-the-art TTA algorithms in the non-i.i.d. setting, but also achieves comparable performance to those algorithms under the i.i.d. assumption.

tl;dr: We propose to extend Information Bottleneck with nuisance variable for out-of-distribution generalization.

Additional information

Abstract

The information bottleneck (IB) principle is one of natural approaches to obtain a succinct representation x -> z for a given downstream task x -> y: namely, it finds z that (a) maximizes the (task-relevant) mutual information I(z; y), while (b) minimizing I(x; z) to constrain the capacity of z for better generalization. In practical scenarios where the training data is limited, however, the IB objective may not be able to prevent z from co-adapting on so-called "shortcut" signal, i.e., features only in training data those are predictive-yet-compressible enough. They are typically from biases in data acquisition, and less generalizable under new (but still semantically-aligned) environments. To bypass such a failure mode, we extend the standard framework of IB to also model the nuisance information with respect to z, namely z_n, so that (z, z_n) can reconstruct x: by minimizing I(z_n; y) as well as the IB objective here, z can now encode more diverse y-related signal in x, while disentangling the remainder information from z. Our experimental results show that the representation learned from our proposed training consistently improves various notions of robustness over the standard VIB training without relying on data augmentations, e.g., novelty detection and corruption robustness.

tl;dr: A more sensible training method for randomized smoothing by incorporating a sample-wise control of target robustness.

Additional information

Abstract

Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to l2-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. In this paper, we propose a simple training method leveraging the fundamental trade-off between accuracy and (adversarial) robustness to obtain more robust smoothed classifiers, in particular, through a sample-wise control of robustness over the training samples. We make this control feasible by using "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input: specifically, we differentiate the training objective depending on this proxy to filter out samples that are unlikely to benefit from the worst-case (adversarial) objective. Our experiments show that the proposed method, despite its simplicity, consistently exhibits improved certified robustness upon state-of-the-art training methods. Somewhat surprisingly, we find these improvements persist even for other notions of robustness, e.g., to various types of common corruptions.

  Code

Additional information

Abstract

Semi-supervised learning (SSL) has been a powerful strategy to incorporate few labels in learning better representations. In this paper, we focus on a practical scenario that one aims to apply SSL when unlabeled data may contain out-of-class samples - those that cannot have one-hot encoded labels from a closed-set of classes in label data, i.e., the unlabeled data is an open-set. Specifically, we introduce OpenCoS, a simple framework for handling this realistic semi-supervised learning scenario based upon a recent framework of self-supervised visual representation learning. We first observe that the out-of-class samples in the open-set unlabeled dataset can be identified effectively via self-supervised contrastive learning. Then, OpenCoS utilizes this information to overcome the failure modes in the existing state-of-the-art semi-supervised methods, by utilizing one-hot pseudo-labels and soft-labels for the identified in- and out-of-class unlabeled data, respectively. Our extensive experimental results show the effectiveness of OpenCoS under the presence of out-of-class samples, fixing up the state-of-the-art semi-supervised methods to be suitable for diverse scenarios involving open-set unlabeled data.

BibTeX

@misc{park2021opencos,
  title={Open{CoS}: Contrastive Semi-supervised Learning for Handling Open-set Unlabeled Data},
  author={Jongjin Park and Sukmin Yun and Jongheon Jeong and Jinwoo Shin},
  year={2021},
  eprint={2107.08943},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

tl;dr: (a) VisA - a new larger-scale benchmark for industrial anomaly detection and segmentation; (b) a novel self-supervised pre-training method targeting anomaly downstream tasks on the benchmark.

Additional information

Abstract

Visual anomaly detection is commonly used in industrial quality inspection. In this paper, we present a new dataset as well as a new self-supervised learning method for ImageNet pre-training to improve anomaly detection and segmentation in 1-class and 2-class 5/10/high-shot training setups. We release the Visual Anomaly (VisA) Dataset consisting of 10,821 high-resolution color images (9,621 normal and 1,200 anomalous samples) covering 12 objects in 3 domains, making it the largest industrial anomaly detection dataset to date. Both image and pixel-level labels are provided. We also propose a new self-supervised framework - SPot-the-difference (SPD) - which can regularize contrastive self-supervised pre-training, such as SimSiam, MoCo and SimCLR, to be more suitable for anomaly detection tasks. Our experiments on VisA and MVTec-AD dataset show that SPD consistently improves these contrastive pre-training baselines and even the supervised pre-training. For example, SPD improves Area Under the Precision-Recall curve (AU-PR) for anomaly segmentation by 5.9% and 6.8% over SimSiam and supervised pre-training respectively in the 2-class high-shot regime. We open-source the project at http://github.com/amazon-research/spot-diff.

  Code   Slides   Poster

tl;dr: Consistency regularization can also prevent robustness overfitting in adversarial training.

  • Also appeared ICML AdvML Workshop 2021 as an Oral presentation
  • Won the Best Paper Award from Korean Artificial Intelligence Association 2021
Additional information

Abstract

Adversarial training (AT) is currently one of the most successful methods to obtain the adversarial robustness of deep neural networks. However, the phenomenon of robust overfitting, i.e., the robustness starts to decrease significantly during AT, has been problematic, not only making practitioners consider a bag of tricks for a successful training, e.g., early stopping, but also incurring a significant generalization gap in the robustness. In this paper, we propose an effective regularization technique that prevents robust overfitting by optimizing an auxiliary `consistency' regularization loss during AT. Specifically, we discover that data augmentation is a quite effective tool to mitigate the overfitting in AT, and develop a regularization that forces the predictive distributions after attacking from two different augmentations of the same instance to be similar with each other. Our experimental results demonstrate that such a simple regularization technique brings significant improvements in the test robust accuracy of a wide range of AT methods. More remarkably, we also show that our method could significantly help the model to generalize its robustness against unseen adversaries, e.g., other types or larger perturbations compared to those used during training. Code is available at https://github.com/alinlab/consistency-adversarial.

BibTeX

@inproceedings{tack2022consistency,
  title={Consistency Regularization for Adversarial Robustness},
  author={Jihoon Tack and Sihyun Yu and Jongheon Jeong and Minseon Kim and Sung Ju Hwang and Jinwoo Shin},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2022}
}

2021

  Code   Talk   Slides   Poster

tl;dr: Overconfident inputs nearby the data may cause adversarial vulnerability in randomized smoothing, and regularizing them toward the uniform confidence improves robustness.

  • Also appeared at ICML AdvML Workshop 2021
Additional information

Abstract

Randomized smoothing is currently a state-of-the-art method to construct a certifiably robust classifier from neural networks against $\ell_2$-adversarial perturbations. Under the paradigm, the robustness of a classifier is aligned with the prediction confidence, i.e., the higher confidence from a smoothed classifier implies the better robustness. This motivates us to rethink the fundamental trade-off between accuracy and robustness in terms of calibrating confidences of a smoothed classifier. In this paper, we propose a simple training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup: it trains on convex combinations of samples along the direction of adversarial perturbation for each input. The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness in case of smoothed classifiers, and offers an intuitive way to adaptively set a new decision boundary between these samples for better robustness. Our experimental results demonstrate that the proposed method can significantly improve the certified $\ell_2$-robustness of smoothed classifiers compared to existing state-of-the-art robust training methods.

BibTeX

@inproceedings{jeong2021smoothmix,
  title={Smooth{Mix}: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness},
  author={Jongheon Jeong and Sejun Park and Minkyu Kim and Heung-Chang Lee and Doguk Kim and Jinwoo Shin},
  booktitle={Advances in Neural Information Processing Systems},
  year={2021},
  url={https://openreview.net/forum?id=nlEQMVBD359}
}

  Code   Talk   Slides   Poster

tl;dr: We propose a novel discriminator of GAN showing that contrastive representation learning, e.g., SimCLR, and GAN can benefit each other when they are jointly trained.

Additional information

Abstract

Recent works in Generative Adversarial Networks (GANs) are actively revisiting various data augmentation techniques as an effective way to prevent discriminator overfitting. It is still unclear, however, that which augmentations could actually improve GANs, and in particular, how to apply a wider range of augmentations in training. In this paper, we propose a novel way to address these questions by incorporating a recent contrastive representation learning scheme into the GAN discriminator, coined ContraD. This "fusion" enables the discriminators to work with much stronger augmentations without increasing their training instability, thereby preventing the discriminator overfitting issue in GANs more effectively. Even better, we observe that the contrastive learning itself also benefits from our GAN training, i.e., by maintaining discriminative features between real and fake samples, suggesting a strong coherence between the two worlds: good contrastive representations are also good for GAN discriminators, and vice versa. Our experimental results show that GANs with ContraD consistently improve FID and IS compared to other recent techniques incorporating data augmentations, still maintaining highly discriminative features in the discriminator in terms of the linear evaluation. Finally, as a byproduct, we also show that our GANs trained in an unsupervised manner (without labels) can induce many conditional generative models via a simple latent sampling, leveraging the learned features of ContraD. Code is available at https://github.com/jh-jeong/ContraD.

BibTeX

@inproceedings{jeong2021training,
  title={Training {GAN}s with Stronger Augmentations via Contrastive Discriminator},
  author={Jongheon Jeong and Jinwoo Shin},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=eo6U4CAwVmg}
}

2020

  Code   Talk   Slides   Poster

tl;dr: Consistency controls robustness in the world of randomized smoothing, like TRADES in adversarial training.

  • Also appeared at ICML UDL Workshop 2020
  • Won Qualcomm Innovation Fellowship Korea 2020
Additional information

Abstract

A recent technique of randomized smoothing has shown that the worst-case (adversarial) $\ell_2$-robustness can be transformed into the average-case Gaussian-robustness by "smoothing" a classifier, i.e., by considering the averaged prediction over Gaussian noise. In this paradigm, one should rethink the notion of adversarial robustness in terms of generalization ability of a classifier under noisy observations. We found that the trade-off between accuracy and certified robustness of smoothed classifiers can be greatly controlled by simply regularizing the prediction consistency over noise. This relationship allows us to design a robust training objective without approximating a non-existing smoothed classifier, e.g., via soft smoothing. Our experiments under various deep neural network architectures and datasets show that the "certified" $\ell_2$-robustness can be dramatically improved with the proposed regularization, even achieving better or comparable results to the state-of-the-art approaches with significantly less training costs and hyperparameters.

BibTeX

@inproceedings{jeong2020consistency,
 author = {Jeong, Jongheon and Shin, Jinwoo},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
 pages = {10558--10570},
 publisher = {Curran Associates, Inc.},
 title = {Consistency Regularization for Certified Robustness of Smoothed Classifiers},
 url = {https://proceedings.neurips.cc/paper/2020/file/77330e1330ae2b086e5bfcae50d9ffae-Paper.pdf},
 volume = {33},
 year = {2020}
}

  Code   Talk   Slides   Poster

tl;dr: Contrastive representations are surprisingly good at discriminating OOD samples, and contrasting also “OOD-like” augmentations can further improve their performances.

  • Won Qualcomm Innovation Fellowship Korea 2020
Additional information

Abstract

Novelty detection, i.e., identifying whether a given sample is drawn from outside the training distribution, is essential for reliable machine learning. To this end, there have been many attempts at learning a representation well-suited for novelty detection and designing a score based on such representation. In this paper, we propose a simple, yet effective method named contrasting shifted instances (CSI), inspired by the recent success on contrastive learning of visual representations. Specifically, in addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself. Based on this, we propose a new detection score that is specific to the proposed training scheme. Our experiments demonstrate the superiority of our method under various novelty detection scenarios, including unlabeled one-class, unlabeled multi-class and labeled multi-class settings, with various image benchmark datasets. Code and pre-trained models are available at https://github.com/alinlab/CSI.

BibTeX

@inproceedings{tack2020csi,
 author = {Tack, Jihoon and Mo, Sangwoo and Jeong, Jongheon and Shin, Jinwoo},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
 pages = {11839--11852},
 publisher = {Curran Associates, Inc.},
 title = {CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances},
 url = {https://proceedings.neurips.cc/paper/2020/file/8965f76632d7672e7d3cf29c87ecaa0c-Paper.pdf},
 volume = {33},
 year = {2020}
}

  Code   Talk   Slides

tl;dr: Adversarial examples targeting Majority -> minority can play as surprisingly effective minority samples to prevent overfitting under class-imbalance.

Additional information

Abstract

In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples (e.g., images) from more-frequent classes. This simple approach enables a classifier to learn more generalizable features of minority classes, by transferring and leveraging the diversity of the majority information. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods. The performance of our method even surpasses those of previous state-of-the-art methods for the imbalanced classification.

BibTeX

@InProceedings{kim2020M2m,
  author = {Kim, Jaehyung and Jeong, Jongheon and Shin, Jinwoo},
  title = {M2m: Imbalanced Classification via Major-to-Minor Translation},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2020}
}

2019

  Code   Talk   Slides   Poster

tl;dr: Any CNNs can become more efficient by “re-allocating” unnecessary channels to increase the kernel size.

Additional information

Abstract

Recent progress in deep convolutional neural networks (CNNs) have enabled a simple paradigm of architecture design: larger models typically achieve better accuracy. Due to this, in modern CNN architectures, it becomes more important to design models that generalize well under certain resource constraints, e.g. the number of parameters. In this paper, we propose a simple way to improve the capacity of any CNN model having large-scale features, without adding more parameters. In particular, we modify a standard convolutional layer to have a new functionality of channel-selectivity, so that the layer is trained to select important channels to re-distribute their parameters. Our experimental results under various CNN architectures and datasets demonstrate that the proposed new convolutional layer allows new optima that generalize better via efficient resource utilization, compared to the baseline.

BibTeX

@InProceedings{jeong2020training,
  title = 	 {Training {CNN}s with Selective Allocation of Channels},
  author =       {Jeong, Jongheon and Shin, Jinwoo},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {3080--3090},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/jeong19c/jeong19c.pdf},
  url = 	 {https://proceedings.mlr.press/v97/jeong19c.html}
}

2016


:briefcase:  Work Experience


:medal_sports:  Honors & Awards


:handshake:  Professional Services

  • Conference reviewers
    • Neural Information Processing Systems (NeurIPS)
    • International Conference on Learning Representations (ICLR)
    • International Conference on Machine Learning (ICML)
    • AAAI Conference on Artificial Intelligence (AAAI)
  • Journal reviewers
    • International Journal of Computer Vision (IJCV)
    • Transactions on Machine Learning Research (TMLR)
    • ACM Transactions on Modeling and Performance Evaluation of Computing Systems (ACM ToMPECS)