NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs

1Vision and AI Lab, IISc Bangalore 2Google Research 3IIT BHU Varanasi, 4BITS Pilani

CVPR 2023

iNat19 qualitative results

Sample(s) from iNaturalist2019 real class (with class frequency), generated by StyleGAN2-ADA and after adding proposed NoisyTwins. NoisyTwins achieves remarkable diversity, class-consistency and quality by just using 38 samples on average.


StyleGANs are at the forefront of controllable image generation as they produce a latent space that is semantically disentangled, making it suitable for image editing and manipulation. However, the performance of StyleGANs severely degrades when trained via class-conditioning on large-scale long-tailed datasets. We find that one reason for degradation is the collapse of latents for each class in the W latent space. With NoisyTwins, we first introduce an effective and inexpensive augmentation strategy for class embeddings, which then decorrelates the latents based on self-supervision in the W space. This decorrelation mitigates collapse, ensuring that our method preserves intraclass diversity with class-consistency in image generation. We show the effectiveness of our approach on large-scale real-world long-tailed datasets of ImageNet-LT and iNaturalist 2019, where our method outperforms other methods by ∼ 19% on FID, establishing a new state-of-the-art.


For the ith sample of class ci, we create twin augmentations (c̃ai, c̃bi), by sampling from a Gaussian centered at class embedding (µci). After this, we concatenate them with the same zi and obtain (w̃ai , w̃bi) from the mapping network, which we stack in batches of augmented latents (W̃A and W̃B). The twin (w̃ai, w̃bi) vectors are then made invariant to augmentations (similar) in the latent space by minimizing cross-correlation between the latents of two augmented batches (W̃A and W̃B).

Overview of NoisyTwins

Qualitative Results


We find that existing SotA methods for tail classes show collapsed (a) or arbitrary image generation (b). With NoisyTwins, we observe diverse and class-consistent image generation, even for classes having 5-6 images. The tail classes get enhanced diversity by transferring the knowledge from head classes, as they share parameters.


We observe that the noise-only baseline suffers from the mode collapse and class confusion for tail categories as shown on (left). Despite this, it is found that the mean iFID based on Inception V3 shows a smaller value for StyleGAN2ADA+Noise, whereas a higher value for diverse and class-consistent NoisyTwins. Hence, this metric does not align with qualitative results. On the other hand, the proposed mean iFIDCLIP is lower for NoisyTwins, demonstrating its reliability.

Other Applications

Few-Shot Generation

ImageNet Carnivore


Qualitative comparison on few-shot ImageNet Carnivores dataset

Animal Face


Qualitative comparison on few-shot Animal Faces dataset

Fine-tuning after mode collapse


Fine-tuning Results: (Top) FID Curve during fine-tuning with NoisyTwins for CIFAR10-LT dataset. (Below) Diverse images of the truck class generated after fine-tuning baseline with NoisyTwins.

More Results



Qualitative Analysis on iNaturalist2019 (1010 classes). Examples of generations from various classes for evaluated baselines. The baseline ADA suffers from mode collapse, whereas gSR suffers from class confusion particularly for tail classes, particularly for tail classes as seen above on the left. NoisyTwins generates diverse and class-consistent images across all categories.

Related Links


  author    = {Rangwani, Harsh and Bansal, Lavish and Sharma, Kartik and Karmali, Tejan and Jampani, Varun and Babu, R. Venkatesh},
  title     = {NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2023},