Instructions to use MamaPearl/nula-cifar10-robust-v0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MamaPearl/nula-cifar10-robust-v0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="MamaPearl/nula-cifar10-robust-v0", trust_remote_code=True) pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoModelForImageClassification model = AutoModelForImageClassification.from_pretrained("MamaPearl/nula-cifar10-robust-v0", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
NULA - Base
v0.1.0 · Anti-Aliased Residual CNN · CIFAR-10 · Adversarial Robustness
From left to right: Clean (32x32), Bilinear Resize (8x8), Hard Decimation (stride-2),
and Checkerboard Aliasing (high-frequency injection).
NULA, an anti-aliased residual convolutional neural network for CIFAR-10 image classification, trained to be robust against perturbations that exploit downsampling operations.
Classical image models rely on fragile high-frequency cues, which downsampling operators destroy or alias exactly to those components.
NULA is trained to reduce this dependence and instead form representations that remain stable under information-destroying transformations such as resizing, decimations, and aliasing-style perturbation.
NULA explicitly targets robustness to operators that change the sampling structure of the input, rather than generic adversarial perturbations.
Problem
Downsampling operations are linear maps that perform a many-to-one mapping. Information is destroyed, and the null space explodes:
any such map D : ℝⁿ → ℝᵐ with m < n has a non-trivial null.
- dim(ker(D)) = n - rank(D) ≥ n - m > 0
This is exploitable.
Bilinear Resize
Bilinear resize computes a weighted average over local neighborhoods. Any input signal decomposes as:
- x = x_low + x_high
where x_low contains the low-frequency components preserved by the operator, and x_high contains the high-frequency residual.
Under downsampling D and upsampling U:
D(x) ≈ x_low
U(D(x)) ≈ x_low
x_high ∈ ker(D)
x_high -> completely annihilated. An attacker can inject arbitrary signal into x_high, perturbing the image without affecting the downsampled representation at all.
Decimation
Decimation keeps every k-th pixel and discards the rest:
- x_dec[i, j] = x[ki, kj]
No smoothing is applied before subsampling.
By the Nyquist-Shannon theorem, signals with frequency content above 1/(2k) of the sampling rate are aliased. they fold back into lower frequencies and corrupt the representation.
Checkerboard Attack
Visualization of the Clean Input (x), the Perturbed Input (x + δ) at ε=0.05, and the Amplified Perturbation Signal (δ).
- δ[i, j] = ε · (-1)^(i+j)
This is a Nyquist injection. It is the highest spatial frequency representable on a discrete grid. Under stride-2 subsampling S_2:
- (S_2 δ)[i, j] = δ[2i, 2j] = ε · (-1)^(2i+2j) = ε · 1 = ε
The subsampled result is a constant.
the checkerboard pattern is completely collapsed to a DC offset and loses all adversarial structure.
This proves:
- δ ∈ ker(S₂ - εI)
the perturbation lies in the null space of the centered stride-2 operator. Modern convolutional networks would never even see the attack.
The result is an image perceptually identical to the original, with a manipulated activation pattern upstream of the first downsampling operation.
let f = g ∘ D where g is the remainder of the network.
for any δ ∈ ker(D):
- f(x + δ) = g(D(x + δ)) = g(D(x)) = f(x)
The network is blind to the entire subspace ker(D).
Approach
Anti-aliased Downsampling (BlurPool)
Standard strided convolutions perform subsampling without enforcing a band-limit, causing aliasing.
BlurPool2d introduces a low-pass filter before subsampling:
- x → (low-pass filter) → subsample
The filter is a normalized fixed binomial kernel [1, 2, 1] ⊗ [1, 2, 1], applied depthwise: one filter per channel with no cross-channel mixing.
This enforces approximate band-limitedness prior to resolution reduction. It reduces aliasing artifacts and makes feature extraction more stable under downsampling.
Squeeze-and-Excitation (SE) blocks
SE blocks perform channel-wise reweighting.
s = σ(W₂ δ(W₁ GAP(x)))
x → s ⊙ x
The bottleneck dimension is max(C // r, 1) where r = 16, keeping the recalibration lightweight relative to the feature dimension.
The network learns to suppress channels that carry unstable high-frequency information and amplify channels that carry structurally stable features.
FIRST EVALUATION (Base)
The first evaluation was trained for clean classification only, without adversarial augmentation.
| Perturbation | Accuracy | Δ from base |
|---|---|---|
| No Perturbation | 91.95% | — |
| Resize x0.5 (bilinear) | 59.83% | −32.12% |
| Resize x0.25 (bilinear) | 24.82% | −67.13% |
| Decimate x2 | 30.03% | −61.92% |
| Checkerboard ε = 0.03 | 75.47% | −16.48% |
| Checkerboard ε = 0.05 | 44.99% | −46.96% |
This is a catastrophic collapse of model performance under representation instability. The representation inside the network is highly sensitive to aliasing and not stable under non-invertible transformations.
for checkerboard perturbations: ∃δ s.t. ||δ||∞ ≤ ε, but argmax f(x + δ) ≠ argmax f(x)
for resize/decimate: the transformation preserves class identity while destroying high-frequency structure.
the model fails because its representations are not invariant to the loss of this structure.
SECOND EVALUATION (Robust)
The second evaluation of NULA was retrained from scratch under a modified training distribution.
During training, images were stochastically exposed to resolution-degrading transformations such as:
- resize-down/up
- hard decimation
- anti-aliased blur-decimation
The training objective becomes:
- min_θ E_{ (x, y) ~ D, T ~ 𝓣 } [ L(f_θ(T(x)), y) ]
where T is a distribution over resolution-degrading operators
The model is forced to learn representations that remain predictive under transformations that destroy or corrupt high-frequency information.
| Perturbation | Accuracy | Δ from base |
|---|---|---|
| No Perturbation | 89.42% | −2.53% |
| Resize x0.5 | 85.37% | +25.54% |
| Resize x0.25 | 71.80% | +46.98% |
| Decimate x2 | 85.02% | +54.99% |
| Checkerboard ε = 0.03 | 89.43% | +13.96% |
| Checkerboard ε = 0.05 | 89.39% | +44.40% |
- For augmentation functions, see
augmentations.py - For the adversarial training loop, see
train_robust.py
Interpretation
The baseline model relies on high-frequency components that are not stable under downsampling or aliasing. These components lie in regions of the input space that are not preserved by common sampling operators.
As a result, small perturbations aligned with these unstable directions cause large changes in the model’s activations.
The robust variant shifts reliance toward features that occupy the range of the downsampling operator — the subspace that survives projection.
These features encode structural information at scales that are preserved under frequency loss, rather than fine-grained detail that is annihilated by the null space.
The result is a model whose decision boundary is anchored to geometry that persists through information destruction.
Usage
NULA is hosted on the HuggingFace Hub and can be loaded directly via the transformers library.
from transformers import pipeline
classifier = pipeline(
"image-classification",
model="MamaPearl/nula-cifar10-robust-v0",
trust_remote_code=True
)
# Run on any image URL or local path
results = classifier("https://path-to-your-image.jpg")
for res in results:
print(f"{res['label']}: {res['score']:.2%}")
For manual loading, see infer.py
Input tensors should be shape (B, C, H, W).
Preprocessing Requirements:
Color Mode: RGB (images should be converted via .convert("RGB"))
Input Size: 32x32 (standard for CIFAR-10)
Normalization: Mean [0.5, 0.5, 0.5] and std [0.5, 0.5, 0.5] (scales pixels from [0, 1] to [-1, 1])
NULA Architecture
| Component | Details |
|---|---|
| Stem | 3 → 128, Conv3×3, BatchNorm, SiLU |
| Stage 1 | 128 → 128, residual, no downsample |
| Stage 2 | 128 → 256, residual, BlurPool downsample |
| Stage 3 | 256 → 512, residual, BlurPool downsample |
| Head | GlobalAvgPool → Linear(512, 512) → SiLU → Dropout(0.3) → Linear(512, 10) |
SE blocks applied at each stage with reduction factor 16.
Citation
If you use this model or repository in your research, please cite:
@misc{mamapearl_nula_2026,
author = {MamaPearl},
title = {NULA: Robust CIFAR-10 Classification via Anti-Aliased Downsampling and Adversarial Augmentation},
month = apr,
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/MamaPearl/nula-cifar10-robust-v0}},
}
Authors
MamaPearl · @mamapearli
License
This project is licensed under the MIT License. See LICENSE for more information.
- Downloads last month
- 30