U²-Net — LiteRT (TFLite) GPU, FP16

On-device LiteRT (.tflite) conversion of U²-Net for salient-object segmentation / background removal. U²-Net is a nested U-structure ("U-net of U-nets", a pure CNN) that predicts a single-channel saliency mask; the foreground is composited onto transparency to cut the subject out of its background.

The model runs fully on the LiteRT CompiledModel GPU accelerator (ML Drift): every op is GPU-native, no CPU fallback, no Flex ops. It converts with litert-torch with no custom rewrites (pure CNN).

Files

File	Size	Description
`u2net_fp16.tflite`	88 MB	float16 weights, GPU-compatible

I/O

Input: [1, 3, 320, 320] float32, NCHW, RGB. Preprocessing: resize to 320×320, divide by the per-image max, then ImageNet normalize (mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]).
Output: [1, 1, 320, 320] saliency mask in [0, 1] (sigmoid). Upscale to the input size and use as the foreground alpha.

Usage (Android, LiteRT CompiledModel)

val model = CompiledModel.create(
    context.assets, "u2net_fp16.tflite",
    CompiledModel.Options(Accelerator.GPU), null
)
val inputs = model.createInputBuffers()
val outputs = model.createOutputBuffers()
inputs[0].writeFloat(nchwFloatArray)   // [1,3,320,320]
model.run(inputs, outputs)
val mask = outputs[0].readFloat()      // [1,1,320,320] in [0,1]

A complete Android sample (live camera + gallery background removal) is available in google-ai-edge/litert-samples.

Performance

~147 ms / frame on a Pixel 8a (Tensor G3, Mali) GPU.

Conversion notes

Converted with litert-torch (full U2NET, 44M params) and float16-quantized with ai-edge-quantizer. Verified: all ops GPU-native, output correlation = 1.0 vs the PyTorch reference (FP32), ~0.9999 for the FP16 build.

License & attribution

This is a format conversion of the official U²-Net weights (no architectural changes); all credit to the original authors.

Downloads last month: -