GasTwinFormer: A Hybrid Vision Transformer for Livestock Methane Emission Segmentation and Dietary Classification in Optical Gas Imaging

  • Toqi Tahamid Sarker1*
  • Mohamed G Embaby2*
  • Taminul Islam1
  • Amer AbuGhazaleh2
  • Khaled R Ahmed1
1School of Computing, 2School of Agricultural Sciences — Southern Illinois University, Carbondale, USA     * equal contribution
ICCV 2025

Method Summary

  • Mix Twin Encoder: Alternates spatially-reduced global attention with locally-grouped self-attention to balance global context and local detail.
  • LR-ASPP Decoder: Lightweight multi-scale aggregation using encoder features (F1+F2+F3) for precise plume boundaries at real-time speed.
  • Multi-task Head: Joint methane segmentation and dietary classification in one network.
  • Optimized Design: Best attention pattern EL-EL-EL-EL, LSA window 5×5, Gaussian Plume loss for segmentation.

Parameters

3.348M

FLOPs

3.428G

Speed

114.9 FPS

GasTwinFormer architecture

GasTwinFormer architecture overview.

Abstract

Livestock methane emissions represent 32% of human-caused methane production, making automated monitoring critical for climate mitigation strategies. We introduce GasTwinFormer, a hybrid vision transformer for real-time methane emission segmentation and dietary classification in optical gas imaging through a novel Mix Twin encoder alternating between spatially-reduced global attention and locally-grouped attention mechanisms. Our architecture incorporates a lightweight LR-ASPP decoder for multi-scale feature aggregation and enables simultaneous methane segmentation and dietary classification in a unified framework. We contribute the first comprehensive beef cattle methane emission dataset using OGI, containing 11,694 annotated frames across three dietary treatments. GasTwinFormer achieves 74.47% mIoU and 83.63% mF1 for segmentation while maintaining exceptional efficiency with only 3.348M parameters, 3.428G FLOPs, and 114.9 FPS inference speed. Additionally, our method achieves perfect dietary classification accuracy (100%), demonstrating the effectiveness of leveraging diet-emission correlations. Extensive ablation studies validate each architectural component, establishing GasTwinFormer as a practical solution for real-time livestock emission monitoring.

Results

Segmentation

mIoU: 74.47%   mF1: 83.63%

Efficiency

Params: 3.348M   FLOPs: 3.428G   FPS: 114.9

Diet Classification

Accuracy: 100%

Qualitative Visualizations

Methane plume segmentation overlays across diets: MD (mixed diet), HG (high grain), HF (high forage). Methods: Ground Truth, BiSeNetV2, UperNet, GasFormer, SegFormer, Twins PCPVT-S, GasTwinFormer.

MD

MD GT
MD BiSeNetV2
MD UperNet
MD GasFormer
MD SegFormer
MD Twins PCPVT
MD GasTwinFormer

HG

HG GT
HG BiSeNetV2
HG UperNet
HG GasFormer
HG SegFormer
HG Twins PCPVT
HG GasTwinFormer

HF

HF GT
HF BiSeNetV2
HF UperNet
HF GasFormer
HF SegFormer
HF Twins PCPVT
HF GasTwinFormer

Dataset

Beef cattle methane emission dataset captured with a TELEDYNE FLIR Gx320 OGI camera in black-hot thermal mode, annotated for methane plume segmentation and labeled by dietary treatment.

Overview
Total frames208,149
Annotated plume frames11,694 (5.6%)
Total videos19
Animals12 beef cattle
Frame resolution640×480 (PNG)
Frame rate30 FPS
Spectral range3.2–3.4 μm
Thermal sensitivity< 15 mK
Distribution by Dietary Treatment
DietImagesVideosTrain / Val / Test
High Forage (HF)2,730 (23.4%)101,906 / 404 / 420
Mixed Diet (MD)4,658 (39.8%)53,258 / 696 / 704
High Grain (HG)4,306 (36.8%)43,013 / 644 / 649
Total11,694198,177 / 1,744 / 1,773
  • Splitting: Temporal per-video split (70%/15%/15%) to evaluate on future time points.
  • Annotations: Multi-stage pipeline combining classical processing, GasFormer refinement, and manual selection of best overlay masks.
  • Format: 8-bit grayscale PNGs (0–255) with methane plume overlays for visualization.