GasTwinFormer: A Hybrid Vision Transformer for Livestock Methane Emission Segmentation and Dietary Classification in Optical Gas Imaging

Toqi Tahamid Sarker1, Mohamed Embaby2, Taminul Islam1, Amer AbuGhazaleh1, Khaled R Ahmed1

1Southern Illinois University Carbondale, USA     2University of California, Davis, USA
{toqitahamid.sarker, taminul.islam, aabugha, khaled.ahmed}@siu.edu, memaby@ucdavis.edu
ICCV 2025

Method Summary

  • Mix Twin Encoder: Alternates spatially-reduced global attention with locally-grouped self-attention to balance global context and local detail.
  • LR-ASPP Decoder: Lightweight multi-scale aggregation using encoder features (F1+F2+F3) for precise plume boundaries at real-time speed.
  • Multi-task Head: Joint methane segmentation and dietary classification in one network.
  • Optimized Design: Best attention pattern EL-EL-EL-EL, LSA window 5×5, Gaussian Plume loss for segmentation.

Parameters

3.348M

FLOPs

3.428G

Speed

114.9 FPS

Architecture

GasTwinFormer architecture

GasTwinFormer Architecture: Mix Twin Encoder alternates between Efficient Multi-head Attention (EMA) blocks with spatially-reduced global attention and Locally-grouped Self-Attention (LSA) blocks. Features from F1, F2, and F3 are aggregated by the LR-ASPP decoder for methane segmentation, while F4 feeds the dietary classification head.

Abstract

Livestock methane emissions represent 32% of human-caused methane production, making automated monitoring critical for climate mitigation strategies. We introduce GasTwinFormer, a hybrid vision transformer for real-time methane emission segmentation and dietary classification in optical gas imaging through a novel Mix Twin encoder alternating between spatially-reduced global attention and locally-grouped attention mechanisms. Our architecture incorporates a lightweight LR-ASPP decoder for multi-scale feature aggregation and enables simultaneous methane segmentation and dietary classification in a unified framework. We contribute the first comprehensive beef cattle methane emission dataset using OGI, containing 11,694 annotated frames across three dietary treatments. GasTwinFormer achieves 74.47% mIoU and 83.63% mF1 for segmentation while maintaining exceptional efficiency with only 3.348M parameters, 3.428G FLOPs, and 114.9 FPS inference speed. Additionally, our method achieves perfect dietary classification accuracy (100%), demonstrating the effectiveness of leveraging diet-emission correlations. Extensive ablation studies validate each architectural component, establishing GasTwinFormer as a practical solution for real-time livestock emission monitoring.

Results

Accuracy vs Efficiency comparison

Accuracy vs Efficiency Trade-off: GasTwinFormer (blue) achieves the highest mIoU (74.47%) with minimal parameters (3.348M) and FLOPs (3.428G), outperforming both CNN-based methods (orange) and other vision transformers in the efficiency-accuracy space.

Segmentation

mIoU: 74.47%   mF1: 83.63%

Efficiency

Params: 3.348M   FLOPs: 3.428G   FPS: 114.9

Diet Classification

Accuracy: 100%

Qualitative Comparisons

Qualitative segmentation results

Visual comparison of methane plume segmentation across three dietary treatments and seven methods. Rows show temporal progression of methane emission events. GasTwinFormer produces cleaner boundaries and better handles plume dispersion compared to baseline methods.

Video Demonstrations

Real-time methane plume segmentation results from GasTwinFormer across different dietary treatments. Yellow overlays indicate detected methane emissions from beef cattle.

High Forage (HF)

10 videos • High fiber diet with predominantly forage-based nutrition

FLIR0380

FLIR0378

FLIR0427

FLIR0384

FLIR0368

FLIR0371

Mixed Diet (MD)

5 videos • Balanced diet with mixed forage and concentrate

FLIR0388

FLIR0386

FLIR0393

FLIR0387

FLIR0434

High Grain (HG)

4 videos • High energy diet with predominantly grain-based concentrate

FLIR0394

FLIR0396

FLIR0441

FLIR0397

Dataset

Beef cattle methane emission dataset captured with a TELEDYNE FLIR Gx320 OGI camera in black-hot thermal mode, annotated for methane plume segmentation and labeled by dietary treatment.

Overview
Total frames208,149
Annotated plume frames11,694 (5.6%)
Total videos19
Animals12 beef cattle
Frame resolution640×480 (PNG)
Frame rate30 FPS
Spectral range3.2–3.4 μm
Thermal sensitivity< 15 mK
Distribution by Dietary Treatment
DietImagesVideosTrain / Val / Test
High Forage (HF)2,730 (23.4%)101,906 / 404 / 420
Mixed Diet (MD)4,658 (39.8%)53,258 / 696 / 704
High Grain (HG)4,306 (36.8%)43,013 / 644 / 649
Total11,694198,177 / 1,744 / 1,773
  • Splitting: Temporal per-video split (70%/15%/15%) to evaluate on future time points.
  • Annotations: Multi-stage pipeline combining classical processing, GasFormer refinement, and manual selection of best overlay masks.
  • Format: 8-bit grayscale PNGs (0–255) with methane plume overlays for visualization.

BibTeX

@article{sarker2025gastwinformer,
  title={GasTwinFormer: A Hybrid Vision Transformer for Livestock Methane Emission Segmentation and Dietary Classification in Optical Gas Imaging},
  author={Sarker, Toqi Tahamid and Embaby, Mohamed and Islam, Taminul and AbuGhazaleh, Amer and Ahmed, Khaled R},
  journal={arXiv preprint arXiv:2508.15057},
  year={2025}
}