GasTwinFormer: Livestock Methane Emission Segmentation and Diet Classification

Method Summary

Mix Twin Encoder: Alternates spatially-reduced global attention with locally-grouped self-attention to balance global context and local detail.
LR-ASPP Decoder: Lightweight multi-scale aggregation using encoder features (F1+F2+F3) for precise plume boundaries at real-time speed.
Multi-task Head: Joint methane segmentation and dietary classification in one network.
Optimized Design: Best attention pattern EL-EL-EL-EL, LSA window 5×5, Gaussian Plume loss for segmentation.

Parameters

3.348M

FLOPs

3.428G

Speed

114.9 FPS

GasTwinFormer architecture overview.

Abstract

Livestock methane emissions represent 32% of human-caused methane production, making automated monitoring critical for climate mitigation strategies. We introduce GasTwinFormer, a hybrid vision transformer for real-time methane emission segmentation and dietary classification in optical gas imaging through a novel Mix Twin encoder alternating between spatially-reduced global attention and locally-grouped attention mechanisms. Our architecture incorporates a lightweight LR-ASPP decoder for multi-scale feature aggregation and enables simultaneous methane segmentation and dietary classification in a unified framework. We contribute the first comprehensive beef cattle methane emission dataset using OGI, containing 11,694 annotated frames across three dietary treatments. GasTwinFormer achieves 74.47% mIoU and 83.63% mF1 for segmentation while maintaining exceptional efficiency with only 3.348M parameters, 3.428G FLOPs, and 114.9 FPS inference speed. Additionally, our method achieves perfect dietary classification accuracy (100%), demonstrating the effectiveness of leveraging diet-emission correlations. Extensive ablation studies validate each architectural component, establishing GasTwinFormer as a practical solution for real-time livestock emission monitoring.

Results

Segmentation

mIoU: 74.47% mF1: 83.63%

Efficiency

Params: 3.348M FLOPs: 3.428G FPS: 114.9

Diet Classification

Accuracy: 100%

Qualitative Visualizations

Methane plume segmentation overlays across diets: MD (mixed diet), HG (high grain), HF (high forage). Methods: Ground Truth, BiSeNetV2, UperNet, GasFormer, SegFormer, Twins PCPVT-S, GasTwinFormer.

MD

HG

HF

Dataset

Beef cattle methane emission dataset captured with a TELEDYNE FLIR Gx320 OGI camera in black-hot thermal mode, annotated for methane plume segmentation and labeled by dietary treatment.

Overview
Total frames	208,149
Annotated plume frames	11,694 (5.6%)
Total videos	19
Animals	12 beef cattle
Frame resolution	640×480 (PNG)
Frame rate	30 FPS
Spectral range	3.2–3.4 μm
Thermal sensitivity	< 15 mK

Distribution by Dietary Treatment
Diet	Images	Videos	Train / Val / Test
High Forage (HF)	2,730 (23.4%)	10	1,906 / 404 / 420
Mixed Diet (MD)	4,658 (39.8%)	5	3,258 / 696 / 704
High Grain (HG)	4,306 (36.8%)	4	3,013 / 644 / 649
Total	11,694	19	8,177 / 1,744 / 1,773

Splitting: Temporal per-video split (70%/15%/15%) to evaluate on future time points.
Annotations: Multi-stage pipeline combining classical processing, GasFormer refinement, and manual selection of best overlay masks.
Format: 8-bit grayscale PNGs (0–255) with methane plume overlays for visualization.

GasTwinFormer: A Hybrid Vision Transformer for Livestock Methane Emission Segmentation and Dietary Classification in Optical Gas Imaging