Abstract:
Objectives: High-precision subsurface velocity inversion is a core objective in seismic exploration, directly determining the resolution of geological interpretation and structural imaging. The traditional full waveform inversion (FWI) methods mathematically cast this as a partial differential equation (PDE)-constrained optimization problem. While theoretically highly accurate, conventional FWI suffers from severe computational bottlenecks, intense reliance on highly precise initial velocity models, and susceptibility to cycle-skipping, which often traps the inversion in local minima. Recently, pure data-driven deep learning approaches have been introduced to bypass these bottlenecks by establishing direct non-linear mappings between seismic data and velocity fields. However, these networks inherently lack physical constraints, struggle with the severe scarcity of high-quality, large-scale labeled field datasets, and exhibit poor generalization capabilities across complex, unseen geological structures. To bridge the gap between rigorous geophysical principles and efficient deep learning, this study proposes a novel physics-informed self-adaptive convolutional inversion operator network framework (PISC_InvONet). The primary objective is to design a robust, computationally efficient hybrid inversion paradigm that fuses data-driven representation learning with physics-guided forward modeling constraints. By doing so, this research aims to eliminate the severe dependency on highly accurate initial models, alleviate the demand for massive labeled datasets, and significantly enhance the accuracy, adaptability, and generalization of seismic velocity inversion in highly complex geological environments.
Methods: The proposed PISC_InvONet framework is fundamentally built upon a deep operator network architecture, shifting the paradigm from traditional finite-dimensional numerical mapping to infinite-dimensional functional space mapping. The core backbone is the self-adaptive convolutional inversion operator network (SC_InvONet), which utilizes a dual-branch architecture consisting of a branch network for seismic data processing and a trunk network for acquisition geometry modeling. To effectively process the multiscale nature of spatiotemporal seismic records, the branch network ingests discretized seismic waveform data and employs an adaptive dynamic residual convolutional module. Unlike fixed convolutional kernels, this deformable convolution dynamically adjusts its receptive field to align with the kinematic properties of wave propagation, effectively suppressing random noise. Additionally, a self-attention mechanism integrated with Haar wavelet downsampling is employed to capture long-range spatial dependencies and extract implicit structural features within the data. To address the generalizability issues caused by varying acquisition geometries, the trunk network explicitly models the spatial coordinates of the seismic sources. The features from the branch and trunk are subsequently fused via a dot-product operation, enabling the model to adaptively handle data acquired under different source configurations without retraining. The training process employs a two-stage hybrid strategy. First, SC_InvONet is pre-trained on available datasets using a combination of L1, L2, and multi-scale structural similarity (MS-SSIM) loss functions to establish a robust baseline mapping. Subsequently, transfer learning is leveraged to initialize the PISC_InvONet framework, followed by a fine-tuning stage driven by a physics-guided forward operator. Using a 2D acoustic wave equation solved via an 8th-order spatial finite-difference method with perfectly matched layer boundaries, the predicted velocity fields are forward-modeled to generate synthetic seismic data. The residual between this synthetic data and the true observed data acts as a physics-guided loss, strictly enforcing geophysical consistency and iteratively updating the network parameters.
Results: The proposed method was rigorously evaluated across three datasets of increasing complexity: CurveVel-B (synthetic folds), Simulate-SEG-Salt (complex salt domes), and the highly complex Marmousi benchmark model. Ablation studies on the architecture demonstrated the critical role of the trunk network; when subjected to a 20-meter source perturbation, the baseline model lacking source encoding suffered severe structural distortion, whereas SC_InvONet maintained high fidelity, reducing mean absolute error (MAE) and mean squared error (MSE) by approximately 64% and 79%, respectively. Furthermore, comparisons between the purely data-driven SC_InvONet and the physically constrained PISC_InvONet revealed that the integration of the forward modeling operator effectively corrects non-physical artifacts. For steep and complex fold boundaries where data-driven models blurred geological interfaces, PISC_InvONet delivered sharp, high-fidelity reconstructions by confining the solution space to physically plausible wave dynamics. In cross-dataset generalization tests pre-training on Simulate-SEG-Salt and testing on Marmousi, PISC_InvONet successfully recovered complex fault networks, steep dipping layers, and deep high-velocity unconformities without requiring any meticulously smoothed initial velocity field, unlike traditional FWI which diverged catastrophically under similar initial conditions. The framework also demonstrated exceptional resilience against degraded data quality. Even under the simultaneous injection of intense Gaussian noise and extensive random missing traces, PISC_InvONet successfully preserved the macroscopic geological structures and ensured the spatial continuity of subterranean boundaries, with the MAE remaining tightly controlled primarily around sharp structural transitions rather than diverging globally.
Conclusions: The PISC_InvONet framework successfully establishes a hybrid inversion paradigm by tightly coupling neural inversion operators with rigorous physical constraints. The introduction of dynamic self-attention convolutions and explicit source location encoding significantly enhances the extraction of multi-scale waveform features and the adaptability to varying acquisition geometries. By employing a two-stage pre-training and physics-guided fine-tuning strategy, the framework fundamentally mitigates the traditional FWI's intense reliance on high-quality initial models and large-scale, perfect datasets. Extensive experiments confirm that PISC_InvONet provides superior structural fidelity, robust noise resistance, and excellent cross-domain generalization capabilities. This method offers a highly stable, computationally efficient, and geophysically consistent solution for large-scale practical seismic imaging tasks, paving a promising technical pathway for the future evolution of 3D and elastic wave physics-informed inversion algorithms.