Abstract:
Objectives Image matching is a basic step in many vision applications.Aiming at the challenging problem in image matching that the consistent contents between a matched image pair generally occupy much less regions, we convert the whole image matching to local object matching. And the Siamese structure based feature extraction network is employed to produce the discriminative features for local objects. The proposed feature could effectively match the consistent objects between image pairs, and further complete the whole image matching task.
Methods We solve the image matching task by matching the consistent objects within image pairs. The local object patches are firstly detected based on edge boxes algorithm, and the proper objects, which satisfy the inputs of the feature extraction network, are selected according to the size of the detected object patches. Then the feature extraction network is constructed based on the Siamese structure, and the network is trained based on comparison mechanism.The training process makes the feature distances of the consistent objects close with each other, and those of the inconsistent objects far from each other.Thus, the object features could effectively match the consistent objects and distinguish the inconsistent. Finally the object distances are calculated to constitute the similarity matrix, and the consistent objects are detected based on the mutual matching mechanism, the image matching task is ultimately completed according to the number of the consistent objects.
Results We predict whether or not two images are matched by detecting the consistent objects between the image pair, and the core of the whole method is to construct the discriminative features for local objects. The experimental results demonstrate that the proposed Siamese structure based feature extraction network is capable of producing the high representative features, which could effectively match the consistent objects and distinguish the inconsistent objects. Comparing with the existing networks, the proposed feature extraction network could achieve better matching performance on the test datasets.In the image matching experiments, the proposed method could outperform the other approaches.In addition, the proposed method only describes the critical objects within images, which greatly reduces the data volume.Thus, it could achieve high efficiency.
Conclusions The core of image matching is to decide whether two images contain the consistent objects, and the background contents in two images are useless in image matching practice.However, the existing methods generally describe the overall image as a whole, and all the image contents need to be process, which increases the data volume and limits the matching performance.Instead of representing the whole image directly, we convert the image matching to local object matching.If there exist several consistent objects between an image pair, the images can be predicted as matched. Only the critical objects within images are described, the irrelevant contents, out of the object regions, is not involved in the object matching, which actually elements the impact of the inconsistent contents, and make the object matching more accurate. As the useless contents are not described, the computation cost is greatly reduced, and the high matching efficiency could be achieved. The experimental results indicate that the local object based method could effectively solve the image matching task with high efficiency.