In this paper, a robust visual tracking system by utilizing the images acquired from a color camera and a thermal camera is proposed to track the target with real-time performance. The thermal camera, which can observe the heat originated from the target such as the human body or vehicle, can collaborate with the color camera to track the target in the cluttered environment or under occlusion. Unlike the general tracking by using the color camera and the thermal camera, which simply verifies the target hypotheses in these two kinds of image domains, a sampling multiple importance resampling scheme is proposed here to efficiently generate the hypotheses and verify them. The better hypotheses in the color and thermal images are selected to evaluate the sparse appearance representation such that the target under severe occlusion can be identified with real-time performance. Using this resampling scheme, the diversity and the convergency are simultaneously considered by adaptively fusing the hypotheses in the color and thermal images.