After years of extensive studies, the salient motion detection problem has gained plausible performance improvement that was primarily propelled by the rapid development of self-adaptive top-down modeling techniques. Nevertheless, almost all the conventional solutions are still not robust enough to handle video sequences captured by hand-hold cameras. This is mainly due to the absence of the position alignment information that is indispensable for top-down background modeling. In contrast, the bottom-up video saliency detection methods, though achieving excellent salient motion detection in either stationary or nonstationary videos, still have rather poor detection performance in scenarios with massive dynamic background. In this letter, we explore a bottom-up saliency framework by introducing a novel spatial-temporal regional filter method to handle the dynamic background problem. Our key rationale is to assign large saliency value to those regions with stable spatial-temporal coherency while eliminating irregular, repeating dynamic background. As far as we know, this is the first work to address the dynamic background problem from the perspective of the bottom-up video saliency. We conduct massive quantitative evaluations over public available benchmarks to validate the effectiveness and robustness of our method.