The increasing resolutions combined with storage and processing limitations of mobile devices point to the need for new compression techniques for video coding. Meanwhile, to achieve higher compression rates without compromising quality, the coding process becomes more and more complex. In reference software of HEVC the most time consuming step is the execution of Motion Estimation (ME), which is responsible for 40% of the total execution time. The ME consists of a Block Matching Algorithm (BMA) which searches within a candidate set for one that minimizes a cost function. In this work, we propose a hardware architecture to execute BMA based on the Sum of Absolute Transformed Differences (SATD) to compute the cost function. The architecture uses a criterion to eliminate candidates without compromising the equivalent result of an exhaustive search within all candidates. The first implementation, used for 8×12 and 12×8 block sizes, is more energy efficient than exhaustive search architecture with 18.6% of eliminations. The second one, used for 12×16 and 16×12 block sizes, requires more eliminations, demanding 34% of eliminations to be more energy efficient than the exhaustive search.