Adaptive Loop Filter (ALF) has been developed lately to improve the video coding performance. It is inserted between deblocking and inter-prediction, which makes deblocking and ALF very time-critical because they are conducted sequentially. In this paper, we propose an efficient architecture integrating deblocking and ALF for the decoder. The architecture not only implements deblocking and ALF in parallel but also reduces area cost as much as possible. These are achieved by shared hybrid organized memory architecture and one-block-two-edge parallel strategy using a novel filter order. The proposed architecture is implemented in verilog HDL and can achieve real-time decoding for 1080p @ 30 fps applications by working at 211MHz in a Xilinx Virtex-5 FPGA.