Presented paper explains general purpose approach to the parallel pixel processing on GPU. It presents essential dataset structuring, correct type assignment and kernel configuration for CUDA application interface. Paper also explains data movement and optimal computation saturation. Transfers are also analyzed in correlation with the computation especially for the embarrassingly parallel problem. Paper defines possible pitfalls of large dataset transfers and low computation intensity. The list of the optimization techniques, used for the pixel processing on GPU, is also included.