Artificial Neural Networks are a widely used computing system implemented for a wide variety of tasks and problems. A common application of such networks is classification problems. However, a significant amount of this research focuses on one and two-dimensional information, such as vectorized data and images. There is limited research performed on three-dimensional media such as video clips. This can be attributed to a lack of adequate resources, available training datasets, hardware constraints, and appropriate frameworks for implementing such networks. This paper attempts to provide an alternate methodology of feeding three-dimensional video data by preprocessing instead of directly inputting to a deep convolutional neural network. By taking sequential segments from multiple frames of a single video clip and combining them into a single image, the temporal dimension of the video can be encoded as a two-dimensional image. This process is called as temporal slicing and repeated for the entire spatial dimension of the video. The end result is spatio-temporal data encoded in a spatial format, which is then propagated through a convolutional neural network as image data. This method is less resource-intensive and is remarkably faster than pre-existing three-dimensional convolutional methods, while achieving significantly higher accuracy compared to the aforementioned network architectures.