Video summarization is an important multimedia task for applications such as video indexing and retrieval, video surveillance, human-computer interaction and video "storyboarding". In this paper, we present a new approach for automatic summarization of video collections that leverages a structured minimum-risk classifier and efficient submodular inference. To test the accuracy of the predicted summaries we utilize a recently-proposed measure (V-JAUNE) that considers both the content and frame order of the original video. Qualitative and quantitative tests over two action video datasets - the ACE and the MSR DailyActivity3D datasets - show that the proposed approach delivers more accurate summaries than the compared minimum-risk and syntactic approaches.