This paper presents an ultra-low power acoustic sensing and object recognition microsystem for Internet of Things applications. The microsystem is targeted for unattended ground sensor nodes where long-term (decades) life time is desired without the need for battery replacement. The system incorporates an microelectromechanical systems microphone as a frontend sensor along with active circuitry to identify target objects. We introduce an algorithm-circuit cross optimization to realize a 12-nW stand-alone microsystem that integrates the analog frontend with the digital backend signal classifier. The frequency-domain analysis of target audio signals reveals that the system can operate with a relatively low bandwidth (<500 Hz) and SNR (>3 dB) which significantly relaxes power constraints on both analog frontend and digital backend circuits. To further relax the current requirement of the preceding amplifier, we propose an 8-bit SAR-analog-to-digital converter that is designed to have a highly reduced sampling capacitance (<50 fF). For the digital backend, we propose a feature extractor using the serialized tones-of-interest discrete Fourier transform, replacing a conventional high-power/area-consuming parallel feature extraction using the fast Fourier transform. This approach reduces area and thus leakage power which often dominates the overall power consumption. The proposed system successfully identifies a number of target objects including an electrical generator, a small car, and a truck with >95% reliability and consumes only 12 nW with continuous monitoring.