Advanced search

From:

To:

Items from 1 to 20 out of 26 results

chapter

Spoken term detection from noisy input

G Gosztolya, G Kovacs, L Toth

2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI) > 91 - 96

2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI)

The aim of the spoken term detection task is to find the occurrence of user-entered keywords in an archive of audio recordings. The kind of techniques that are used usually are vocabulary-independent, using only the acoustic information available. In this scenario, however, we rely exclusively on the acoustic model, which is a drawback when it is unreliable; for example when the input is noisy. In...

chapter

Audio keywords detection in basketball video

Chunyan Zeng, Weibei Dou

2010 International Conference on Audio, Language and Image Processing > 1765 - 1770

2010 International Conference on Audio, Language and Image Processing (ICALIP)

This paper presents an audio keywords detection method for highlight retrieval in basketball video. The keywords contain shoes squeaking sound, speech, cheer, long whistle and short whistle, which correspond to basketball game events. After feature analysis, the Simple Excellent Feature Combination based on Pearson Correlation Coefficient (SEFC-PCC) is used to select efficient features, which contributes...

chapter

Emotions analysis of speech for call classification

Esraa Ali Hassan, Neamat El Gayar, M Ghanem Moustafa

2010 10th International Conference on Intelligent Systems Design and Applications > 242 - 247

10th International Conference on Intelligent Systems Design and Applications (ISDA 2010)

Most existing research in the area of emotions recognition has focused on short segments or utterances of speech. In this paper we propose a machine learning system for classifying the overall sentiment of long conversations as being Positive or Negative. Our system has three main phases, first it divides a call into short segments, second it applies machine learning to recognize the emotion for each...

chapter

Multi-talker speech recognition under ego-motion noise using Missing Feature Theory

G Ince, K Nakadai, T Rodemann, H Tsujino, more

2010 IEEE/RSJ International Conference on Intelligent Robots and Systems > 982 - 987

2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010)

This paper presents a system that gives a mobile robot the ability to recognize target speaker's speech, even if the robot performs an action and there are multiple speakers talking in the room. Associated problems to this system are twofold: (1) While the robot is moving, the joints inevitably generate ego-motion noise due to its motors. (2) Recognizing target speech against other interfering speech...

chapter

Visualized Feature Fusion and Style Evaluation for Musical Genre Analysis

Qingjun Yao, Haifeng Li, Jiayin Sun, Lin Ma

2010 First International Conference on Pervasive Computing, Signal Processing and Applications > 883 - 886

2010 First International Conference on Pervasive Computing, Signal Processing and Applications (PCSPA 2010)

Different kinds of features in time domain, spectral domain and cepstral domain are used for musical genre classification. In this paper, through the fusion of short-term timbral features and long-term rhythmic feature, we propose a novel method where: musical genre vector is constructed using the likelihood ratio of GMM (Gaussian Mixture Model) and radar chart is applied to provide visualized style...

chapter

Auditory Features Revisited for Robust Speech Recognition

F Kelly, N Harte

2010 20th International Conference on Pattern Recognition > 4456 - 4459

2010 20th International Conference on Pattern Recognition (ICPR 2010)

Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC)...

chapter

Classification of Audio Data Using a Centroid Neural Network

Dong-Chul Park

2010 International Conference on Information Science and Applications > 1 - 6

2010 International Conference on Information Science and Applications (ICISA 2010)

The automatic classification of audio data is an effective way to organize a large-scale audio data files. In this paper, an automatic content-based audio classification model using Centroid Neural Networks (CNN) with a Divergence Measure is proposed. The Divergence-based Centroid Neural Network (DCNN) algorithm, which employs the divergence measure as its distance measure, is used for clustering...

chapter

Background music identification through content filtering and min-hash matching

Chih-Yi Chiu, Dimitrios Bountouridis, Ju-Chiang Wang, Hsin-Min Wang

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 2414 - 2417

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

A novel framework for background music identification is proposed in this paper. Given a piece of audio signals that mixes background music with speech/noise, we identify the music part with source music data. Conventional methods that take the whole audio signals for identification are inappropriate in terms of efficiency and accuracy. In our framework, the audio content is filtered through speech...

chapter

Are you Awerewolf? Detecting deceptive roles and outcomes in a conversational role-playing game

Gokul Chittaranjan, Hayley Hung

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5334 - 5337

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper addresses the task of automatically detecting outcomes of social interaction patterns, using non-verbal audio cues in competitive role-playing games (RPGs). For our experiments, we introduce a new data set which features 3 hours of audio-visual recordings of the popular “Are you a Werewolf?” RPG. Two problems are approached in this paper: Detecting lying or suspicious behavior using non-verbal...

chapter

Speaker independent visual-only language identification

J L Newman, S J Cox

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5026 - 5029

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

We describe experiments in visual-only language identification (VLID), in which only lip shape, appearance and motion are used to determine the language of a spoken utterance. In previous work, we had shown that this is possible in speaker-dependent mode, i.e. identifying the language spoken by a multi-lingual speaker. Here, by appropriately modifying techniques that have been successful in audio...

chapter

Partitioned Feature-based Classifier model

Dong-Chul Park

2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 412 - 417

2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2009)

The Partitioned Feature-based Classifier (PFC) is proposed in this paper. PFC does not use entire feature vectors extracted from the original data at once to classify each datum, but use only groups of features related to each feature vector to classify data separately. In the training stage, the contribution rate calculated from each feature vector group is drawn throughout the accuracy of each feature...

chapter

FPGA implementation of a novel far-field sound localization system

Jhing-Fa Wang, Yung-Chuan Jiang, Zheng-Wei Sun

TENCON 2009 - 2009 IEEE Region 10 Conference > 1 - 4

TENCON 2009. 2009 IEEE Region 10 Conference

Sound localization systems (SLS) identify the direction of a sound source. However, most of approaches focus on near-field identification, i.e. 1~2 m. In this paper we develop a novel algorithm for far-field sound localization based on the average magnitude difference function (AMDF), thereby extending the distance to 5 m. The far-field SLS is implemented on a field programmable gate array (FPGA)...

chapter

Speech/Music Classification Using Occurrence Pattern of ZCR and STE

A. Ghosal, R. Chakraborty, R. Chakraborty, S. Haty, more

2009 Third International Symposium on Intelligent Information Technology Application > 3 > 435 - 438

2009 Third International Symposium on Intelligent Information Technology Application

With the rapid growth in audio data volume, research in the area of content-based audio retrieval has gained impetus in the last decade. Audio classification serves as the fundamental step towards it. Accuracy in classifying data relies on the strength of the features and on the efficacy of classification scheme. In this work, we have focused on the features only. We have restricted ourselves further...

chapter

Improving AMDF for pitch period detection

Kang Guangyu, Guo Shize

2009 9th International Conference on Electronic Measurement&Instruments > 4-283 - 4-286

2009 9th International Conference on Electronic Measurement & Instruments (ICEMI 2009)

Pitch period is the important parameters of speech recognition and speech synthesis. Pitch period detection has been focus in the field of audio processing research. Traditional AMDF-based algorithm and its improved version, LV-AMDF-based algorithm easily leads to the double error or half error, and so on in the pitch detection. To solve these problems, AMDF, LV-AMDF function characteristics and shortcomings...

chapter

Audio Interpolation and Stitching Detection Based on Band-Partitioning Spectral Smoothness

Ding Qi, Xu Wang, Ping Xi-jian

2009 WASE International Conference on Information Engineering > 1 > 189 - 192

2009 WASE International Conference on Information Engineering (ICIE)

In this paper, the spectral characteristics of uninterpolated and interpolated signals are analyzed, and a new audio spectral measure, band-partitioning spectral smoothness (BPSS) is proposed. For interpolated signals, the spectral smoothness in the high frequency band produced by interpolation is much smaller than in the other frequency band. And then the signal spectrum is partitioned into several...

chapter

A two phase method for general audio segmentation

J.X. Zhang, J. Whalley, S. Brooks

2009 IEEE International Conference on Multimedia and Expo > 626 - 629

2009 IEEE International Conference on Multimedia and Expo (ICME)

This paper presents a model-free and training-free two-phase method for audio segmentation that separates monophonic heterogeneous audio files into acoustically homogeneous regions where each region contains a single sound. A rough segmentation separates audio input into audio clips based on silence detection in the time domain. Then a self-similarity matrix, based on selected audio features in the...

chapter

Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models

Ya-Ti Peng, Ching-Yung Lin, Ming-Ting Sun, Kun-Cheng Tsai

2009 IEEE International Conference on Multimedia and Expo > 1218 - 1221

2009 IEEE International Conference on Multimedia and Expo (ICME)

Audio is a useful modality complement to video for healthcare monitoring. In this paper, we investigate the use of hierarchical hidden Markov models (HHMMs) for healthcare audio event classification. We show that HHMM can handle audio events with recursive patterns to improve the classification performance. We also propose a model fusion method to cover large variations often existing in healthcare...

chapter

Audio-based classification of speaker characteristics

P. Dutta, A. Haubold

2009 IEEE International Conference on Multimedia and Expo > 422 - 425

2009 IEEE International Conference on Multimedia and Expo (ICME)

The human voice is primarily a carrier of speech, but it also contains non-linguistic features unique to a speaker and indicative of various speaker demographics, e.g. gender, nativity, ethnicity. Such characteristics are helpful cues for audio/video search and retrieval. In this paper, we evaluate the effects of various low-, mid-, and high-level features for effective classification of speaker characteristics...

chapter

Low-Complexity Voice Detector for Mobile Environments

M. Ries, B. Gardlo, M. Rupp, P. De Leon

2009 16th International Conference on Systems, Signals and Image Processing > 1 - 4

2009 16th International Conference on Systems, Signals and Image Processing

Provisioning of mobile audio and video services is a difficult challenge since in the mobile environment, bandwidth and processing resources are limited. Audio content is normally present in most multimedia services, however, the user expectation of perceived audio quality differs for speech and nonspeech content. Therefore, automatic voice or speech detection is needed in order to maximize perceived...

chapter

On-Line Speech/Music Segmentation for Broadcast News Domain

M. Kos, M. Grasic, D. Vlaj, Z. Kacic

2009 16th International Conference on Systems, Signals and Image Processing > 1 - 4

2009 16th International Conference on Systems, Signals and Image Processing

This paper presents novel feature-group for on-line speech/music segmentation for broadcast news domain. The features are based on mel-frequency cepstral coefficients variance (MFCCV). The idea behind the feature-group construction is the energy variation in a narrow frequency sub-band. The variation is bigger for speech than for music. For feature discrimination and segmentation ability evaluation...

Keywords:
ACCURACY
SPEECH
AUDIO SIGNAL PROCESSING

Publication date

Set your own date range

INFONA - science communication portal

Advanced search

Advanced search

Spoken term detection from noisy input

Audio keywords detection in basketball video

Emotions analysis of speech for call classification

Multi-talker speech recognition under ego-motion noise using Missing Feature Theory

Visualized Feature Fusion and Style Evaluation for Musical Genre Analysis

Auditory Features Revisited for Robust Speech Recognition

Classification of Audio Data Using a Centroid Neural Network

Background music identification through content filtering and min-hash matching

Are you Awerewolf? Detecting deceptive roles and outcomes in a conversational role-playing game

Speaker independent visual-only language identification

Partitioned Feature-based Classifier model

FPGA implementation of a novel far-field sound localization system

Speech/Music Classification Using Occurrence Pattern of ZCR and STE

Improving AMDF for pitch period detection

Audio Interpolation and Stitching Detection Based on Band-Partitioning Spectral Smoothness

A two phase method for general audio segmentation

Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models

Audio-based classification of speaker characteristics

Low-Complexity Voice Detector for Mobile Environments

On-Line Speech/Music Segmentation for Broadcast News Domain

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options