Search results

chapter

Visual and auditory analysis methods for speaker recognition in digital forensic

Mehmet Mehdi Karakoc, Asaf Varol

2017 International Conference on Computer Science and Engineering (UBMK) > 1113 - 1116

2017 International Conference on Computer Science and Engineering (UBMK)

In the first part of this study, the basic concepts of forensic phonetics such as voice, speech, and voice track are explained. In the second part; visual and auditory montage detection methods used in forensic phonetics, one of the lower branches of digital forensics, were examined. The most frequently used visual and auditory analysis methods have been determined by examining the literature. Then...

chapter

A novel approach to personalize the healthcare video search

Tanvir Ambekar, Vijaya Musande

2017 1st International Conference on Intelligent Systems and Information Management (ICISIM) > 212 - 216

2017 1st International Conference on Intelligent Systems and Information Management (ICISIM)

Due to the increasing growth of the web, these days Internet is broadly utilized by users to fulfill different data needs. Sometimes, more precise information related to specific streams such as Healthcare is not available on the internet that satisfies the user's information need. There is a specific category of users such as doctors who really interested in the videos related to disease diagnosis...

chapter

Dynamic gaze analysis: An application enviroment for face-to-face communication

Ulku Arslan Aydin, Sinan Kalkan, Cengiz Acarturk

2017 International Artificial Intelligence and Data Processing Symposium (IDAP) > 1 - 6

2017 International Artificial Intelligence and Data Processing Symposium (IDAP)

Gaze analysis in dynamic environments has remained an unresolved problem due to the complexities that pertain to the detection and tracking of objects in the visual environment. This study provides a solution to the problem for face-to-face communication, in which the visual objects in the environment are faces. The application that has been developed for this purpose is able to detect and track faces...

chapter

Issues in Visualizing Intercultural Dialogue Using Word2Vec and t-SNE

Heeryon Cho, Sang Min Yoon

2017 International Conference on Culture and Computing (Culture and Computing) > 149 - 150

2017 International Conference on Culture and Computing (Culture and Computing)

One way to visualize an intercultural dialogue is to plot keywords jointly used by the intercultural speakers to see how the keywords locate relatively to each other, with the position of the keywords signifying some kind of a similarity relationship. We processed a Japanese transcription of a Korean-Japanese dialogue using Word2Vec and t-SNE algorithm to generate various 2D plots of the noun words...

chapter

Applying audio description for context understanding of surveillance videos by people with visual impairments

Virginia Pinto Campos, Luiz Marcos G. Goncalves, Tiago Maritan U. de Araujo

2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) > 1 - 5

2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

Use of surveillance cameras as a monitoring tool for home environments, elderly, and children has becoming a common practice. However, people with visual impairments have difficulties in using this kind of device because it relies only on visual information. Towards solving this problem, this work aims to propose a solution that combines deep learning techniques for object recognition in the video...

chapter

Psychomotor cues for depression screening

Zafi Sherhan Shah, Kirill Sidorov, David Marshall

2017 22nd International Conference on Digital Signal Processing (DSP) > 1 - 5

2017 22nd International Conference on Digital Signal Processing (DSP)

Depression is a cognitive impairment, which according to the World Health Organisation is the leading cause of disability worldwide. One key trait of depression is psychomotor retardation, which adversely affects both emotional and physical behaviour of an individual. In this paper we perform experiments on the Audio Visual Emotion recognition Challenge 2016 — Depression Classification sub-Challenge...

chapter

Multi-way regression for age prediction exploiting speech and face image information

Evangelia Pantraki, Constantine Kotropoulos

2017 25th European Signal Processing Conference (EUSIPCO) > 2196 - 2200

2017 25th European Signal Processing Conference (EUSIPCO)

In this paper, the problem of age estimation is addressed based on two modalities: speech utterances and speakers' face images. The proposed age estimation framework employs the Shifted Covariates REgression Analysis for Multi-way data (SCREAM) model, which combines Parallel Factor Analysis 2 and Principal Covariates Regression. SCREAM is able to extract a few latent variables from multi-way data...

chapter

Hey robot, why don't you talk to me?

Hwei Geok Ng, Paul Anton, Marc Brugger, Nikhil Churamani, more

2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) > 728 - 731

2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)

This paper describes the techniques used in the submitted video presenting an interaction scenario, realised using the Neuro-Inspired Companion (NICO) robot. NICO engages the users in a personalised conversation where the robot always tracks the users' face, remembers them and interacts with them using natural language. NICO can also learn to perform tasks such as remembering and recalling objects...

chapter

Establishment of Indonesian viseme sequences using hidden Markov model based on affection

Endang Setyati, Oki Susandono, Lukman Zaman, Yuliana Melita Pranoto, more

2017 International Seminar on Intelligent Technology and Its Applications (ISITIA) > 275 - 280

2017 International Seminar on Intelligent Technology and Its Applications (ISITIA)

Every language has different characteristics, one of which is how to pronounce the language. Pronunciation accompanied by emotional expression are increasingly making different characteristics. This research proposes the establishment of natural Indonesian viseme order influenced by the expression of emotion. This system converts the text input of an Indonesian sentence into a sequence Indonesian...

chapter

A human workload assessment algorithm for collaborative human-machine teams

Jamison Heard, Caroline E. Harriott, Julie A. Adams

2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) > 366 - 371

2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)

Mass casualty events caused by a biological weapon require fully capable first response teams. However, human first responders are equipped with protective gear, which limits their capabilities to complete tasks. Robots can be employed to work collaboratively with the first responders in order to augment the human's reduced abilities. The robot needs to understand and adapt to the human's workload...

chapter

Quest like form of MOOC organizing

Alexey Lagunov, Nadejda Podorojnyak

2017 16th International Conference on Information Technology Based Higher Education and Training (ITHET) > 1 - 5

2017 16th International Conference on Information Technology Based Higher Education and Training (ITHET)

Modern students have been brought up on the literature in the style of “fantasy” and computer games, so reading scientific literature and watching science films seems boring to modern students. To solve the problem of increasing students' interest in learning we offer a MOOC in form of “Quest”-style.

chapter

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3337 - 3345

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail. While one new captioning approach, dense captioning, can potentially describe images in finer levels of detail by captioning many regions within an image, it in turn is unable to...

chapter

The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives

Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6478 - 6487

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Visual narrative is often a combination of explicit information and judicious omissions, relying on the viewer to supply missing details. In comics, most movements in time and space are hidden in the gutters between panels. To follow the story, readers logically connect panels together by inferring unseen actions through a process called closure. While computers can now describe the content of natural...

chapter

Visual speech synthesis from 3D mesh sequences driven by combined speech features

Felix Kuhnke, Jorn Ostermann

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1075 - 1080

2017 IEEE International Conference on Multimedia and Expo (ICME)

Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature...

chapter

Automated Screening of Job Candidate Based on Multimodal Video Processing

Jelena Gorbova, Iiris Lusi, Andre Litvin, Gholamreza Anbarjafari

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) > 1679 - 1685

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

The selection of adequate job candidates is very long and challenging process for each employer. The system presented in this paper is aiming to decrease the time for candidate selection on the pre-employment stage using automatic personality screening based on visual, audio and lexical cues from short video-clips. The system is build to predict candidate scores of 5 Big Personality Traits and to...

chapter

Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video

Robert Bolles, J. Brian Burns, Martin Graciarena, Andreas Kathol, more

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) > 1907 - 1914

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

This paper is part of a larger effort to detect manipulations of video by searching for and combining the evidence of multiple types of inconsistencies between the audio and visual channels. Here, we focus on inconsistencies between the type of scenes detected in the audio and visual modalities (e.g., audio indoor, small room versus visual outdoor, urban), and inconsistencies in speaker identity tracking...

chapter

Evaluation of touchscreen assistive technology for visually disabled users

Berglind Fjola Smaradottir, Santiago Gil Martinez, Jarle Audun Haland

2017 IEEE Symposium on Computers and Communications (ISCC) > 248 - 253

2017 IEEE Symposium on Computers and Communications (ISCC)

Touchscreen assistive technology is designed to support speech interaction between visually disabled people and mobile devices, allowing the use of a choreography of gestures to interact with a touch user interface. This paper presents the evaluation of VoiceOver, a screen reader in Apple Inc. products, made in the research project Visually impaired users touching the screen- A user evaluation of...

chapter

Improving acoustic modeling using audio-visual speech

Ahmed Hussen Abdelaziz

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1081 - 1086

2017 IEEE International Conference on Multimedia and Expo (ICME)

Reliable visual features that encode the articulator movements of speakers can dramatically improve the decoding accuracy of automatic speech recognition systems when combined with the corresponding acoustic signals. In this paper, a novel framework is proposed to utilize audio-visual speech not only during decoding but also for training better acoustic models. In this framework, a multi-stream hidden...

chapter

Extracting emotions from speech using a bag-of-visual-words approach

Evaggelos Spyrou, Theodoros Giannakopoulos, Dimitrios Sgouropoulos, Michalis Papakostas

2017 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP) > 80 - 83

2017 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP)

Recognition of humans' emotions may be crucial in certain applications involving e.g., human-computer interaction, monitoring of elderly, understanding the affective state of learners during a course etc. To this goal and depending on the application and the environment, one may use physiological parameters (e.g., heart rate, brain activity etc.) which are typically obtrusive, or analyze other modalities...

chapter

Novel Method for Storyboarding Biomedical Videos for Medical Informatics

Sema Candemir, Sameer Antani, Zhiyun Xue, George Thoma

2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS) > 127 - 132

2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)

We propose a novel method for developing static storyboard for video clips included with biomedical research literature. The technique uses both visual and audio content in the video to select candidate key frames for the storyboard. From the visual channel, the Intra-frames are extracted using FFmpeg tool. IBM Watson speech-to-text service is used to extract words from the audio channel, from which...

INFONA - science communication portal

Search results

Visual and auditory analysis methods for speaker recognition in digital forensic

A novel approach to personalize the healthcare video search

Dynamic gaze analysis: An application enviroment for face-to-face communication

Issues in Visualizing Intercultural Dialogue Using Word2Vec and t-SNE

Applying audio description for context understanding of surveillance videos by people with visual impairments

Psychomotor cues for depression screening

Multi-way regression for age prediction exploiting speech and face image information

Hey robot, why don't you talk to me?

Establishment of Indonesian viseme sequences using hidden Markov model based on affection

A human workload assessment algorithm for collaborative human-machine teams

Quest like form of MOOC organizing

A Hierarchical Approach for Generating Descriptive Image Paragraphs

The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives

Visual speech synthesis from 3D mesh sequences driven by combined speech features

Automated Screening of Job Candidate Based on Multimodal Video Processing

Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video

Evaluation of touchscreen assistive technology for visually disabled users

Improving acoustic modeling using audio-visual speech

Extracting emotions from speech using a bag-of-visual-words approach

Novel Method for Storyboarding Biomedical Videos for Medical Informatics

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options