|
Abstracts
14:00 – 15:30 |
Opening Session & keynote speakers Workshop Chair: Catherine
Quinquis (France Telecom, France)
|
Prof Jens Blauert, Professor Emeritus (University of Bochum,
Germany); Models of the Binaural Hearing System: The Precedence Effect
MODELS OF THE BINAURAL HEARING SYSTEM
(co-authored by Jonas Braasch)
– Prominent Features of Binaural Hearing
– Architecture of a Model of Binaural Hearing
– The Jeffress Processor
– The Lindemann/Gaik Extensions
– Interpreting Binaural Activity
– The Effect of Interaural Incoherence
– Binaural Speech Enhancement
– Problems of Current Binaural Models
– Future Work
THE PRECEDENCE EFFECT
(co-authored by Jonas Braasch)
The acoustic modality is of paramount importance for human inter-individual
communication. Consequently, the human auditory system is highly
differentiated and able to perform sophisticated tasks such as the
identification, recognition and segregation of concurrent sound sources in
acoustically adverse conditions - e.g., in reverberant or noisy
environments. To this end the different stages of the system at peripheral,
sub-cortical and cortical level act in a coordinated manner.
In this lecture we take the auditory Precedence Effect as an example to
discuss the role of the different stages of the auditory system in complex
sound-localisation tasks. Further, we consider different strategies of
modelling auditory functions. |
Prof Sabine Meunier (CNRS LMA, Marseille, France)
Loudness could be defined as the subjective intensity of a sound, that means
how strong a sound seems to a listener. Although this definition seems to
link loudness with the only sound intensity, loudness depends on other
parameters of the sound. The links between loudness, sound pressure level,
frequency, bandwidth and duration are well known. Based on researches done
for many years, loudness models have been developed, some are now included
into standards. Nowadays, the researches focus on the loudness of
non-stationary sounds and on the effect of context. The question concerning
the psychophysical method best suited to measure loudness is still of
current event.
In this presentation, the relationship between loudness, sound pressure
level, frequency, bandwidth and duration will be shown and the questions
addressed by the current researches will be presented. |
Master of Ceremonies: Jean-Yves Monfort (France Telecom, France) |
15:30 – 16:00 |
Coffee break |
16:00 – 17:30 |
Round Table with SDOs |
« TOP »
|
|
09:00 – 10:30 |
SESSION 1: Loudness
Coordinator: Gerald Lecucq (Alcatel, France) |
Sridhar Kalluri (Starkey Hearing Research Center, USA): Effect on
sound quality of extending the bandwidth of amplification to high
frequencies in hearing-impaired listeners
While it is becoming possible for hearing aids to give a broader frequency
range of amplification than possible in the past, there is little consistent
objective evidence that a greater audible bandwidth gives perceptual benefit
to hearing-impaired (HI) listeners. This study investigates whether
extending the bandwidth of reception to high frequencies gives an
improvement of sound quality to HI listeners, as it does for normal-hearing
listeners.
We address this question by asking 10 moderate HI listeners to rate their
preference in terms of sound quality of different upper frequency limits of
amplification (4, 6, 8, 10 and 12 kHz) in paired-comparison trials. Subjects
rate the quality of 3 music samples and 1 speech sample, with all samples
selected to have significant high-frequency content. Stimuli are amplified
linearly according to a high-frequency version of the CAMEQ prescription
that compensates for the subject’s hearing loss.
Inconsistent findings from past studies regarding the efficacy of increasing
bandwidth may be due to insufficient audibility of high-frequency energy.
The problem stems from the difficulty of verifying sound levels at the ear
drum at high frequencies. The present study addresses verification of
audibility by measuring sound levels in each subject with a probe microphone
placed approximately 2-3 mm from the ear drum. The proximity of the probe
tip to the ear drum helps overcome the across-subject variability in the
levels of high-frequency components of sound due to individual differences
in ear-canal geometry. The study also verifies audibility by measuring the
ability of individual subjects to discriminate the different bandwidth
conditions for every stimulus sample used in the assessment of sound
quality.
We will discuss the results of the experiment and its implications for the
bandwidth of amplification in moderate HI listeners. |
Arnault Nagle (France Telecom, France): Assessment of audio
codecs in the context of VoIP audio conferencing: Monaural vs diotic
listening
In VoIP audio conferencing, audio rendering is usually proposed over either
handsets or headphones, which means two distinct kinds of listening
condition: monaural or diotic. The goal of our study is to determine whether
listening over the monaural or diotic condition has an impact on the
perceived quality of speech processed by VoIP codecs.
We performed two ACR tests: one in narrowband and one in wideband. Each test
had two sessions: one with a monaural listening and the other with diotic
listening. Both tests were performed using 32 different listeners, divided
into four groups of eight listeners each. The processed speech material was
presented randomly to each group, seated in an acoustically conditioned
sound room following the P.800 requirements. The speech material used was
extracted from the France Telecom french speech database. Speech database
consists of simple meaningful short sentences recorded in a quiet
environment.
The listening level was not the same between the monaural and the diotic
condition, in order to keep equivalent the loudness. It was chosen as 79 dB
SPL for the monaural condition, whereas a decrease of 10 dB SPL was applied
per channel over all the bandwidth for the diotic condition (69 dB SPL).
It is shown that the listening condition has a significant effect on the
perceived codec quality. For diotic listening, quality is judged more
severely when speech is degraded for instance by packet loss or low bit
rate. Diotic listening seems to help subjects to better discriminate
degradations. In addition, the difference of listening level between the
monaural and diotic conditions leads to hide noise defects, which points out
the potential weight of the listening level for quality evaluation.
In function of the codec, that is the degradation introduced (packet loss or
bit rate), the impact can be more or less strong resulting in shifts in
codec ranking between the two listening modes. At the opposite, in
comparison with monaural listening, diotic listening highlights the benefits
of high quality codecs. These results suggest that audio codecs should be
chosen carefully for use cases. |
10:30 – 11:00 |
Coffee break |
11:00 – 12:30 |
SESSION 2: Modelling: binaural, spatialisation
Coordinator: Thomas Sporer (Fraunhofer, Germany) |
Gunilla Berndtsson (Ericsson Research, Sweden): Creation of
test material that simulates the stereo capture of a teleconference site
We consider audio- and videoconferencing to be a key application for
wideband and super wideband stereo codecs. Hence it is crucial that these
codecs perform well on stereo audio signals captured at the participating
sites. In order to test the performance of these codecs for this key
application it is important to have good test material that is
representative of such teleconferencing sessions.
In this contribution we begin by discussing the audio scene to be captured
and its key spatial audio components, which are the reverberated signals of
the main speakers in the room and background noise which contains both
diffused components and spatially placed components such as interfering
talkers. We then proceed to discuss the captured stereo image of this audio
scene and the most important spatial characteristics that need to be
preserved in order to deliver a good stereo image of the audio scene. We
continue to describe the main stereo capture methods used and the kind of
spatial characteristics their stereo images are able to deliver.
The contribution closes by proposing several concrete audio scenes that we
feel are representative of a teleconferencing session and methods for
creating test material that simulates the proposed audio scenes. |
Peter Hughes (BT Group, UK): Conferencing with Spatial Audio
Most people take part in telephone conferences from time to time, and will
be very familiar with both their benefits (practical multi-party
communications) and drawbacks (stilted conversations, difficulty in
recognizing who is talking, frequent poor quality audio, etc). In addition
teleconferencing can be fatiguing due the telephony quality, the relatively
long duration of teleconferences compared to normal telephone calls and the
majority of time being spent listening rather than talking
Contrast this to real life, where we hear sounds from all around us and our
two ears enable us to not only locate where a sound is coming from and turn
to face it if required, but also to filter it out from background noise or
other talkers. This gives us the ability to focus in on single conversations
amongst many - the so called 'cocktail part effect'
To investigate the benefits of employing spatial sound in audio conferencing
systems a PC based SIP VoIP client called the “Senate” has been developed
with the following key features:
- PC based audio client capable of playing both streamed speech and
local sound files in a spatial environment.
- Spatial sound using HRTF based 3D audio processing or cinematic 5 channel
reproduction.
- Wideband speech using AMR-WB wideband coder
- Named talkers with graphic icons in virtual room.
- Visual indication of who is talking.
- Audio Smileys - the ability to mix sound effects and other audio into the
transmitted audio stream.
This paper will discuss a number of topics based on the Senate including the
benefits of spatial audio conferencing, extensions into the consumer market,
implementation issues including efficient network usage and some ideas for
user interfaces for both PCs and other communications devices. |
Mansoor Hyder (University of Tübingen, Germany): 3D Telephony
Telephony is a well-established and important tool for interpersonal
communication. Despite the revolutionary expansion in the use of telephony
brought about by IP-based services and mobile phones, telephony as a media
has stagnated.
The basic principle of a microphone and speaker has not changed. The major
limitation of today's telephony systems is that the location of a person
speaking cannot be identified. This adds to the problem of poor quality,
especially in multi-user scenarios. This research therefore aims to extend
telephony into the third dimension. This will enable users to locate sound
sources in space, after all our ears and perception abilities are naturally
binaural, which has not yet been exploited by the telecommunication
industry.
Building on IP based telephony, advanced codecs, and recent developments in
micro-mechanical tracking sensor technology, components are emerging making
a fully fledged 3D telephony system feasible at modest costs.
In this work we describe the design of a system using innovative 3D audio
rendering based on Uni-Verse, head tracking using MEMS sensor and an
IP-based VoIP protocol. We also will give an overview on the
work-in-progress regarding the implementation of our prototype.
This 3D phone can be used in a conferencing solution. Then, conference calls
would be more realistic because the participants could identify who is
talking by locating the origin of the sound. Also, many non verbal signs can
be heard such as head or the body movements due to changes in the acoustic
delays and echoes. |
12:30 – 14:00 |
Lunch break |
14:00 – 15:30 |
SESSION 3: Artificial Head, Ear and Mouth
Coordinator: Luc Madec (B&K, Denmark) |
Hans Gierlich (HEAD acoustics GmbH, Germany): Optimum frequency
response characteristics for Wideband Terminals
In ETSI standard ES 202 739 and ES 202 740 a new testing technique for the
measurement of wideband terminals is introduced. Tolerance masks are given
for sending and receiving frequency response characteristics. As an
important new concept in this standard, no longer the ERP but the free field
reference point is used for determining the response characteristics in
receiving direction. Nevertheless, the question is open to what extent the
frequency response tolerance masks in sending as well as in receiving
direction can be relaxed without affecting a good wideband transmission
performance.
Subjective tests have been carried out in order to derive the impact of non
optimum receiving frequency response characteristics on the perceived speech
sound quality. Different experiments are described and the results are
discussed. Based on the test results and respective frequency response
characteristics, a tolerance mask is proposed which guarantees a maximum
speech sound quality in receiving direction, assuming impairments solely
stemming from different frequency response characteristics.
In an additional set of experiments, the listening quality in sending
direction was assessed under different types of background noise, and using
different types of wideband terminals. The aim of this investigation was to
find desirable sending frequency response characteristics with and without
background noise at the near end, and to possibly give general
recommendations in case of speech with near end background noise. The
subjective experiments are introduced and the results will be discussed. |
Gaetan Lorho, David Isherwood (Nokia Corporation): Acoustic impedance characteristics of artificial ears for telephonometric use
"Artificial ears are an integral part of the audio design process for
telephony devices such as mobile phones. The mechanical and
electro-acoustical characteristics of these artificial ears should primarily
provide an overall acoustic impedance similar to that of the average human
ear over a given frequency range. This paper presents work conducted within
the ITU-T Study Group 12 to quantify the degree of similarity between human
ears and a subset of ITU-T Rec. P.57 Type 3 artificial ears with respect to
their acoustic impedance when measured using a mobile phone-like device." |
15:30 – 16:00 |
Coffee break |
16:00 – 17:30 |
SESSION 4: Terminals characteristics and teleconferencing
Coordinator: Hans Gierlich (HEAD acoustics GmbH, Germany) |
Pascal Huart (Cisco, France): User perception and end-point
characteristics
Phone acoustic characteristics can only be adjusted to a limited extent
using embedded realtime signal processing therefore the endpoint audio
performances should really be considered from the early step of the design.
The presentation intent is to cover some endpoint characteristic limitations
and end user perception of band extensions. The limitations considered are
transducers as well as the mechanical and industrial design for a typical
enterprise phone. |
Christian Hoene (University of Tübingen, Germany): An
Open-Source Softphone for Network Musical Performances
Playing musical instruments over the telephone is very demanding, because
the quality requirements are far higher than the of a normal conversation.
First, the acoustic latency or „mouth-to-ear“ delay must be about 20 ms
because acoustic waves travel in 20 ms a distance of 7 meters. Any larger
delay and distance makes it difficult for musicians to keep synchronized
[1]. Second, the transmission shall provide high-quality reproduction of
sound that is very faithful to the original.
Network music performance solutions have been present [2][3] using – for
example – the Ultra-Low-Delay codec of Fraunhofer IIS [4]. However, in
today’s world of internet telephony, softphones can be downloaded and used
for free, VoIP-to-VoIP can be made without paying any fees, and many VoIP
and SIP applications are available as open source. Thus, it will be
difficult to make network music performances a success story, if one has to
pay for the license of a patented codec.
In this work, we present our open-source softphone, in which we combine the
open-source phone „Ekiga“ [5] with the Bluetooth „SBC“ audio codec [6][7]
and a packet loss concealment algorithm based on ITU G.711 Appendix I [8].
Latter we extend to support full bandwidth. Our softphone solution allows
high-quality stereo audio at very low algorithmic delays and modest
compression rates. All algorithms are free of royalties and their source
code is available.
In addition, we show the results of subjective MUSHRA tests [9] and
objective assessment using ITU P.862.2 [10] and BS.1387-1 [11] on the
audio quality of our solution, testing the performance of the packet loss
concealment in cases of speech, singing and audio source material. We also
measure the coding performance of Bluetooth SBC varying various encoding
parameters such as number of subbands, quantization bits and compression
modes.
Finally, we conclude with an outlook on further tasks in research and
standardization. |
Hans Gierlich (HEAD acoustics GmbH, Germany): Echo perception
in wideband telecommunication scenarios
So far, subjective tests leading to echo loss requirements have been
conducted mostly in narrowband telecommunication systems. It can be assumed
that the requirements in wideband telecommunication systems may be
different, first, due to the higher quality expectation of the users and
second due to the different perception of high frequency echo components.
Furthermore one-dimensional instrumental parameters such as weighted
terminal coupling loss (TCLw) cannot adequately describe echo impairments.
In addition, frequency dependent and temporal echo impairments may have to
be taken into account.
In a subjective test different types of echo impairments introduced using an
echo simulation were investigated. The subjective testes were conducted
according to ITU-T Rec. P.831. The wideband terminal was simulated including
the typical sidetone path. The test conditions and the test procedure will
be described in detail. The results of the subjective test will be discussed
and conclusions will be drawn on the required spectral echo attenuation. |
« TOP »
|
|
09:00 – 10:30 |
SESSION 5: Test methodologies: extensions, new parameters, test signals,
calibration
Coordinator: Slawek Zielinski (University of Surrey, UK) |
Alexander Raake (Deutsche Telekom Laboratories, Germany):
Conversational speech quality of spatialized audio conferences
In previous listening tests, the advantage of a spatialized over a non-spatialized
sound rendering in multiparty audio conferencing has been proven, for
example, in terms of a higher speech intelligibility, better speaker
identification, higher focal assurance (retaining who said what in the
conference) and user preference. However, only very few studies have
addressed the potential advantages of spatial audio in an actual
conversation situation. In this presentation, we describe a conversation
test method for assessing the speech quality of audio conferences with
remote interlocuters. The method is based on a set of realistic conversation
test scenarios: The first set aims at audio conferences held in a business
context, the second set at conferences held in a private or spare time
setting; at this stage, the test scenarios are applicable to conferences
with three interlocutors. The paper reports on the results of two
conversation test series carried out with the business set of the
conversation test scenarios ("3CTS", 3-party Conversation Test Scenarios).
The test results show a limited quality differentiation of spatialized
versus non-spatialized speech, and also of narrowband, wideband and fullband
speech (diotic or dichotic presentation). In our presentation, we analyze
the possible reasons for this observation based on different technical and
non-technical criteria. |
Thierry Etamé (France Telecom, France): Characterization of the
multidimensional perceptive space for current speech and sound codecs
The purpose of our work is to produce a reference system that can simulate
and calibrate degradations of speech and audio codecs which are currently
used on telecommunications networks, for subjective assessment tests of
voice quality. At first, 20 wideband codecs are evaluated through subjective
tests with the general goal of producing the multidimensional perceptive
space underlying the perception of current degradations. Then, from a
verbalization task, it appears that the identified attributes are
clear/muffle, high-frequency noise, noise on speech and hiss. Finally, these
dimensions are characterized with correlates such as spectral centroid,
spectral flatness measure, Mean Opinion Score and correlation coefficient. |
Yu Jiao (University of Surrey, UK): Towards consistent assesment of audio quality of systems with different available bandwidth
Historically, different methods were used for the assessment of quality of
narrow-band speech, wide-band speech and broad-band audio signals.
Consequently, various assessment techniques were developed and
compartmentalised according to the bandwidth of associated applications. In
the near future the distinction between audio systems based on their
bandwidth may be blurred and the boundaries between them may be even
completely removed since the new telecommunication systems will allow users
not only to transmit and reproduce speech but also music and sound effects.
In addition, systems will be capable of reproduction of binaural and
multichannel audio signals, making it possible to render accurate 3D audio
scenes. These developments pose new challenges for both objective and
subjective assessment of audio quality in a consistent manner and there is a
need for the development of new, more universal standards for audio
assessment.
In this presentation it will be shown that the traditional methods of
subjective speech quality assessment, such as the ones described in the
ITU-T P.800 Recommendation, could be combined with the methods that are
commonly used in the audio quality assessment, e.g. the one standardised in
the ITU-R BS. 1534 Recommendation. However, an important problem of defining
a fixed frame-of-reference has to be addressed in this new development,
which could be achieved by means of a direct anchoring technique. A live
demonstration of the computer interface based on the new method will be made
during the presentation. |
10:30 – 11:00 |
Coffee break |
11:00 – 12:30 |
Wrap-up session and conclusions
Coordinator: Jean-Yves Monfort (France Telecom, France) |
« TOP »
|
|
|
|