Farina - Reproduction of auditorium spatial impression with binaural and stereophonic sound systems.pdf

(551 KB) Pobierz
Reproduction of Auditorium Spatial Impression with Binaural and Stereophonic Sound Systems
Audio Engineering Society
Convention Paper 6485
Presented at the 118th Convention
2005 May 28–31
Barcelona, Spain
This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration
by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request
and remittance to Audio Engineering Society, 60 East 42 nd Street, New York, New York 10165-2520, USA; also see www.aes.org.
All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
Reproduction of auditorium spatial
impression with binaural and stereophonic
sound systems
Paolo Martignon 1 , Andrea Azzali 1 , Densil Cabrera 2 , Andrea Capra 1 , and Angelo Farina 1
1
Industrial Engineering Department, Università di Parma, Via delle Scienze, 43100 Parma, Italy
paolo.martignon@inwind.it
2
School of Architecture, Design Science and Planning, University of Sydney
Sydney, NSW 2006, Australia
densil@arch.usyd.edu.au
ABSTRACT
Binaural room impulse responses convolved with anechoic recordings are commonly used in auditorium acoustics
design and research. Binaural and stereophonic (O.R.T.F.) room impulse responses, which had been recorded in five
concert auditoria, were used in this study to test the spatial audio quality of four reproduction systems: conventional
stereophony, binaural headphones, stereo dipole, and double stereo dipole. Anechoic music, convolved with the
impulse responses, was reproduced over these systems. The systems were matched as closely as possible to each
other, and to the sound levels that would occur in the auditoria for the musical source. In a subjective test, subjects
rated the room size, sound source distance and realism of the reproduction. The stereo dipole and O.R.T.F.
stereophonic systems appear to work better than the headphone and double stereo dipole systems.
1. INTRODUCTION
binaural signals. Since localization of sound around the
aural axis depends largely on the highly individual
acoustical filtering provided by pinnae, localization is a
primary aspect of this spatial distortion. Nevertheless,
non-individualized binaural recordings are very
convenient, in terms of being easy to obtain through
room acoustical measurement and computer simulation,
as well as from existing databases. Despite their
limitations, they can certainly be helpful in appreciating
the acoustical qualities of auditoria, at least in relative
Binaural audio recordings and binaural room impulse
responses convolved with anechoic recordings are
commonly used in auditorium and room acoustics
design and research. Without individualization, such
recordings and convolutions may be subject to
substantial spatial distortions when listened to using
headphones or other playback systems designed for
143066874.002.png
Martignon et al.
Binaural and stereophonic systems
terms. This study examines three options for presenting
audio recordings from concert auditoria in binaural
format, as well as conventional stereophonic
presentation. It investigates the ability of the audio
reproduction formats to convey sound source distance
and room size in the context of concert auditoria, and
rates the subjectively assessed realism of the audio
formats.
Cross-talk cancellation provides an alternative to
headphones for presenting binaural recordings and
simulations. Originally proposed in the 1960s [3, 4], this
approach was famously used for auditorium acoustical
assessment by Schroeder et al. in 1974 [5]. This
technique reproduces the sound from the two ears of a
head (or model or simulation thereof) at the two ears of
a listener, using at least two loudspeakers. At a
specified head position, the cross-talk from the right
loudspeaker to left ear, and from the left loudspeaker to
right ear, is cancelled by signals from the
complementary loudspeaker. There are limits to this at
low frequencies, because inter-aural level differences
are naturally small or negligible. The short wavelengths
at high frequencies can make the listener’s head position
critical for effective operation. Cross-talk cancellation
also requires an absorbent acoustic environment to be
effective.
1.1. Two-channel audio formats
This section summarizes key characteristics of the audio
formats considered in this research project.
1.1.1. Binaural techniques
Dummy head recordings and binaural simulations
record or predict the sound at the ears, which can then
be reproduced using headphones or other techniques
including cross-talk canceling loudspeaker systems. A
thorough review of binaural techniques, especially using
headphone presentation, is given by Møller [1]. He
summarizes the problems of binaural headphone
techniques as including localization errors around the
cones of confusion (and especially the difficulty in
establishing a frontally localized source), and a lack of
response of the system to head movements. While the
former of these problems can be solved using
individualization, and the latter using head-tracking, the
present paper is concerned with systems with neither
individualization nor head-tracking. Other authors cite
inside-the-head localization as a problem, but Møller et
al. [2] find no instances of this in test using a carefully
calibrated non-individualized binaural headphone
system. Headphone equalization is probably the most
subtle key aspect of using a non-individualized binaural
headphone system: simply reproducing a dummy head
recording over unequalized headphones means that the
sound is subject to the manufacturer’s designed
frequency response (which is unlikely to be optimized
for binaural reproduction), and subject to effects of both
the dummy head ear and listener’s ear effects. One
solution involves compensating for the non-flat transfer
function between the headphones and the microphones
of the original dummy head used to make the
recordings. Møller et al. [2] find that the error in
auditory distance perception increases when using non-
individualized a headphone binaural system (compared
to individualized headphone binaural, and to natural
listening, for source distances of up to 5 m), but they did
not find a systematic shift in perceived distance.
More recently, a refinement of cross-talk cancellation
known as the stereo dipole has been developed,
investigated and applied. This is a type of cross-talk
cancellation where the two loudspeakers are located
close together, so as to approximate co-located
monopole and dipole sources. Kirkeby et al. [6] find
that this configuration (with a 10º interval between
loudspeakers as seen by the listener) minimizes the
ringing artifacts in the cross-talk canceling filters, and
expands the area in which the cross-talk cancellation is
effective (allowing greater listener head movement, [cf.
7]). The cost of closely located sound sources is that the
low frequencies require a great boost, and so cross-talk
cancellation at low frequencies becomes very
inefficient. One solution to this problem is to have
greater separation between low frequency drivers than
high frequency drivers. Another solution is to institute a
cut-off frequency below which cross-talk cancellation is
abandoned, and the loudspeakers merely reproduce the
binaural channels without additional processing. The
present study, which uses stereo dipole, applies both of
these solutions.
One clear advantage of the stereo dipole technique over
binaural headphones is its ability to generate frontally
located auditory images. Having the loudspeakers at
what is probably the most important position for
localization appears to solve this problem. Another
related advantage is that, to the extent that the system
tolerates head movements, the sound field is not locked
to the head, and so localization may be able to benefit
from at least small head movements.
AES 118th Convention, Barcelona, Spain, 2005 May 28–31
Page 2 of 12
143066874.003.png
Martignon et al.
Binaural and stereophonic systems
The double stereo dipole is an extension of the simple
stereo dipole system, with both front and rear stereo
dipole loudspeaker pairs. This facilitates the impression
of sound coming from behind the listener. However,
the listener head position becomes critical for this
loudspeaker arrangement, because the desired
interference between front and rear stereo dipoles
occurs over a quarter of a wavelength.
microphone, which includes an omnidirectional output
channel, was on a boom 1 m ahead of the dummy head.
This configuration and method is described in more
detail by Farina and Ayalon [11].
The five auditoria used in this study were the large,
medium and small halls in Rome’s Parco della Musica,
Parma’s Auditorium Paganini, and Kirishima’s Miyama
Conseru in Japan. Two receiver positions were chosen
for each auditorium. In every case, the receiver was on
the longitudinal axis of symmetry of the auditorium, and
the source 1 m off this axis, on the stage.
1.1.2. Conventional stereophony
Conventional two-channel stereophony is perhaps not
used at all in auditorium acoustics research. However,
it is very commonly used in music reproduction for
entertainment purposes, and there are innumerable
recordings of musical performances in auditoria made
using various stereophonic microphone techniques. The
present study uses the O.R.T.F. stereophonic
microphone array, consisting of two cardioid
microphones separated by 17 cm and by an angle of
110º. In a comparison of various stereophonic
microphone arrays, Hugonnet and Jouhaneau [8] find
that coincident techniques (such as XY and MS) yield
the most accurate lateral localization, while closely
spaced techniques (including O.R.T.F.) yield the finest
distance discrimination. In another comparison, Ceoen
[9] found a subjective preference for recordings made
using the O.R.T.F. system (these were recordings of an
orchestra in an auditorium), and this preference appears
to be due to the configuration’s ability to convey the
spatial impression of the auditorium [10].
Room acoustical parameters were extracted from the
selected impulse responses. These included
reverberation time (T30), early decay time, clarity index
(C80), speech transmission index, bass ratio, treble
ratio, lateral fraction, and inter-aural cross correlation
coefficient (IACC). Octave band values were
transformed to single number values using the
recommendations in ISO3382 [12]. Strength factor (G)
was not determined, but the reproduced sound pressure
level ( L eq ) of each stimulus (see below) was.
2.2. Listening room and apparatus
The listening room floor was 4.5 m x 3.2 m, with a
ceiling height of 4.2 m. Sound absorbing panels were
attached to most of the wall space up to a height of 2 m.
Absorbers were also suspended near the ceiling, and
placed on the floor. Materials likely to absorb low
frequency sound (such as cardboard panels and boxes)
were included in the room acoustical absorption. The
measured mid-frequency reverberation time (using the
experiment loudspeakers as sources, and dummy head
in the subject’s position as receiver) was 0.2 s, with an
increase in reverberation time the low frequency range.
Background noise level, with the audio equipment
operating, was measured at NCB 25 [13].
2. METHOD
2.1. Auditoria and impulse response
measurements
This study exploits a collection of auditorium impulse
responses previously made by Farina and colleagues
[11]. The key characteristic of the selected impulse
responses is that the same equipment and procedure was
used in each case, with the signal gain structures fully
documented. Measurements had been made using a
dodecahedron loudspeaker plus a subwoofer as the
sound source on stage. The test signal was an
exponential swept sine wave. Equalization had been
applied to this signal for a constant spatially averaged
output power from the loudspeaker. A Neumann KU70
dummy head was used as the binaural microphone, with
a pair of Neumann AK40 cardiod microphones in the
O.R.T.F. configuration for two channel stereophonic
recording. In addition, a Soundfield B-format
The axis of symmetry of the loudspeaker array was not
aligned with the room, nor was the listening position in
the room’s center. Loudspeakers were at a distance of
1.5 m from the listening position. Prototype Audiolink
AL105 loudspeakers were used for the conventional
stereophonic pair, ±30º from the median line of
symmetry. Genelec S30D reference studio monitors
were used for the front stereo dipole, on their sides so
that the tweeters were 22 cm apart, the mid-range
drivers 43 cm apart, and the woofers 83 cm apart
(measuring between driver centres). This corresponds
to respective angles of 4º, 8º, and 16º from the median
AES 118th Convention, Barcelona, Spain, 2005 May 28–31
Page 3 of 12
143066874.004.png
Martignon et al.
Binaural and stereophonic systems
line of symmetry (the angle seen by the subject between
loudspeakers is double these values). The rear stereo
dipole pair had QSC AD-S82H loudspeakers, with
driver centers separated by 45 cm, corresponding to a 9º
angle from the midline.
2.3. Stimulusgeneration
A calibrated anechoic recording was used in this project
so that the reproduced sound pressure levels could be
realistic. This was of a piano accordion, with a
measurement microphone at a distance of 2.5 m directly
in front of the performer. The music was “La ballata di
Michè” (“Miky’s Ballad”), by Fabrizio de Andrè: a
waltz, with a legato melody and articulated
accompaniment. The octave band sound pressure levels
of the source, normalised to 1 m, are shown in Figure 2.
The A-weighted L eq of the piano accordion normalized
to 1 m is 80 dB(A). The recording was approximately
45 seconds in duration.
Although different loudspeaker models were used, the
frequency responses of all systems were matched using
4096 tap inverse filters between 100 Hz and 20 kHz,
developed using the algorithm of Kirkeby et al. [14].
One point in favour of this system matching was that the
audio content of the experiment was undemanding on
the loudspeakers, having little low frequency content
and requiring only modest sound pressure levels at the
listening position. Specifically, inverse filters were
designed: (i) for the conventional stereophonic system
to flatten the frequency response to an omnidirectional
measurement microphone at the listener position; (ii) for
the headphones to flatten the frequency response from
the headphones to the dummy head; and (iii) for the
stereo dipole systems, to provide cross-talk cancellation
from 250 Hz and a flat frequency response between the
binaural channels and dummy head (in the listening
position) from 100 Hz.
Although the room had windows, they were almost
entirely covered with opaque panels, so that the
experiment was conducted in the light of the computer
monitor, with just a little additional ambient light. Most
of the surfaces in the room, at least below a height of
2 m, were dark grey or black, and little other than the
experiment computer display was visible to a subject
once their eyes had adapted to the computer monitor.
Figure 2 Octave band equivalent sound pressure level of
the accordion, normalized to a microphone distance of
1 m.
Impulse responses created using a dodecahedron
loudspeaker are not ideal for use in listening
experiments (convolved with anechoic recordings).
Typical sound sources, such as individual musical
instruments or a human voice, are usually directional,
rather than omnidirectional. An omnidirectional source
will yield a lower direct-to-reverberant energy ratio than
a source directed to the listener in an auditorium,
resulting in reduced clarity for the listener. A second
limitation of dodecahedral loudspeakers is their
sensitivity as a function of frequency and radiation
angle varies substantially due to interference between
the twelve drivers. At high frequencies, the individual
drivers also have their own directivity, resulting uneven
sound radiation. The duration of an anechoic impulse
response from a dodecahedral array is long, determined
by the size of the dodecahedron. Although the room
impulse responses used in this study were made with a
dodecahedral loudspeaker (plus subwoofer), some
attempt was made to address these problems. Firstly,
the spatially averaged spectral irregularity of the
Figure 1 Sketch of the listening room configuration.
AES 118th Convention, Barcelona, Spain, 2005 May 28–31
Page 4 of 12
143066874.005.png
Martignon et al.
Binaural and stereophonic systems
loudspeaker was compensated for by equalising the
measurement signal (as mentioned previously). This is
probably an adequate solution for all but the direct
sound. Secondly, the direct sound was addressed by
substituting the measured direct impulse with an ideal
direct impulse. In the case of the O.R.T.F. impulse
responses, this ideal signal was simply a single sample
impulse, which has an almost flat frequency response up
to the Nyquist frequency. For the dummy head the
signal was the 0º anechoic impulse response for that
dummy head. The direct sound of each room impulse
response was measured, using a 256-sample fast Fourier
transform (Blackmann-Harris window, sampling rate of
48 kHz) centered on the first major peak in the impulse
response. The 256 sample ideal signals (with the
impulse peak at the 129th sample) were substituted for
the direct sound, scaled to have the same acoustic
energy as the original 256 samples (measured at 500
Hz). The remaining part of the room impulse responses,
consisting of early reflections and reverberant decay,
was attenuated by 3 dB relative to the direct sound,
thereby producing a simplistic approximation of a sound
source with a directivity index of 3 dB facing the
listening position.
source-receiver distance in the auditorium (assuming
direct sound only). This established the playback gain
structure for the stereophonic system, such that the
speech and accordion were reproduced in the listening
room at approximately the same sound pressure levels
as would have occurred in the auditoria.
Figure 3 Comparison between theoretical free field and
measured sound levels for various receiver positions in
the five auditoria, at 500 Hz.
Verification of the impulse response relative calibration
was done by examining the relationship between the
direct sound level and source-receiver distance.
Notwithstanding effects of very early reflections,
dissipation of acoustic energy in the air, and variation in
loudspeaker directivity (depending on its orientation),
the direct sound pressure level at the receiving position
should follow the free field ideal of -6 dB per doubling
of distance. Consistency with this principle was
examined at 500 Hz (where air dissipation should be
negligible, and the loudspeaker omnidirectional), as
illustrated in Figure 3. There is general agreement
between measurement and theory, with an rms error of
less than 1 dB, but deviations of up to 2 dB.
While the gains of the three binaural playback systems
could be matched simply by dummy head
measurements at the listening position, there is, to some
extent, and arbitrary relationship between the
stereophonic and binaural system gains, because their
spatial sensitivity is different, and spatial sensitivity
varies substantially with frequency in the case of the
binaural system. It is certainly possible to match the
microphone systems for free field sensitivity, or for
diffuse field sensitivity – but these results are quite
different, and in an auditorium the sound-field is at
neither of these extremes. Therefore a simple approach
to microphone system matching was taken in the
playback system – such that the mean broadband sound
pressure level difference of equivalent recordings (room
impulse responses convolved with anechoic speech or
accordion) was 0 dB (standard deviation of 1.2 dB).
Having some stimuli with somewhat greater or lesser
sound pressure levels over the binaural systems, relative
to the stereo system) could influence the subjective
parameters investigated, and as such was considered to
be a useful component in the subjective comparison
between these systems.
The edited impulse responses (both ORTF and dummy
head) were convolved with the anechoic recording of
piano accordion, at a constant gain. In order to calibrate
the gains of the playback systems in the listening room,
a 500 Hz octave band noise signal was created with a
known level difference to the anechoic recording
microphone calibration tone. This was convolved with
the direct impulse only of one of auditorium situations
(O.R.T.F. format) using the same processing gain
structure as for the music convolutions. The reproduced
sound pressure level of the stereophonic loudspeaker
system was adjusted to match that predicted by the
AES 118th Convention, Barcelona, Spain, 2005 May 28–31
Page 5 of 12
143066874.001.png
Zgłoś jeśli naruszono regulamin