(Continuation of Question 8/16) Motivation
Voice activity detection (VAD) is widely used in telecommunications networks
as a means of differentiating between wanted and unwanted in-band audio signals,
for example to obtain trunking efficiency in circuit multiplication equipment,
or to ensure correct operation of echo control and other signal enhancement
devices, etc.
The proposal for generic sound activity detection (GSAD) is motivated by two
problems:
- With rapid changes in the telecommunication network environment, more and
more multimedia services are being provided. Although the network is evolving
from a voice to a multimedia network, most VAD algorithms are still mainly
designed to handle voice signals and cannot work properly in the presence of
rich audio signals, which include voice, music, background environmental noise,
information tones, etc.
- Historically, VAD algorithms have been developed separately for individual
network elements and applications, and there are currently numerous VAD
algorithms. However, they are based on different principles, which make it
difficult to provide common performance enhancements across all VADs.
Therefore it is beneficial to develop a generic sound (rather than voice)
activity detector, which can be applied across a range of applications. The
benefits from a standardised GSAD are:
- Enhanced performance to deal with new types of in-band audio signals
- Reduced development time and cost for new equipment requiring sound activity
detection, e.g. codecs, circuit multiplication equipment, echo control, signal
enhancement devices, VoIP gateways, terminal adapters, etc
- Opportunity for use in existing speech and audio coders which do not include
VAD
Study items
Study items to be considered include, but are not limited to:
- Definition and classification of applications and associated performance
requirements for generic sound activity detection
- Definition of algorithm(s) suitable for generic sound activity detection
meeting the applications and performance requirements
- Definition of the test conditions and evaluation procedures to be applied in
selecting between candidate algorithms on the basis of objective and subjective
performance, in conjunction with SG 12
- Selection and specification of procedures to be used in verifying the
implementation of selected algorithm or algorithms
- Considerations on how to help measure and mitigate climate changes
Tasks
Tasks include, but are not limited to:
- Develop Terms of Reference for GSAD algorithms for different applications
- Assist SG 12 in developing new Recommendations on testing methodologies
- Solicit proposals and conduct selection test(s) for candidate algorithms to
meet these Terms of Reference
- Develop new Recommendation(s) based on the outcome of the(se) selection test(s)
An up-to-date status of work under this Question is found in the SG 16 work
programme (http://itu.int/ITU-T/workprog/wp_search.aspx?isn_sg=554).
Relationships
Recommendations:
- G.700-series speech and audio coding Recommendations
- G.76X-series circuit multiplication Recommendations
- G.799.X-series voice over IP gateway Recommendations
- G.16X-series speech enhancement Recommendations
- P.800-series methods for objective and subjective assessment of quality
Recommendations
- Q.115.x-series protocols for the control of signal processing network elements
and functions
Questions:
- 7, 9, 10/16 on speech and audio coding
- 14, 15, 16, 18/16 on network signal processing
Study Groups:
- ITU-T SG 2 to identify other potential user applications
- ITU-T SG 9 on applications digital cable systems and IPTV
- ITU-T SG 11 on signalling requirements and protocols
- ITU-T SG 12 on speech and audio quality evaluation of specified algorithms
- ITU-T SG 13 on NGN and on speech and audio coding in IMT
- ITU-R SG 5 to ensure compatibility with mobile transmission system constraints
Other Bodies:
- 3GPP, 3GPP2
- ETSI TISPAN
- IETF
- TIA
|