News

For latest information on MPEG-H Audio please visit the Audioblog-Website: https://www.audioblog.iis.fraunhofer.com/tag/mpeg-h

FAQ

Question Answer

What is MPEG-H Audio?

MPEG-H Audio is a new, next-generation audio technology providing more realism through sound from above as well as around the listener. With its unique personalization features, MPEG-H Audio offers viewers great flexibility to actively engage with the content and adapt it to their own preferences. Regardless of the device, the MPEG-H Audio System delivers the best sound experience possible.

What's the benefit of MPEG-H Audio compared to legacy audio codecs?

MPEG-H Audio is a complete audio solution and much more than just a codec. Among others, it offers the following big advantages compared to legacy audio codecs: 

1) Immersive Sound: MPEG-H Audio allows the transmission of three-dimensional immersive audio (3D-audio) by adding elevated sound sources above and below the listeners position. MPEG-H Audio has been specifically designed for flexible loudspeaker signaling including traditional layouts such as stereo, 5.1, 7.1, as well as 3D configurations, namely 5.1+4H, 7.1+4H or 22.2 or even yet to be defined layouts. Within MPEG-H Audio, immersive sound can be carried as channels, objects or as a combination of those. 

2) Interactive and Personalized Sound: MPEG-H Audio enables the listener to interact with the content and create personalized audio experiences. The advanced interactivity options range from simple adjustments, for example, increasing or decreasing the dialogue level in relation to other audio elements, to advanced scenarios in which audio elements may be selected and adjusted in level and/or position as preferred by the listener and under the limits authored by the content creator. 

3) Universal Delivery: MPEG-H offers flexibility by delivering of the same bit stream through different distribution platforms (e.g., terrestrial, satellite, broadband or mobile networks) to all types of devices (e.g., TV set, AVR, soundbar, set-top box, tablet, virtual reality gears with 360-degree video) in various environments, for example, living room, home theater, or noisy mobile environments.

What is MPEG-H Audio standard?

MPEG-H Audio is an international standard developed by the ISO/IEC Moving Picture Experts Group (MPEG), the organisation which has a long history in audio coding with mp3 and the AAC codec family. The MPEG-H Audio standard (ISO/IEC 23008-3) specifies two relevant profiles – Low Complexity (LC) and Baseline (BL) – essential for the broadcast and streaming industry, which allow decoding and rendering of immersive, 3D-audio content while enabling advanced personalization features. Audio objects may be used alone or in combination with channels for efficient delivery and reproduction of immersive sound. The use of these audio objects allows for interactivity or personalization of a program by adjusting the gain or position of the objects during playback. Details about the MPEG-H Audio standard can be found here.

Which audio codec is used for MPEG-H Audio?

MPEG-H Audio is a complete audio solution. It does not use other audio codecs, its codec functionality builds upon the developments from previous generations of MPEG audio codecs such as the AAC codec family instead.

What are the use cases of MPEG-H Audio?

MPEG-H Audio enriches the audio experience by combining immersive sound and advanced personalization options with bit rate efficient, universal delivery to meet requirements of today's consumer needs. 

The MPEG-H Audio System has proven to be the most advanced audio solution for enhancing the broadcast and streaming services for sport events, empowering the audience to experience the emotion of the sports arena in their living room and to decide what is more important for themselves, for example, listening only to the crowd of their favorite team or focus on the commentary. Read more here and here

Similarly to sport events, streaming of live concerts is another major use case where service providers are eager to enhance the their services with immersive sound and interactivity options. Read more here and here

The advanced accessibility features of the MPEG-H Audio system are essential for the elderly and visually or hearing impaired audience. With its Dialog Enhancement and advanced Audio Description Services, MPEG-H Audio makes broadcast audio more accessible for all viewers.

What standards currently support MPEG-H Audio? MPEG-H was adopted in several broadcast, streaming and virtual reality standards. A list can be found at the end of this page.
What MPEG-H Audio content and services are available?

MPEG-H Audio powers the music format 360 Reality Audio, initiated by Sony. The first 360 Reality Audio immersive music streaming services from Amazon Music HD, Deezer, nugs.net, Sony Select and TIDAL have launched in fall of 2019 with currently more than 3000 songs available. Major Labels supporting the 360RA initiative include Sony Music Entertainment, Universal Music, and Warner Music.

The MPEG-H Audio System is used as the sole audio system in the world’s first terrestrial UHD TV service in South Korea. Launch of the system was in May 2017 and commercial services from KBS, MBC and SBS are on-the-air 24/7 since then.

What available end-consumer devices support MPEG-H Audio?

A growing number of devices support MPEG-H Audio, like the Sennheiser Ambeo sound bar, the Amazon Echo Studio smart speaker or the Google ChromeCast Ultra 4K, as well as TV sets from Samsung and LG for the UHD TV service in South Korea.

What is the bitrate of an MPEG-H program?

Because of the flexibility of MPEG-H Audio when it comes to signal configurations, there is no simple answer to that question, as the bitrate depends on the number of signals (channel signals or object signals). With an increasing number of signals in a configuration, the efficiency of the codec increases and the resulting total bitrate is smaller than the sum of single-encoded signals. 

The following table indicates bitrates for some common channel configurations resp. a combination of channel and object signals, starting with stereo and 5.1 surround to several 3D configurations (indicated by "H" for the height channels) and combinations of 3D channel configurations and different numbers of object signals. 

All given examples use a total number 16 or less signals that is covered by "Level 3" in the MPEG-H Audio standard, except for the last configuration, "22.2", that is covered by "Level 4".

Bit rates in kbit/s for Good Excellent Transparent
2.0 48 64 96
5.1 128 192 256
5.1+2H 160 256 320
5.1+4H 192 320 448
7.1+4H/5.1+4H + 2 Objects 256 - 288 384 - 420 512 - 576
7.1+4H + 3 Objects/5.1+4H + 5 Objects 352 - 384 480 - 576 640 - 768
22.2 512 768 1024

Scale according to MUSHRA Recommendation ITU-R BS. 1534-3

Can the MPEG-H stream be multiplexed alongside AAC streams?

Existing broadcast services that use AAC/HE-AAC stereo or surround audio, can be enhanced with the advanced MPEG-H Audio features by simply adding an additional MPEG-H Audio stream in the multiplex. All audio and video broadcast encoders that support MPEG-H Audio can create a multiplex containing the AAC stream as well as the MPEG-H Audio stream. The former can be decoded by legacy receivers and the latter will be decoded by newer receivers.

How can a user interact with audio elements inside an MPEG-H Audio stream?

MPEG-H Audio enabled devices natively offer a "User Interface" which displays all the interactivity options enabled by an MPEG-H stream. Based on the content creator's intentions, for each MPEG-H stream, different interactivity options might be offered to the viewers at home and through the User Interface they have the freedom to personalise their content.

How does the en- and decoder know what options are embedded in the MPEG-H audio stream?

An MPEG-H Audio scene comprises the audio content itself together with additional metadata. This metadata is created during production and contains all necessary information to render the audio content in arbitrary reproduction layouts and to ensure the best audio experience on any platform.

How do I ensure the integrity of metadata during production?

MPEG-H Audio has been carefully designed for enhancing broadcast, streaming and immersive music applications. To ensure the integrity of metadata in an SDI-based environment at any production step, the metadata is delivered in the "Control Track". The Control Track is a "time-code like" audio signal and can be treated as a regular audio channel. This ensures the synchronization of metadata with its corresponding audio and video signals. The Control Track is robust enough to survive A/D and D/A conversions, level changes, sample rate conversions or frame-wise editing. The Control Track does not force audio equipment to be put into data mode or non-audio mode in order to pass through.

What is the MPEG-H Control Track?

The MPEG-H Control Track is a unique solution for delivering the metadata aligned with the audio and video data though existing SDI-based infrastructures. The Control Track is as a "time-code like" PCM audio signal that can be carried on an extra SDI or WAV-file channel. It can be edited in a video editor just as any other audio signal. 

It allows transport of the metadata tightly coupled with the audio content over any medium offering transport of PCM data, such as SDI, MADI, or AoIP. The Control Track can be treated like any other audio signal and is robust against sample rate conversions or level changes. The metadata contained in the Control Track is aligned to the audio and video data, thus any configuration change in live or post production can be applied at every video frame boundary.

What is the MPEG-H Production Format?

The MPEG-H Production Format (MPF) is a multi-channel PCM audio file which contains all the audio content and production metadata of the MPEG-H Audio scene. The metadata is stored as a Control Track, which is a timecode-like PCM audio signal and one of the audio tracks in the multichannel wave-file.

What is the Audio Definition Model (ADM)?

The Audio Definition Model (ADM) according to ITU-R BS.2076 defines an open metadata format for production, exchange and archiving of next-generation audio (NGA) content in file-based workflows. Its comprehensive metadata syntax allows describing many types of audio content including channel-, object-, and scene-based representations for immersive and interactive audio experiences. A serial representation of the Audio Definition Model (S-ADM) is specified in ITU-R BS.2125 and defines a segmentation of the original ADM for use in linear workflows such as real-time production for broadcasting and streaming applications.

What is the MPEG-H ADM Profile?

The MPEG-H ADM Profile defines constraints on ITU-R BS.2076 and ITU-R BS.2125 that enable interoperability with established NGA content production and distribution systems for MPEG-H Audio as defined in ISO/IEC 23008-3. 

The freely available Fraunhofer ADM Info Tool is a software utility that provides support in creating profile-conform ADM metadata. Its conformance check framework runs input ADM metadata against an exhaustive set of checks derived from the MPEG-H ADM Profile, gathering detailed reports of any encountered conformance issues and providing information on how to resolve them.

Is there an automatic conversion of Dolby Atmos content to MPEG-H?

With the MPEG-H Conversion Tool, Fraunhofer offers a simple one-click solution for converting existing Dolby Atmos BWF/ADM files into the MPEG-H Production Format. The tool is available to partners under evaluation agreement. Please contact Fraunhofer for more information.

Where can I get MPEG-H Audio Production Tools?

Fraunhofer IIS offers Production Tools, bundled in the MPEG-H Authoring Suite. The Suite consists of the MPEG-H Authoring Plug-in (MHAPi) version 3.5 and the standalone MPEG-H Authoring Tool (MHAT) version 3.5. 

Register here for a download of the MPEG-H Authoring Suite

Other options for producing MPEG-H include the New Audio Technology Spatial Audio Designer and Blackmagic DaVinci Resolve Studio for post-production workflows, as well as the Linear Acoustic AMS and the Jünger MMA Hardware for live production with MPEG-H Audio.

What can I do with the free available MPEG-H Authoring Suite (MAS)

The MPEG-H Authoring Suite (MAS) is a set of tools that make the production of MPEG-H Audio content easier, faster, more intuitive, and more powerful. They support the recently published MPEG-H ADM Profile, as well as binaural monitoring for immersive audio reproduction over headphones. 

The MPEG-H Authoring Plug-in (MHAPi) takes you through all the steps of creating object- or channel-based MPEG-H Audio productions inside a VST3- or AAX-enabled digital audio workstation (DAW). You will be able to export your immersive and interactive MPEG-H Audio scenes to either MPEG-H Production Format (MPF) or MPEG-H BWF/ADM, containing audio and metadata and ready for distribution via MPEG-H-enabled channels. 

The MPEG-H Authoring Tool (MHAT) is a new software tool for Mac and Windows that helps you create MPEG-H metadata with existing audio material. The MHAT allows for easy MPEG-H authoring without the need of a digital audio workstation (DAW). You can define specific MPEG-H parameters, instantly listen to your configurations and export your authored mixes as MPEG-H Production Format (MPF), MPEG-H BWF/ADM or as a template export in an XML file. 

What is required to enable object-based production with MPEG-H in existing production workflows?

Object-based production requires a metadata authoring step for the object-based interactivity and accessibility features as well as for loudness measurement. There is no single answer that fits all kinds of production environments and production requirements, but a range of typical workflows starting at simple, automated or preset-based authoring that fits the most common content types, up to comprehensive authoring workflows for advanced applications. See here for more information

What is the "MPEG-H Audio authoring step"?

The MPEG-H Audio System has been designed such that content creators can define multiple presets and explore new creative options. A broadcaster can prepare mixes (including the default or main mix of the program) using authoring tools that specify an ensemble of gain and position settings for objects to create preset mix selections that can be presented on a simple menu to the user. Even more control of the audio elements in a program is possible and can be enabled in the »advanced MPEG-H Audio interactivity menu« by enthusiast viewers. All interactivity features offered to the user are strictly defined by the broadcaster during metadata creation. This process of generating metadata is called »authoring« and is the most important difference in production of MPEG-H Audio content compared to a legacy production.

How can I export the audio alongside the metadata and ensure the integrity in all production steps?

There are multiple solutions, depending on the production scenario. Using the tools of the MPEG-H Authoring Suite in post-productions, audio and metadata can be exported as: 

MPEG­-H BWF/ADM: An MPEG-­H BWF/ADM (short for Broadcast Wave Format with embedded Audio Definition Model metadata) file is a multi­channel wave-­file which contains all the audio and metadata for the MPEG-­H scene. The exported BWF/ADM file is compliant to the MPEG­-H ADM Profile. Loudness will be measu­red during export and will be embedded into the exported file. 

MPF: An MPF (short for MPEG­-H Production Format) file is a multi­channel wave-­file which contains all the audio and metadata for the MPEG-­H scene. The metadata is stored in the Control Track, which is one of the audio tracks in the multichannel wave-file and contains a modulated signal that is robust against sample rate conversions or level changes. Loudness will be measured during export and will be embedded into the exported file. 

XML: This export option is intended for special applications that make use of MPEG­-H scene definitions as XML representation. The XML is accompanied by a multichan­nel wave file containing the audio essence. 

For more information watch this video on Vimeo or this video on Youtube 

For MPEG-H live-productions, the Authoring and Monitoring Units (AMAU) export the audio signals and the Control Track in realtime. It allows transport of the metadata tightly coupled with the audio content over any medium offering transport of PCM data, such as SDI, MADI, or AoIP. The Control Track can be treated like any other audio signal and is robust against sample rate conversions or level changes. 

For more information watch this video 

Can I export MPEG-H compliant ADM using the Authoring Tools?

Yes, the MPEG-H Authoring Suite supports the export of audio and metadata as BWF/ADM according to the MPEG-H ADM Profile <add link to the download>.

What is the recommended MPEG-H loudspeaker configuration?

MPEG-H Audio has been specifically designed for flexible loudspeaker signaling including traditional layouts such as stereo, 5.1 and 7.1, as well as 3D-Audio configurations with height channels, like 5.1+4H and 7.1+4H, or configurations with height, mid and lower-layer channels, for example 22.2, or even yet to be defined layouts. 

The loudspeaker configuration depends on the requirements of the intended production. Recommendations for loudspeaker placement, studio design and productions workflows can be found here.

Is there an option to monitor on headphones with binaural rendering?

Yes, this option is available in version 3.5 of the MPEG-H Authoring Suite.

How does MPEG-H support downmix?

MPEG-H Audio supports downmixing to typical, common speaker layouts with a set of pre-defined downmix configurations. Additionally, it comes with customizable downmix options enabling content-specific downmixing that is configurable for each layout.

Are there example sessions or templates to be used for the MPEG-H Authoring Suite?

Yes, the MPEG-H Authoring Suite comes with a set of template sessions for Nuendo, Pro Tools, Reaper and Sequoia.

How can I get training or tutorials for MPEG-H Audio production?

As a first step, we'd like to recommend our series of tutorial videos to help you get started with MPEG-H Authoring using our MPEG-H Authoring Plug-in. 

Watch on YouTube

Watch on Vimeo

If you have further questions, you can always get in touch with our MPEG-H Tool experts via: productiontools-techsupport@iis.fraunhofer.de

Can I export MPEG-H compliant ADM using the Authoring Tools?

Yes, the Authoring Tools support the export of audio and metadata as MPEG-H Production Format, MPEG-H BWF/ADM and XML.

Download

STANDARDS & SPECIFICATIONS

ISO/IEC 23008-3: "Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio"

Link

-----

ATSC: A/342 Part 3:2017, MPEG-H System

Link

Digital Video Broadcasting (DVB): ETSI TS 101 154, Specification for the use of Video and Audio Coding in Broadcasting and Broadband Applications

Link

TTA (TTAK-KO-07.0127R3): Transmission and Reception for Terrestrial UHDTV Broadcasting Service

Link

ABNT NBR 15602-2, Digital terrestrial television – Video coding, audio coding and multiplexing Part 2: Audio coding, Amendment 1

Link

----

SCTE: SCTE 242-3, Next Generation Audio Coding Constraints for Cable Systems: Part 3 - ­ MPEG-H Audio Coding Constraints

Link

UHD Forum: Ultra HD Forum Guidelines

Link

International Telecommunications Union (ITU) Recommendation ITU-R BS.1196-7 (01/2019), Audio coding for digital broadcasting

Link

---

ISO/IEC 23000-19:2020, Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media

Link

CTA: CTA-5001, Web Application Video Ecosystem – Content Specification

Link

DASH-IF: Guidelines for Implementation: DASH-IF Interoperability Point for ATSC 3.0

Link

HbbTV: HbbTV 2.0.2 Specification (ETSI TS 102 796): Hybrid Broadcast Broadband TV

Link

---

3GPP: ETSI TS 126 118 v15.0.0 (2018-10) 5G: 3GPP Virtual reality profiles for streaming applications (3GPP TS 26.118 version 15.0.0 Release 15)

Link

VR-IF: VR Industry Forum Guidelines

Link

ISO/IEC 23090-2:2019, Information technology — Coded representation of immersive media — Part 2: Omnidirectional media format

Link

---

Digital Video Broadcasting (DVB): ETSI EN 300 468, Specification for Service Information (SI) in DVB systems

Link

Digital Video Broadcasting (DVB): MPEG-DASH Profile for Transport of ISO BMFF Based DVB Services over IP Based Networks

Link

SCTE: SCTE 243-3, Next Generation Audio Coding Constraints for Cable Systems: Part 3 ­ - Carriage of MPEG-H Audio

Link

For more information, please visit www.iis.fraunhofer.de/mpeg-h