For the last three years, 11 partners from all over Europe have been working on a €9.5 million project to launch what they call “the ultra high resolution interactive television service of the future. A full system comprising appropriate capturing and analysis technology, networking components and various terminal devices”. The project is FascinatE (which stands for Format-Agnostic SCript-based INterAcTive Experience).
The FascinatE project partners are, in alphabetical order, Alcatel-Lucent, ARRI, the BBC, Franuhofer HHI (Heinrich Hertz Institute), Interactive Institute, Joanneum Research Digital, Softeco Sismat, Technicolor, TNO Innovation For Life, Universitat Polytecnica Catalunya Barcelonatech, and the University of Salford, Manchester.
The project required the partners to develop new video and audio capture systems and scripting systems to control the shot framing options presented to each viewer. Video content is captured using UltraHD panorama cameras and additional cameras, including 3D rigs. Video and ambient audio are compiled into a layered scene representation together with metadata.
Intelligent networking components interpret the layered scene representation and adapt the content depending on the type of service or the capabilities of the target device.
The whole range of terminal devices is catered for; from high resolution, immersive displays for a bigger audience, and home viewing environments, down to individuals’ mobile devices.
Work is being done on interactivity, with hand gestures for large devices and touch for personal devices.
The 3D audio for FascinatE is recorded using an MH Acoustics em32 Eigenmike® microphone array (above), a 32 channel spherical microphone array that uses 32 individually calibrated professional quality 14 mm electret microphones embedded in a rigid sphere baffle. As an example of how it’s used, for the test shoot – a Premier league match between Chelsea and Wolverhampton Wanderers – the Eigenmike was used:
“to record the soundfield near the principal cameras for a comprehensive spatial audio representation of the auditory scene. When played back over a suitable ambisonics sound system, the result is a very realistic impression of being in that auditory environment. In FascinatE, the recorded soundfield can be processed such that an accurate impression of ‘being there’ can be obtained wherever the user navigates to in the visual scene.”
Lifted directly from the FascinatE website, here’s a description of how the audio was handled:
Creating a format agnostic interactive broadcast experience poses some interesting challenges to the partners from Technicolor and The University of Salford who are responsible for the audio aspects of FascinatE. Of chief importance is the need to record the given audio scene in such a way that the content can be rendered on any reproduction system at the user end and can update depending on the dynamic viewing point. This demands a paradigm shift from how audio has been traditionally recorded for broadcast. Instead of broadcasting to match a specific hardware set up such as stereo, 5.1, 7.1 etc we adopt an object orientated approach which can be reproduced on any system. The audio scene is considered to be made up of a set of audio objects (point sources with a specific location) and an ambient sound field contribution. The challenge at the recording side therefore is to record the sound field as well as the content and location of the audio objects at the scene. This often involves completely different recording techniques to what is considered standard practice in the broadcast industry. Ideally each sound source would be individually close miked and tracked in space, however in many cases (such as the first FascinatE test shoot at a football match) this is not possible and the content and position of the audio objects needs to be derived by processing the signals from the available microphones near to the sources.
The ambient sound field can also be recorded in such a way that it can be updated to match a given viewing position for example, using sound field microphones such as the Eigenmike® (Figure 1) or the SoundField® microphone which record the 3 dimensional sound field at a given point. With audio objects and sound field accurately recorded it is possible to encode these sources in various sound field representations such as ambisonics B-format or wave field synthesis (WFS) which can in turn be decoded into any output format. As the user pans around the visual scene it is possible to both rotate and translate this sound field to match the new viewing position based on camera pan and zoom. On the rendering side, it is important that the audio updates accurately with the updating view and that it matches the user preferences. FascinatE bridges the gap between a passive viewer and active participant scenarios. Current television broadcasts could be considered as passive viewing, where the audio remains stationary regardless of the camera position; conversely active participant viewing is more akin to a video game scenario where the audio updates completely with the viewing position. Of interest for FascinatE is which of these viewing paradigms the user subscribes to when navigating round the scene. Future work will therefore be centred on not only recording the audio scene such that the content is format agnostic but also on determining how best to render the audio to match user preferences.
Calrec has not yet been directly involved with FascinatE, but we like it. If you’ve been involved and you’d like to share your thoughts, please post a comment. We’ve got plenty of experience of surround mics – we did engineer the first ever Soundfield, after all – but we’re not that familiar with the Eigenmike. If you’ve used one, or are something of an expert, we seek enlightenment!