Can we create the “holodeck”?
The challenge of 3D television
Christoph Dosch and David Wood
Arranging a television system so that
viewers can see three-dimensional (3D) pictures is both simple
and complex. ITU’s Radiocommunication Sector (ITU–R) has agreed a new study topic on 3D television (see article in
ITU News of September 2008), and over the next year, will be building up knowledge of the options.
3D imagery and the human visual sense
Christoph Dosch is General Manager, Collaborative Research, at the Institut für Rundfunktechnik GmbH (IRT) in Munich, Germany.
He is Chairman of ITU–R Study Group 6 (Broadcasting service)
David Wood is Head of New Technology
at the European Broadcasting Union in Geneva, Switzerland.
He is Chairman of ITU–R Working Party 6C (Programme production and quality assessment)
It seems simple, but it is actually very difficult to do 3D television well, at least with today’s technology. Many books, articles, and treatises have been written about 3D, and 3D television. Early demonstrations of stereoscopic colour television or of stereoscopic film projection using polarization plane and colour techniques to distinguish the content for the viewer’s right eye from that of the left eye, quickly revealed the problems faced when dealing with the human visual sense.
A series of elements, termed “depth cues”, contribute to our perception of depth. These include the relative size of the known-size objects, masking effects (when one object is in front of another), perspective — and most important of all is “binocular disparity”.
Binocular disparity is the difference between the same scene as seen by the left and right eyes, and is the most powerful depth cue for normally-sighted people. It is easy to discover the effect of binocular disparity simply by closing one eye after the other, and noticing how the image shifts.
We might imagine that all we need to do to achieve 3D television is to arrange for “left-eye” and “right-eye” signals to be available on a screen (“Planar 3D”) separately to each eye of the viewer, and the job is almost done. True, but it may not be done “well”. This applies whether the two pictures are delivered by analogue or by digital means.
All planar 3D systems developed to date cause degrees of “eye fatigue”. Some of the causes can be removed by such means as precise registration of the images for the left and right eye; but two causes in particular are not easily removed, whether the television system is analogue or digital. These are sometimes termed the “accommodation conflict” and the “infinity separation problem”. They, and other problems, limit the possibilities for a long duration comfortable viewing 3D system for home television in the short term.
Nevertheless, in spite of scepticism that a fatigue-free 3D television system can be made, it must be studied. It is a subject of great interest in broadcasting today. In addition, some movie companies are investing heavily in 3D as the potential saviour of the “hall cinema” in the face of the growth of high-definition television (or HDTV) “home cinema”. If fatigue-free 3D cinema can be found, then fatigue-free 3D television can also be found. We should have an attitude of “open-minded scepticism”.
Simple planar 3D
Planar 3D films are made by recording separate images for the left eye and the right-eye from two cameras that are spaced a certain distance apart. The spacing chosen affects the disparity between the left-eye and the right-eye pictures, and thus the viewer’s sense of depth. It is true that this will lead to depth perception, but it almost inevitably results in eye-fatigue after a while. Nevertheless, the technique is widely used for (stereoscopic) photography and movie making, and it has been tested many times for television.
With and without glasses
There are a range of tools to allow each eye to view the separate pictures. These include techniques using “glasses”, and techniques which do not.
Of the systems with glasses, the “best" are probably those using orthogonal (different) polarization planes for each, with matching viewer glasses for each of the left and right eye pictures. Light from each picture is filtered so that only one plane for the light wave is available. This is easy to arrange in a cinema, but more difficult to arrange in a television display. The system is used in movie theatres. Test television systems have been developed on the basis of this method, either using two projection devices projecting onto the same surface, or two displays orthogonally placed so that a combined image can be seen using a semi silvered mirror. In both cases, these devices are “non-standard” television receivers.
There are alternatives which could be used, in principle, more easily in a standard-television display. One is to use different colorimetric arrangements for each of the two pictures, coupled with glasses which filter appropriately. Also, a relatively new notch filter colour separation technique that can be used in projection systems has been developed at Ulm University in Germany, and has been taken up by Dolby in North America.
Another technique, which can be used for television, is time multiplexing of the display, with consecutive left and right signals and shuttered glasses. This was quite widely used in television (sometimes called “interlaced stereo”) for packaged media in the days of the cathode ray tubes (CRT). It is still used for movie theatres today, such as the I-Max, and sometimes used in conjunction with polarization plane separation. In the CRT environment, a major shortcoming of the interlaced stereo was image flicker, since each eye would see only 25 or 30 images per second, rather than 50 or 60. To overcome this, the display rate could be doubled to 100 or 120 Hz to allow flicker-free reception.
A “virtual reality” headset is another example of a technique using glasses and is often used for video games.
Technologies which allow viewing without glasses fall into two categories. The first are techniques which arrange for each eye's view to be directed towards separate picture elements by lenses. This is done by fronting the screen with a ribbed (lenticular) surface. In the second category are techniques that arrange for the screen to be fronted with barrier slots which perform a similar function. In this system, two views (left and right), or more than two (multi-camera 3D) can be used. However, since each of the picture elements (stripes or points) have to be laid next to each other, the number of views impacts on the resolution available. There is a trade-off between resolution and ease of viewing. Arrangements can be made with this type of system to track head or eye movements, and thus change the barrier position, giving the viewer more freedom of head movement.
The accommodation problem
All of these systems work after a fashion, but all suffer from (among other shortcomings) the same fundamental psycho-physical problem of “accommodation conflict”. To understand this we need to consider what happens in the brain when depth is perceived.
When we see “normally”, two inverted images are formed on the retina at the back of the eye, with a parallax disparity. The brain takes in these two images and “fuses” them into a single image that appears to be seen from the centre of the forehead. This is termed the “cyclopean” image. Objects in the cyclopean image have depth proportional to the disparity between the left and right eye images. The brain actually “projects forward” the cyclopean image, in our mind, to its correct position. This process of fusing inverted images and mental forward projection is an amazing one that reminds us of how far we have to go to achieve what the human body does automatically.
When we look at objects, our two eyes turn inward (converge) to those objects. At the same time, our eyes focus (accommodate) on the point of convergence, in a control loop, to maximize the sharpness of the image on the retina.
The Victorian “stereoscope” displayed two images to the right and left eyes, giving an illusion of depth.
The same principle is used in stereoscopic television
In our simple planar 3D, we are confronted with two images on a planar screen which have a disparity. The information the brain receives is that there are objects at different distances before it, and therefore it tries to get the eye to focus on them, and to point to them. Unfortunately, to get the sharpest image, the eyes have to focus on the plane of the screen, and not where the brain thinks the objects are located in space. Focusing on the plane of the screen produces the sharpest images on the retina, yet this is not where the objects “appear” to be. In short, the brain is confused, resulting in discomfort for the viewer if watched for long periods of time.
With time and practice, viewers can train their brain to ignore some of the information, and focus on the screen — in other words to separate the functions that are normally done together in concert. But we are never completely comfortable, and this causes eye fatigue, which those who have tried stereoscopic systems know well.
There are other causes of eye fatigue, some of which can be reduced by careful alignment and registration of the left and right pictures. This can be easier to do with digital systems, and thus such systems can have less eye fatigue.
The other difficulty of stereoscopic television is that of the “infinity separation”. When objects are viewed at infinity, the two eyes have to point directly forward. To achieve this means that objects that are supposed to be at infinity distance from the viewer need to be displaced in the display by the same distance as the eyes are apart, which is about 65 mm. This is something possible to arrange in a cinema where the two projectors or lenses can be adjusted in situ, so that the infinity distance is always the absolute figure of 65 mm. But, televisions in the home come in a variety of screen sizes, and it is not possible to know before a transmission what difference in the transmitted signal will produce an absolute distance of 65 mm on a display.
If the viewer has a distance wider that 65 mm, this is particularly uncomfortable, because the two eyes have to point outwards. Closer than 65 mm, the infinity moves forward.
There are also important measures that need to be taken to reduce eye fatigue in 3D programme production grammar. They include restricting depth of scene, positioning the key objects in the plane of the screen, and having a restricted lens-separation to near object distance ratio. Furthermore, generally, the whole 3D scene needs to be “in focus” from front to back. Thus, 3D television in this form does bring new creative opportunities in some ways, but also restricts the creative freedom of programme makers in others.
A simple planar 3D system always gives the same depth cues, whatever the position of the head or body. This is not what happens in real life, where disparity changes with head position. This leads us to consider how to create “real life” 3D.
Seeing reality through the object wave
To understand what kind of signals would be seen in 3D and would cause no eye fatigue, and would allow head movement, we need to go back to the way light illuminates objects in real life.
When light hits an object, all wavelengths, except that of the “colour” of the object, are absorbed by the object surface. The remaining wavelength light is reflected and refracted. Our eyes can be in the path of this light. And thus we “see” the object, via the small aperture of our eyes and their lenses.
However, the light that passes through space towards our eyes is not a point source of light. Rather, it is the summation of light as it emerges from all points and all angles. This totality of light rays — effectively what would pass through an empty picture frame if we held it between us and the object — is termed the “object wave”. It is the recording of this “object wave” that would produce the complete fatigue-free reproduction of the real image for us to see, if we could capture it.
The object wave, like all waves has magnitude, wavelength, and phase. If we could capture all three we could record the “object wave”. If we were to have all information about the object wave, problems such as the accommodation conflict disappear. Unfortunately, we do not have the means currently to record in a sensor any more than the amplitude of the wave. If you hold up a photographic plate to light — to an object wave — and expose it (anywhere in space) all that is recorded is a blur. The plate has no way of recording either the wavelength or the phase of the information.
Normal cameras solve the recording of wavelength by recording separate amplitudes of filtered colour components (R,G,B) which the eye cannot distinguish from true colours, because the eye itself separates light into three components anyway. The lens and aperture of the camera provides for the recording medium the light image as it passes through a point. What we need to record for “total reality” is different — it is the object wave over a window surface area.
Dennis Gabor, the inventor of holography, recognized that there is way to record the object wave as an amplitude only. He solves the wavelength recording problem by illuminating the object with a single wavelength laser beam. He solves the phase recording problem by folding in the phase information to the amplitude information. This is done by creating an interference pattern with an original laser beam that illuminates the object. The interference pattern created can be recording on a plate. Then the amplitude and phase information can be restored by illuminating the plate. The result is a single wavelength version of the object wave. Not perfect, but a step in the right direction.
At present, it is difficult to conceive of an electronic means of recording time-successive slices of component colours of the interference pattern of an object wave. The storage capacity of a photographic plate used for holography is massive, and well beyond current electronic means of delivery. If a practical sensor or method can be found to record the phase of the object wave as well as the amplitude, a solution to real 3D television will be near.
Creating a 3D television system that can be viewed in comfort poses great challenges. We can — and should — examine “stereoscopic systems” to see how well we can make them work, and how we can arrange compatibility with normal television channels. It may be possible to achieve compatible 3D television systems which can be watched comfortably for the length of a programme. But at the same time, we need to continue the fundamental research into recording the “object wave”, to make possible 3D television which really is equivalent to “being there”.