Приглашаем посетить

Прутков (prutkov.lit-info.ru)

14.3 Aural Styles

Users who cannot see won't, obviously, benefit from the visual styling that most of CSS enables. For these users, what matters is not the drop shadows or rounded corners, but the actual textual content of the page—which must be rendered audibly if they are to understand it. The blind are not the only user demographic that can benefit from aural rendering of web content. A user agent embedded in a car, for example, might use aural styles to enliven the reading of web content such as driving directions or even the driver's email.

In order to meet the needs of these users, CSS2 introduced a section describing aural styles. As of this writing, there are two user agents that support, at least to some degree, aural styles: Emacspeak and Fonix SpeakThis. In spite of this, CSS2.1 effectively deprecates the media type aural and all of the properties associated with it. The current specification includes a note to the effect that future versions of CSS are likely to use the media type speech to represent spoken renderings of documents, but it does not describe any details.

Due to this odd confluence of emerging implementation and deprecation, we will only briefly look at the properties of aural style sheets.

14.3.1 Speaking

At the most basic level, you must determine whether a given element's content should be rendered aurally at all. In aural style sheets, this is handled with the property speak.

speak

Values

normal | none | spell-out | inherit

Initial value

normal

Applies to

all elements

Inherited

yes

Computed value

as specified

The default value, normal, is used to indicate that an element's content should be spoken. If an element's content should not be spoken for some reason, then the value none is used. Even though an element's aural rendering may be suppressed using none, you may override the value on descendant elements, which would thus be rendered. In the following example, the text "Navigation:" would not be rendered aurally, but the text "Home" would be:

<div style="speak: none;">

Navigation:

<a href="home.html" style="speak: normal;">Home</a>

</div>

If an element and its descendants must be prevented from rendering aurally, use display: none instead. In this example, none of the content of the div will be rendered aurally (or in any other medium, for that matter):

<div style="display: none;">

Navigation:

<a href="home.html" style="speak: normal;">Home</a>

</div>

The third value of speak is spell-out, which will most likely be used in conjunction with acronyms or other content that should be spelled out. For example, the following fragment of markup would be rendered aurally as T-E-D-S, or "tee eee dee ess":

<acronym style="speak: spell-out;" title="Technology Evangelism and 

  Developer Support">TEDS</acronym>

14.3.1.1 Punctuation and numbers

There are two other properties that affect the way in which element content is rendered aurally. The first affects the rendering of punctuation and is called (appropriately enough) speak-punctuation.

speak-punctuation

Values

code | none | inherit

Initial value

none

Applies to

all elements

Inherited

yes

Computed value

as specified

Given the default value of none, punctuation is rendered aurally as pauses of appropriate lengths, although CSS does not define these lengths. To pick an example, the pause representing a period (and thus the end of a sentence) might be twice as long as the pause representing a comma. Pause lengths are likely to be language-dependent.

With the value code, punctuation is actually rendered aurally. Thus, the following example would be rendered as, "avast comma ye scalawags exclamation point":

<p style="speak-punctuation: code;">Avast, ye scalawags!</p>

To use another example, the following fragment might be rendered aurally as, "a left bracket href right bracket left curly brace color colon red semicolon right curly brace":

<code style="speak-punctuation: code;">a[href] {color: red;}</code>

Similar to affecting punctuation rendering, speak-numeral defines the method of speaking numbers.

speak-numeral

Values

digits | continuous | inherit

Initial value

continuous

Applies to

all elements

Inherited

yes

Computed value

as specified

The default value continuous means that the number is spoken as a whole number, whereas digits causes numbers to be read individually. Consider:

<p style="speak-numeral: continuous;">23</p>

<p style="speak-numeral: digits;">23</p>

The aural rendering of the first paragraph would be "twenty-three," whereas the second paragraph would be rendered as "two three." Numeric renderings are, as with punctuation, language-dependent but undefined.

14.3.1.2 Speaking table headers

In the aural rendering of a table, it can be easy to lose track of what the cell data actually means. If you're on the 9th row of a 12-row table, and the 6th cell in that row is "21.77," what are the odds you'll remember what the 6th column represents? Will you even remember to what the numbers in this row relate? Table headers provide this information and are easy to check visually. To solve this problem in the aural medium, CSS2 introduced speak-header.

speak-header

Values

once | always | inherit

Initial value

once

Applies to

elements containing table header information

Inherited

yes

Computed value

as specified

By default, a user agent will render the content of a table header only once, when the cell is encountered. The other alternative is to always render the table header information when a cell relating to that header is rendered.

Let's consider the following simple table as an example:

<table id="colors">

<caption>Favorite Color</caption>

<tr id="headers">

<th>Jim</th><th>Joe</th><th>Jane</th>

</tr>

<tr>

<td>red</td><td>green</td><td>blue</td>

<td>

</tr>

</table>

Without any styles applied, the aural rendering of this table would be, "Favorite Color Jim Joe Jane red green blue." You can probably figure out what all that means, but imagine a table containing the favorite colors of 10 or 20 people. Now, suppose you apply the following styles to this table:

#colors {speak-header: always;}

#headers {speak: none;}

The aural rendering of the table should then be, "Favorite Color Jim red Joe green Jane blue." This is much easier to understand, and it will continue to be no matter how large the table might grow.

14.3 Aural Styles
Note that the document language itself defines the method of determining an element's role as a table header. Markup languages may also have ways to associate header information with elements or groups of elements—for example, the attributes scope and axis in HTML4.

14.3.1.3 Speech rate

In addition to ways to affect the style of speech, CSS also offers speech-rate, which is used to set the speed with which content is aurally rendered.

The values are defined as follows:

<number>: Specifies the speaking rate in words per minute. This is likely to vary by language, since some languages are spoken more quickly than others.
x-slow: Equivalent to 80 words per minute.
slow: Equivalent to 120 words per minute
medium: Equivalent to 180-200 words per minute.
fast: Equivalent to 300 words per minute.
x-fast: Equivalent to 500 words per minute.
faster: Increases the current speech rate by 40 words per minute.
slower: Decreases the current speech rate by 40 words per minute.

Here are two examples of extreme changes in speech rate:

*.duh {speech-rate: x-slow;}

div#disclaimer {speech-rate: x-fast;}

CSS does not define how the speech rate is altered. A user agent could draw out each word, stretch out the pauses between words, or both.

14.3.2 Volume

In an aural medium, one of the most important aspects of presentation is the volume of the sound produced by the user agent. Enter the aptly named property, volume.

The values are defined as follows:

<number>: Provides a numeric representation of the volume. 0 corresponds to the minimum audible volume, which is not the same as being silent; 100 corresponds to the maximum comfortable volume.
<percentage>: Calculated as a percentage of the inherited value.
silent: No sound is produced, which is different than the numeric value 0. This is the aural equivalent of visibility: hidden.
x-soft: Equivalent to the numeric value 0.
soft: Equivalent to the numeric value 25.
medium: Equivalent to the numeric value 50.
loud: Equivalent to the numeric value 75.
x-loud: Equivalent to the numeric value 100.

It's important to note that the volume value (say that five times fast!) defines the median volume, not the precise volume of every sound produced. Thus, the content of an element with volume: 50; may well be rendered with sounds that go above and below that level, especially if the voice is highly inflected or has a dynamic range of sounds.

The numeric range is likely to be user-configured, since only an individual user can determine his minimum audible volume level (0) and maximum comfortable volume level (100). As an example, a user might decide that the minimum audible volume is a 34dB tone, and the maximum comfortable volume is an 84dB tone. This means there is a 50dB range between 0 and 100, and each increase of one in the value will mean a half-dB increase in the median volume. In other words, volume: soft; would translate to a median volume of 46.5dB.

Percentage values have an effect analogous to their effect in font-size: they increase or decrease the value based on the parent element's value. For example:

div.marine {volume: 60;}

big {volume: 125%;}



<div class="marine">

When I say jump, I mean <big>rabbit</big>, you maggots!

</div>

Given the audio range described before, the content of the div element here would be spoken with a median volume of 64dB. The exception is the big element, which is 125% the parent's value of 60. This is computed as 75, which is equivalent to 71.5dB.

If a percentage value would place an element's computed numeric value outside the range of 0 through 100, the value is clipped to the nearest value. Suppose you were to change the previous styles to read:

div.marine {volume: 60;}

big {volume: 200%;}

This would cause the big element's volume value to be computed as 120; this would then be clipped to 100, which corresponds in this case to a median volume of 84dB.

The advantage of defining volume in this way is that it permits the same style sheet to serve in different environments. For example, the settings for 0 and 100 will be different in a library than they will be in a car, but the values will effectively correspond to the same intended auditory effects in each setting.

14.3.3 Giving Voice

To this point, we've talked about ways to affect the aural presentation, but what we've left out is a way to choose the voice used to aurally render content. Like font-family, CSS defines a property called voice-family.

voice-family

Values

[[<specific-voice> | <generic-voice> ],]* [<specific-voice> | <generic-voice> ] | inherit

Initial value

user agent-dependent

Applies to

all elements

Inherited

yes

Computed value

as specified

As with font-family, voice-family allows the author to supply a comma-separated list of voices that can be used in the rendering of an element's content. The user agent looks for the first voice in the list and uses it if the voice is available. If not, the user agent looks for the next voice in the list, and so on, until it either finds a specific voice or runs out of specified voices.

Thanks to the way the value syntax is defined, you can provide a number of specific or generic families in any order. Therefore, you can end your value with a specific family instead of a generic one. For example:

h1 {voice-family: Mark, male, Joe;}

CSS2.x does not define generic family values, but mentions that male, female, and child are all possible. Therefore, you might style the elements of an XML document as follows:

rosen {voice-family:  Gary, Scott, male;}

guild {voice-family: Tim, Jim, male;}

claud {voice-family: Donald, Ian, male;}

gertr {voice-family: Joanna, Susan, female;}

albert {voice-family: Bobby, Paulie, child;}

The actual voice chosen to render a given element will affect the way the user perceives that element, since some voices will be pitched higher or lower than others, or may be more or less monotonic. CSS provides ways to affect these aspects of a voice as well.

14.3.4 Altering the Voice

Once you've gotten the user agent to use a particular voice in the aural rendering of the content, you might want to alter some of its aspects. For example, a voice might sound right, except it's pitched too high for your liking. Another voice might be a little too "dynamic" but otherwise meet your needs. CSS defines properties to affect all of the vocal aspects.

14.3.4.1 Changing the pitch

Obviously, different voices have different pitches. To pick the most basic of examples, male voices average around 120Hz, whereas female voices average in the vicinity of 210Hz. Thus, every voice family will have its own default pitch. CSS allows authors to alter this pitch using the property pitch.

There is no explicit definition of the keywords x-low through x-high, so the most that can be said about them is that each one will be a higher pitch than the one before it. This is similar to the way the font-size keywords xx-small through xx-large are not precisely defined, but each must be larger than the one before it.

Frequency values are a different matter. If you define an explicit pitch frequency, then the voice will be altered so that its average pitch matches the value you supply. For example:

h1 {pitch: 150Hz;}

The effects can be dramatic if an unexpected voice is used. Let's consider an example where an element is given two voice-family possibilities and a pitch frequency:

h1 {voice-family: Jethro, Susie; pitch: 100Hz;}

For the purposes of this example, assume that the default pitch of "Jethro" is 110Hz, and the default pitch for "Susie" is 200Hz. If "Jethro" gets picked, then h1 elements will be read with the voice pitched slightly lower than normal. If "Jethro" isn't available and "Susie" is used instead, there will be an enormous, and potentially bizarre, change from the voice's default.

Regardless of what pitch is used in an element's rendering, you can influence the dynamic range of the pitch by using the property pitch-range.

pitch-range

Values

<number> | inherit

Initial value

50

Applies to

all elements

Inherited

yes

Computed value

as specified

The purpose of pitch-range is to raise or lower the inflection in a given voice. The lower the pitch range, the closer all pitches will be to the average, resulting in a monotonic voice. The default value, 50, yields "normal" inflections. Values higher than that will increase the degree of "animation" in the voice.

14.3.4.2 Stress and richness

A companion property to pitch-range is stress, which is intended to help authors minimize or exaggerate the stress patterns in a language.

stress

Values

<number> | inherit

Initial value

50

Applies to

all elements

Inherited

yes

Computed value

as specified

Every human language has, to some degree, stress patterns. In English, for example, sentences have different parts that call for different stress. The previous sentence might look something like this:

<sentence>

  <primary>In English,</primary>

  <tertiary>for example,</tertiary>

  <secondary>sentences have different parts that call for 

different stress.</secondary>

</sentence>

A style sheet defining stress levels for each portion of the sentence might say:

primary {stress: 65;}

secondary {stress: 50;}

tertiary {stress: 33;}

This leads to a decrease in stress for the less important parts of a sentence, and a greater stress on the parts that are considered more important. stress values are language-dependent, so the same value may lead to different stress levels and patterns. CSS does not define such differences (which probably doesn't surprise you by now).

Similar in many ways to stress is richness.

richness

Values

<number> | inherit

Initial value

50

Applies to

all elements

Inherited

yes

Computed value

as specified

The higher a voice's richness value, the greater its "brightness" and the more it will "carry" in a room. Lower values will lead to a softer, more "mellifluous" voice (to quote the CSS2 specification). Thus, an actor's soliloquy might be given richness: 80; and a sotto voce aside might get richness: 25;.

14.3.5 Pauses and Cues

In visual design, it's possible to draw extra attention to an element by giving it extra margins to separate it from everything else or by adding borders. This causes the eye to be drawn toward these elements. In aural presentation, the closest equivalent is the ability to insert pauses and audible cues around an element.

14.3.5.1 Pauses

All spoken language relies on pauses of some form. The short gaps between words, phrases, and sentences are as critical to understanding the meaning as the words themselves. In a sense, pauses are like the auditory equivalent of margins, in that both serve to separate the element from its surrounding content. In CSS, three properties can be used to insert pauses into a document: pause-before, pause-after, and pause.

pause-before, pause-after

Values

<time> | <percentage> | inherit

Initial value

0

Apply to

all elements

Inherited

no

Computed value

the absolute time value

With the <time> value format, you can express the length of a pause in either seconds or milliseconds. For example, let's say you want a full two-second pause after an h1 element. Either of the following rules would have that effect:

h1 {pause-after: 2s;}

h1 {pause-after: 2000ms;}  /* the same length of time as '2s' */

Percentages are a little trickier, as they are calculated in relation to a measure-implied value of speech-rate. No, really! Let's see how this works. First, consider the following:

h1 {speech-rate: 180;}

This means any h1 element will be aurally rendered at about three words per second. Now consider:

h1 {speech-rate: 180; pause-before: 200%;}

The percentage is calculated based on the average word length. In this case, a word will take 333.33 milliseconds to speak, so 200% of that is 666.66 milliseconds. Put another way, there will be a pause before each h1 of about two-thirds of a second. If you alter the rule so the speech-rate value is 120, the pause will be a full second long.

The shorthand pause brings together pause-before and pause-after.

pause

Values

[[<time> | <percentage>]{1,2} ] | inherit

Initial value

0

Applies to

all elements

Inherited

no

Computed value

see individual properties (pause-before, etc.)

If you supply only one value, then it's taken as the pause value both before and after an element. If you supply two values, then the first one is the pause before the element, and the second one is the pause after. Thus, the following rules are all equivalent:

pre {pause: 1s;}

pre {pause: 1s 1s;}

pre {pause-before: 1s; pause-after: 1s;}

14.3.5.2 Cues

If pauses aren't enough to call attention to an element, you can insert audio cues before and after it, which are the auditory equivalent of borders. Like the pause properties, there are three cue properties: cue-before, cue-after, and cue.

cue-before, cue-after

Values

<uri> | none | inherit

Initial value

none

Applies to

all elements

Inherited

no

Computed value

for <uri> values, the absolute URI; otherwise, none

By supplying the URI of an audio resource, the user agent is directed to load that resource and play it before (or after) an element. Suppose you want to precede each unvisited hyperlink in a document with a chime, and every visited link with a beep. The rules would look something like this:

a:link {cue-before: url(chime.mp3);}

a:visited {cue-before: url(beep.wav);}

The shorthand property cue acts as you'd expect.

cue

Values

[ <cue-before> || <cue-after> ] | inherit

Initial value

none

Applies to

all elements

Inherited

no

Computed value

see individual properties (cue-before, etc.)

As with pause, supplying a single value for cue means that value will be used for both the before and after cues. Two values means the first is used for the before cue, and the second is used for the after cue. Therefore, the following rules are all equivalent:

a[href] {cue: url(ping.mp3);}

a[href] {cue: url(ping.mp3) url(ping.mp3);}

a[href] {cue-before: url(ping.mp3); cue-after: url(ping.mp3);}

14.3.5.3 Pauses, cues, and generated content

Both pauses and cues are played "outside" any generated content. Consider:

h1 {cue: url(trumpet.mp3);}

h1:before {content: "Behold! ";}

h1:after {content: ". Verily!";}



<h1>The Beginning</h1>

The audio rendering of this element would be, roughly, "(trumpets) Behold! The Beginning. Verily! (trumpets)."

CSS does not specify whether pauses go "outside" cues or vice versa, so the behavior of auditory user agents in this regard cannot be predicted.

14.3.6 Background Sounds

Visual elements can have backgrounds, so it's only fair that audible elements should be able to have backgrounds as well. In the aural medium, this is accomplished by playing a sound while the element is being spoken. The property used to accomplish this is play-during.

play-during

Values

<uri> | [mix || repeat]? | auto | none | inherit

Initial value

auto

Applies to

all elements

Inherited

no

Computed value

for <uri> values, the absolute URI; otherwise, as specified

The simplest example is playing a single sound at the beginning of an element's aural rendering:

h1 {play-during: url(trumpets.mp3);}

Given this rule, any h1 element would be spoken while the sound file trumpets.mp3 plays at the same time. The sound file is played once. If it is shorter than the time it takes to speak the element's contents, then it stops before the element is finished. If it is longer than the necessary time, then the sound stops once all of the element's content has been spoken.

If you want a sound to repeat throughout the entire speaking of an element, add the keyword repeat. This is the auditory equivalent of background-repeat: repeat:

div.ocean {play-during: url(wave.wav) repeat;}

Like visible backgrounds, background sounds do not composite by default. Consider the following situation:

a:link {play-during: url(chains.mp3) repeat;}

em {play-during: url(bass.mp3) repeat;}



<a href="http://www.example.com/">This is a <em>really great</em> site!</a>

What will happen is that chains.mp3 will play repetitively behind the text of the link, except for the text of the em element. For that text, the chains will not be heard, but instead bass.mp3 will be heard. The parent's background sound is not heard, just as its background would not be seen behind the em element if both elements had visible backgrounds.

If you want to combine the two, the keyword mix comes into play:

a:link {play-during: url(chains.mp3) repeat;}

em {play-during: url(bass.mp3) repeat mix;}

Now chains.mp3 will be heard behind all of the link text, including the text in the em element. For that element, both chains.mp3 and bass.mp3 will be heard mixed together.

The analogy with visible backgrounds breaks down with the value none. This keyword cuts off all background sounds, including any that may belong to any ancestor elements. Thus, given the following rules, the em text will have no background sounds at all—neither bass.mp3 nor chains.mp3 will be heard:

a:link {play-during: url(chains.mp3) repeat;}

em {play-during: none;}



<a href="http://www.example.com/">This is a <em>really great</em> site!</a>

14.3.7 Positioning Sounds

When only one person is speaking, the sound emanates from one point in space, unless of course that person is moving around. In a conversation involving multiple people, the sound of each voice will come from a different point in space.

With the availability of high-end audio systems and 3D sound, it should be possible to position sounds within that space. CSS2.x defines two properties to accomplish this, one of which defines the angle of a sound's source on a horizontal plane, and the second of which defines the source's angle on a vertical plane. The placement of sounds along the horizontal plane is handled using azimuth.

azimuth

Values

<angle> | [[ left-side | far-left | left | center-left | center | center-right | right | far-right | right-side ] || behind ] | leftwards | rightwards | inherit

Initial value

center

Applies to

all elements

Inherited

yes

Computed value

normalized angle

Angle values can come in three units: deg (degrees), grad (grads), and rad (radians). The possible ranges for these unit types are 0-360deg, 0-400grad, and 0-6.2831853rad. Negative values are permitted, but they are recalculated as positive values. For example, -45deg is equivalent to 315deg (360-45), and -50rad would be the same as 350rad.

Most of the keywords are simply equivalents of angle values. These are shown in Table 14-1, using degrees as the angle value of choice, and illustrated visually in Figure 14-11. The last column of Table 14-1 shows the equivalents of the keywords in the first column being used in conjunction with behind.

Table 14-1. azimuth keyword and angle equivalents

Keyword

Angle

Behind

center

0

180deg -180deg

center-right

20deg -340deg

160deg -200deg

right

40deg -320deg

140deg -220deg

far-right

60deg -300deg

120deg -240deg

right-side

90deg -270deg

90deg -270deg

center-left

340deg -20deg

200deg -160deg

left

320deg -40deg

220deg -140deg

far-left

300deg -60deg

200deg -120deg

left-side

270deg -90deg

270deg -90deg

Figure 14-11. The horizontal plane, seen from above

14.3 Aural Styles
Note that the keyword behind cannot be combined with an angle value. It can be used only in conjunction with one of the keywords listed in Table 14-1.

There are two keywords in addition to those listed in Table 14-1: leftwards and rightwards. The effect of the former is to subtract 20deg from the current angle value of azimuth, and the latter adds 20deg to the value. For example:

body {azimuth: right-side;}  /* equivalent to 90deg */

h1 {azimuth: leftwards;}

The computed angle value of azimuth for the h1 element is 70deg. Now consider the following situation:

body {azimuth: behind;}  /* equivalent to 180deg */

h1 {azimuth: leftwards;}  /* computes to 160deg */

The effect of leftwards, given these rules, will be to make the sound move to the right, not the left. It's strange, but that's how CSS2 is written. Similarly, using rightwards in the previous example would cause the h1 element's sound source to move 20 degrees to the right.

Much like azimuth, only simpler, is elevation, which places sounds in the vertical plane.

Like azimuth, elevation accepts degree, grad, and radian angles. The three angle-equivalent keywords are above (90 degrees), level (0), and below (-90 degrees). These are illustrated in Figure 14-12.

Figure 14-12. The vertical plane, seen from the right side

The relative-placement keywords, higher and lower, either add or subtract 10 degrees from the current elevation angle. Therefore, in the following example, h1 elements that are children of the body will be placed 10 degrees above the horizontal plane:

body {elevation: level;}  /* equivalent to 0 */

body > h1 {elevation: higher;}

14.3.7.1 Combining azimuth with elevation

When values for azimuth and elevation are taken together, they define a point in an imaginary sphere whose center is the user. Figure 14-13 illustrates this sphere, along with some cardinal points and the values that would place sounds in those positions.

Figure 14-13. Three-dimensional aural space

Imagine that as you sit in a chair, there is a point halfway between straight ahead and your right, and halfway between the horizon and the zenith. This point could be described as azimuth: 45deg; elevation: 45deg;. Now imagine a sound source at the same elevation but located halfway between your left and a point directly behind you. This source could be described in any of the following ways:

azimuth: -135deg; elevation: 45deg;

azimuth: 215deg; elevation: 45deg;

azimuth: left behind; elevation: 45deg;

It is entirely possible that positioned sounds would be of assistance to a user in separating cues from other audio sources, or to create positionally separate special material:

a[href] {cue: url(ping.wav); azimuth: behind; elevation: 30deg;}

voices.onhigh {play-during: url(choir.mp3); elevation: above;}

Table of Contents