Glossary

Ambitus: The ambitus of a voice is the range, or the distance, between the highest and the lowest singable note. The staff view can show the typical ambitus of speaking and singing voices and of singable overtones.

Amplitude: The amplitude is the maximum value of a signal over a given period of time. This correlates to the intensity and to the perceived loudness of a signal. It has no unit, but is scaled into the range [-1, 1], where -1 and 1 represent the largest values that a particular file format can encode.

Analyzer View: The Analyzer View is the central window in VoceVista and contains one or two sub-windows that can show the Spectrogram, the Spectrum, or both. More about the Analyzer View

Auto Marker: An Auto Marker is a type of Marker that is automatically created for each recorded segment. In other words, every time you press record, and then stop, a new auto marker is created to mark the recorded time period.

Bit Depth: Same as Sample Size.

Cent: A cent is one hundredth of the distance between two notes on the piano, or between two semitones of the tempered scale. In other words, two consecutive keys on the piano (regardless if black or white) are 100 Cent apart. The cent is used to measure extremely small intervals. One octave is divided into 1200 Cent.

Clipping: Clipping is the effect when parts of the recorded audio signal are too loud to be represented by the used sample format, and are therefore cut off. For example, the audio format may be able to represent sample values between -1.0 and 1.0. If the incoming signal contains values larger than 1.0, they will all be set to 1.0, which causes a loss of information, and a distortion of the signal.

dB: Short for Decibel.

Decibel: The decibel is a logarithmic unit that indicates the ratio of an intensity relative to a reference level. When used to represent the intensity of an audio signal or of individual frequency components, the reference level is 0dB, which represents the loudest sound that can be encoded in a particular file format. A decibel value of 0dB equals an amplitude of 1. All intensities that are smaller than the loudest reference level have a negative decibel value. The available range depends on the bit depth of the file format. With 16 bit, the smallest intensity that can be represented is -90dB, and with 24 bit, it is approximately -140dB.

Dynamic Range: The dynamic range is the ratio between the largest and the smallest value that can be represented by a given format. The dynamic range is typically measured in decibel. In digital audio, common dynamic range values are 90db (for 16-bit audio), and 140db (for 24-bit audio).

EGG: Short for Electroglottograph.

Electroglottograph (EGG): The Electroglottograph (or "EGG") is a small device that allows to estimate the closing and opening of the glottis (Wikipedia), the opening between the vocal folds.

Equal Temperament: A tuning system that divides the octave into twelve equal semitones, each 100 cents apart. It is the default tuning of the modern piano and most Western music. Tones in this system are referred to as tempered tones; in VoceVista, the snap-to-note feature snaps slider lines to the nearest tempered tone.

FFT Window Function: The window function is a set of coefficients between 0 and 1 that are multiplied with a sequence of samples before taking the FFT of this sequence. The purpose of this is to reduce mathematical artifacts in the spectrum arising from discontinuities between the beginning and the end of the signal. Read more about the FFT Window function on Wikipedia.

Fast Fourier Transform: The FFT is a mathematical process that converts a series of samples in the time domain (such as a digital audio recording) into a list of frequencies and their intensity. VoceVista can run FFTs at up to 2²⁰ (1,048,576) bins in the Pro edition; see the high-resolution audio spectrum analyzer overview.

File / Marker List: The File / Marker List is a window that lists the Markers of the current file. It can also show a list of recently used files, or a list of search results. Further, it allows to add and edit markers and marker descriptions.

File Description: The File Description is a special type of Marker that is automatically added to every file. Each file has a description, which is the first entry in the marker list, and which has the round information icon as symbol. It can be used to add a description to the file (such as what it contains, when it was recorded, where, with whom, and any other relevant information).

Filter: Short for Frequency Filter.

Formant: A formant is a peak in the spectrum resulting from harmonics meeting a resonance. The literature on the voice does not always clearly distinguish between formants, resonances, and overtones. Overtones are frequency components of a sound that may be amplified by the vocal tract if they match the frequency of a resonance, and the resulting peak could be called a formant.

Frequency: The frequency is the number of cycles per second. The unit of frequency is the Hertz (Hz). The frequency of a sound wave determines its pitch.

Frequency Domain: A representation of a signal as a function of frequency rather than time. The Spectrum is a frequency-domain view: it shows which frequencies are present in the signal and how strongly, but not when each event occurred. Compare to Time Domain.

Frequency Filter: Frequency Filters are a tool to isolate individual parts of a recording in the frequency domain and make them louder or quieter. This allows, for example, to listen only to specific frequencies in a recording, or to take them away entirely.

Frequency Resolution: The frequency resolution of the Spectrum is the difference in Hz between two frequencies that the analyzer can distinguish. The frequency resolution can be set on the Analyzer Settings page. Smaller values show more detail in the Spectrum and Spectrogram, but they also require more processing power and can make the program slower. For a discussion of how resolution depends on FFT size and sampling rate, with numerical examples up to 2²⁰ bins, see the high-resolution audio spectrum analyzer page.

Fundamental: For a tone that has multiple harmonic components, the fundamental tone is the frequency that forms the base of an overtone scale that contains all these harmonics. In most cases the fundamental is the pitch that a human listener will identify when hearing the tone.

Harmonic: Harmonic is another word for overtone, with one small difference: Harmonics are counted such that the fundamental is the first harmonic, while overtones are counted such that the first overtone is the second harmonic.

Harmonic Series: The harmonic series is the set of frequencies that are all integer multiples of a fundamental frequency.

Hertz (Hz): Hertz is the unit of frequency to indicate the number of cycles per second of a periodic phenomenon. It is named after the German physicist Heinrich Hertz.

Intensity: The intensity is a measure of how loud or strong a signal is. The Waveform shows the intensity of the entire recording for each point in time, while the Spectrum shows the intensities of the individual frequency components. The intensity can be measured as amplitude, or in decibel.
The intensity is not identical to the loudness of the whole signal or of the frequency components, because the human ear perceives different frequencies differently. For example, if two tones are played with the same intensity, one with 100Hz, and the other with 1000Hz, a human listener might hear one as louder than the other, even though they have the same amplitude when leaving the speaker. The intensity that VoceVista can show is therefore not the loudness experienced by a human listener, but the sound pressure level recorded by the microphone.

Intonation: How accurately a singer or instrumentalist produces the intended pitches in a piece of music. Good intonation means the notes match the chosen tuning system closely enough that the music sounds in tune. VoceVista makes intonation visible by showing the pitch of each note against musical notation and by reporting deviation from the nearest tempered tone in cents.

IPA: Short for International Phonetic Alphabet, a standardized set of symbols for representing the sounds of spoken language. VoceVista uses IPA symbols on the Vowel Chart to label vowels independently of any specific language’s spelling.

Lin: Short form of linear. Opposite of logarithmic. On a linear scale, numbers with the same distance have the same difference.

Log: Short form of logarithmic. Opposite of linear. A log scale can be useful to display numbers that range from very small to very large, especially values that represent quantities perceived by humans. On a log scale, numbers with the same distance to each other have the same ratio, whereas on a linear scale, numbers with the same distance have the same difference. The piano has a log scale. All octaves are the same distance apart, as each octave is a doubling of the frequency. If the piano is projected on a linear scale, the piano keys become progressively wider.

Long-Term view: The Long-Term view is part of the Analyzer View and shows things that span a relatively long range of time, such as a Spectrogram, a melody, or a musical piece. The Long-Term view has a frequency scale and a time scale.

LTAS: Short for Long-Term Average Spectrum. An LTAS is computed by averaging many spectra over an extended period of a recording — often the entire recording or a long marked section. This averaging smooths out the detail of individual notes and reveals the overall spectral fingerprint of a voice or instrument, such as the singer’s formant cluster around 3 kHz, or the typical balance of low and high harmonics. more about LTAS

MIDI: Short for Musical Instrument Digital Interface, a standard protocol to encode messages to electronic instruments. In VoceVista, MIDI output is used to play the keys of the piano keyboard and of Note Sliders. It can be send to the standard MIDI synthesizer that is part of the operating system, or it can be send to external instruments connected to the computer. MIDI is also used as a file format to store a musical piece as a sequence of notes.

Marker: A marker marks a specific point in time, or a time range, in a recording. It can hold text to name and describe the area of interest. Markers can be used as searchable bookmarks to easily find specific points in a recording, and to add comments and notes. There are four types of markers: Auto Markers, Range Markers, Point Markers, and the File Description. More about Markers

Mono: A mono recording has one channel, for example the input of a single microphone.

Note: In music, a note is a single pitched sound of a defined duration, and the written symbol that represents it. Within VoceVista, "note" may refer to (1) the musical note as written on the Staff View, (2) a Note Slider overlaying the spectrogram, or (3) an entry in a MIDI sequence — all of which share the same underlying idea of a pitch with a time extent.

Note Slider: A Note Slider is a visual tool that is laid over the Spectrogram. Each slider has a fundamental frequency and, optionally, a start and end time, so it can be understood as a musical note — much like a note in a MIDI sequence. A slider can be played as a sound. It can also be drawn out to show the overtones and undertones of its fundamental frequency, which makes it useful for highlighting a specific note, illustrating principles of music theory and acoustics, or transcribing a piece of music. Note Sliders are sometimes referred to simply as Notes, and were called Overtone Sliders in earlier versions.

Octave: An octave is the interval between two pitches whose frequencies are in a 2:1 ratio. A note one octave above another sounds like the "same note" at a higher pitch and shares its letter name (e.g., A3 and A4). Because the ear perceives pitch logarithmically, equal octaves appear as equal distances on a logarithmic frequency scale and on the piano keyboard.

Oscilloscope: A display that shows how a signal changes over time on a two dimensional graph, where one axis is time, and the other axis is the intensity of the signal. In VoceVista, an oscilloscope display can be shown by zooming in the Waveform View very far.

Overtone: An overtone is a tone that relates to a specific fundamental tone. Each overtone has a frequency that is a whole multiple of the fundamental frequency. For example, if the fundamental has a frequency of 100Hz, its overtones have 200Hz, 300Hz, etc. Also called harmonic, or partial tone.

Overtone Slider: Earlier name for Note Slider. The two terms refer to the same visual tool; Note Slider is the current name.

Partial tone: Other word for overtone.

Pitch: Pitch is a perceptual property of a sound that corresponds to the fundamental frequency of a tone. Pitch allows to classify tones as higher or lower. Pitch is not a purely objective physical property because a human listener may perceive the pitch of a tone differently from its measurable fundamental frequency. more about Pitch

Playback Cursor: Other word for Time Cursor, especially during Playback.

Point Marker: A Point Marker is a type of Marker which marks a specific point in time and has no range.

Profiles: Profiles are a set of user settings that can be stored and retrieved. Profiles can contain most settings that can be changed by the user, such as the range of the frequency scale, the arrangement of toolbar buttons, or the display configuration. When a profile is saved, the current state of those settings is written into the profile. When the profile is later activated, all affected settings will be set to the value in the profile.

Range Marker: A Range Marker is a type of Marker that marks a period of time with a beginning and an end.

Resonance: A resonance is a property of the vocal tract with a specific frequency. The vocal tract has multiple resonances that will amplify sound with the frequency of that tone. The sound can come from the vocal folds, but it may also come from other sources.

Ruler: A ruler is a visual aid that marks a specific frequency or amplitude. Over the Spectrogram, rulers are similar to Note Sliders in that they represent a frequency. However, contrary to sliders, rulers have no label, no overtones, and cannot be played. They are simply a visual tool.

Sample: A single measurement of sound pressure, or amplitude. In a digital recording, sound is stored as a sequence of numbers. A sound wave travels through the air and moves the membrane of a microphone. The microphone converts this mechanical movement into an electrical current, and the sound card reads out this current many times per second and stores each sample as a number that can be further processed by the computer.

Sample Size: The number of bits of each sample in a digital recording. Common values are 16, 24 and 32 bit. Larger values can represent a larger dynamic range of intensities. Also referred to as Bit Depth.

Sampling Rate: The number of discrete measurements (or samples) per second stored in a digital audio recording. The sampling rate determines the frequency range that can be represented by an audio file. The highest representable frequency is half the sampling rate. For example, in a file with a sampling rate of 44100 Hz, the highest frequency that can be displayed in the Spectrum is 22050 Hz.
Common values are 44100 samples per second for CD-Quality sound, or 48000, 96000 and 192000 samples per second for studio-quality sound.

Short-Term view: The Short-Term view is part of the Analyzer View and shows things that span a relatively short range of time, such as a single Spectrum. The Short-Term view has a frequency scale and an intensity scale. However, the intensity scale only applies to the Spectrum, and not to the pitch value.

Singer’s Formant: A cluster of formants in the range of approximately 2.5 - 3.5 kHz that is characteristic of trained classical singing voices. The clustering of formants in this region produces a strong peak in the spectrum that allows the voice to project over an orchestra without amplification, since orchestral sound has relatively little energy in that band. The singer’s formant is typically visible as a bump on the high end of the LTAS of an operatic voice.

Snapping: A behavior in VoceVista where a moved Note Slider line aligns to a nearby reference — most commonly the nearest tempered tone (snap to musical note) or the nearest peak in the spectrum (snap to spectral peak). Snapping can be enabled, disabled, or temporarily inverted with the Alt key while dragging. Useful both for setting a precise reference pitch and for measuring the actual frequency of a sung or played note.

Sound Generator: A built-in tool in VoceVista for synthesizing reference tones, glides, scales, and harmonic stacks directly from the program. Useful for ear training, illustrating acoustic phenomena, generating playback material to practice along with, and providing a controlled signal for testing analyzer settings. more about the Sound Generator

Spectrogram: The Spectrogram is a series of spectra. Whereas the Spectrum shows a single frequency-intensity diagram, the Spectrogram shows many such diagrams side-by side. Therefore, the Spectrogram is a two-dimensional diagram where one axis shows time, and the other shows the frequency. The intensity of each frequency at a specific point in time is now represented by the color of this point.

Spectrum: The Spectrum shows the strength of the individual frequency components in a piece of sound at a specific point in time. The Spectrum is a two-dimensional diagram, where one axis shows the frequency, and the other shows the intensity of each frequency.

Staff View: The Staff View shows a musical staff with treble and bass clefs. The location of the staff lines corresponds loosely to the location of the associated pitch on the frequency scale. When notes are played on the piano or the Note Sliders, they are shown as musical notes on the staff view.

Stereo: A stereo recording has two channels. To make a stereo recording, you need a recording device with two separate microphones. Stereo recordings are normally used to add depth to a recording by reproducing sound as a human listener would hear it with two ears. However, the two channels can also be used for different purposes, for example to record the sound from within an organ with one microphone, and the sound from the outside with another.

Subharmonic: A frequency that is a whole-number fraction of a fundamental tone — the same set of frequencies as undertones. The two terms are largely interchangeable; subharmonic emphasizes the mathematical relationship to the fundamental, while undertone emphasizes the audible tone. Subharmonics also appear as a vocal phenomenon (often called vocal fry or creaky voice) when the vocal folds vibrate in alternating cycles, producing pitches at 1/2 or 1/3 of the perceived fundamental.

Tempered Tone: A tone whose frequency matches one of the twelve notes of the equal-tempered scale.

Timbre: Often called tone color, timbre is the perceptual quality of a sound that lets us tell apart two notes of the same pitch and loudness — for example, the same note sung as an "ah" and an "ee", or played on a violin versus a flute. Timbre is determined by the relative strengths of the harmonics (and other components) in the sound, and by how those strengths change over time. In VoceVista, the Spectrum and Spectrogram are the main tools for visualizing timbre.

Time Cursor: Green line that indicates the time in the recording that is currently being played (or that will be played next). Also, when the Spectrogram and the Spectrum are both visible, the Time Cursor determines the time position of the Spectrum.

Time Domain: A representation of a signal as a function of time. The Waveform is the classic time-domain view: it shows the changing amplitude of the signal sample by sample, but does not directly reveal which frequencies are involved. Compare to Frequency Domain.

Time Range Slider: The Time Range Slider is a graphical interface element on the Timeline View that shows the current time range of the Spectrogram and the Waveform.

Time Resolution: The time resolution of the analyzer determines the length of a piece of a recording that the analyzer uses to calculate its Spectrum or pitch. A lower time resolution means that the analyzer can look at a longer piece of a recording. This will give more accuracy in the frequency domain at the expense of resolution in the time domain.

Timeline: The Timeline View shows an overview of the entire recording. It is similar to the Waveform View. The difference to the Waveform View is that the Timeline is zoomed out further than the Spectrogram and may show the whole recording, while the Waveform always shows the same time range as the Spectrogram.

Tuning: The process of adjusting an instrument or voice so that its notes match a chosen reference pitch system (most commonly equal temperament with A4 = 440 Hz). Also: the system itself, e.g. equal temperament tuning, just intonation tuning. VoceVista can be used as a precise tuning aid by showing the deviation of a played note from the nearest tempered tone in cents.

Undertone: An undertone is a tone that relates to a specific fundamental tone. Each undertone has a frequency that is a whole ratio of the fundamental tone. So undertones follow the sequence 1/2, 1/3, 1/4, 1/5 etc. For example, if the fundamental has 100Hz, the undertones have the frequencies 50Hz, 33.33Hz, 25Hz, 20Hz, etc. Each undertone is a tone that has the reference tone as one of its overtones.

Vibrato: A periodic modulation of the pitch of a sustained tone, typically a few cycles per second. Vibrato gives a sung or played tone its characteristic warmth and is one of the things singers and instrumentalists train deliberately. VoceVista can visualize the rate (cycles per second), extent (depth in cents or semitones), and regularity of vibrato in the Analyzer View and the dedicated Vibrato View. more about Vibrato

Vocal Folds: Two folds of muscle and mucosa in the larynx that open and close to produce the buzzing source sound of the human voice. Often also called vocal cords. The frequency at which the vocal folds vibrate determines the pitch of the voice; the resulting source sound is then shaped by the resonances of the vocal tract into the recognizable timbre of speech and singing.

Vocal Tract: The air-filled space above the vocal folds — comprising the throat (pharynx), the mouth, and (when open to it) the nasal cavity — through which sound from the vocal folds passes before leaving the body. The shape of the vocal tract creates resonances that amplify some frequencies and attenuate others, producing the formants that define vowels and a singer’s individual sound.

Vowel: A speech sound produced with an open vocal tract, where its quality (a, e, i, o, u, ä, etc.) is determined by the shape of the tract — particularly the position of the tongue and lips. Each vowel corresponds to a characteristic pattern of formants; the first two formants alone are usually enough to distinguish most vowels, which is why the Vowel Chart is a two-dimensional plot of F1 against F2.

Vowel Chart: The vowel chart shows the first and second resonance frequencies of the oral cavity (sometimes called Formants) that are used in many languages to form a specific vowel. The chart is a two-dimensional diagram where one axis represents the first, and the other the second formant. The vowels are shown as symbols from the International Phonetic Alphabet (IPA).

Waveform: The Waveform View shows the samples of a digital recording. When the displayed time range is very small (in other words, when the view is zoomed in very far), the individual samples are shown, as on an oscilloscope. When the view is zoomed out, each pixel shows an aggregate with the maximum and minimum values of the samples contained in the time range corresponding to this pixel. The values in the vertical middle of the Waveform show the Root Mean Square (RMS) of the signal.