High-resolution real-time audio spectrum analyzer
VoceVista Video is high-resolution, real-time audio spectrum analyzer software for Windows and macOS. It takes a live signal from a microphone or instrument, or a recorded file, and renders the spectrogram on the fly with smooth scrolling at the display's full refresh rate (120 Hz on a current MacBook Pro; higher on fast desktop monitors). The Pro edition resolves the signal into over a million FFT bins, enough detail to separate closely spaced partials in inharmonic sounds like singing bowls, gongs, and bells, and to make precise measurements for instrument design and acoustic research.
Most spectrum analyzer software caps out well below this. Years of optimization went into making the combination of high bin counts, high refresh rate, and smoothly scrolling display practical on a normal laptop.
Audio spectrum analyzer specifications
- Platform
- Windows and macOS
- FFT bin count
- 256 to 1,048,576 (2⁸ to 2²⁰) (Pro edition; Basic and Live cap at 8,192 pts = 8 kpts = 2¹³)
- Frequency resolution
- down to ~0.046 Hz / bin (at 48 kHz sample rate, 2²⁰ bins)
- Measurable frequency range
- half the sample rate (Nyquist), capped by the microphone in practice (see "How resolution and range work" below for values at common sample rates)
- Spectrogram refresh rate
- limited by the monitor, not the analyzer (120 Hz on a current MacBook Pro; higher on fast desktop displays)
- Spectrogram scrolling
- smooth, continuous
- Sample rate
- limited by the hardware, not the software (up to 192 kHz on common audio interfaces; arbitrary for loaded files)
- Window functions
- 12 selectable (Rectangle, Hamming, Hann, Blackman, Nuttall, Bartlett, Blackman-Nuttall (default), Blackman-Harris, Dolph-Chebyshev, Flat Top, Hann-Poisson, Gaussian)
How resolution and range work
An FFT spectrum analyzer has two independent frequency axes, plus a third axis in time. They are easy to mix up, and they do different things.
Frequency resolution (bin width) is how finely the analyzer can tell two close frequencies apart. It equals the sample rate divided by the FFT size. This is the axis where bin count earns its keep: more bins at the same sample rate means a narrower bin, which separates partials that would otherwise merge into one peak. Bin count alone is not the resolution; the resolution always depends on the sample rate too.
Frequency range (maximum measurable frequency) is the highest frequency the analyzer can show at all. It is the Nyquist limit, half the sample rate. Sample rate alone controls this; the bin count has nothing to do with it. In practice this range is bounded again by the microphone, which is usually the real limit (see below).
Time resolution is the third axis: how often the spectrum updates and how long an analysis window the FFT looks at. Frequency resolution and time resolution trade off against each other (narrower bins need a longer time window), so VoceVista exposes both as independent controls so you can pick the balance that fits the task.
Put together: bin count plus sample rate set the resolution, sample rate alone sets the range, and the analysis window sets how that resolution is spent against time. At the Pro edition's maximum of around a million bins, here is what resolution and range come out to at common sample rates:
| Sample rate | Max measurable frequency (Nyquist) | Bin width at 2²⁰ |
|---|---|---|
| 22 kHz | 11 kHz | 0.021 Hz |
| 44.1 kHz | 22.05 kHz | 0.042 Hz |
| 48 kHz | 24 kHz | 0.046 Hz |
| 96 kHz | 48 kHz | 0.092 Hz |
| 192 kHz | 96 kHz | 0.183 Hz |
Holding the sample rate constant at 44.1 kHz (the common rate for vocal analysis and CD-quality audio), here is the bin width at each available FFT size. Sizes up to 8,192 pts (8 kpts, 2¹³) are available in all editions; sizes from 16,384 pts (16 kpts, 2¹⁴) up to 1,048,576 pts (1 Mpts, 2²⁰) require the Pro edition.
| FFT size (points) | Bin width at 44.1 kHz |
|---|---|
| 256 pts ( 2⁸) | 172 Hz |
| 512 pts ( 2⁹) | 86.1 Hz |
| 1,024 pts (1 kpts, 2¹⁰) | 43.1 Hz |
| 2,048 pts (2 kpts, 2¹¹) | 21.5 Hz |
| 4,096 pts (4 kpts, 2¹²) | 10.8 Hz |
| 8,192 pts (8 kpts, 2¹³) | 5.38 Hz |
| 16,384 pts (16 kpts, 2¹⁴) | 2.69 Hz |
| 32,768 pts (32 kpts, 2¹⁵) | 1.35 Hz |
| 65,536 pts (64 kpts, 2¹⁶) | 0.673 Hz |
| 131,072 pts (128 kpts, 2¹⁷) | 0.336 Hz |
| 262,144 pts (256 kpts, 2¹⁸) | 0.168 Hz |
| 524,288 pts (512 kpts, 2¹⁹) | 0.084 Hz |
| 1,048,576 pts (1 Mpts, 2²⁰) | 0.042 Hz |
A note on range. The microphone, not the analyzer, sets the real-world upper limit. Ordinary microphones roll off at around 20 kHz, which is well covered by a 44.1 or 48 kHz sample rate. Professional measurement microphones extend up to roughly 30 or 50 kHz; anything beyond that needs laboratory-grade instrumentation or ultrasonic transducers. So while the software will happily run at 192 kHz on the right audio interface, the limit you usually hit first is the microphone. For specific picks across the price range, see our microphone recommendations.
A note on resolution. For comparison, the default setting for singing analysis is 8,192 pts (8 kpts, 2¹³) at 44.1 kHz, also the maximum in the Basic and Live editions. That gives about 5.4 Hz per bin, plenty for vocal pedagogy where harmonics are spaced apart by 100 Hz or more, but not enough to separate the closely spaced inharmonic partials of bowls and gongs in their low register. At the same sample rate, the Pro edition's 1,048,576 pts (1 Mpts, 2²⁰) gives about 0.042 Hz per bin: 128 times finer.
Dynamic range and bit depth
Dynamic range, the spread between the loudest and quietest signals the analyzer can distinguish, has several aspects that stack along the audio chain. VoceVista records at up to 32-bit floating-point internally, which is wide enough that the recording stage itself imposes no practical limit; imported FLAC files are supported at 24-bit and 32-bit (nominally 144 dB and beyond). On the analysis side, the FFT itself has a usable dynamic range of about 100 dB in practice, set by the sidelobe attenuation of the window function; the 12 selectable windows listed in the specs table trade this dynamic range against main-lobe width. On live signals the limit is usually reached well before any of that: a good audio interface gives 100 to 120 dB of dynamic range, and the microphone plus room noise typically eat another 20 dB on top of that. So the overall dynamic range you see is the minimum of these stages, and in practice the hardware sets the ceiling, not the analyzer.
See it on real signals
A few examples of the kinds of signals where high resolution earns its keep. Click any image to enlarge.
Where this resolution actually matters
For most vocal pedagogy and everyday musical practice you do not need anything close to a million bins. These are the cases where you do.
Singing bowls and Tibetan bowls
Bowls produce closely spaced inharmonic partials, often within a few hertz of each other in the low register. At 23 Hz per bin (a typical 2,048-point FFT at 48 kHz) several partials collapse into a single smeared peak. Sub-hertz bin width separates them cleanly.
Gongs and bells
Rich inharmonic spectra with dozens of modes packed into a narrow frequency band. Resolving the individual modes, and tracking their decay rates, requires fine bin width and a high refresh rate together.
Piano inharmonicity and stretched tuning
Real piano strings produce partials that are progressively sharper than integer multiples of the fundamental. Measuring that stretch curve, the basis for piano tuning, depends on accurate partial-frequency tracking down to a fraction of a hertz.
Instrument design and acoustic measurement
Characterizing the modes of bodies, plates, strings, and pipes calls for the same kind of detail: closely spaced resonances that lower-resolution analyzers fold into a single bump.
Tuning of zithers, harps, and historical instruments
When the tuning system itself is the subject of study, or stretched or tempered in non-obvious ways, you need to read the actual frequencies of the strings, not what an equal-tempered overlay assumes they are.
Editions
The maximum FFT size, and therefore the finest frequency resolution available, differs by edition:
| Edition | Max FFT bin count | Max frequency resolution at 44.1 kHz |
|---|---|---|
| Overtone Analyzer | 8,192 pts (8 kpts, 2¹³) | 5.38 Hz / bin |
| VoceVista Video | 8,192 pts (8 kpts, 2¹³) | 5.38 Hz / bin |
| VoceVista Video Pro | 1,048,576 pts (1 Mpts, 2²⁰) | 0.042 Hz / bin |
For a full comparison of what each edition includes beyond FFT resolution, see the editions page.
Try it on your own signal
The free 30-day trial includes the full feature set. Point it at a bowl, a gong, an instrument, or whatever you want to look at in detail.