While waiting for a calibrated microphone to arrive, I thought I'd try some simple audio spectrum tests using an Agilent 33622A and an Android spectrum analyzer app, called Speedy Spectrum, running on an Asus Nexus 7 tablet.  My intent was to play with a basic set up to get a sense of how the equipment I had at hand performed and to see if what I've read about psychoacoutsic mapping could be demonstrated in a simple and measurable way.  What came out of these exploratory tests was interesting, insightful and encouraging. In the interest of full disclosure I will admit to not being an expert in audio measurement.  I know how to run a waveform generator and a spectrum analyzer and I know how to make subjective assessments of what I hear, but I don't know much about the techniques used to measure audio.  Learning about those methods through experimentation and study is an important part of the fun for me. I invite experts in audio reading this blog to share their knowledge and correct any false claims, errors in use of terminology, or other missteps I make along the way.


My plan at this point is to use instrumentation to characterize my ears ability, or inability, to resolve information in a carefully created audio signal.  To carry out these characterizations I will use an Agilent 33662A Waveform Generator, specifically the summing feature, that allows precise combination of two waveforms of variable frequency and amplitude.  The 33622A will drive powered M-Audio AV40 speakers.  There will be two "ears" listening to the sound produced by the AV40 speakers.  My human ears, a little worn and degraded from optimum performance, but still connected to a human auditory processing path, will provide a subjective assessment of masking effects.  The second "ear" will be a Dayton Audio EMM-6 electret measurement microphone (not yet arrived) coupled to a Behringer Xenyx 302 USB interface, feeding digitized audio into a Windows laptop running spectrum analysis software (specific software to be determined).  This "ear" will provide an objective assessment of audio signals heard by my human ears.  My hope is that my subjective assessment combined with insights provided by numerical data from the generator and measurement configuration will allow me to devise a method to mask low bit rate information within well defined audio content.

My suspicion, grown many years ago while working in broadcast radio, was that I would encounter many imperfections in my audio set-up.  Even though the AV40 are billed as "studiophile" monitor speakers, I do not expect them to have a flat frequency response across the nominal 20 Hz to 20 kHz range attributed to ideal human hearing.  In addition, since I am conducting these experiments in my study and not in an anechoic chamber, I expect there will be refections, nulls, peaks and other real-room effects confounding the open air measurement of audio.  These are not detriments in my view as they represent real world listening situations.  If psychoacoustic masking is to be effective it must work in real listening situations.  Nevertheless, I started my explorations by taking some simple acoustic measurements to see what I was dealing with.


Speaker frequency response in my study


The image below show the physical set up that I characterized with the Speedy Spectrum app in my Nexus 7 tablet.

Spectral response set up.jpg

The Agilent 33622A on top of the instrument stack was the signal source for the tests.  The two channels of the 33622A were frequency and amplitude coupled.  One channel drove the M-Audio speakers, the other channel was fed to an Agilent 53230A frequency counter.  An Agilent 34461A operating in trend mode monitored the AC rms voltage at the speaker inputs.  A Tektronix MDO4104-3 oscilloscope provided time domain representation of the signals fed to the speakers.For this first test I drove only the right hand speaker.  The Asus Nexus 7 tablet running the Speedy Spectrum app is seen in the photograph situated in front of the right speaker.


So, the plan was to sweep the Agilent 33622A from 20 Hz to 20 kHz with the Speedy Spectrum app running in peak detect mode. This would give me a rough idea of the response of the speakers + room + tablet microphone combination.


The first surprise revealed itself as soon as I started the app.  See the spectrum capture below.  In a fairly quiet room, what the heck was causing the significant peaks at 502 Hz, 9.5 kHz, and 19.1 kHz?

SpeedySpectrumCapture unusual peaks.png

Some detective work revealed the following:  The 9.5 kHz and 19.1 kHz peaks (19.1 kHz seems to be the second harmonic of the 9.5 kHz) are always there, no matter where I run the app; indoors outdoors, daytime, nighttime, quiet space or noisy space - they don't go away, don't get quieter or louder.  Are they generated in the tablet, or maybe they be an artifact of the spectrum analysis app?  I don't know, but I will accept their mysterious constant presence.


I found the 502 Hz source through a process of elimination.  As users of modern test equipment are well aware, just about every instrument these days is equipped with a cooling fan.  After sitting at my desk for a while concentrating on experiments, or writing a blog, I no longer notice the constant drone of the fans. However, when I picked up the tablet and left the study, the 502 Hz dropped into the noise floor.  That suggested the source was in the study.  I turned off each piece of test equipment in sequence looking for the 502 Hz peak to disappear.  When I turned off the Agilent 33622A the peak disappeared.  It would appear the fan in the 33622A at steady state generates a robust acoustic or mechanical vibration at 502 Hz.  Oh well.


While attempting to establish a reference level of 40 dB at 1 kHz I discovered that small movements of my body affected the detected signal amplitude. My assumption was that my movements changed the reflection path of the audio signal either adding to or subtracting from the wavefront arriving at the microphone.  Recognizing these effects allowed me to reduce my expectation for repeatable results.  I now expected to get a rough representation of frequency response in broad strikes only.


The 33622A was configured to sweep from 20 Hz to 20 kHz over a duration of 240 seconds so as not to overwhelm the computations occurring in the tablet.  A representative spectral record of one sweep is shown below alongside a short video clip or a portion of the spectrum being captured.

SpeedySpectrumCapture room spectral response.jpg


The spectrum is neither flat or repeatable.  Any movement near the set up causes multi-dB shifts in the response.  During the sweep the Agilent 34461A in trend mode indicated the signal driving the speakers did not change by more than about 100 μV, yet we are seeing swings of about 29 dB in the spectral record.  The response of the speakers contributes to the shape of the response, as does the response of the microphone in the tablet, but I think room acoustics contributes most to the overall shape of the response.  What I've learned from this experiment is that I can record a spectral response using this set up and that the response is very sensitive to the room state.


Preview of next experiment


By using the summing feature in the Agilent 33622A I was able to generate a primary tone and experiment with frequency, phase and amplitude for a secondary masking tone.  I discovered that the Speedy Spectrum app is capable of sensing and representing sinusoids that are quite close in frequency and of significantly different amplitudes.  I also subjectively discovered that I probably will be able to mask a secondary tone so that it is nearly imperceptible to a listener.  If the secondary tone is close in frequency (within 100 Hz) and lower in amplitude by several dB, it becomes difficult to subjectively detect its presence.  I tried to run present/absent masking tone tests but they were somewhat corrupted by a noticeable burst of noise upon switching the summing feature on and off on the 33622A.  The 33622A was not designed to produce noiseless switching and that is understandable as noiseless switching is probably not important in most applications


I'll report my findings with single tone masking in my next blog.