This blog post examines part of the story of the evolution of music synthesis during a certain key period toward the end of the 20th Century. What is music or sound synthesis? It is the creation of music and sounds without the need to use actual acoustic instruments such as percussion, wind and brass instruments. The 1960s and early- to mid-1970s were a fantastic period for music worldwide. Elvis of course, Johnny Cash, John Denver, The Beatles, Elton John, the incredible Rolling Stones, Led Zeppelin, and acoustic instruments ruled. Computer based sound synthesis was in its infancy and was rarely used back then. Some did use it however. The earliest digital computer sound synthesizer software dates to the late 1950’s and Bell Labs for instance.
Programmable Analog Sound Synthesis
But there are other ways to create sounds electronically too.
Analog synthesizers rely on resistors, capacitors, inductors and active components to form oscillators, noise generators, filters, amplifiers, and modulators for instance. The first programmable analog synthesizer was developed in the mid-1950’s by RCA and was known as the RCA Electronic Music Synthesizer. It is described below and the diagram above can be referred to.
RCA were keen to be able to create music at low cost without requiring an expensive orchestra. If they had to hire just one musician to create the music then they could use the synthesizer to do the work of the orchestra and therefore create very cost-effective broadcast material.
RCA’s synthesizer relied on mechanical tuning forks, electromagnetically energised, and pickups to convert the oscillation into an electrical signal. These fixed oscillators and voltage controlled oscillators (VCO) were mixed to produce the difference signal which was then passed into a non-linear function known as the Octaver which would produce eight simultaneous outputs, one per octave, of the desired note; furthermore for the desired output of the eight, it would simultaneously provide for a more instrument-like sound by generating harmonics (i.e. producing the ‘timbre’ of an instrument) since each output of the Octaver was a triangle wave rich in harmonics and the selected output would then be further processed by later functions downstream.
The ‘Frequency Glider’ portion was used to make tones sweep in order to create ‘portamento’ effects which are used as frequency sliding transitions between notes. It achieved this by replicating the input frequency (using a frequency-to-voltage converter connected to a voltage controlled oscillator); the controlling voltage is low-pass filtered to achieve the sweep characteristic.
The ‘Growth Decay and Duration’ control was used to implement an attack, decay and release amplitude envelope to the note. This worked using a capacitor charge or discharge curve for parts of the envelope, switched by relays, and this resulting envelope voltage was used to bias an amplifier. Some instruments have their note energy spread out over time, and others will release energy in a much shorter period and this can be mimicked using the amplitude envelope feature.
Vibrato type effects can be created using the ‘Low Frequency Modulator’ module which actually modulated the amplitude, not the frequency; apparently by modulating amplitude with a non-sinusoidal signal, the human ear is tricked into thinking a vibrato effect is occurring.
All of these features of the RCA Electronic Music Synthesizer were switched in using relays under the control of a punched paper roll (eventually replaced with optical scanning of inked marks on paper). The final output was mixed with a second channel and then used to create a record using a turntable and cutting tool. The two channels were a clever scheme to solve a couple of issues. Firstly it allowed two notes to be played simultaneously (for example one note could be decaying while the next needed to be was struck) and secondly it allowed for notes to be set up in advance on one channel while the other one was playing a note. This meant that relay timing differences would not cause sound artifacts to become audible during the configuration of the various modules of the synthesizer. Each channel was enabled or disabled by the presence or absence of envelope settings on the paper roll. This can be seen in the example paper roll fragment here (image from from RCA’s patent).
The short (2-minute) video here shows what the RCA Electronic Music Synthesizer sounded like; the audio is from a 1950’s recording marked “RCA Victor LM 1922 ‘The Synthesis of Music’” (source: Lowell) and contains a bit of Brahms Hungarian Dance No.1.
Some later models contained further digital control circuitry but the actual processing of the sound (i.e. synthesis) was analog. Analog synthesizers were exceedingly expensive and required much cabling to set up different sounds. They also required calibration/tuning at all times; any change in temperature meant that the notes were off-pitch.
Synthesis Goes Digital
A lot of the early work in digital sound synthesis occurred at Bell Labs and at Stanford from the late 1950’s. By this time programmable computer use was growing. Max Mathews, Director of the Behavioral Research Laboratory in Bell Labs at Murray Hill in New Jersey made good use of IBM’s Type 704 computer installed at IBM HQ in New York. He developed a computer program in 1957 called MUSIC (or MUSIC 1 since there was a whole series of them) that could play music based on a set of instructions. MUSIC digitally generated a triangle wave and the instructions could specify the frequency, duration and amplitude. The digital values were converted to sound using a 12-bit digital to analogue converter (DAC).
The program was very primitive but he followed that up with MUSIC 2 and then MUSIC 3 in 1960. Later MUSIC 4 and MUSIC 5 were developed too. These so-called MUSIC-N versions had enhancements such as additional channels, additional waveforms and different amplitude envelopes. In 1961 Max Mathews provided the music for the speech synthesized ‘Daisy’ song that was voiced by software created by John L. Kelly Jr and Lou Gerstman. It still sounds shockingly eerie today. The Daisy scene in 2001: A Space Odyssey is a reference to this moment in movie history.
This was still a primitive time for digital synthesis however. It was very difficult to produce sounds that were instrument-like. A real music instrument does not have notes sounding exactly like a triangle wave. It is clear that a music instrument note will have a main frequency but there will be other frequencies present in the note’s spectrum too. These could be harmonic or non-harmonic depending on the instrument. Furthermore these frequencies may change over time while the note is being played. One approach to computer based music would be to record real music instruments and then play them back, i.e. sampling. This was not feasible in the 1950-1960 period because sufficient memory to do this was expensive and not a plentiful resource in computers.
Today there are dozens of sound synthesis techniques, some combining or removing tones , some distorting waveforms, some relying on modelling of real instrument sound characteristics. All of these techniques require a computer, and a fairly impossible one in the early 1960’s! The Type 704 had magnetic core memory with 12 microsecond! access time and could support just under 12,000 floating-point operations per second.
So, well into the 1960’s, music synthesis was fairly primitive. One important thing though was that Mathews published detail about his MUSIC computer program in a magazine called Science.
The diagram above shows how he explained his program; it shows an example that could be programmed in MUSIC. The inputs on the left side are parameters which can be changing. The other blocks (apart from the summation triangle) are generators; they generate a signal based on input. Mathews created a convention where one of the inputs would represent frequency, and the other would represent amplitude. The AND-gate type symbols (that is where the similarity ends of course; these are nothing to do with Boolean logic gates) in the diagram above have their top input as amplitude, and bottom input for selecting the desired frequency. The video also shows an attempt to replicate the short fragment of composition that is in Mathew's paper, using open source Csound.
Digital Sound Synthesis Improves a Lot: FM Synthesis
A professor of music at Stanford in California, John Chowning, happened to read Max Mathew’s article, paid him a visit, and decided to experiment with the software. Chowning was able to get access to the computer at Stanford’s Artificial Intelligence (AI) lab, and several years later made a discovery. It was all to do with changing the frequency of one oscillator with the output from another. Ordinarily this would be a very common thing for a vibrato effect. The main note generating oscillator would be set to a centre tone of (say) 100Hz, and another oscillator at a very low frequency (say) 5Hz would modulate such that the center tone moves slightly up and down around the 100Hz figure. In Chowning’s case he had set the controlling frequency to something higher. He experimented with values of 50Hz, 100Hz and so on, which caused some very interesting sounds to appear. They appeared rich in tones. He also noticed that by changing the amount of frequency deviation that the carrier was subjected to, the sound changed yet again. Through systematic experiments he was able to determine the settings he needed (ratio of carrier to modulating frequency, and frequency deviation) in order to be able to generate sounds with some or all harmonics present within a particular bandwidth. He could alter the bandwidth by changing the amount of frequency deviation. With exact ratios of integers for the carrier to modulation he could selectively choose to have all harmonics, or every other harmonic or other recurring patterns, like if the teeth of a comb were pulled out in a pattern. Furthermore with no additional code it was possible to create sounds with non-harmonic content too. This was possible because any harmonics below zero Hertz gets folded back into the spectrum and allows for the amplitude of the harmonics to add or subtract. But if the carrier to modulation frequency was not composed of integers then there was the capability for the folded back content to fall in-between the harmonics and therefore generate non-harmonic components too.
It turns out that real instruments can have harmonic and non-harmonic content, and so this was a significant step towards simulating instruments.
By using Mathew’s MUSIC software it was possible for Chowning to write code to dynamically change the modulation frequency and frequency deviation as the note was being played. This means that the timbre of the note can change as the note is being played. Combined with an overall amplitude envelope to simulate the striking of instruments and the distribution of note energy, it was possible to fairly easily fool the ear into thinking that a real instrument was being played, much like human taste can be fooled into thinking jelly beans approximately taste like apple or pear through the correct mixing of some chemicals which stimulate certain parts of the sense of taste.
To summarize this important discovery; how does FM synthesis do what it does? How does it produce rich effects that can sound like real instruments? And why does a guitar sound different to a piano, even though the same note may be played?
It relies on the astounding discovery that music instrument sound waves have a constantly changing spectral mix (a mix of frequencies and amplitude) around the main note that is being played. It then takes another leap to realize that it could be possible to approximately model a music instrument sound by harnessing Frequency Modulation (FM). Traditionally FM for radio communications uses a radio frequency (RF) centre frequency, with this centre ‘carrier’ frequency being adjusted or ‘modulated’ by the audio signal. With FM Synthesis for music, the carrier frequency becomes far lower, into the audio band.
FM can appear difficult to understand from a spectral point of view. The video below shows the FM sound spectrum (only first 2500Hz of spectrum is displayed) as the modulation index (the ratio of the peak frequency deviation to the modulating frequency) is gradually increased. Then the video shows the effect of changing the modulating frequency. It is possible to see patterns, and also to observe where interesting sounds appear.
Chowning’s famous paper contained examples of how to generate particular instrument sounds; he showed exactly how to configure MUSIC-N such that the sound of a clarinet, a brass instrument, a bassoon and a woodwind instrument could be simulated. The video further below shows the results of running a MUSIC-N-like piece of software on the BeagleBone Black in an effort to reproduce the sounds described in Chowning’s paper.
Selling FM Synthesis
Stanford University was keen to get this new FM Sound Synthesis technology out in the field and it approached on of the largest electric organ manufacturers in the US called the Hammond Organ Company. Their Hammond Organs relied on a motor rotating and causing oscillations in a pickup that were amplified. It was very electromechanical.
Chowning’s FM Synthesis was so radically different to the traditional electric organ that it actually ended up disrupting the market. But at that time, it seemed sensible to approach the Hammond Organ Company. Unfortunately their engineers didn’t quite know what to make of the patent that Stanford wanted to license to them. The engineers were familiar with mechanics and motors; they didn’t know what to do with an algorithm which is what Chowning’s paper was about.
For a while it seemed that FM Synthesis would be difficult to sell.
In Japan after WW2 there was a big drive to invest in technology and transform the country. The Japanese government actively took steps to ensure this. It’s Ministry of International Trade and Industry (MITI) advised businesses on how to license technology from overseas.
Unrelated, in the United States in the 1950’s there were several major lawsuit rulings against US technology companies such as DuPont, RCA (1958), IBM (1952) and AT&T, concerning issues such as monopolization, patent pools or licensing arrangements and generally unfair advantage. As a result of these lawsuits many of these firms had to allow the licensing of their patents. For Japanese firms it was a huge opportunity to acquire know-how and then refine the production process and turn the patents into realizable products. One example is the bipolar junction transistor (BJT) invented at AT&T by Bardeen, Brattain and Shockley. Sony (which at the time had less than 0.5% of the number employees it has now) acquired a license to manufacture them, spent years refining the manufacturing process and eventually ended up with its own inventions such as the tunnel diode and highly successful products such as the Walkman.
Yamaha too was an organ maker at the same time as the Hammond Organ Company however Yamaha was unique in that it tried to diversify into other markets such as motorbikes. Yamaha had also moved beyond electromechanical organs with their transistorized organ called the Electone D-1. It had a lot of transistor oscillators! The problem with analog oscillators based on inductors or capacitors is that the frequency drifts. A change in room temperature can be enough to cause the organ to sound out of tune. It was the same problem that plagued other manufacturers of analog synthesizers too.
When Yamaha heard about John Chowning’s FM Synthesis they licensed the technology on the spot. They had to produce an instrument very quickly otherwise there were penalties. They produced the Electone GX1 which still relied on analog oscillators but using the FM algorithm. It was an extremely expensive instrument (tens of thousands of dollars). However five years later Yamaha succeeded in launching the DX7 which had the FM synthesis algorithms inside two integrated circuits (ICs). The DX7 had six oscillators which could be rearranged in 32 different combinations and when set up with parameters could create a virtually unlimited set of sounds. The DX7 was one of the most successful synthesizers of all time and cost under $2000 USD.
The 20-year patent finally expired in 1994 and today Yamaha enjoys more than $2bn revenue each year from musical instruments. Stanford did well too; it received $20M for its patent and that was exceptional at the time.
Replicating Chowning’s Work
Out of curiosity I decided to try to replicate the sounds that John Chowning described in his paper. A descendent of MUSIC is a piece of open source software called Csound. It was compiled for the BeagleBone Black and tested. The video below shows the FM algorithm block diagram from Chowning’s paper and how it mapped to the Csound code. The audio results are included.
Some analog and digital sound synthesis techniques from the 1950’s onwards were discussed, culminating in FM sound synthesis which had a major effect on music in the 1980’s and 1990’s. Today it is possible to recreate the same effects using low cost hardware such as the BeagleBone Black.
There is the possibility of small errors in the information above. I’m not a historian or an expert in music. Occasionally there was conflicting information and I have tried to find more than one reputable source for information which I have relied upon. I don’t believe there are any errors but if you notice any, or have more information in areas, please let me know.
Information was taken from multiple sources, the mains one are listed below.
The Digital Computer as a Musical Instrument – M. V. Mathews
The Synthesis of Complex Audio Spectra by Means of Frequency Modulation – John M. Chowning
The Music Machine – Edited by Curtis Roads
We Were Burning – Bob Johnstone