Session 1: pvan
Session 2: mqan
To work through these examples, you will need to have installed the SNDAN programs, and GSVIEW or equivalent eps viewer. The directory containing the programs (e.g c:\sndan\bin) needs to be on your system PATH. See Installing SNDAN and Using the Windows Console if you are unsure about the procedures.
To begin with, if you have not already done so, open a Windows Console session (MSDOS prompt), and make the sndan/examples directory your current directory. If this is on drive C, you would need to type
Some soundfiles in this directory are used in the examples below.
Two example sessions are presented, the first using phase vocoder analysis
(pvan), and the second using McAulay-Quatieri (MQ) 'partial tracking'
analysis (mqan). When you have worked through these sessions, try
analysing and processing the other soundfiles included in this distribution.
You will then be ready to work through the rest of this documentation,
and embark on your own projects.
Before you can analyse a sound, a 'header file' needs to be created.
This contains documentary information about the sound, some of which is
used to annotate the graphic displays. The program to do this is mkheader.
In this session mkheader will be run interactively, which means that it will ask you for each piece of information in turn. You need to give mkheader a filename:
Enter the information following the prompts ( printed below in bold):
Performer is: ctptf4.wav (unless you know the performer!)
Instrument used: C trumpet
You are next asked for the date the sound was recorded. Just enter the current date, unless you have reason otherwise. This is simply text, so any reasonably compact format is acceptable. For example:
Date recorded: Feb 1st 2000
Pitch played: F4
Dynamic level: mf
[or, you could enter the peak amplitude of the file]
Vibrato used? no
File contains [all/middle/end] of tone: all
Enter comments: slightly unsteady
( or whatever else might be useful. You can also leave this section blank. Comments can run over multiple lines - press <Enter> in the usual way to terminate the line, then press CTRL-Z <Enter> to finish.)
Note that when using mkheader interactively, you can use spaces between words for each item. If you type everything into the command line, each item with spaces will need to be enclosed in double-quotes.
You can now analyse the sound, using the phase vocoder program pvan. There is one parameter that requires some thought - the analysis frequency. The more accurately you can set this, the better the analysis will be.
In this case it is easy, as you already know the musical note - F4. Referring to the Pitch Table, you can see that F4 has a frequency of 349.228 hz. You can just use 349, as the small difference (given that even the best players wobble in pitch a little!) can easily be handled by pvan. The full command line for pvan will be:
pvan 349 ctptf4.head ctptf4.wav ctptf4.an
Pvan analyses the sound into frames of harmonics. In the process it prints a lot of numeric information to the screen, which you can safely ignore. However, at the end it prints a very comprehensive summary of the sound - including all the information you put into the head file. Some of this is rather technical, but you can see, for example, that the frames are only 1.5 milliseconds apart, and that there are 63 harmonics.
If you like, you can immediately check the accuracy of the analysis by resynthesising "ctptf4.an". Use addsyn for this:
addsyn ctptf4.an ctptf4syn.wav
You can now display the analysis data using monan. This easily the dominant program in the SNDAN package for analysis and transformation. It has a large number of commands, some of them rather technical. This example only hints at the possibilities available.
Run monan with the name of the analysis file:
You are asked how many harmonics to load; the default is to load all of them, so just press <Enter>.
This brings you to the main command prompt within monan. Expert users will go directly to the command they want to use, but at this stage you are just starting, and will probably just want to know what you could do. So, as the monan prompt suggests, call up the list of commands by typing 'lc'.
Somewhere in this list you will see the 3D display command, 'pp'. Type 'pp', and when the pp menu is displayed, type '1' to print the graph. At this point you will really know if GSTOOLS has been successfully installed. If it has, an attractive colour 3D plot of the sound will appear after a second or so. If not, well, you need to install GSTOOLS or an equivalent eps viewer (see Installing SNDAN).
Assuming all is well: it is now up to you to interpret what you see in this picture. Note that monan has only printed the first 20 harmonics. To see all of them, type '3' (you are still at the pp command level), and set harmonics 1 to 63 to be displayed. Then type '1' to plot the new graph.
On first inspection, the original srange, or the first 20 harmoncs, seems about right, as there appears to be very little energy in the upper harmonics. However, to be sure, and to inspect those upper harmonics more closely, you can change to a Decibel plot (type '5' at the pp prompt, followed by '1' to get the new plot). Those high harmonics can be seen much more clearly. They may have seemed very quiet before, but this is a trick of the numbers; in an amplitude plot we really only see the loud parts of the sound. They are quiet, but certainly audible. This could be called a 'loudness' plot, as it reflects much more closely what we hear.
So, how will it affect the sound if you simply eliminate these harmonics? This is important if, for example, you want to reasmple the sound at a lower sample rate. It is a simple matter to test this by resynthesizing using only the lower harmonics. Exit the pp menu level by typing 'q', and then type sy, monan's synthesis command.
This asks for several parameters, most of which you will want to leave alone (just press <Enter> at each prompt) - but take note of the options available, which include pitch-shifting and time-scaling (for more control over time-scaling, see the addsyn programs). The first parameter asks for an outfile name - type "ctptf4_30.wav". The third parameter asked for is the range of harmonics to keep, so here you need to type 1 followed by 30, to keep the first 30 harmonics. Accept the defaults for everything else, and the synthesis is performed.
You can play the sound within monan using the sp command. This launches the player application installed as the associated program under Windows (the program that would run if you double-clicked on a WAVE file).
Of course, you need to give the sp command a filename, so you have to type it in. But can you remember it exactly? If you have command history ("doskey") running, you don't need to - step back through the commands using the up-arrow key until you find the filename you had typed in previously.
You have probably noticed that the trumpet tone is none too steady. There seem to be wobbles in both pitch and loudness. Indeed, using the 3D plot, the amplitude profiles show clear peaks through the sound. There is also a lot of lower-level noise (thick black lines) on the edges of the amplitude profiles. Of course, this is just one note, and in the context of a live poerformance it is hardly important. However, if this sound is to be used on a sampler, irregularities of this kind are generally undesirable.
One simple but powerful transformation available in monan is the facility to smooth the harmonic profiles (amplitude or frequency), using a low-pass filter. Try the amplitude change first. Type 'sm' (you need to be at the top menu level in monan), and enter 'a' to apply the smoothing to amplitude. You have the option of offsetting the start time, which enables you to avoid flattening the attack, but for now accept the default of zero.The cut-off frequency is then required. For this task it must be low: try 5 hz. Then redisplay using pp, command 1 (you also need the 'linear' amplitude scale). You should see that the amplitude envelopes are much smoother now, while the larger-scale peaks are preserved. In fact, this offers a powerful tool for broad-band noise reduction, as each harmonic is processed individually, but harmonics are not simply removed or reduced in amplitude, as would be the case using a conventional filter.
You can also apply sm to smooth frequency wobbles, in the same way. To see how erratic the frequencies are, use the 'af' command. This creates a bar plot of the amplitude of each harmonic, but shows frequency deviations as lateral deviations on the display. The broader the wiggles, the greater the frequency deviation.
At any stage, you can synthesise the modified data using sy, and play it using sp. Also note that in this Windows version a single level 'undo' has been implemented. Using the 'un' command you can return to the previous state and repeat a command with changed parameters.
[SNDAN Home page]
Analyse the sound.
Display the analysis data.
Convert to .an format.
Develop further in monan.
Analyse a difficult sound.
Mqan's analysis method is very different from that of pvan, although both start of by applying a running FFT spectrum analysis to the sound. While pvan assumes that the source contains harmonics that deviate only slightly from their predicted positions, mqan takes the analysis several stages further. Firstly it scans each FFT frame for peaks, which are deemed to contain true frequency components ('partials') of the sound. Secondly it attempts to link peaks from frame to frame, to form 'partial tracks'. It is thus able, in principle, to create a coherent analysis of a sound containing wide and arbitray pitch changes. Inevitably, the track generation process will stumble if peaks are ambiguous, and mqan, especially, requires that partial tracks to not overlap, so it is not infallible (this is after all an 'analysis model' - not all sounds, especially those with significant amounts of noise, fit this model at all well), but the quality of the analysis, if handled carefully, can be surprisingly good even for challenging sounds such as drums (such as the file "doumbek.wav" included with this distribution).
However, for this example session a relatively non-challenging sound is used, "tenor.wav", a short sample of a tenor voice with a rich vibrato. As before, we need to identify the fundamental frequency for the analysis, which in this case is G3. Of course, you will play the file first; try to 'analyse' it in your head as you do so - are there any distinctive features in the sound that you would expect to appear in the analysis?
You should by now have no difficulty in creating a suitable head file for "tenor.wav" using mkheader; you might as well call the file "tenor.head".
According to the Pitch Table, G3 corresponds to a frequency of 195.998 Hz. So in this case we could reasonably set the analysis frequency at 196Hz.
The command line for mqan is much the same as for pvan:
mqan 196 tenor.head tenor.wav tenor.mq 20
The only parameter that is new is the last one, 'threshold'. Depending on the source, this can have quite an impact on the analysis, as it determines the loudness threshold below spectral peaks are discarded. In this case the peaks will be very clear, so the value is almost immaterial. A value of 20 corresponds to -76dB below peak; enough to discard low-level noise while still preserving all salient components.
However, it is important to make a check on the quality of the analysis. The program for this is mqsyn2:
mqsyn2 tenor.mq tenorsyn.wav 1 1
The last two arguments implement simple time-varying time-scaling. For a basic quality check this is inappropriate, so the tempo factor will be 1 throughout. It is well worth experimenting with later, of course!
For .mq files, the display program is mqplot. Unlike the much more powerful program monan, it offers no transformations. Later, you can convert "tenor.mq" into the .an format, and load that into monan.
mqplot is very interactive; it will even ask for a filename if you just type in the program name.
However, there are only a few required arguments, so if you like you can type them all in:
mqplot tenor.mq 100 6000 3 2
This produces a 2D plot of frequency against time. The tracks reveal the vibrato very clearly. Note that though tracks cannot cross over each other in this implementation, they can occupy the same space at different times. This is quite different from the fixed harmonic structure assumed by pvan.
You are now in a small command menu similar to those in monan. One useful command here is 'f' to change the vertical frequency scale. You need to know the minimum and maximum fundamental frequency in the sound, for the next stage in this session, so you need to zoom in on the lowest track. Set a low limit of 100 and an upper limit of 300. This will show that the fundamental lies within the range 160 to 220. If you like, run 'f' again with these limits to confirm this range.
You may also have realised that the lower limit is well under the fundamental frequency of 196 used when "tenor.wav" was analysed. Mqan in fact surreptitiously drops the given analysis frequency by an octave, to allow for just such deviations.
The plot created above shows that the vibrato of the voice has been captured very successfully. The next step is to convert "tenor.mq" into the .an format so that the data can be developed and studied further with monan. The program to convert .mq files into .an files is mqtoan. It is important to appreciate that the format of the .an file itself does not have to be limited to the fixed harmonics extracted by pvan. Each harmonic is simply defined by a time-varying amplitude and a time-varying frequency deviation. With pvan analysis the deviation will always be within the bounds of the fundamental frequency. However in principle the deviations written in the file could range much more widely. The process is by no meas infallible; at the end of this session an example is presented where mqtoan fails to make the conversion.
The command line is rather more extended than those you have had to use hitherto:
mqtoan tenor.mq tenor.an 160 220 20 0.03 160 100
The first three numbers here are familiar, as they correspond to the minimum and maximum frequencies determined above, and the threshold amplitude used to analyse the sound. Since the sound is strongly harmonic, the 'harmonic acceptance interval' can be small; here the recommended value of 0.03 is used. The fundamental frequency can reasonably be the same as the mimimum frequency. Finally, you can either st a deliberately low value for the number of harmonics to retain, or a generous value, as here, to get all of them. As it happens, when mqtoan is run as above, it creates 68 harmonics.
Load "tenor.an" into monan, following the procedure in Session 1. If you can resist the temptation to draw another colour 3d plot, you can instead go directly to the command pt, which plots the musical pitch of the note against time, i.e the vertical scale is marked in musical notes rather than in raw frequency. You are asked to accept or alter various parameters. Accept the octave range suggested, and set an amplitude threshold of zero (you may as well see everything). Finally, the 'Allowable pitch range' need not be more than a semitone in this case though it does no harm to allow more. Any deviations outside the allowable range are not plotted.
At last, monan draws a plot of the pitch profile of the voice. Does it reflect the observations you made when you listened to "tenor.wav"? You may have heard the slight rise in pitch at the end, and you may have thought the overall pitch was very stable. Did you sense that, despite the fact that it sounds like a G, most of the pitch actually lies below G, and that the vibrato ranges over a semitone?
You can now try out the smoothing command sm you used in Session 1, with the aim of reducing the vibrato. You will need to apply smoothing to both frequency and amplitude, but it is worth comparing the effects of smoothing one or the other. This is where the undo command un is useful. You can apply the first transformation, to pitch, say, and synthesise the new data. Then you can go back to the original data by typing 'un', and smooth the amplitudes.
Since vibrato is typically around 5Hz, to remove it the filter cut-off frequency will need to be lower than that. You may sometimes find that it is a good idea to apply smoothing in a couple of stages, dropping the cut-off frequency each time.
Included in this distribution is a soundfile called 'wobble.wav'. This is a simple sine-wave with several up-down pitch sweeps in it. Such a sound will give some difficulty to pvan, but mqan can manage very well. "Wobble.wav" is of little musical merit, but it is useful here to demonstrate the differing capabilities and limitations of pvan, mqan and mqtoan.
For convenience, the header file "wobble.head" has already been prepared.
To analyse with pvan, use this command line:pvan 300 wobble.head wobble.wav wobblepv.anTo analyse with mqan, use this:mqan 300 wobble.head wobble.wav wobblemq.an 20You now need to resynthesise these files, to compare the quality of the analysis:addsyn wobblepv.an wobpvsyn.wav
mqsyn2 wobblemq.mq wobmqsyn.wav
It will be very obvious that pvan has struggled. Although the source signal is clear enough, it is accompanied by audible extra modulation artefacts. mqan, on the other hand, has achieved a virtually flawless result.
You can inspect "wobblepv.an" in monan using the ftc command, which draws a colour or grey-scale sonogram plot. The added artefacts generated by pvan can be seen very clearly. For "wobblemq.mq", use mqplot with the standard 2D display. The display frequency range for both files can be 200 to 1200.
Also, take a look at the sizes of both files. pvan has no choice but to keep each analysis frame, and all the harmonics expected for that frame. mqan, on the other hand, can reduce the analysis data substantially, so that MQ files will vary considerably in size, depending on the complexity of the source sound.
Unfortunately it has proved impossible to convert "wobblemq.mq" into an .an format file, as, perhaps unsurprisingly, the fundamental frequency tracker is unable to find a fundamental pitch. The fcheck stage of mqtoan prints a stream of messages saying "sparse frame detected" - equivalent to analysing an empty file. It is hoped that a later version will be able to ovecome this limitation.
[SNDAN Home page]