Duke University and University of Illinois at
Urbana-Champaign:
Performance Sept. 28th
3-5PM ET
How can you make sense of the sound you are hearing when in the
Studio?
The Cameras the Studio Align to
Different Parts of the Room
The SoundSense studio is a result of technical and musical engineering.
The hardware that helps in the production of the sounds that you hear
includes nine cameras, fifteen computers, eight speakers, two
microphones and a mixing board. All the computers and cameras
communicate through the internet. The music is a mixture of synthesized
and sampled sounds that are played with a software package called
SuperCollider. When you're in the studio, you control the music you hear
by how you move in the room. Moving with different amounts of motion
results in different rhythms of music being
played. Moving in different areas of the room will produce different
types of music. For example, one part of the room is plays the congas!
How does the Studio react to you?
The studio space is divided into nine different regions where each
region corresponds to the view of one of the nine cameras. Two times a
second, the computer takes a snapshot through the webcam.
Part of the Room Sampled by A Camera
The computer
figures out the amount of motion in each camera's view by comparing the
variation in pixel values of two consecutive snapshots. If the motion
detected is above a certain amount, then the computer monitoring the
camera notifies another computer to synthesize music. Based on the
amount of the motion the camera has detected, the music computer
synthesizes differing rhythms and kinds of music.
See What is Going on In UIUC's CANVAS Lab!
Click Here!
To see the view from a camera in Duke's Studio, just click on "Camera" on
the navigation bar on your left.
How does it work in detail?
The video from each of the cameras in the Studio is sent to a
computer over the Internet. A software program running on that computer
then takes one image and compares it to the image captured right before
it. It does this by essentially "subtracting" the two images from each
other. If the two images are exactly alike (meaning nothing moved or
changed in the room) then the difference between the images when they
are subtracted is zero. If someone were to move in the room between the
capturing of images, then there would be some difference between the
two images, so the computer would record some number greater than zero.
A small number resulting from the subtraction corresponds to a little
movement while a large value after subtraction corresponds to a lot of
movement. The results of these calculations are sent to another
computer that runs the "SuperCollider" program that synthesizes the
music. Since, we can transform the amount of movement into a numerical
value based on the difference between the each of the snap-shots, if
this value is above a certain amount (called a "threshold"), music is
synthesized. What type of music is synthesized depends on this value as
well, as there are three thresholds that in turn synthesize differently
based on the exact amount of motion detected. More movement results
in more complex rhythm, while less movement triggers a more simple
rhythm
Due to the communication between the cameras and the music
synthesizing computer, each camera corresponds to a different part of the collective
"song" that results when the room is full of people. Each camera and
computer continually carries out this process and all can function at
the same time. Therefore, by moving around the room, you are personally
creating the music that you hear.
Here's a model of the Studio constructed using Virtools, to download the plug-in
for Windows go to here,
for other operatiings systems go to here.
How does the music work?
The music is constructed using 8 bar rhythmic and melodic templates.
Each instrument has four rhythmic and melodic templates corresponding to
the four levels of motion that can be detected. The rhythms grow in
complexity: Those rhythms in first template are much simpler than the rhythms in the last template. As the levels of motion change, the
music jumps from one template to another while maintaining its correct location in
the 8 bar phrase.
Additionally, some instruments are programmed to improvise rhythms and
ornaments using simple rules which determine how many times they play
there can be in a given time period without specifying exactly when these new
rhythms and ornaments will happen. Combined with the fixed rhythm templates, the
effect can be one of improvisation supported by a solid musical
foundation.
Marimba
Finally, some of the synthesized instruments are based on physical
models of acoustic instruments. For example, the marimba sound is not a
sampled marimba, but a banded waveguide instrument that recreates the modal
frequencies of a marimba. Metallic gong sounds are synthesized by tuning biquad filters
to the modal frequencies of sampled gongs. These synthesis techniques
offer more flexibility during performance than ordinary samples.
What other mathematical methods are there to determine motion in the
StudioScape?
The method used to "subtract" the images is just fine for this type of
application. However, if we wanted to know exactly where in the room the
movement was happening instead of just whether movement is occurring or
not in a general area of space, a more complicated process called
convolution would be used. In convultion, the computer represents
each image
as a matrix of pixel values, where a pixel is the smallest unit of an
image. Then the two images are compared by multiplying the first matrix
by a reversed and translated version of the second matrix. You can
think of this multiplication as taking the second image, and moving it
around over the first image systematically so that every possible way
that the second image might fit onto the first is inspected. As the
second image is moved over the first image, each pixel of image one and
image two is multiplied together in all the different positions and then
added together. The results reflect whether or not there is a
difference between image one and image two.
Because we want the music to be played as you move, the time it takes
for the program to figure out if a change has occurred in the room is
very important. For this reason, another technique is used so that the
processing can happen as fast as possible and in real-time. The
Fourier Transform can be used to reduce the amount of time the
program takes to determine if there is a difference between images. The Fourier
Transform is a mathematical function that transforms a function that
depends on time into one that depends on frequency. We can do this
because time depends on frequency via the relationship:
The fast
Fourier transform algorithm is used to change the original time-domain
image matrix into one matrix that depends on frequency. On this new
matrix convolution is carried out much more quickly than before, and
then the inverse fast Fourier transform is used to obtain the same
result as convolution would but in much less time.
How can I get involved more projects in StudioScape?
Math and science are directly applicable to many exciting
activities and projects; the MiX TAPEStry project in the StudioScape
project is just one example. There is an infinite amount of creative
ways that technology can let us represent all sorts of things with
sound, opening exciting new ways of exploring everything from scientific
experiments to artistic endeavors.