MiX TAPEStry

Duke University and University of Illinois at Urbana-Champaign:

Performance Sept. 28th 3-5PM ET

How can you make sense of the sound you are hearing when in the Studio?

Studio Abstract View

The Cameras the Studio Align to Different Parts of the Room

The SoundSense studio is a result of technical and musical engineering. The hardware that helps in the production of the sounds that you hear includes nine cameras, fifteen computers, eight speakers, two microphones and a mixing board. All the computers and cameras communicate through the internet. The music is a mixture of synthesized and sampled sounds that are played with a software package called SuperCollider. When you're in the studio, you control the music you hear by how you move in the room. Moving with different amounts of motion results in different rhythms of music being played. Moving in different areas of the room will produce different types of music. For example, one part of the room is plays the congas!

How does the Studio react to you?

The studio space is divided into nine different regions where each region corresponds to the view of one of the nine cameras. Two times a second, the computer takes a snapshot through the webcam.

Camera Screenshot

Part of the Room Sampled by A Camera

The computer figures out the amount of motion in each camera's view by comparing the variation in pixel values of two consecutive snapshots. If the motion detected is above a certain amount, then the computer monitoring the camera notifies another computer to synthesize music. Based on the amount of the motion the camera has detected, the music computer synthesizes differing rhythms and kinds of music.



See What is Going on In UIUC's CANVAS Lab!

Click Here!
To see the view from a camera in Duke's Studio, just click on "Camera" on the navigation bar on your left.

How does it work in detail?

The video from each of the cameras in the Studio is sent to a computer over the Internet. A software program running on that computer then takes one image and compares it to the image captured right before it. It does this by essentially "subtracting" the two images from each other. If the two images are exactly alike (meaning nothing moved or changed in the room) then the difference between the images when they are subtracted is zero. If someone were to move in the room between the capturing of images, then there would be some difference between the two images, so the computer would record some number greater than zero. A small number resulting from the subtraction corresponds to a little movement while a large value after subtraction corresponds to a lot of movement. The results of these calculations are sent to another computer that runs the "SuperCollider" program that synthesizes the music. Since, we can transform the amount of movement into a numerical value based on the difference between the each of the snap-shots, if this value is above a certain amount (called a "threshold"), music is synthesized. What type of music is synthesized depends on this value as well, as there are three thresholds that in turn synthesize differently based on the exact amount of motion detected. More movement results in more complex rhythm, while less movement triggers a more simple rhythm

Due to the communication between the cameras and the music synthesizing computer, each camera corresponds to a different part of the collective "song" that results when the room is full of people. Each camera and computer continually carries out this process and all can function at the same time. Therefore, by moving around the room, you are personally creating the music that you hear.

Here's a model of the Studio constructed using Virtools, to download the plug-in for Windows go to here, for other operatiings systems go to here.

How does the music work?

The music is constructed using 8 bar rhythmic and melodic templates. Each instrument has four rhythmic and melodic templates corresponding to the four levels of motion that can be detected. The rhythms grow in complexity: Those rhythms in first template are much simpler than the rhythms in the last template. As the levels of motion change, the music jumps from one template to another while maintaining its correct location in the 8 bar phrase.

Additionally, some instruments are programmed to improvise rhythms and ornaments using simple rules which determine how many times they play there can be in a given time period without specifying exactly when these new rhythms and ornaments will happen. Combined with the fixed rhythm templates, the effect can be one of improvisation supported by a solid musical foundation.

Marimba

Marimba

Finally, some of the synthesized instruments are based on physical models of acoustic instruments. For example, the marimba sound is not a sampled marimba, but a banded waveguide instrument that recreates the modal frequencies of a marimba. Metallic gong sounds are synthesized by tuning biquad filters to the modal frequencies of sampled gongs. These synthesis techniques offer more flexibility during performance than ordinary samples.

What other mathematical methods are there to determine motion in the StudioScape?

The method used to "subtract" the images is just fine for this type of application. However, if we wanted to know exactly where in the room the movement was happening instead of just whether movement is occurring or not in a general area of space, a more complicated process called convolution would be used. In convultion, the computer represents each image as a matrix of pixel values, where a pixel is the smallest unit of an image. Then the two images are compared by multiplying the first matrix by a reversed and translated version of the second matrix. You can think of this multiplication as taking the second image, and moving it around over the first image systematically so that every possible way that the second image might fit onto the first is inspected. As the second image is moved over the first image, each pixel of image one and image two is multiplied together in all the different positions and then added together. The results reflect whether or not there is a difference between image one and image two.

Mathematically we can write the convolution as:

Convulation 
Equation

where i runs from 1 to M - m + 1 and j runs from 1 to N - n + 1. From: http://homepages.inf.ed.ac.uk/rbf/HIPR2/convolve.htm

Because we want the music to be played as you move, the time it takes for the program to figure out if a change has occurred in the room is very important. For this reason, another technique is used so that the processing can happen as fast as possible and in real-time. The Fourier Transform can be used to reduce the amount of time the program takes to determine if there is a difference between images. The Fourier Transform is a mathematical function that transforms a function that depends on time into one that depends on frequency. We can do this because time depends on frequency via the relationship: frequency equation

The fast Fourier transform algorithm is used to change the original time-domain image matrix into one matrix that depends on frequency. On this new matrix convolution is carried out much more quickly than before, and then the inverse fast Fourier transform is used to obtain the same result as convolution would but in much less time.

How can I get involved more projects in StudioScape?

Barbara Jordan

Math and science are directly applicable to many exciting activities and projects; the MiX TAPEStry project in the StudioScape project is just one example. There is an infinite amount of creative ways that technology can let us represent all sorts of things with sound, opening exciting new ways of exploring everything from scientific experiments to artistic endeavors.

.
Valid XHTML 1.0!