MP3 compression works by reducing (or approximating) the accuracy of certain components of sound that are considered (by psychoacoustic analysis) to be beyond the hearing capabilities of most humans. This method is commonly referred to as perceptual coding or as psychoacoustic modeling. The remaining audio information is then recorded in a space-efficient manner, using MDCT and FFT algorithms. Compared to CD-quality digital audio, MP3 compression can commonly achieve a 75 to 95% reduction in size. For example, an MP3 encoded at a constant bitrate of 128 kbit/s would result in a file approximately 9% of the size of the original CD audio. In the early 2000s, compact disc players increasingly adopted support for playback of MP3 files on data CDs.
During the development of the MUSICAM encoding software, Stoll and Dehery's team made thorough use of a set of high-quality audio assessment material selected by a group of audio professionals from the European Broadcasting Union and later used as a reference for the assessment of music compression codecs. The subband coding technique was found to be efficient, not only for the perceptual coding of the high-quality sound materials but especially for the encoding of critical percussive sound materials (drums, triangle,...), due to the specific temporal masking effect of the MUSICAM sub-band filterbank (this advantage being a specific feature of short transform coding techniques).
Karlheinz Brandenburg used a CD recording of Suzanne Vega's song "Tom's Diner" to assess and refine the MP3 compression algorithm. This song was chosen because of its nearly monophonic nature and wide spectral content, making it easier to hear imperfections in the compression format during playbacks. This particular track has an interesting property in that the two channels are almost, but not completely, the same, leading to a case where Binaural Masking Level Depression causes spatial unmasking of noise artifacts unless the encoder properly recognizes the situation and applies corrections similar to those detailed in the MPEG-2 AAC psychoacoustic model. Some more critical audio excerpts (glockenspiel, triangle, accordion, etc.) were taken from the EBU V3/SQAM reference compact disc and have been used by professional sound engineers to assess the subjective quality of the MPEG Audio formats. LAME is the most advanced MP3 encoder. LAME includes a VBR variable bit rate encoding which uses a quality parameter rather than a bit rate goal. Later versions (2008+) support an n.nnn quality goal which automatically selects MPEG-2 or MPEG-2.5 sampling rates as appropriate for human speech recordings which need only 5512 Hz bandwidth resolution.
The MP3 encoding algorithm is generally split into four parts. Part 1 divides the audio signal into smaller pieces, called frames, and a modified discrete cosine transform (MDCT) filter is then performed on the output. Part 2 passes the sample into a 1024-point fast Fourier transform (FFT), then the psychoacoustic model is applied and another MDCT filter is performed on the output. Part 3 quantifies and encodes each sample, known as noise allocation, which adjusts itself in order to meet the bit rate and sound masking requirements. Part 4 formats the bitstream, called an audio frame, which is made up of 4 parts, the header, error check, audio data, and ancillary data.
During encoding, 576 time-domain samples are taken and are transformed to 576 frequency-domain samples.[clarification needed] If there is a transient, 192 samples are taken instead of 576. This is done to limit the temporal spread of quantization noise accompanying the transient (see psychoacoustics). Frequency resolution is limited by the small long block window size, which decreases coding efficiency. Time resolution can be too low for highly transient signals and may cause smearing of percussive sounds.
When performing lossy audio encoding, such as creating an MP3 data stream, there is a trade-off between the amount of data generated and the sound quality of the results. The person generating an MP3 selects a bit rate, which specifies how many kilobits per second of audio is desired. The higher the bit rate, the larger the MP3 data stream will be, and, generally, the closer it will sound to the original recording. With too low a bit rate, compression artifacts (i.e., sounds that were not present in the original recording) may be audible in the reproduction. Some audio is hard to compress because of its randomness and sharp attacks. When this type of audio is compressed, artifacts such as ringing or pre-echo are usually heard. A sample of applause or a triangle instrument with a relatively low bit rate provide good examples of compression artifacts. Most subjective testings of perceptual codecs tend to avoid using these types of sound materials, however, the artifacts generated by percussive sounds are barely perceptible due to the specific temporal masking feature of the 32 sub-band filterbank of Layer II on which the format is based.
Perceived quality can be influenced by listening environment (ambient noise), listener attention, and listener training and in most cases by listener audio equipment (such as sound cards, speakers and headphones). Furthermore, sufficient quality may be achieved by a lesser quality setting for lectures and human speech applications and reduces encoding time and complexity. A test given to new students by Stanford University Music Professor Jonathan Berger showed that student preference for MP3-quality music has risen each year. Berger said the students seem to prefer the 'sizzle' sounds that MP3s bring to music.
An in-depth study of MP3 audio quality, sound artist and composer Ryan Maguire's project "The Ghost in the MP3" isolates the sounds lost during MP3 compression. In 2015, he released the track "moDernisT" (an anagram of "Tom's Diner"), composed exclusively from the sounds deleted during MP3 compression of the song "Tom's Diner", the track originally used in the formulation of the MP3 standard. A detailed account of the techniques used to isolate the sounds deleted during MP3 compression, along with the conceptual motivation for the project, was published in the 2014 Proceedings of the International Computer Music Conference.
Non-standard bit rates up to 640 kbit/s can be achieved with the LAME encoder and the freeformat option, although few MP3 players can play those files. According to the ISO standard, decoders are only required to be able to decode streams up to 320 kbit/s. Early MPEG Layer III encoders used what is now called Constant Bit Rate (CBR). The software was only able to use a uniform bitrate on all frames in an MP3 file. Later more sophisticated MP3 encoders were able to use the bit reservoir to target an average bit rate selecting the encoding rate for each frame based on the complexity of the sound in that portion of the recording. 2b1af7f3a8