The article presents a novel approach to search in shared audio file storages such as P2P-based systems. The proposed method enables the recognition of specific patterns in the audio contents, in such a way it extends the searching possibility from the description-based model to the content- based model. The targeted shared file storages seam to change contents rather unexpectedly. This volatile nature led our development to use real-time capable methods for the search process. The importance of the real-time pattern recognition algorithms that are used on audio data for content-sensitive searching in stream media has been growing over a decade (Liu, Wang, & Chen, 1998). The main problem of many algorithms is the optimal selection of the reference patterns (soundprints in our approach) used in the recognition procedure. This proposed method is based on distance maximization and is able to choose the pattern that later will be used as reference by the pattern recognition algorithms quickly (Richly, Kozma, Kovács & Hosszú, 2001). The presented method called EMESE (Experimental MEdia-Stream rEcognizer) is an important part of a lightweight content-searching method, which is suitable for the investigation of the network-wide shared file storages. This method was initially applied for real-time monitoring of the occurrence of known sound materials in broadcast audio. The experimental measurement data showed in the article demonstrate the efficiency of the procedure that was the reason for using it in shared audio database environment.
The Problem Of The Pattern Recognition
In the field of sound recognition there are many different methods and applications for specific tasks (Coen, 1995; Kondoz, 1994). (Figure 1)
The sound representation in the recognition system
Key Terms in this Chapter
Peer-to-Peer (P2P) Model: A communication way where each node has the same authority and communication capability. They create a virtual network, overlaid on the Internet. Its members organize themselves into a topology for data transmission.
Manhattan-Distance: The L 1 metric for the points of the Euclidean space defined by summing the absolute coordinate differences of the two points (|x2-x1|+|y2-y1|+…). Also known as city block or taxi-cab distance; a car drives this far in a lattice-like street pattern.
Application Level Network (ALN): The applications, which are running in the hosts, can create a virtual network from their logical connections. This virtual network is also called overlay ( see later in the section ). The operations of such software entities are not able to understand without knowing their logical relations. The most cases this ALN software entities use the P2P model (see later in the section) , not the client/server (see later in the section)) one for the communication.
Bark-Scale: A non-linear frequency scale modeling the resolution of the human hearing system. One Bark distance on the Bark-scale equals to the so-called critical bandwidth that is linearly proportional to the frequency under 500Hz and logarithmically above that. The critical bandwidth can be measured by the simultaneous frequency masking effect of the ear.
Pattern Recognition: It means the procedure of finding a certain series of signals in a longer data file or signal stream.
Synchronization: It is the name of that procedure, which is carried out for finding the appropriate points in two or more streams for the correct parallel playing out.
Audio Signal Processing: It means the coding, decoding, playing, and content handling of the audio data files and streams.
Client/Server Model: A communicating way, where one host has more functionality than the other. It differs from the P2P model ( see later in the section ).
Overlay: The applications, which create an ALN (see earlier in the section) work together, and they usually follow the P2P communication model (see later in the section) .