Implementation of a diffusive differential reassignment method for signal enhancement: An application to wolf population counting

https://doi.org/10.1016/j.amc.2007.03.086Get rights and content

Abstract

We investigate the combination of two PDE-based image processing techniques applied to the image produced by time-frequency representations of one-dimensional signals, such as the spectrogram. Specifically, we consider the energy transport equation associated to the lagrangian coordinates corresponding to the spectrogram differential reassignment proposed by Chassandre-Mottin, Daubechies, Auger and Flandrin as a spectrogram readability improving method, together with the image restoration model proposed by Álvarez, Lions and Morel for noise reduction and edge enhancement. Our aim is to produce a transformation of the spectrogram in which the instantaneous frequency lines are easier to track, for using it as an input for a (wolves howls) counting algorithm.

After presenting the model derivation, we show some analytical properties of it, such as the existence of a unique solution and a comparison principle, and perform later a discretization to numerically investigate its performance for the cases of synthetic signals and field recorded wolves choruses. We finally compare our results with those obtained from well-established techniques.

Introduction

Wolf is a protected specie in many countries around the world. Due to their predator character and to their proximity to human settlements, wolves often kill cattle interfering in this way in farmers’ economy. To smooth this interference, authorities reimburse the cost of these lost to farmers. Counting the population of wolves inhabiting a region is, therefore, not only a question of biological interest but also of economic interest, since authorities are willing to estimate the budget devoted to costs produced by wolf protection, see for instance [1]. However, estimating the population of wild species is not an easy task. In particular, for mammals, few and not very precise techniques are used, mainly based on the recuperation of field traces, such as steps, excrements and so on. Our investigation is centered in what it seems to be a new technique, based on signal and image theory methods, to estimate the population of species which fulfill two conditions: living in groups, for instance, packs of wolves, and emitting some characteristic sounds, howls and barks, for wolves. The basic initial idea is to produce, from a given recording, some time-frequency distribution which allows to identify the different howls corresponding to different individuals by estimating the instantaneous frequency (IF) lines of their howls.

Unfortunately, the real situation is somehow more involved due mainly to the following two factors. On one hand, since natural sounds, in particular wolf howling, are composed by a fundamental pitch and several harmonics, direct instantaneous frequency estimation of the multi-signal recording leads to an over-counting of individuals since various IF lines correspond to the same individual. Therefore, more sophisticated methods are indicated for the analysis of these signals, methods capable of extracting additional information such as the slope of the IF, which allows to a better identification of the harmonics of a given fundamental tone. Chirplet type transforms [2], [3] or the use of the Fourier fractional transform [4] are possibilities which are under current development [5]. On the other hand, despite the quality of recording devices, field recordings are affected for a variety of undesirable signals which range from low amplitude broad spectrum long duration signals, like wind, to signals localized in time, like cattle bells, or localized in spectrum, like car engines. Clearly, the addition of all these signals generates an unstructured noise in the background of the wolves chorus which impedes the above mentioned methods to work properly.

In a previous work [6], we investigated the noise reduction and edge (IF lines) enhancement on the spectrogram image by a PDE-based image processing algorithm. Considering the spectrogram as the initial image, the algorithm induces isotropic diffusion (noise smoothing) in regions with low gradient values, and anisotropic diffusion (edge-IF enhancement) in regions with high gradient values. We showed several numerical demonstrations applied to both synthetic and field recorded signals confirming a good performing of the algorithm. Moreover, in [7] we performed comparisons with several denoising techniques such as the nonlinear spectral substraction method and the 2D-stationary wavelet transform, showing that although the PDE method is more expensive in terms of time execution it is also more stable under threshold values perturbations in the denoised images. However, we found two main difficulties which motivated the present work. In one hand, the broadening effect of the spectrogram when compared to the WV distribution is still present if not slightly increased by this diffusive transformation. On the other hand, there is not a natural stopping time for the evolution of the algorithm, being this to be found by a trial and error mechanism, preventing therefore from getting a fully automatic algorithm.

In this article we address these difficulties partly by using the ideas introduced in [8] to improve the readability of the spectrogram through the so-called reassignment techniques, later developed in [9], [10], [11]. Let xL2(R) denote an audio signal and consider the short time Fourier transform (STFT)Gφ(x;t,ω)=Rx(s)φ(s-t)e-iωsds,corresponding to the real, symmetric and normalized window φL2(R). The energy density function or spectrogram of x corresponding to the window φ is given bySφ(x;t,ω)=|Gφ(x;t,ω)|2,which may be expressed also as [12]Sφ(x;t,ω)=R2WV(φ;t˜,ω˜)WV(x;t-t˜,ω-ω˜)dt˜dω˜,with WV(y; ·, ·) denoting the Wigner–Ville distribution of yL2(R),WV(y;t,ω)=Ryt+s2yt-s2e-iwsds.The Wigner–Ville (WV) distribution has received much attention for IF estimation due to its excellent concentration and many other desirable mathematical properties, see [12]. However, it is well known that it presents high amplitude sign-varying cross-terms for multi-component signals which makes its interpretation difficult, see Fig. 1. Expression (3) represents the spectrogram as the convolution of the WV distribution of the signal, x, with the smoothing kernel defined by the WV distribution of the window, φ, explaining the mechanism of attenuation of the cross-terms interferences in the spectrogram. However, an important drawback of the spectrogram with respect to the WV distribution is the broadening of the IF lines as a direct consequence of the smoothing convolution. To override this inconvenient, it is suggested in [8] that instead of assigning the averaged energy to the geometric center of the smoothing kernel, (t, ω), as it is done for the spectrogram, one assigns it to the center of gravity of these energy contributions, (tˆ,ωˆ), which is certainly more representative of the local energy distribution of the signal, see Fig. 1. As deduced in [9], the gravity center may be computed by the following formulastˆ(x;t,ω)=t-RGTφ(x;t,ω)Gφ(x;t,ω),ωˆ(x;t,ω)=ω+IGDφ(x;t,ω)Gφ(x;t,ω),where the STFT’s windows in the numerators are (t) = (t) and (t) = φ′(t). The reassigned spectrogram, RSφ(x; t, ω), is then defined as the aggregation of the reassigned energies to their corresponding locations in the time-frequency domainRSφ(x;t,ω)=12πR2Sφ(x;t,ω)δ(t-tˆ(t,ω))δ(ω-ωˆ(t,ω))dtdω.Observe that energy is conserved through the reassignment process. Other desirable properties, among which non-negativity and perfect localization of linear chirps, are proven in [13]. For our application, it is of special interest the fact that the reallocation vector, r(t,ω)=(tˆ(t,ω)-t,ωˆ(t,ω)-ω), may be expressed through a potential related to the spectrogram [11],r(t,ω)=12log(Sφ(x;t,ω)),when φ is a Gaussian window of unit variance. Let τ  0 denote an artificial time and consider the dynamical expression of the reassignment given by Φ(t,ω,τ) = (t, ω) + τr (t, ω) which, for τ = 0 to τ = 1, connects the initial point (t, ω) with its reassigned point (tˆ,ωˆ). Rewriting this expression as1τ(Φ(t,ω,τ)-Φ(0,ω,τ))=r(t,ω),and taking the limit τ  0, we may identify the displacement vector r as the velocity field of the transformation Φ. In close relation with this approach is the process referred to as differential reassignment [11], defined as the transformation given by the (autonomous) dynamical system corresponding to such velocity field,dχdτ(t,ω,τ)=r(χ(t,ω,τ)),χ(t,ω,0)=(t,ω),for τ > 0. Observe that, in a first-order approximation, we still have that χ connects (t, ω) with some point in a neighborhood of (tˆ,ωˆ), sinceχ(t,ω,1)χ(t,ω,0)+r(χ(t,ω,0))=(t,ω)+r(t,ω)=(tˆ,ωˆ).In addition, for τ  ∞, each particle (t, ω) converges to some local extremum of the potential log (Sφ (x; ·, ·)), among them the maxima and ridges of the original spectrogram. The conservative energy reassignation analogous to (4) for the differential reassignment is obtained by solving the following problem for u(t, ω, τ) and τ > 0,uτ+div(ur)=0,u(·,·,0)=u0,where we introduced the notation u0 = Sφ(x; ·, ·) and, consequently, r=12log(u0). Since in applications both signal and spectrogram are defined in bounded domains, we assume (7), (8) to hold in a bounded time-frequency domain, Ω, in which we assume non-energy flow conditions on the solution and the datau·n=0,r·n=0onΩ×R+,being n the unitary outwards normal to ∂Ω. Finally, observe that the positivity of the spectrogram [12] and the fact that it is obtained from a convolution with a C kernel implies the regularity u0, r  C and, therefore, problem (7), (8), (9) admits a unique smooth solution.

As noted in [11], differential reassignment can be viewed as a PDE-based processing of the spectrogram image in which the energy tends to concentrate on the initial image ridges (IF lines). As mentioned above, our aim is not only to concentrate the diffused IF lines of the spectrogram but also to attenuate the noise present in our recordings. It is clear that noise may distort the reassigned spectrogram due to the change of the energy distribution and therefore of the gravity centers of each time-frequency window. Although even a worse situation may happen to the differential reassignment, due to its convergence to spectrogram local extrema (noise picks among them) an intuitive way to correct this effect comes from its image processing interpretation. As shown in [6], when a strong noise is added to a clean signal better results are obtained for approximating the clean spectrogram if we use a noise reduction edge enhancement PDE-based algorithm than if we simply threshold the image spectrogram. This is due to the local application of gaussian filters in regions of small gradients (noise, among them) while anisotropic diffusion (in the orthogonal direction to the gradient) is applied in regions of large gradients (edges-IF lines). Therefore, a possible way to improve the image obtained by the differential reassigned spectrogram is to modify (7) by adding a diffusive term with the mentioned properties.

Let us make a final observation before writing the model we shall work with. In the derivation of both the reassigned and the differential reassigned spectrogram the property of energy conservation is imposed, implying that energy values on ridges increase. Indeed, let B be a neighborhood of a point of maximum for u0, in which divr = Δlog u0 < 0, and let (t0, ω0)  B. Let χ0(t, ω, τ) denote the characteristic defined by (6) starting at (t0, ω0). Evaluating Eq. (7) along χ0 we obtainddτu=uτ+r·u=-udivr,implying that u experiments exponential increase in B. For image processing, it is desirable the maximum principle to hold, i.e., that the bounds minu0  u  maxu0 hold for any (t,ω,τ)Ω×R+, ensuring that the processed image is within the range of image definition ([0, 255], usually). A simple way, which we shall address, to ensure this property is by dropping the right hand side term of Eq. (10), i.e., replacing Eq. (7) by the transport equationuτ+r·u=0.However, no energy conservation law will apply anymore (note that u is constant along the characteristics).

The combination of the differential reassignment problem (8), (9), (11) with the edge-detection image-smoothing algorithm [14] is written asuτ+ε2log(u0)·u-g(|Gsu|)A(u)=0inΩ×R+,u·n=0onΩ×R+,u(0,·,·)=u0inΩ,where ε  0, see Remark 1, andA(u)=(1-h(|u|))Δu+h(|u|)j=1,,nfju|u|2uxj2.Let us briefly remind the properties and meaning of the diffusive term components in Eq. (12):

  • Function Gs is a Gaussian of variance s. The variance is a scale parameter which fixes the minimal size of the details to be kept in the processed image.

  • Function g is non-increasing with g(0) = 1 and g(∞) = 0. It is a contrast function, which allows to decide whether a detail is sharp enough to be kept.

  • The composition of Gs and g on ∇u rules the speed of diffusion in the evolution of the image, controlling the enhancement of the edges and the noise smoothing.

  • The diffusion operator A combines isotropic and anisotropic diffusion. The first smoothes the image by local averaging while the second enforces the diffusion only on the orthogonal direction to ∇u (along the edges). More precisely, for θj = (j  1)∗π/n, j = 1, …, n we define xj as the orthogonal to the direction θj, i.e., xj = tsin θj + ωcos θj. Then, smooth non-negative functions fj(cos θ, sin θ) are designed to be active only when θ is close to θj. Therefore, the anisotropic diffusion is taken in an approximated direction to the orthogonal of ∇u. The combination of isotropic and anisotropic diffusions is controlled by function h(s), which is non-decreasing withh(s)=0forsϵ,1fors2ϵ,being ϵ the enhancement parameter.

Remark 1

A re-parametrization τ  τ/ε transforms Eq. (11) to the equivalent form ∂u/∂τ + εr · u = 0. Parameter ε > 0 allows us to play with different balances between transport and diffusion effects.

Section snippets

Mathematical properties

For ε = 0, the following theorem is proven in [14].

Theorem 1

Let u0  W1,(Ω) and ε = 0.

  • (i)

    Then, for any T > 0, there exists a unique (viscosity) solution, u   C([0,) × Ω)  L(0, T; W1,(Ω)), of problem (12). Moreover,infΩu0usupΩu0inR+×Ω.

  • (ii)

    Let v be a solution of problem (12), (13), (14) corresponding to the initial data v0  L(Ω). Then, for all T  0, there exists a constant K which depends only on u0W1, and v0L such thatsup0τTu(τ,·,·)-v(τ,·,·)L(Ω)Ku0-v0L(Ω).

A straightforward modification of the proof of

Numerical experiments

In this section, we present numerical demonstrations of the diffusive differential reassignment method applied to spectrograms of synthetic and field signals, and some comparisons with other methods. Computation of the spectrogram is a standard operation performed by applying the discrete fast fourier transform (dfft) to the convolution of the signal with the window, being the latter imposed by relation (5) between the displacement vector and the spectrogram, which only holds for a unit

References (19)

  • A. Skonhoft

    The costs and benefits of animal predation: an analysis of Scandinavian wolf re-colonization

    Ecol. Econ.

    (2006)
  • B. Dugnol et al.

    Wolves counting by spectrogram image processing

    Appl. Math. Comput.

    (2007)
  • S. Mann et al.

    The chirplet transform: physical considerations

    IEEE Trans. Signal Process.

    (1995)
  • L. Angrisani et al.

    A measurement method based on a modified version of the chirplet transform for instantaneous frequency estimation

    IEEE Trans. Instrum. Meas.

    (2002)
  • H.M. Ozaktas et al.

    The Fractional Fourier Transform with Applications in Optics and Signal Processing

    (2001)
  • B. Dugnol, C. Fernández, G. Galiano, J. Velasco, Implementation of a Chirplet Transform Method for Separating and...
  • B. Dugnol, C. Fernández, G. Galiano, J. Velasco, Wolves chorus noise reduction by spectrogram image processing, Signal...
  • K. Kodera et al.

    Analysis of time-varying signals with small BT values

    IEEE Trans. Acoust. Speech Signal Process.

    (1978)
  • F. Auger et al.

    Improving the readability of time-frequency and time-scale representations by the method of reassignment

    IEEE Trans. Signal Process.

    (1995)
There are more references available in the full text version of this article.

Cited by (11)

  • Using acoustic indices to estimate wolf pack size

    2019, Ecological Indicators
    Citation Excerpt :

    This type of vocalization has been studied to explore different topics, such as identifying individual and pack vocal signatures (Harrington, 1989; Palacios et al., 2007; Passilongo et al., 2012, 2010; Root-Gutteridge et al., 2014; Tooze et al., 1990; Zaccaroni et al., 2012), the chorus structure (Harrington and Mech, 1982; Harrington, 1975; Theberge and Falls, 1967), the detection of individuals (Ausband et al., 2011; Bassi et al., 2015; Brennan et al., 2013; Duchamp et al., 2012; Fuller and Sampson, 1988; Llaneza et al., 2005; Suter et al., 2016) or reproduction events (Harrington, 1986; Llaneza et al., 2014; Longis et al., 2004; Nowak et al., 2008, 2007; Palacios et al., 2016; Sèbe et al., 2006), as well as for acoustic localization (Papin et al., 2018). In addition, different methods have been used to estimate wolf chorus size based on howls, including discriminating the fundamental and harmonic frequencies of howling wolves (Filibeck et al., 1982; Sèbe et al., 2004), image processing techniques based on spectrograms (Dugnol et al., 2008, 2007a, 2007b), and visual inspections of spectrograms obtained from chorus recordings (Passilongo et al., 2015). These previously developed approaches are useful for estimating wolf chorus size based on howls but most of them are time consuming or subjective, and they include potential sources of errors.

  • Rearranged nonlocal filters for signal denoising

    2015, Mathematics and Computers in Simulation
    Citation Excerpt :

    We also provide an example in which the spectrogram energy content of an ECG signal is filtered to identify an arrhythmia episode. In [6,7,9], we used nonlinear diffusion image denoising techniques applied to the spectrogram of a sound signal, a wolf chorus. Although, as mentioned above, execution time is not a relevant issue for this type of problems, we found that nonlinear diffusion algorithms require a high computational time, making their use not operative in many situations.

  • On a nonlocal spectrogram for denoising one-dimensional signals

    2014, Applied Mathematics and Computation
    Citation Excerpt :

    In this article, we also provide an example in which the filtering of the spectrogram energy content of an ECG signal is used to identify an arrhythmia episode. In [7,8,10], we considered nonlinear diffusion image denoising techniques applied to the spectrogram of a sound signal, a wolf chorus. Although, as above mentioned, execution time is not the main issue for this type of problems, we found that nonlinear diffusion algorithms require a high computational time, making their use not operative in many situations.

  • Suitability of Advanced Non-Parameterized Time-Frequency Techniques for Condition Monitoring Applications

    2023, Proceedings of 2023 IEEE 3rd Applied Signal Processing Conference, ASPCON 2023
View all citing articles on Scopus

The first three authors are supported by Project PC0448, Gobierno del Principado de Asturias, Spain. Third and fourth authors are supported by the Spanish DGI Project MTM2004-05417.

View full text