Pseudo-Supervised Training Improves Unsupervised Melody Segmentation / 2459
Stefan Lattner, Carlos Eduardo Cancino Chacón, Maarten Grachten
An important aspect of music perception in humans is the ability to segment streams of musical events into structural units such as motifs and phrases.A promising approach to the computational modeling of music segmentation employs the statistical and information-theoretic properties of musical data, based on the hypothesis that these properties can (at least partly) account for music segmentation in humans. Prior work has shown that in particular the information content of music events, as estimated from a generative probabilistic model of those events, is a good indicator for segment boundaries.In this paper we demonstrate that, remarkably, a substantial increase in segmentation accuracy can be obtained by not using information content estimates directly, but rather in a bootstrapping fashion. More specifically, we use information content estimates computed from a generative model of the data as a target for a feed-forward neural network that is trained to estimate the information content directly from the data. We hypothesize that the improved segmentation accuracy of this bootstrapping approach may be evidence that the generative model provides noisy estimates of the information content, which are smoothed by the feed-forward neural network, yielding more accurate information content estimates.