Towards a Perceptual Distance Metric for Audio