Snips speaks too fast!


#1

My setup is HASS.io on Raspberry PI3 model B with Snips and MQTT addons. From log of Snips addon (see below), Snips does have response to my voice command like “turn on the main light” although it recognizes as “turn on the menu” without correct slot. So it asks “Which light mode do you need” to get some correct slot.

The problem is that the sentence is played at several times the original sampling rate (16K as seen in the log). It is simply too fast to be understood. I had tried ALSA’s speaker-test util to check it and found the 48K wav sound is perfect to be played. Does anyone has idea about what might cause this issue?

[01:31:27.406930] INFO :snips_hotword_hermes: Hotword detected
[01:31:27.437724] INFO :snips_dialogue::dialogue: State: Idle, incoming Message: Hotword(Detected)
[01:31:27.437930] INFO :snips_dialogue::services: publish Hotword(Wait)
[01:31:27.438021] INFO :snips_dialogue::services: publish Asr(ToggleOn)
[01:31:27.438107] INFO :snips_dialogue::services: publish AudioServer(PlayFile)
[01:31:27.438183] INFO :snips_dialogue::dialogue: Current State: WaitingQuery
[01:31:27.488473] INFO :snips_asr_hermes : Listening
[01:31:27.488602] INFO :audio_server_hermes : Playing “/usr/share/snips/dialogue/sound/start_of_input.wav” using output “default”, wav spec : WavSpec { channels: 2, sample_rate: 22050, bits_per_sample: 16, sample_format: Int }
[01:31:32.727678] INFO :snips_asr_lib::asr: Endpoint detection.
[01:31:33.050403] INFO :snips_asr_hermes : Cleanup
[01:31:33.050534] INFO :snips_asr_hermes : Idle
[01:31:33.085769] INFO :snips_dialogue::dialogue: State: WaitingQuery, incoming Message: Asr(TextCaptured)
[01:31:33.085897] INFO :snips_dialogue::services: publish Asr(ToggleOff)
[01:31:33.085951] INFO :snips_dialogue::services: publish AudioServer(PlayFile)
[01:31:33.088050] INFO :snips_dialogue::services: publish Nlu(Query)
[01:31:33.088218] INFO :snips_dialogue::dialogue: Current State: WaitingIntent
[01:31:33.176483] INFO :audio_server_hermes : Playing “/usr/share/snips/dialogue/sound/end_of_input.wav” using output “default”, wav spec : WavSpec { channels: 2, sample_rate: 22050, bits_per_sample: 16, sample_format: Int }
[01:31:33.177359] INFO :snips_analytics_hermes: Cleanup
[01:31:33.177651] INFO :snips_analytics_hermes: Idle
[01:31:33.195957] INFO :queries_hermes : Cleanup
[01:31:33.196096] INFO :queries_hermes : Idle
[01:31:33.226003] INFO :snips_dialogue::dialogue: State: WaitingIntent, incoming Message: Nlu(IntentParsed)
[01:31:33.226245] INFO :snips_dialogue::services: publish Tts(Say)
[01:31:33.226329] INFO :snips_dialogue::dialogue: Current State: WaitingEndSpeaking(WaitingAnswer(“Which light mode do you need?”, PartialIntent(IntentMessage { input: “turn on the menu”, intent: IntentClassifierResult { intent_name: “user_3nvyne7w2__ActivateLightMode”, probability: 0.745324 }, slots: Some([]) }), “LightingMode”))
[01:31:33.362275] INFO :snips_analytics_hermes: Cleanup
[01:31:33.362442] INFO :snips_analytics_hermes: Idle
[01:31:33.486921] INFO :audio_server_hermes : Playing “tts-5” using output “default”, wav spec : WavSpec { channels: 1, sample_rate: 16000, bits_per_sample: 16, sample_format: Int }
[01:31:35.377096] INFO :snips_dialogue::dialogue: State: WaitingEndSpeaking(WaitingAnswer(“Which light mode do you need?”, PartialIntent(IntentMessage { input: “turn on the menu”, intent: IntentClassifierResult { intent_name: “user_3nvyne7w2__ActivateLightMode”, probability: 0.745324 }, slots: Some([]) }), “LightingMode”)), incoming Message: Tts(SayFinished)
[01:31:35.377364] INFO :snips_dialogue::services: publish Hotword(Wait)
[01:31:35.377511] INFO :snips_dialogue::services: publish Asr(ToggleOn)
[01:31:35.377591] INFO :snips_dialogue::services: publish AudioServer(PlayFile)
[01:31:35.377663] INFO :snips_dialogue::dialogue: Current State: WaitingAnswer(“Which light mode do you need?”, PartialIntent(IntentMessage { input: “turn on the menu”, intent: IntentClassifierResult { intent_name: “user_3nvyne7w2__ActivateLightMode”, probability: 0.745324 }, slots: Some([]) }), “LightingMode”)
[01:31:35.427168] INFO :snips_asr_hermes : Listening


#2

I have extracted tts wav file from MQTT messages. It is a wav file with 16000Hz sampling rate. I can play it normally with any player on PC but still fail in Snips. I then converted the tts wav file to 22050 Hz which is the same sampling rate as start_of_input.wav (wake up sound of Snips). The converted file can be played well with “aplay” in the environment of Snips container! Therefore, I think it should be the audio setting of ALSA in Hass.io. Still try to figure out how to correctly configure ALSA with a USB speakerphone (Jabra 510)…


#3

I had the same issue:

here is fix from developers:

pcm.!default {
type asym
playback.pcm {
type plug
slave {
pcm "hw:1,0"
rate 48000
format "S16_LE"
channels 2
}
rate_converter “samplerate_medium”
}
capture.pcm {
type plug
slave.pcm “hw:1,0”
}
}

"I’ve done some tests with a Jabra 510 and I have indeed the same problems you have. Here is an alsa conf that should fix the problems. Can you try it ? (the rate / format and channel should match the ones shown by cat /proc/asound/card1/stream0"

Also try to join - https://snipslabs.slack.com You will get more help there


#4

Following the example, aplay cannot play any wave file because it cannot find /usr/lib/arm-linux-gnueabihf/alsa-lib/libasound_module_rate_▒٩.so. I don’t know why the file name comes with some strange characters. I do find libasound_module_rate_medium.so. I guess there may be some problem about the base image coming with snips_addon. I remove the line of rate_converter and try it again. This time I can use aplay to play wave file with 16000 and 22050 correctly!

However, with the new .asourcerc, Snips tts still has the same issue (play wave file with 16000Hz sampling rate at faster speed). Maybe it uses other way to play wave file… still trying to find solutions…


#5

I am using the debian packages and not snips docker. In any case this works for me. I had to comment out the rate converter but this config both tts, alsa and direct wavs work properly.

  type asym
  playback.pcm {
    type plug
    slave {
      pcm "hw:1,0"
      rate 48000
      format "S16_LE"
      channels 2
    }
#    rate_converter "samplerate_medium"
  }
  capture.pcm {
    type plug
    slave.pcm "hw:1,0"
  }
}

ctl.!default {
  type hw
  card 1
}

#6

(editing post)

The .asoundrc / asound.conf in the link below solved my Jabra 410 turbo talk issue.