Amazon Alexa: store user's words

Kuldeep Ghate picture Kuldeep Ghate · May 16, 2016 · Viewed 12.7k times · Source

I'm new to writing Alexa skills and want to write a skill to store the speaker's words.
For example, if I say, 'Alexa, save {whatever i say}', it should save the words in some string.
Now from what I understand, the intent schema something should be like

{
   intents:[
       "intent" : "SaveIntent"
   ]
}

and utterances like

SaveIntent save
SaveIntent store

In this case, how do I store '{whatever I say}'?

Answer

Sam Hanley picture Sam Hanley · May 17, 2016

To capture free-form speech input (rather than a defined list of possible values), you'll need to use the AMAZON.LITERAL slot type. The Amazon documentation for the Literal slot type describes a use case similar to yours, where a skill is created to take any phrase and post it to a Social Media site. This is done by creating a StatusUpdate intent:

{
  "intents": [
    {
      "intent": "StatusUpdate",
      "slots": [
        {
          "name": "UpdateText",
          "type": "AMAZON.LITERAL"
        }
      ]
    }
  ]
}

Since it uses the AMAZON.LITERAL slot type, this intent will be able to capture any arbitrary phrase. However, to ensure that the speech engine will do a decent job of capturing real-world phrases, you need to provide a variety of example utterances that resemble the sorts of things you expect the user to say.

Given that in your described scenario, you're trying to capture very dynamic phrases, there's a couple things in the documentation you'll want to give extra consideration to:

If you are using the AMAZON.LITERAL type to collect free-form text with wide variations in the number of words that might be in the slot, note the following:

  • Covering this full range (minimum, maximum, and all in between) will require a very large set of samples. Try to provide several hundred samples or more to address all the variations in slot value words as noted above.
  • Keep the phrases within slots short enough that users can say the entire phrase without needing to pause.

Lengthy spoken input can lead to lower accuracy experiences, so avoid designing a spoken language interface that requires more than a few words for a slot value. A phrase that a user cannot speak without pausing is too long for a slot value.

That said, here's the example Sample Utterances from the documentation, again:

StatusUpdate post the update {arrived|UpdateText}

StatusUpdate post the update {dinner time|UpdateText}

StatusUpdate post the update {out at lunch|UpdateText}

...(more samples showing phrases with 4-10 words)

StatusUpdate post the update {going to stop by the grocery store this evening|UpdateText}

If you provide enough examples of different lengths to give an accurate picture of the range of expected user utterances, then your intent will be able to accurately capture dynamic phrases in real uses cases, which you can access in the UpdateText slot. Based on this, you should be able to implement an intent specific to your needs.