How to implement speech-to-text via Speech framework

Boris picture Boris · May 7, 2017 · Viewed 7k times · Source

I want to do speech recognition in my Objective-C app using the iOS Speech framework.

I found some Swift examples but haven't been able to find anything in Objective-C.

Is it possible to access this framework from Objective-C? If so, how?

Answer

Boris picture Boris · May 7, 2017

After spending enough time looking for Objective-C samples -even in the Apple documentation- I couldn't find anything decent, so I figured it out myself.

Header file (.h)

/*!
 * Import the Speech framework, assign the Delegate and declare variables
 */

#import <Speech/Speech.h>

@interface ViewController : UIViewController <SFSpeechRecognizerDelegate> {
    SFSpeechRecognizer *speechRecognizer;
    SFSpeechAudioBufferRecognitionRequest *recognitionRequest;
    SFSpeechRecognitionTask *recognitionTask;
    AVAudioEngine *audioEngine;
}

Methods file (.m)

- (void)viewDidLoad {
    [super viewDidLoad];

    // Initialize the Speech Recognizer with the locale, couldn't find a list of locales
    // but I assume it's standard UTF-8 https://wiki.archlinux.org/index.php/locale
    speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:[[NSLocale alloc] initWithLocaleIdentifier:@"en_US"]];

    // Set speech recognizer delegate
    speechRecognizer.delegate = self;

    // Request the authorization to make sure the user is asked for permission so you can
    // get an authorized response, also remember to change the .plist file, check the repo's
    // readme file or this project's info.plist
    [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
        switch (status) {
            case SFSpeechRecognizerAuthorizationStatusAuthorized:
                NSLog(@"Authorized");
                break;
            case SFSpeechRecognizerAuthorizationStatusDenied:
                NSLog(@"Denied");
                break;
            case SFSpeechRecognizerAuthorizationStatusNotDetermined:
                NSLog(@"Not Determined");
                break;
            case SFSpeechRecognizerAuthorizationStatusRestricted:
                NSLog(@"Restricted");
                break;
            default:
                break;
        }
    }];

}

/*!
 * @brief Starts listening and recognizing user input through the 
 * phone's microphone
 */

- (void)startListening {

    // Initialize the AVAudioEngine
    audioEngine = [[AVAudioEngine alloc] init];

    // Make sure there's not a recognition task already running
    if (recognitionTask) {
        [recognitionTask cancel];
        recognitionTask = nil;
    }

    // Starts an AVAudio Session
    NSError *error;
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory:AVAudioSessionCategoryRecord error:&error];
    [audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error];

    // Starts a recognition process, in the block it logs the input or stops the audio
    // process if there's an error.
    recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    AVAudioInputNode *inputNode = audioEngine.inputNode;
    recognitionRequest.shouldReportPartialResults = YES;
    recognitionTask = [speechRecognizer recognitionTaskWithRequest:recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {
        BOOL isFinal = NO;
        if (result) {
            // Whatever you say in the microphone after pressing the button should be being logged
            // in the console.
            NSLog(@"RESULT:%@",result.bestTranscription.formattedString);
            isFinal = !result.isFinal;
        }
        if (error) {
            [audioEngine stop];
            [inputNode removeTapOnBus:0];
            recognitionRequest = nil;
            recognitionTask = nil;
        }
    }];

    // Sets the recording format
    AVAudioFormat *recordingFormat = [inputNode outputFormatForBus:0];
    [inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
        [recognitionRequest appendAudioPCMBuffer:buffer];
    }];

    // Starts the audio engine, i.e. it starts listening.
    [audioEngine prepare];
    [audioEngine startAndReturnError:&error];
    NSLog(@"Say Something, I'm listening"); 
}

- (IBAction)microPhoneTapped:(id)sender {
    if (audioEngine.isRunning) {
        [audioEngine stop];
        [recognitionRequest endAudio];
    } else {
        [self startListening];
    }
}

Now, add the delegate the SFSpeechRecognizerDelegate to check if the speech recognizer is available.

#pragma mark - SFSpeechRecognizerDelegate Delegate Methods

- (void)speechRecognizer:(SFSpeechRecognizer *)speechRecognizer availabilityDidChange:(BOOL)available {
    NSLog(@"Availability:%d",available);
}

Instructions & Notes

Remember to modify the .plist file to get user's authorization for Speech Recognition and using the microphone, of course the <String> value must be customized to your needs, you can do this by creating and modifying the values in the Property List or right-click on the .plist file and Open As -> Source Code and paste the following lines before the </dict> tag.

<key>NSMicrophoneUsageDescription</key>  <string>This app uses your microphone to record what you say, so watch what you say!</string>

<key>NSSpeechRecognitionUsageDescription</key>  <string>This app uses Speech recognition to transform your spoken words into text and then analyze the, so watch what you say!.</string>

Also remember that in order to be able to import the Speech framework into the project you need to have iOS 10.0+.

To get this running and test it you just need a very basic UI, just create an UIButton and assign the microPhoneTapped action to it, when pressed the app should start listening and logging everything that it hears through the microphone to the console (in the sample code NSLog is the only thing receiving the text). It should stop the recording when pressed again.

I created a Github repo with a sample project, enjoy!