How to generate audio wave form programmatically while recording Voice in iOS?

iVenky picture iVenky · Jun 4, 2013 · Viewed 22.3k times · Source

enter image description here

How to generate audio wave form programmatically while recording Voice in iOS?

m working on voice modulation audio frequency in iOS... everything is working fine ...just need some best simple way to generate audio wave form on detection noise...

Please dont refer me the code tutorials of...speakhere and auriotouch... i need some best suggestions from native app developers.

I have recorded the audio and i made it play after recording . I have created waveform and attached screenshot . But it has to been drawn in the view as audio recording in progress

-(UIImage *) audioImageGraph:(SInt16 *) samples
                normalizeMax:(SInt16) normalizeMax
                 sampleCount:(NSInteger) sampleCount
                channelCount:(NSInteger) channelCount
                 imageHeight:(float) imageHeight {

    CGSize imageSize = CGSizeMake(sampleCount, imageHeight);
    UIGraphicsBeginImageContext(imageSize);
    CGContextRef context = UIGraphicsGetCurrentContext();

    CGContextSetFillColorWithColor(context, [UIColor blackColor].CGColor);
    CGContextSetAlpha(context,1.0);
    CGRect rect;
    rect.size = imageSize;
    rect.origin.x = 0;
    rect.origin.y = 0;

    CGColorRef leftcolor = [[UIColor whiteColor] CGColor];
    CGColorRef rightcolor = [[UIColor redColor] CGColor];

    CGContextFillRect(context, rect);

    CGContextSetLineWidth(context, 1.0);

    float halfGraphHeight = (imageHeight / 2) / (float) channelCount ;
    float centerLeft = halfGraphHeight;
    float centerRight = (halfGraphHeight*3) ;
    float sampleAdjustmentFactor = (imageHeight/ (float) channelCount) / (float) normalizeMax;

    for (NSInteger intSample = 0 ; intSample < sampleCount ; intSample ++ ) {
        SInt16 left = *samples++;
        float pixels = (float) left;
        pixels *= sampleAdjustmentFactor;
        CGContextMoveToPoint(context, intSample, centerLeft-pixels);
        CGContextAddLineToPoint(context, intSample, centerLeft+pixels);
        CGContextSetStrokeColorWithColor(context, leftcolor);
        CGContextStrokePath(context);

        if (channelCount==2) {
            SInt16 right = *samples++;
            float pixels = (float) right;
            pixels *= sampleAdjustmentFactor;
            CGContextMoveToPoint(context, intSample, centerRight - pixels);
            CGContextAddLineToPoint(context, intSample, centerRight + pixels);
            CGContextSetStrokeColorWithColor(context, rightcolor);
            CGContextStrokePath(context);
        }
    }

    // Create new image
    UIImage *newImage = UIGraphicsGetImageFromCurrentImageContext();

    // Tidy up
    UIGraphicsEndImageContext();

    return newImage;
}

Next a method that takes a AVURLAsset, and returns PNG Data

- (NSData *) renderPNGAudioPictogramForAssett:(AVURLAsset *)songAsset {

    NSError * error = nil;


    AVAssetReader * reader = [[AVAssetReader alloc] initWithAsset:songAsset error:&error];

    AVAssetTrack * songTrack = [songAsset.tracks objectAtIndex:0];

    NSDictionary* outputSettingsDict = [[NSDictionary alloc] initWithObjectsAndKeys:

                                        [NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,
                                        //     [NSNumber numberWithInt:44100.0],AVSampleRateKey, /*Not Supported*/
                                        //     [NSNumber numberWithInt: 2],AVNumberOfChannelsKey,    /*Not Supported*/

                                        [NSNumber numberWithInt:16],AVLinearPCMBitDepthKey,
                                        [NSNumber numberWithBool:NO],AVLinearPCMIsBigEndianKey,
                                        [NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
                                        [NSNumber numberWithBool:NO],AVLinearPCMIsNonInterleaved,

                                        nil];


    AVAssetReaderTrackOutput* output = [[AVAssetReaderTrackOutput alloc] initWithTrack:songTrack outputSettings:outputSettingsDict];

    [reader addOutput:output];
    [output release];

    UInt32 sampleRate,channelCount;

    NSArray* formatDesc = songTrack.formatDescriptions;
    for(unsigned int i = 0; i < [formatDesc count]; ++i) {
        CMAudioFormatDescriptionRef item = (CMAudioFormatDescriptionRef)[formatDesc objectAtIndex:i];
        const AudioStreamBasicDescription* fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription (item);
        if(fmtDesc ) {

            sampleRate = fmtDesc->mSampleRate;
            channelCount = fmtDesc->mChannelsPerFrame;

            //    NSLog(@"channels:%u, bytes/packet: %u, sampleRate %f",fmtDesc->mChannelsPerFrame, fmtDesc->mBytesPerPacket,fmtDesc->mSampleRate);
        }
    }


    UInt32 bytesPerSample = 2 * channelCount;
    SInt16 normalizeMax = 0;

    NSMutableData * fullSongData = [[NSMutableData alloc] init];
    [reader startReading];


    UInt64 totalBytes = 0;


    SInt64 totalLeft = 0;
    SInt64 totalRight = 0;
    NSInteger sampleTally = 0;

    NSInteger samplesPerPixel = sampleRate / 50;


    while (reader.status == AVAssetReaderStatusReading){

        AVAssetReaderTrackOutput * trackOutput = (AVAssetReaderTrackOutput *)[reader.outputs objectAtIndex:0];
        CMSampleBufferRef sampleBufferRef = [trackOutput copyNextSampleBuffer];

        if (sampleBufferRef){
            CMBlockBufferRef blockBufferRef = CMSampleBufferGetDataBuffer(sampleBufferRef);

            size_t length = CMBlockBufferGetDataLength(blockBufferRef);
            totalBytes += length;


            NSAutoreleasePool *wader = [[NSAutoreleasePool alloc] init];

            NSMutableData * data = [NSMutableData dataWithLength:length];
            CMBlockBufferCopyDataBytes(blockBufferRef, 0, length, data.mutableBytes);


            SInt16 * samples = (SInt16 *) data.mutableBytes;
            int sampleCount = length / bytesPerSample;
            for (int i = 0; i < sampleCount ; i ++) {

                SInt16 left = *samples++;

                totalLeft  += left;



                SInt16 right;
                if (channelCount==2) {
                    right = *samples++;

                    totalRight += right;
                }

                sampleTally++;

                if (sampleTally > samplesPerPixel) {

                    left  = totalLeft / sampleTally;

                    SInt16 fix = abs(left);
                    if (fix > normalizeMax) {
                        normalizeMax = fix;
                    }


                    [fullSongData appendBytes:&left length:sizeof(left)];

                    if (channelCount==2) {
                        right = totalRight / sampleTally;


                        SInt16 fix = abs(right);
                        if (fix > normalizeMax) {
                            normalizeMax = fix;
                        }


                        [fullSongData appendBytes:&right length:sizeof(right)];
                    }

                    totalLeft   = 0;
                    totalRight  = 0;
                    sampleTally = 0;

                }
            }



            [wader drain];


            CMSampleBufferInvalidate(sampleBufferRef);

            CFRelease(sampleBufferRef);
        }
    }


    NSData * finalData = nil;

    if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown){
        // Something went wrong. return nil

        return nil;
    }

    if (reader.status == AVAssetReaderStatusCompleted){

        NSLog(@"rendering output graphics using normalizeMax %d",normalizeMax);

        UIImage *test = [self audioImageGraph:(SInt16 *)
                         fullSongData.bytes
                                 normalizeMax:normalizeMax
                                  sampleCount:fullSongData.length / 4
                                 channelCount:2
                                  imageHeight:100];

        finalData = imageToData(test);
    }




    [fullSongData release];
    [reader release];

    return finalData;
}

I have

Answer

hotpaw2 picture hotpaw2 · Jun 4, 2013

If you want real-time graphics derived from mic input, then use the RemoteIO Audio Unit, which is what most native iOS app developers use for low latency audio, and Metal or Open GL for drawing waveforms, which will give you the highest frame rates. You will need completely different code from that provided in your question to do so, as AVAssetRecording, Core Graphic line drawing and png rendering are far far too slow to use.

Update: with iOS 8 and newer, the Metal API may be able to render graphic visualizations with even greater performance than OpenGL.

Uodate 2: Here are some code snippets for recording live audio using Audio Units and drawing bit maps using Metal in Swift 3: https://gist.github.com/hotpaw2/f108a3c785c7287293d7e1e81390c20b