How to find the difference between two arrays in C?

Doe J picture Doe J · Mar 7, 2018 · Viewed 7.1k times · Source

I've been trying to write c programs for finding the union, intersection and difference between two arrays, and while the first two worked out fine, I'm having some trouble finding the difference between two arrays. With difference I mean each element that is in array1, that is not in array2.

I want the third array to contain every element in array1 that is not in array2, and not vica versa. So if array1 is [1, 2, 3], and arr2 is [3, 4, 5], then arr3 is [1, 2]. I am also unsure how to find the difference if the two arrays are of different sizes.

My output is a bunch of zeros and negative numbers:

The difference is: 1

The difference is: 2

The difference is: -14200

The difference is: 0

The difference is: -14340

The difference is: 0

This is the code I've been working with:

#include <stdio.h>

int main()
{
  int arr1[100];
  int arr2[100];
  int size1, size2, i, j, s=0;

  //enter array size
  printf("\nPlease enter array1 size: \n");
  scanf("%d", &size1);
  printf("\nPlease enter array2 size: \n");
  printf("\n--------------------------- \n");
  scanf("%d", &size2);

  //setting up a third array to contain the difference
  int tot_size = size1+size2;
  int arr3[tot_size];


  //enter array elements
  for(i=0;i<size1;++i)
  {
    printf("\nPlease enter array1 element %d:\n", i);
    scanf("%d", &arr1[i]);
  }
  printf("\n--------------------------- \n");
  for(i=0;i<size2;++i)
  {
    printf("\nPlease enter array2 element %d:\n", i);
    scanf("%d", &arr2[i]);
  }

  printf("\n--------------------------- \n");


  //compare the two arrays, if two elements are not equal
  //store them in a third array
  for(i = 0; i < size1; i++)
  {
    for(j = 0; j < size2; j++)
    {
      if(arr1[i] != arr2[j])
      {
        arr3[s] = arr1[i];
        ++i;
        ++j;
        ++s;
      }
    }
  }

  for(i=0;i<s;++i)
    printf("\nThe difference is: %d\n", arr3[i]);

}

Any help would be much appreciated, as I am new to C and still have lots to learn.

Answer

Tom&#39;s picture Tom's · Mar 7, 2018

If the difference between two array is the number in the first not in the second AND the number in the second not in the first, you can simply do the following :

  • create a result array, and copy the first and second array in.

    arr1 = [3, 5, 7, 0]

    arr2 = [1, 10, 5]

    arr3 = [arr1, arr2] ==> [3, 5, 7, 0, 1, 10, 5]

  • Then, sort the array (using qsort, or any other sorting function)

    arr3 = [0, 1, 3, 5, 5, 7, 10]

  • Finally, delete the number appearing more than once (the sorting step make it really easy in only one pass)

    arr3 = [0, 1, 3, 7, 10]


After comment : So, the difference between arr1 and arr2 is number in arr1 not in arr2 ? Your first code make more sense.

You should make some function in order to make it easy for you.

  • Make an "IsNumberInArray" function

    bool IsNumberInArray(int number, int *array, size_t arraySize)

I leave the implementation to you (if the array is sorted, you could implement an dichotomic search, else you can do an good old loop for).

  • Then, for each number in arr1, if IsNumberInArray(arr1[i], arr2, size2) is false, add arr1[i] in arr3.

Basically, it's nearly exactly what you do. Your problem lie in the "inversed" condition (is the number is in the second array ?) and "how to break" from the second loop easily. The function will provide that.

Note that since arr3 will only retain arr1 number which is not in arr2, arr3 size can be at max size1. That's why I firstly assumed you wanted uniq number in arr1 AND arr2, since tot_size was size1 + size2.


Usually, I don't give code for "easy" problem, because if you can't solve it by yourself, that mean you need practice and giving you the answer will not be usefull for you, but since sg7 did it, it's meaningless to hold it (and you can't use the room for now), so here an implementation of the algorithm :

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>

bool IsNumberInArray(int number, int *array, size_t arraySize)
{
    for (size_t i = 0; i < arraySize; ++i) {
       if (array[i] == number) {
           return (true);
       }
    }

    return (false);
}

void DumpArrayContent(int *array, size_t arraySize, char *arrayName)
{
    printf("%s has %zu elements:\n", arrayName ? arrayName : "array", arraySize);               
    for (size_t i = 0; i < arraySize; ++i) {
        printf("%d ",array[i]);
    }
    printf("\n");    
}

int main(void)
{
    int arr1[] = {1,2,3,4,7,8,9};
    int arr2[] = {3,4,5};

    size_t s1 = sizeof(arr1)/sizeof(*arr1);
    size_t s2 = sizeof(arr2)/sizeof(*arr2);

    int    arr3[s1];
    int    s3 = 0;

    for (size_t i = 0; i < s1; ++i) {
        if (!IsNumberInArray(arr1[i], arr2, s2)) {
           arr3[s3] = arr1[i];
           s3++;
        }
    }

    DumpArrayContent(arr1, s1, "arr1");
    DumpArrayContent(arr2, s2, "arr2");
    DumpArrayContent(arr3, s3, "arr3");

    return 0;
} 

I don't think there is a more "effective" implementation, since after compiler optimization, the resulting executable would be pretty identical. If there are not compiler optimization activated, sg7 code will be more "effective" since it's straigthforward (mine have function call). It's up to you to see which one you prefer.