How to make shark/spark clear the cache?

hadoop hive apache-spark shark-sql

venkat · Dec 11, 2013 · Viewed 45.4k times · Source

when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.

Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached

this doesn't change even if i kill/stop shark,spark, hadoop processes. Right now, the only way to clear the cache is to reboot the machine.

has anyone faced this issue before? is it some configuration problem or a known issue in spark/shark?

Answer

To remove all cached data:

sqlContext.clearCache()

If you want to remove an specific Dataframe from cache:

df.unpersist()