I have already done with spark installation and executed few testcases setting master and worker nodes. That said, I have a very fat confusion of what exactly a job is meant in Spark context(not SparkContext). I have below questions
I read the Spark documention but still this thing is not clear for me.
Having said, my implementation is to write spark jobs{programmatically} which would to a spark-submit.
Kindly help with some example if possible . It would be very helpdful.
Note: Kindly do not post spark links because I have already tried it. Even though the questions sounds naive but still I need more clarity in understanding.
Well, terminology can always be difficult since it depends on context. In many cases, you can be used to "submit a job to a cluster", which for spark would be to submit a driver program.
That said, Spark has his own definition for "job", directly from the glossary:
Job A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. save, collect); you'll see this term used in the driver's logs.
So I this context, let's say you need to do the following:
So,
Hope it makes things clearer ;-)