Well looks too simple a question to be asked but i asked after going through few ppts on both.
Both methods increase instruction throughput. And Superscaling almost always makes use of pipelining as well. Superscaling has more than one execution unit and so does pipelining or am I wrong here?
Superscalar design involves the processor being able to issue multiple instructions in a single clock, with redundant facilities to execute an instruction. We're talking about within a single core, mind you -- multicore processing is different.
Pipelining divides an instruction into steps, and since each step is executed in a different part of the processor, multiple instructions can be in different "phases" each clock.
They're almost always used together. This image from Wikipedia shows both concepts in use, as these concepts are best explained graphically:
Here, two instructions are being executed at a time in a five-stage pipeline.
To break it down further, given your recent edit:
In the example above, an instruction goes through 5 stages to be "performed". These are IF (instruction fetch), ID (instruction decode), EX (execute), MEM (update memory), WB (writeback to cache).
In a very simple processor design, every clock a different stage would be completed so we'd have:
Which would do one instruction in five clocks. If we then add a redundant execution unit and introduce superscalar design, we'd have this, for two instructions A and B:
Two instructions in five clocks -- a theoretical maximum gain of 100%.
Pipelining allows the parts to be executed simultaneously, so we would end up with something like (for ten instructions A through J):
In nine clocks, we've executed ten instructions -- you can see where pipelining really moves things along. And that is an explanation of the example graphic, not how it's actually implemented in the field (that's black magic).
The Wikipedia articles for Superscalar and Instruction pipeline are pretty good.