"measuring trading performance" is simple: just follow substract and divide 
Now what do you mean by "really" ?
You could mean:
- why is there such difference in performance claims so that their measure is questionable
or
- is this performance a pure chance or is it reproducible (with some attached probability) ?
About the difference in performance claims it has to do with trading scale. The lower the trading scale the higher the expected performance. So 25% / annum for long term trading is good whereas it is not good for intraday.
About consistency, the question of reproducibility implies that the question has some sense. If you have a performance about a hedge fund that has invested for 50 years during the bullish market it is a non-sense to ask for reproducibility since a stock market won't last long enough for that for the context to be the same. So it has only sense for lower scales. Judging consistency rigourously is not a trivial task and this is in fact the domain of quality control. I have post an introduction to Deming's SOPK and about Shewart (Deming's master in fact). For a rapid procedure classify between different win and loss especially if you have a fixed stop loss strategy mixing the two will give an asymetric curve and you can't refer to a known law for probability. If you have to judge consistency you have to retire EXCEPTIONAL winners. Once you have homogeneous distribution you can make a chart and judge consistency with a shewart control which have two types: one for judging consistency of mean, the other for consistency of variance.

Now what do you mean by "really" ?
You could mean:
- why is there such difference in performance claims so that their measure is questionable
or
- is this performance a pure chance or is it reproducible (with some attached probability) ?
About the difference in performance claims it has to do with trading scale. The lower the trading scale the higher the expected performance. So 25% / annum for long term trading is good whereas it is not good for intraday.
About consistency, the question of reproducibility implies that the question has some sense. If you have a performance about a hedge fund that has invested for 50 years during the bullish market it is a non-sense to ask for reproducibility since a stock market won't last long enough for that for the context to be the same. So it has only sense for lower scales. Judging consistency rigourously is not a trivial task and this is in fact the domain of quality control. I have post an introduction to Deming's SOPK and about Shewart (Deming's master in fact). For a rapid procedure classify between different win and loss especially if you have a fixed stop loss strategy mixing the two will give an asymetric curve and you can't refer to a known law for probability. If you have to judge consistency you have to retire EXCEPTIONAL winners. Once you have homogeneous distribution you can make a chart and judge consistency with a shewart control which have two types: one for judging consistency of mean, the other for consistency of variance.
Quote from rcmcdougall:
A lot of wild claims get made in our business. You hear of annualized gains of 20%, 40%, 100% 1,000%.
But how should we really measure our trading performance?
Rob.