I am not a statistician so cannot tell if your sample size is sufficient. In general, the closer your system is to random, the larger the sample size you need to determine utility.Number of trades in training period is around 100+ (depending on the system), verification period: ~200, test period: ~250 (highest 400, lowest 195). Each of periods is 5 years (actually test period is almost 6 years by now).
As an example, a coin toss that is slightly bias toward head, say 50.1/49.9 vs a random one at 50/50 needs a huge number to determine its validity. On the other hand one that is biased 80/20 will need far less, perhaps less than 50 samples.
Most trading systems only provide a small advantage vs randomness and it is hard to prove. So my gut says you don't have the sample size. I know because I am struggling with the same issue.