Data sorting help: Nordic ITCH data

My data consist of like 20 million rows of this:

T33013
M000
D 431630
X 431629 1000
M003
D 431571
A 431665S 100 67272 1834000
M006
A 431666S 2600 1027 1176000
D 430996

In which program could I sort it the way that it would look like this:

33013000 D 431630
33013000 X 431629 1000
33013003 D 431571
33013003 A 431665S 100 67272 1834000
33013006 A 431666S 2600 1027 1176000
33013006 D 430996

So the that every action a,b,c,d would get a column with the Previous T and M number. And I could sort out different rows.

Please help if you can thanks!
 
Quote from evira:

In which program could I sort it the way that it would look like this:

33013000 D 431630
33013000 X 431629 1000
Perl, AWK, Python.... the list goes on. Which do you prefer? I will write you a script.
 
Quote from Kevin Schmit:

Perl, AWK, Python.... the list goes on. Which do you prefer? I will write you a script.

Maybe Python if its easier? Can you also make the script that way, that all the rows starting with S,O,R,H,B and Q would be deleted.

Thanks in advance!
 
Code:
import sys
fin = open(sys.argv[1], 'r')
skipLn = ['T','M','S','O','R','H','B','Q']
while 1:
  line = fin.readline()
  if not line: break;
  if line[:1] == 'T': bigTS = line[1:].rstrip()
  if line[:1] == 'M': milliTS = line[1:].rstrip()
  if line[:1] not in skipLn: 
    print '%s%s %s' % (bigTS, milliTS, line[:].rstrip())     
fin.close()
 
Quote from Kevin Schmit:

Code:
import sys
fin = open(sys.argv[1], 'r')
skipLn = ['T','M','S','O','R','H','B','Q']
while 1:
  line = fin.readline()
  if not line: break;
  if line[:1] == 'T': bigTS = line[1:].rstrip()
  if line[:1] == 'M': milliTS = line[1:].rstrip()
  if line[:1] not in skipLn: 
    print '%s%s %s' % (bigTS, milliTS, line[:].rstrip())     
fin.close()

Im getting an error.. It says invalid syntax with red on the %s' Can you help?

If my file is called: test.txt how should I open it in Python and run the script?

Thanks!
 
Quote from evira:
Im getting an error.. It says invalid syntax with red on the %s' Can you help?
What version of Python are you running, under what operating system? I tested it on Python 2.6.8 under Cygwin/Win7.

Quote from evira:
If my file is called: test.txt how should I open it in Python and run the script?
Copy the script to a file with the extension ".py" e.g. test.py
Then call it from the command line like this:

python test.py test.txt

See the attached gif for an example of how to do this.
 

Attachments

Quote from Kevin Schmit:

What version of Python are you running, under what operating system? I tested it on Python 2.6.8 under Cygwin/Win7.


Copy the script to a file with the extension ".py" e.g. test.py
Then call it from the command line like this:

python test.py test.txt

See the attached gif for an example of how to do this.

Im a total beginner in programming.. I just need to edit my files to that new order

Im running python 3.3.0 under Win32/Win7. I made the test.py from the script using python.
I don´t know how to command a file. Like where should the file be in for example C:/Python33/test.txt

I appreciate your help
 
Quote from Kevin Schmit:

What version of Python are you running, under what operating system? I tested it on Python 2.6.8 under Cygwin/Win7.


Copy the script to a file with the extension ".py" e.g. test.py
Then call it from the command line like this:

python test.py test.txt

See the attached gif for an example of how to do this.

I got your script working! Thanks a lot!
 
Quote from Kevin Schmit:

What version of Python are you running, under what operating system? I tested it on Python 2.6.8 under Cygwin/Win7.


Copy the script to a file with the extension ".py" e.g. test.py
Then call it from the command line like this:

python test.py test.txt

See the attached gif for an example of how to do this.

Thanks It worked! I really appreciate your help!

Would it be possible to add to that script that it would delete rows that don´t for example have the number "24311" or make another simple one for the new txt file.

And a harder one: is it possible to make a script that would delete all rows that don´t have a same number line than all the rows containing A and 24311.

Like this:
46138100 A 4568834S 35000 24311 24280
46139111 A 4569028S 2000 24311 24520
46138350 X 4568834
32823978 X 4480746
46140239 D 4569028
32823978 D 324847

So it would delete the rows without the same number line:

46138100 A 4568834S 35000 24311 24280
46139111 A 4569028S 2000 24311 24520
46138350 X 4568834
46140239 D 4569028
 
Back
Top