Python - Read and split lines from text file into indexes.

Why do you reverse the file into another file instead of doing it in memory (say, on a nested list?)


I had a lot of problems with lists and dictionaries, I ended up with special characters I couldn't get rid of. That's why I also call the write function with each loop, instead of writing once to file at the end. 100 lines will call the write function 100 times.




:)
 
I had a lot of problems with lists and dictionaries, I ended up with special characters I couldn't get rid of. That's why I also call the write function with each loop, instead of writing once to file at the end. 100 lines will call the write function 100 times.




:)
Special characters huh?
Are you using the same encoding on the file as you use in the program?

With 100 lines your approach should work fine, but those IO's to HDD will become expensive if you go into larger files.
 
Well I'm glad that you got it to work. But you've ended up with some real ugly code there (your code is so ugly..... pick your own punchline http://www.jokes4us.com/yomamajokes/yomamasouglyjokes.html). Hope you, or someone else, never has to debug it.


OK ...... I checked out Pandas and the Python CSV module.

Pandas
It doesn't come with the standard Python installation so I couldn't run any tests, and I couldn't find any examples on line that interested me. It is overkill for my needs so I won't bother with it.


Python CSV module
It comes with the standard Python installation and lots of examples on line. But the CSV manipulation I require can be accomplished without it and probably just as efficiently. I do plan on using the Python CSV module on my next project.


Using Python version 2.7.6 and the tasks I wanted my script to perform.
  • No editing of original CSV file. Any lines or empty data fields that can crash the script must be handled in the script.
  • Reverse the lines of the original CSV file before processing. Earliest transactions at the top, not the bottom.
  • Split each line at the commas and assign each column an index[] variable.
  • Delete lines that do not contain certain values such as "Buy", "Sell", or "Expired".
  • Delete unwanted columns and create new columns with values from some of the remaining columns.
  • Add a line counter, new column headers and footer labels.
  • Total the value of some of the columns and get P/L percentage.
  • Format the timestamp from 2014-05-06 to Tue May 06.
  • Change negative values into positive values in the quantity column.
  • Change 0 values to nothing in the debit, credit and commission columns.
  • Output the results to a new CSV file.

My completed 100% working script using only the datetime module.

Code:
#!/usr/bin/python
# Python version 2.7.6

import datetime

csv_in = "TransactionHistory_22523594.csv"
csv_reverse = "temp_reverse.csv"
csv_out = "transaction_history.csv"
header = "Row,Date,Buy/Sell,QTY,Security,Price,Debit,Credit,Commission,Total Amount,Currency" + '\n'
footer_commission = 0
footer_debit = 0
footer_credit = 0
footer_total_amount = 0
counter = 1
split = ","
join = ","

def write_file(data,write_to):
  f_out = open(write_to,'a')
  f_out.write(data + "\n")
  f_out.close()

def reverse_file(in_put):
  for line in reversed(list(open(in_put))):
  rows = (line.rstrip())
  write_file(rows,csv_reverse)

reverse_file(csv_in)
write_file(header,csv_out)

f_in = open(csv_reverse, "r")
for line in f_in.xreadlines():
  if len(line.strip()) != 0 :
  line = line.strip()
  parts = line.split(split)
  if parts[2] == "Buy" or parts[2] == "Sell" or parts[2] == "Expired":
  row= '{0:03d}'.format(counter)
  transaction_date = parts[0]
  buy_sell = parts[2]
  qty = parts[5]
  security = parts[4]
  price = parts[6]
  if price == "":
  price = "0"
  total_amount = parts[8]
  currency = parts[9]
  transaction_date = datetime.datetime.strptime(transaction_date, "%Y-%m-%d").strftime("%a %b %d")
  qty = abs(int(qty))
  price = float(price)
  total_amount = float(total_amount)
  amount = qty * price * 100
  amount = int(amount)

  t = abs(float(total_amount))
  if parts[8] >= "0" and  parts[8] <= "1":
  commission = 0
  else:
  commission = abs(t - amount)
  if total_amount <= 1:
  debit = amount
  credit = 0
  else:
  debit = 0
  credit = amount
  if debit > t:
  credit = debit
  commission = debit
  debit = 0

  footer_debit = footer_debit + debit
  footer_credit = footer_credit + credit
  footer_commission = footer_commission + commission
  footer_total_amount = footer_total_amount + total_amount

  qty = str(abs(qty))
  price = str(price)
  debit = str(debit)
  if debit == "0":
  debit = ""
  credit = str(credit)
  if credit == "0":
  credit = ""
  commission = str(commission)
  if commission == "0":
  commission = ""
  total_amount = str(total_amount)
  counter = counter + 1

  data_out = row + join + transaction_date + join + buy_sell + join + qty + join + security + join \
  + price + join + debit + join + credit + join + commission + join + total_amount + join + currency

  print data_out
  write_file(data_out,csv_out)

pl = footer_debit + footer_commission
pl_percent = (footer_credit - (pl)) / pl * 100
footer_debit = str(footer_debit)
footer_credit = str(footer_credit)
footer_commission = str(footer_commission)
footer_total_amount = str(footer_total_amount)
pl_percent = str(pl_percent)

footer = '\n' + join + join + join + join + join + "Subtotal" + join + footer_debit + join + footer_credit + join \
  + footer_commission + join + footer_total_amount + join + currency + '\n' + '\n' + join + join + join \
  + join + join + join + join + "P/L" + join + pl_percent + " %" + join + footer_total_amount + join + currency

write_file(footer,csv_out)

input("\n\nPress the Enter key to exit.")



:)
 
This is hilarious, 3 pages of Python chaos to read a simple text based file with delimiter. Seriously, guys, I thought some of you are the die hard Python users...

OP, here is some C# code in case you are getting tired of even considering to use a library (chuckle, chuckle) to read a simple csv based file:

var contents =File.ReadAllText(filename).Split('\n');
var csv =from line in contents
Select line.Split(',').ToArray();
 
This is hilarious, 3 pages of Python chaos to read a simple text based file with delimiter. Seriously, guys, I thought some of you are the die hard Python users...

OP, here is some C# code in case you are getting tired of even considering to use a library (chuckle, chuckle) to read a simple csv based file:

var contents =File.ReadAllText(filename).Split('\n');
var csv =from line in contents
Select line.Split(',').ToArray();
well... if you put it that way...
the job could be done quite easily using the right tools...

Code:
import pandas as pd
excel_file = pd.ExcelFile('MyBelovedMICROSOFTExcelFile.xlsx')
excel_page = excel_file.parse(excel_file.sheet_names[0])
 
via the import of a library? Seriously?

Btw, I thought the task was to read in a text based file. Why are you now providing code to read in a spread sheet?

Just to summarize, here is what the Python experts suggest to read in a simple text file:

=> Use CSV library
=> Use Pandas library
=> Use Anacondas distro

Not re-inventing the wheel is a good thing, but that you even have to peruse a library to read in and properly parse a simple text file is absolutely beyond me. I thought that kind of job is what Python was designed to tackle.


well... if you put it that way...
the job could be done quite easily using the right tools...

Code:
import pandas as pd
excel_file = pd.ExcelFile('MyBelovedMICROSOFTExcelFile.xlsx')
excel_page = excel_file.parse(excel_file.sheet_names[0])
 
Last edited:
With all due respect, but the code you previously published on your blog (and which by your own admission was so horrible that you continued to only post pseudo code) did not look that much more appealing nor efficient...

Well I'm glad that you got it to work. But you've ended up with some real ugly code there (your code is so ugly..... pick your own punchline http://www.jokes4us.com/yomamajokes/yomamasouglyjokes.html). Hope you, or someone else, never has to debug it.
 
via the import of a library? Seriously?

Btw, I thought the task was to read in a text based file. Why are you now providing code to read in a spread sheet?
Libraries get the job done faster in coding time, and usually faster in run time (since they have optimized routines written in C under the hood), and in the case of pandas it also allows for much easier manipulation of the data after it has been imported through the dataframe structure.


oh yeah, I forgot and used Excel :)
here's the code for csv text files

Code:
import pandas as pd
csv_file = pd.DataFrame.from_csv("myBelovedCSVfile.csv", sep=',')

and here's the code to do the same in R... when it comes to doing a lot with a single line, it is hard to beat R.

Code:
csv_file = read.csv('myBelovedcsvFile.csv', sep=',')
 
And here is the code in C# :

var contents =File.ReadAllText(filename).Split('\n');
var csv =from line in contents select line.Split(',').ToArray();

(you can actually move that all into one line but it looks ugly)

P.S.: This version is most likely a lot faster than Python's Pandas. I know OP asked about a Python solution but I could not help it but jump in after seeing 3 full pages of discussion how to read in a text file.



Libraries get the job done faster in coding time, and usually faster in run time (since they have optimized routines written in C under the hood), and in the case of pandas it also allows for much easier manipulation of the data after it has been imported through the dataframe structure.


oh yeah, I forgot and used Excel :)
here's the code for csv text files

Code:
import pandas as pd
csv_file = pd.DataFrame.from_csv("myBelovedCSVfile.csv", sep=',')

and here's the code to do the same in R... when it comes to doing a lot with a single line, it is hard to beat R.

Code:
csv_file = read.csv('myBelovedcsvFile.csv', sep=',')
 
Back
Top