Python - Read and split lines from text file into indexes.

eusdaiki · May 6, 2015

What's the purpose of not using libraries? learning?

OTM-Options · May 6, 2015

eusdaiki said:
Why do you reverse the file into another file instead of doing it in memory (say, on a nested list?)

I had a lot of problems with lists and dictionaries, I ended up with special characters I couldn't get rid of. That's why I also call the write function with each loop, instead of writing once to file at the end. 100 lines will call the write function 100 times.

eusdaiki · May 6, 2015

OTM-Options said:
I had a lot of problems with lists and dictionaries, I ended up with special characters I couldn't get rid of. That's why I also call the write function with each loop, instead of writing once to file at the end. 100 lines will call the write function 100 times.

Special characters huh?
Are you using the same encoding on the file as you use in the program?

With 100 lines your approach should work fine, but those IO's to HDD will become expensive if you go into larger files.

globalarbtrader · May 7, 2015

Well I'm glad that you got it to work. But you've ended up with some real ugly code there (your code is so ugly..... pick your own punchline http://www.jokes4us.com/yomamajokes/yomamasouglyjokes.html). Hope you, or someone else, never has to debug it.

OTM-Options said:
OK ...... I checked out Pandas and the Python CSV module.

Pandas
It doesn't come with the standard Python installation so I couldn't run any tests, and I couldn't find any examples on line that interested me. It is overkill for my needs so I won't bother with it.

Python CSV module
It comes with the standard Python installation and lots of examples on line. But the CSV manipulation I require can be accomplished without it and probably just as efficiently. I do plan on using the Python CSV module on my next project.

Using Python version 2.7.6 and the tasks I wanted my script to perform.

No editing of original CSV file. Any lines or empty data fields that can crash the script must be handled in the script.

Reverse the lines of the original CSV file before processing. Earliest transactions at the top, not the bottom.

Split each line at the commas and assign each column an index[] variable.

Delete lines that do not contain certain values such as "Buy", "Sell", or "Expired".

Delete unwanted columns and create new columns with values from some of the remaining columns.

Add a line counter, new column headers and footer labels.

Total the value of some of the columns and get P/L percentage.

Format the timestamp from 2014-05-06 to Tue May 06.

Change negative values into positive values in the quantity column.

Change 0 values to nothing in the debit, credit and commission columns.

Output the results to a new CSV file.

My completed 100% working script using only the datetime module.

Code:

#!/usr/bin/python # Python version 2.7.6 import datetime csv_in = "TransactionHistory_22523594.csv" csv_reverse = "temp_reverse.csv" csv_out = "transaction_history.csv" header = "Row,Date,Buy/Sell,QTY,Security,Price,Debit,Credit,Commission,Total Amount,Currency" + '\n' footer_commission = 0 footer_debit = 0 footer_credit = 0 footer_total_amount = 0 counter = 1 split = "," join = "," def write_file(data,write_to): f_out = open(write_to,'a') f_out.write(data + "\n") f_out.close() def reverse_file(in_put): for line in reversed(list(open(in_put))): rows = (line.rstrip()) write_file(rows,csv_reverse) reverse_file(csv_in) write_file(header,csv_out) f_in = open(csv_reverse, "r") for line in f_in.xreadlines(): if len(line.strip()) != 0 : line = line.strip() parts = line.split(split) if parts[2] == "Buy" or parts[2] == "Sell" or parts[2] == "Expired": row= '{0:03d}'.format(counter) transaction_date = parts[0] buy_sell = parts[2] qty = parts[5] security = parts[4] price = parts[6] if price == "": price = "0" total_amount = parts[8] currency = parts[9] transaction_date = datetime.datetime.strptime(transaction_date, "%Y-%m-%d").strftime("%a %b %d") qty = abs(int(qty)) price = float(price) total_amount = float(total_amount) amount = qty * price * 100 amount = int(amount) t = abs(float(total_amount)) if parts[8] >= "0" and parts[8] <= "1": commission = 0 else: commission = abs(t - amount) if total_amount <= 1: debit = amount credit = 0 else: debit = 0 credit = amount if debit > t: credit = debit commission = debit debit = 0 footer_debit = footer_debit + debit footer_credit = footer_credit + credit footer_commission = footer_commission + commission footer_total_amount = footer_total_amount + total_amount qty = str(abs(qty)) price = str(price) debit = str(debit) if debit == "0": debit = "" credit = str(credit) if credit == "0": credit = "" commission = str(commission) if commission == "0": commission = "" total_amount = str(total_amount) counter = counter + 1 data_out = row + join + transaction_date + join + buy_sell + join + qty + join + security + join \ + price + join + debit + join + credit + join + commission + join + total_amount + join + currency print data_out write_file(data_out,csv_out) pl = footer_debit + footer_commission pl_percent = (footer_credit - (pl)) / pl * 100 footer_debit = str(footer_debit) footer_credit = str(footer_credit) footer_commission = str(footer_commission) footer_total_amount = str(footer_total_amount) pl_percent = str(pl_percent) footer = '\n' + join + join + join + join + join + "Subtotal" + join + footer_debit + join + footer_credit + join \ + footer_commission + join + footer_total_amount + join + currency + '\n' + '\n' + join + join + join \ + join + join + join + join + "P/L" + join + pl_percent + " %" + join + footer_total_amount + join + currency write_file(footer,csv_out) input("\n\nPress the Enter key to exit.")

volpunter · May 7, 2015

This is hilarious, 3 pages of Python chaos to read a simple text based file with delimiter. Seriously, guys, I thought some of you are the die hard Python users...

OP, here is some C# code in case you are getting tired of even considering to use a library (chuckle, chuckle) to read a simple csv based file:

var contents =File.ReadAllText(filename).Split('\n');
var csv =from line in contents
Select line.Split(',').ToArray();

eusdaiki · May 7, 2015

volpunter said:
This is hilarious, 3 pages of Python chaos to read a simple text based file with delimiter. Seriously, guys, I thought some of you are the die hard Python users...

OP, here is some C# code in case you are getting tired of even considering to use a library (chuckle, chuckle) to read a simple csv based file:

var contents =File.ReadAllText(filename).Split('\n');
var csv =from line in contents
Select line.Split(',').ToArray();

well... if you put it that way...
the job could be done quite easily using the right tools...

Code:

import pandas as pd
excel_file = pd.ExcelFile('MyBelovedMICROSOFTExcelFile.xlsx')
excel_page = excel_file.parse(excel_file.sheet_names[0])

volpunter · May 7, 2015

via the import of a library? Seriously?

Btw, I thought the task was to read in a text based file. Why are you now providing code to read in a spread sheet?

Just to summarize, here is what the Python experts suggest to read in a simple text file:

=> Use CSV library
=> Use Pandas library
=> Use Anacondas distro

Not re-inventing the wheel is a good thing, but that you even have to peruse a library to read in and properly parse a simple text file is absolutely beyond me. I thought that kind of job is what Python was designed to tackle.

eusdaiki said:
well... if you put it that way...
the job could be done quite easily using the right tools...

Code:

import pandas as pd excel_file = pd.ExcelFile('MyBelovedMICROSOFTExcelFile.xlsx') excel_page = excel_file.parse(excel_file.sheet_names[0])

volpunter · May 7, 2015

With all due respect, but the code you previously published on your blog (and which by your own admission was so horrible that you continued to only post pseudo code) did not look that much more appealing nor efficient...

globalarbtrader said:
Well I'm glad that you got it to work. But you've ended up with some real ugly code there (your code is so ugly..... pick your own punchline http://www.jokes4us.com/yomamajokes/yomamasouglyjokes.html). Hope you, or someone else, never has to debug it.

eusdaiki · May 7, 2015

volpunter said:
via the import of a library? Seriously?

Btw, I thought the task was to read in a text based file. Why are you now providing code to read in a spread sheet?

Libraries get the job done faster in coding time, and usually faster in run time (since they have optimized routines written in C under the hood), and in the case of pandas it also allows for much easier manipulation of the data after it has been imported through the dataframe structure.

oh yeah, I forgot and used Excel

here's the code for csv text files

Code:

import pandas as pd
csv_file = pd.DataFrame.from_csv("myBelovedCSVfile.csv", sep=',')

and here's the code to do the same in R... when it comes to doing a lot with a single line, it is hard to beat R.

Code:

csv_file = read.csv('myBelovedcsvFile.csv', sep=',')

volpunter · May 7, 2015

And here is the code in C# :

var contents =File.ReadAllText(filename).Split('\n');
var csv =from line in contents select line.Split(',').ToArray();

(you can actually move that all into one line but it looks ugly)

P.S.: This version is most likely a lot faster than Python's Pandas. I know OP asked about a Python solution but I could not help it but jump in after seeing 3 full pages of discussion how to read in a text file.

eusdaiki said:
Libraries get the job done faster in coding time, and usually faster in run time (since they have optimized routines written in C under the hood), and in the case of pandas it also allows for much easier manipulation of the data after it has been imported through the dataframe structure.

oh yeah, I forgot and used Excel
here's the code for csv text files

Code:

import pandas as pd csv_file = pd.DataFrame.from_csv("myBelovedCSVfile.csv", sep=',')

and here's the code to do the same in R... when it comes to doing a lot with a single line, it is hard to beat R.

Code:

csv_file = read.csv('myBelovedcsvFile.csv', sep=',')