Create a SQL database from csv files

I know the original post is old, but it's still timely for me. Here's the question:

I bought a zipped file from CRB/BarChart that contains one tiny csv (10kb or so) for each month of corn options going back 20+ years. Let's say there are 240 files. There are no column headers in the files and there are only four columns: contract name (symbol), date, historical volatility, implied volatility.

Is there a way to run a script or something and get all those files into one access table? I would like Date Column Symbol Column HV Column IV Column and then it would repeat way to the right, except that Date Column would not be necessary (or it could be hidden).

I really don't want to import 200+ individual files and then create 200+ queries to make one master data table because if I can automate the process I would like to buy more commodity histories.

I'm out of my element here. Is the task able to be done by someone who's not that great with Access? Alternatively, is the task simple enough that the necessary scripts/instructions/macros could be written by someone I find on one of the freelancer sites? I don't mind paying if I could re-use the the "program" for additional commodities in the future as well as updates to my current corn history (1984-2012).
I have no clue about access but you could loop over each file and import the data into sqlite appending to a table

http://stackoverflow.com/questions/14431646/how-to-write-pandas-dataframe-to-sqlite-with-index
 
if you need speed-flat binary file is way faster than SQL
Exactly as I do. Your program can read binary in machine float or integer format without translation between text to binary every time to query the database which may consist of thousands of csv equivalent files or combined. Extremely fast.
 
I know the original post is old, but it's still timely for me. Here's the question:

I bought a zipped file from CRB/BarChart that contains one tiny csv (10kb or so) for each month of corn options going back 20+ years. Let's say there are 240 files. There are no column headers in the files and there are only four columns: contract name (symbol), date, historical volatility, implied volatility.

Is there a way to run a script or something and get all those files into one Access table? I would like a Date Column, Symbol Column, HV Column, and IV Column and that would repeat many times, although numerous Date Columns would not be necessary (or they could be hidden).

Edit to add: I think there would also need to be a way to automatically insert column headers so I know what month the HVs and IVs refer to (this could eliminate the need for a separate symbol column).

i have no idea about VBA,but i've done this in VB6(which is pretty much same thing).
i can explain in 'general' what is needed:

you create your database in access(see my post above)
now you need an ADO connection between your VBA app and Access.
it can be done programmatically or manually in windows->control panel->administrative tools->data sources(ODBC)->system DSN
google for more details

then you basically go to whatever directory your files are stored,get their names via script(again-google it) and put them into string array

then you go thru the loop from lbound to ubound of this array

and each time you do so-you create a table named as a file(or whatever name you like) with columns that are in file(you can pick data type as well)

then you load this cvs file and after that-go thru each record in the file and insert it into a table with same name.

you can use same or similar procedure for update those tables or you can store all data in one very long table

that's pretty much it. it's not hard,but rather time consuming. all automated. one button

here is some food to chew:

http://www.mrexcel.com/forum/excel-...ual-basic-applications-connection-access.html
 
if you need speed-flat binary file is way faster than SQL and csv. because it's load differently into array. you don't have to go thru each line. as for SQL-open MS Access and click blank database. done :)

I store my data in ascii format (OHLC data) in notepad in windows. I have one notepad file for 1 day each stock. So, since I store data for 100 stocks, I have 100 notepad files for each day. I read these files into memory in vba, for further calculations in excel.

Is this same as what you are suggesting using binary files? Do you mean binary files = my notepad files or are they something else?
 
I store my data in ascii format (OHLC data) in notepad in windows. I have one notepad file for 1 day each stock. So, since I store data for 100 stocks, I have 100 notepad files for each day. I read these files into memory in vba, for further calculations in excel.

Is this same as what you are suggesting using binary files? Do you mean binary files = my notepad files or are they something else?

I did the exact same thing for vol surfaces. 1 file per stock per day. It was faster to run through csv files than through the database. So I bought the data (which came in SQL) and wrote scripts to convert them to CSV files. It's not fast enough for real time, but it was fast enough for morning screens and fast enough for analysis on a trade to put on today (where was the vol over the last 1 month?).

Apparently, you can optimize SQL so that it could be faster (my db had millions and millions of rows (1 row for every option in OPRA for every day), but it wasn't worth my time to figure it out. My DB got slow based on how the data was stored in memory. Using the file system changed that. When I needed a faster query I would use JAVA instead of VBA and that was almost instantaneous. C would be even faster but my C is very rusty.

I also bought stock price data and that was fast in the database. Not nearly the number of rows (almost 100x fewer than the options table).
 
I store my data in ascii format (OHLC data) in notepad in windows. I have one notepad file for 1 day each stock. So, since I store data for 100 stocks, I have 100 notepad files for each day. I read these files into memory in vba, for further calculations in excel.

Is this same as what you are suggesting using binary files? Do you mean binary files = my notepad files or are they something else?

how do you read those files? how do you load them?
if you load ASCII (txt,csv or whatever) file into array-you have to go thru each record (line) in that file, stretching(redim) your array as you go thru each record.. binary file(.bin) loading differently,in one shoot (for simplicity sake). i don't know how much faster,but probably at least 10 times. quite a bit of a difference in performance,specially if you working with large amounts of data(tick data for a day let say 1000 stocks). ASCII would take forever to load this amount of data from each file.

http://www.vb6.us/tutorials/reading-and-writing-binary-files-visual-basic
 
OP-here is how you create table in ms access using code,after you establish connection between DB and your application (this is VB,but should be same in VBA)
note that i'm creating table for each ticker that was in arrTicker

For i = LBound(arrTicker) To UBound(arrTicker)


bChk = TableExists(arrTicker(i))

If bChk = False Then

sSQl = "CREATE TABLE " & arrTicker(i) & "(Rec_ID AutoIncrement,Quote_Date Date,Quote_Time Date," & _
"Bid_Size long, Bid_Price single, Ask_Price single,Ask_Size long,Last_Time Date,Last_Price single,Last_Size long,Total_Vol long,sChange single,sHigh single,sLow single,Prev_Day_Close single,sOpen single);"

IQ_DB.Execute (sSQl)


End If


Next
 
here is how i convert stored streaming data from data base into binary files:

you define your array first

Public Type sBin

ID_Rec As Long '4
Quote_Date As Date '8
Quote_Time As Date '8
Bid_Price As Single '4
Bid_Size As Long '4
Ask_Price As Single '4
Ask_Size As Long '4
Last_Time As Date '8
Last_Price As Single '8
Last_Size As Long '4
Tot_Vol As Long '4
Change As Single '4
High As Single '4
Low As Single '4
PrevDayClose As Single '4
sOpen As Single

End Type

Public arrData() As sBin

so the size in bytes is known ahead,this is why you don't have to redim each time you go thru each record(as you for example do,while filling up this array)

sQL = "Select Rec_ID,Quote_Date,Quote_Time,Bid_Size,Bid_Price,Ask_Price,Ask_Size,Last_Time,Last_Price,Last_Size,Total_Vol,sChange,sHigh,sLow,Prev_Day_Close,sOpen from " & s & " Order by Rec_ID asc;"

RS.Open sQL, IQ_DB

If RS.RecordCount > 0 Then

CountRecord = CountRecord + RS.RecordCount

ReDim arrData(RS.RecordCount + 1)

Do Until RS.EOF

If i = 0 Then

ReDim arrData(i)

arrData(i).ID_Rec = RS("Rec_ID")
arrData(i).Quote_Date = RS("Quote_Date")
arrData(i).Quote_Time = RS("Quote_Time")
arrData(i).Bid_Size = RS("Bid_Size")
arrData(i).Bid_Price = RS("Bid_Price")
arrData(i).Ask_Price = RS("Ask_Price")
arrData(i).Ask_Size = RS("Ask_Size")
arrData(i).Last_Time = RS("Last_Time")
arrData(i).Last_Price = RS("Last_Price")
arrData(i).Last_Size = RS("Last_Size")
arrData(i).Tot_Vol = RS("Total_Vol")
arrData(i).Change = RS("sChange")
arrData(i).High = RS("sHigh")
arrData(i).Low = RS("sLow")
arrData(i).PrevDayClose = RS("Prev_Day_Close")
arrData(i).sOpen = RS("sOpen")

i = i + 1

Else

ReDim Preserve arrData(i)

arrData(i).ID_Rec = RS("Rec_ID")
arrData(i).Quote_Date = RS("Quote_Date")
arrData(i).Quote_Time = RS("Quote_Time")
arrData(i).Bid_Size = RS("Bid_Size")
arrData(i).Bid_Price = RS("Bid_Price")
arrData(i).Ask_Price = RS("Ask_Price")
arrData(i).Ask_Size = RS("Ask_Size")
arrData(i).Last_Time = RS("Last_Time")
arrData(i).Last_Price = RS("Last_Price")
arrData(i).Last_Size = RS("Last_Size")
arrData(i).Tot_Vol = RS("Total_Vol")
arrData(i).Change = RS("sChange")
arrData(i).High = RS("sHigh")
arrData(i).Low = RS("sLow")
arrData(i).PrevDayClose = RS("Prev_Day_Close")
arrData(i).sOpen = RS("sOpen")


i = i + 1

End If

RS.MoveNext

Loop

End If


RS.Close

and then you store it like this :

sQL = GlobalFilePatch & "\" & s & ".bin"

FileNum = FreeFile

Open sQL For Binary Access Write Lock Read Write As #FileNum

Put #FileNum, , arrData

Close FileNum



--------------

and this is how you fill your array from binary file(note the difference between filling up array from ascii and binary):

Public Function Fill_Binary_Tick(ByVal sPat As String) As Long
Dim NumRec As Long
Dim FileNum As Integer

If Dir(sPat) = "" Then

Fill_Binary_Tick = 0

Exit Function

End If

Erase arrData

FileNum = FreeFile

NumRec = FileLen(sPat) / 76 ' 76 bytes in one line,defined above,remember? in sBin type

ReDim arrData(NumRec - 1)

Open sPat For Binary Access Read As #FileNum

Get #FileNum, , arrData

Close #FileNum

Fill_Binary_Tick = NumRec


End Function
 
Thanks, Bob (and others). I learned some new terms and have some homework ahead of me, but I'm beginning to understand the looping process for multiple files. I'm going to give it a shot because I have additional underused data that I'd like to get into one dbase so I can do something with it.
 
Checkout Stanford's database course in coursera.
It'll teach you what you need to know to build it right from the start, and will end up saving you time and work in the long run.
 
Back
Top