Data Analysis with Python: Calculating Mean and Median

well the entire quant community these days uses pandas and numpy as the primary data manipulation/analysis tooling so…

Agreed. It is just easier to import one of those libraries as we will eventually need them for extra calculations.

Whilst it is true that we won't need them for basic statistics, those are just the starting steps of a program and those averages will be later used in more complex calculations, normally done in pandas or numpy.
 
The example also doesn't really show how awesome Pandas is. How about take each value in the list and set it to a business day into the future starting today, then calculate the mean between 10-10-23 and 10-16-23? Then plot the values between 10/10 and 10/16? All just so easy and easy to understand.

import pandas as pd

data = [12, 45, 67, 23, 41, 89, 34, 54, 21]

dates = pd.date_range(start="10/5/2023", periods=len(data), freq='B')
df = pd.DataFrame(data, columns=['Price'], index=dates)

df.loc['2023-10-10':'2023-10-16', 'Price'].mean()
df.loc['2023-10-10':'2023-10-16', 'Price'].plot()

well the entire quant community these days uses pandas and numpy as the primary data manipulation/analysis tooling so…
 
well the entire quant community these days uses pandas and numpy as the primary data manipulation/analysis tooling so…

Numpy is widely used and a fantastic tool. Pandas is fast losing popularity though, even after moving to Arrow, it's slow and a massive memory hog.
 
I remember you use duckdb, right? Mostly for data storage? I find polars very performant for quick analytics. Clickhouse for data storage and transformations/aggregations.

Numpy is widely used and a fantastic tool. Pandas is fast losing popularity though, even after moving to Arrow, it's slow and a massive memory hog.
 
I remember you use duckdb, right? Mostly for data storage? I find polars very performant for quick analytics. Clickhouse for data storage and transformations/aggregations.

Yes, migrated to DuckDB for persistence and Polars for analysis, although the line isn't clear. Technically I should be able to stay with just DuckDB but I'm tired of rewriting stuff.
Really like DuckDB being file based and not a server. Since I'm integrating it with a GUI app, making and restoring backups is just done with a copy, very convenient.
Being the noob that I am, it's 2023 and I'm only now getting familiar with SQL.
 
File based is definitely convenient. I can't count the hours spent on setting up retention, backup, and migration scripts and policies for several databases.

I wonder how good the compression algorithm at duckdb is for large datasets of tick and ohlc data.

Yes, migrated to DuckDB for persistence and Polars for analysis, although the line isn't clear. Technically I should be able to stay with just DuckDB but I'm tired of rewriting stuff.
Really like DuckDB being file based and not a server. Since I'm integrating it with a GUI app, making and restoring backups is just done with a copy, very convenient.
Being the noob that I am, it's 2023 and I'm only now getting familiar with SQL.
 
A quick thanks to @M.W. and @d08

This year, I've been updating my dev skills, ...
Things to install this weekend... Polars and DuckDB!!

A thank you in the current ET environment?
Yup.



When I wrote this code, only God and I understood what it did. Now, only God knows. ~~ Anonymous
 
  • Like
Reactions: d08
File based is definitely convenient. I can't count the hours spent on setting up retention, backup, and migration scripts and policies for several databases.

I wonder how good the compression algorithm at duckdb is for large datasets of tick and ohlc data.

I haven't seen any comprehensive comparisons. If you decide to compare, it would be very interesting to see the results.
 
Back
Top