from https://pypi.org/project/tqdm/:
import pandas as pd
import numpy as np
from tqdm import tqdm
df = pd.DataFrame(np.random.randint(0, 100, (100000, 6)))
tqdm.pandas(desc="my bar!")p`
df.progress_apply(lambda x: x**2)
I took this code and edited it so that I create a DataFrame from load_excel rather than using random numbers:
import pandas as pd
from tqdm import tqdm
import numpy as np
filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x**2)
This gave me an error, so I changed df.progress_apply to this:
df.progress_apply(lambda x: x)
Here is the final code:
import pandas as pd
from tqdm import tqdm
import numpy as np
filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x)
This results in a progress bar, but it doesn't actually show any progress, rather it loads the bar, and when the operation is done it jumps to 100%, defeating the purpose.
My question is this: How do I make this progress bar work?
What does the function inside of progress_apply actually do?
Is there a better approach? Maybe an alternative to tqdm?
Any help is greatly appreciated.
Will not work. pd.read_excel
blocks until the file is read, and there is no way to get information from this function about its progress during execution.
It would work for read operations which you can do chunk wise, like
chunks = []
for chunk in pd.read_csv(..., chunksize=1000):
update_progressbar()
chunks.append(chunk)
But as far as I understand tqdm
also needs the number of chunks in advance, so for a propper progress report you would need to read the full file first....