Insert data into excel, calculate+retrieve data using Python

  • Context: Python 
  • Thread starter Thread starter Hercuflea
  • Start date Start date
  • Tags Tags
    Data Excel Python
Click For Summary

Discussion Overview

The discussion revolves around using Python to interact with a Google Sheets document or Excel spreadsheets for data analysis, particularly involving formula-heavy computations. Participants explore methods for inputting data, retrieving outputs, and plotting results using libraries such as Matplotlib, while addressing challenges related to performance and integration with VBA scripts.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant inquires about the feasibility of using Python to input a range of numbers into a spreadsheet, allowing the spreadsheet to compute outputs based on its formulas, which would then be retrieved and plotted.
  • Another participant suggests using Python libraries designed for Excel interaction, mentioning a resource for various options.
  • A different participant shares their experience with 'xlwings' and 'win32com', noting the quirks of each tool and recommending 'win32com' for its simplicity in driving Excel via Python commands.
  • One participant expresses a preference for using NumPy and Matplotlib over VBA, while questioning the possibility of transferring a large array directly from VBA to NumPy without writing to a file.
  • Another participant points out the shift from spreadsheet formulas to VBA scripts in the discussion, reiterating the potential of 'win32com' for simpler tasks and discussing inter-language communication methods using Python's subprocess module.
  • A later reply mentions the participant's decision to switch from 'win32com' to using ODS files and Python, but notes a significant decrease in performance compared to VBA and Excel, attributing the lag to the LibreOffice API calls.

Areas of Agreement / Disagreement

Participants express varying opinions on the best tools and methods for integrating Python with spreadsheets, with no consensus on a single approach. Some prefer 'win32com', while others find it complicated and have opted for alternative methods. The discussion remains unresolved regarding the most efficient way to handle large data arrays between VBA and Python.

Contextual Notes

Participants mention limitations related to the performance of different tools, the complexity of integrating VBA with Python, and the challenges of dynamically updating spreadsheets while allowing them to compute functions. There are also concerns about the handling of large data arrays and the efficiency of different scripting environments.

Hercuflea
Messages
593
Reaction score
49
Hello,

I currently have a Google Sheets document that I want to run some analysis on.

The document is very formula- heavy (not just raw data). It takes a set of input numbers and does some computations and gives me an important output number at the bottom of the spreadsheet. These computations would be really hard to implement in a programming language, I need the cells of a spreadsheet (it is financial data).

What I want to do is use python to run through a range of numbers, input them into the spreadsheet, have the spreadsheet run its formulas on the input, and then have python retrieve the output, store in an array, and plot the results in Matplotlib. I need to do this about 100,000 times (preferable more even)

Is this possible?
 
Technology news on Phys.org
Yes, but you will need to use python libraries made specifically for interfacing with Excel.

Try here: http://www.python-excel.org/

There are several good choices depending on what you want to do specifically.
 
It's been a couple of years since I've done this -- I ultimately found each of the tools to be quirky in their own way. I think I liked 'xlwings'.

Honestly if you want something that's really low level and simple -- and you're on Windows -- I had some luck using win32com. This quite literally is used to 'drive' other programs via python commands. I in particular used it on excel files, though it works on other applications, as I recall.

E.g. here's a youtube video I found on it (and there are probably a lot more).



As always, have a backup / archive of your files before your run a script on them in case it obliterates things due to a bug.
 
Hmmm...Interesting I'll give it a look. I figured out how to write a script in excel VBA for a single case, but I much prefer working with numpy and matplotlib.

All of the python excel packages I've looked at will let you update a "static" excel file, but apparently you can't dynamically update an excel workbook while letting the workbook compute its own functions on the data.

Right now its looking like I'm going to have about a 42 million x 2 single precision array in my VBA script. I would really like a way to just transfer that array directly to numpy from VBA in memory, rather than writing to file and reading in numpy. Do you think that'd possible? I have 16GB of memory.
 
Originally your post said you were calling spreadsheet formulas, but now it seems there are VBA scripts running. If you want something simple, again wincom32 should allow you to do it. E.g.

https://stackoverflow.com/questions...-vba-functions-and-subs-using-python-win32com

If VBA uses C or fortran arrays (i.e. like an actual numeric programming language does), there should be a way to do it 'properly', but I don't know much more than that as I have no interest in VBA.

One common way of communicating across different languages is to use something like Popen and PIPE from the 'subprocess' module in python -- they can basically grab what has been 'printed' by that other program, normally something I'd do for an array with hundreds or thousands of entries -- not something I've done with millions of entries in it but I guess you could give it a shot.
 
After playing with it some more, I decided win32com is too complicated to bother with.

I converted the file to ODS, and I converted my script from VBA to python. Got it running decently but it is ~200x slower running in python and LibreOffice than it was with VBA and Excel! Alas, there's always more work to be done.

I'm going to look into LibreOffice scripting some more. The lag seems to be coming from the calls to the LibreOffice API.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 29 ·
Replies
29
Views
4K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 11 ·
Replies
11
Views
2K