Background
An integration project: exporting a csv file from system A, convert the csv file to xls file, and import the converted file into system B (A 3rd party system).
Environment:
- Python3.7.5
- xls library: xlwt1.3
- Developing on Mac 10.15.1 with VS Code, and finally the project will be published to a Docker environment(ritht now, it’s Python:3.7.5-alpine3.10).
Issue
The converted xls file, system B doesn’t support it. However, the file can be opened in Microsoft Excel, and then system B supports it after the open operation (did nothing, just open it then close)😫.
1st Try
There are several libraries related to xls, such as pyexcel-xls,xlwings, Pandas. (OpenPyXL, xlsxwriter were not considered because the file is a xls file instead of a xlsx file.) I decided to try them one by one:
pyexcel-xls
,Pandas
, the issue is still there. Checked their dependencies, all of them mentioned xlwt.xlwings
, it calls Microsoft Excel and does data operations. I gave it up because this won’t work in my Docker environment.
2nd Try - External systems
Depends on the clue: the file can be supported after it was opened in MS Excel, I was thinking to find an alternative way:
MS flow is a powerful tool as a part of O365 suite. My plan was:
Send the xls file to my email
MS flow grabs the file from the email, and insert a row to the xls file, then save the change
MS flow sends the file back to the email
I had to give it up because MS flow was unable to find the attached file from my email (I have figured the issue out later by creating a new flow. For my next project, I might try MS UI flows 1st).
Web Service
Per Hugo-Images management. Tried my idea by Zamzar - Online file conversion, implemented the function by referring its doc:
Generated a csv file in my code instead of a xls file
Converted the csv file to a xls file by Zamzar
Yes, there is “But”. I found Zamzar free edition license is based on conversation, uploading/downloading files are two conversations. 100 conversations/1 month didn’t meet my expectation, quit again.
3rd - Dig the root cause
Right now, I was aware that I have to sit down and dig the root cause of xlwt.
Generated the xls (File A) again, made a copy of it (file B) and opened file B by Microsoft Excel once.
Compared the File A and File B with Hex Fiend
There are only 3 differences, not too many, all I need to do is to checking them one by one💪.
Open file A and file B by 0xED to check the difference
Hex Fiend told you there are 3 differences, but it’s difficult to find out what the difference exactly is, 0xED did the trick:
Here are the steps I did on xlwt code:
xlwt -> Workbook.py -> func: save()
xlwt -> Workbook.py -> func: save() ->
f.write(stream)
xlwt -> Workbook.py -> func: get_biff_data() ->
before += self.__write_access_rec()
,return BIFFRecords.WriteAccessRecord(self.__owner).get()
xlwt -> BIFFRecords.py ->
class WriteAccessRecord(BiffRecord):
Per the difference, I changed the code:
def __init__(self, owner): uowner = owner[0:0x30] uowner_len = len(uowner) if isinstance(uowner, six.text_type): uowner = uowner.encode('ascii') # probably not ascii, but play it safe until we know more self._rec_data = pack('%ds%ds' % (uowner_len, 0x70 - uowner_len), b'\x16\x00' + uowner, b' '*(0x70 - uowner_len))
Note: The binary string is ‘15 00’ in the screenshot, but it should be “16 00” when I was working on this, that’s why the code and screenshot are inconsistent.
Run my code again, the xls can be support now!!! 🎉🎆😂 I felt I’m so lucky because the issue was resolved by fixing the 1st difference.
Publish the change to Dock environment
Do docker cp
to copy the modified libraries to docker instance…. Just kidding😂
Here what I did:
Fork xlwt code on Github and push my change
Note: I might create a pull request later after I figured out what I changed.
New a pip requirement file: requirement-git.txt with the content:
# xlwt 1.3.0 https://github.com/gainskills/xlwt/zipball/master
Note:
pip freeze > requirement.txt
for all libraries.- Besides xlwt, I also have flower1.0 in requirement-git.txt
git+https://github.com/gainskills/xlwt.git
works for the docker instance which has git installed
run
pip install -r requirement-git.txt
beforepip install -r requirement.txt
Reference
- Compress images with Tinypng API
- PyMOTW-3 struct — Binary Data Structures
- struct — Interpret bytes as packed binary data
Scan the QR code using WeChat