How many times have you exported data and realized a little too late that the leading zeros in constituent IDs were no longer showing up?
I know I have, and much to my chagrin I have had to re-send the file or spend time working through a process of making Excel or CSV format the leading zero appropriately. I found a way using the Python package,
pandas that makes leading zero formatting fairly easy and you don’t have to add zeros or concat two cells together like you may consider doing in using Excel.
Below is an example data and the the steps that I followed to when I found that zeros where disappearing in my Excel document.
Here is my “data” with the IDs formatted as text so the leading zeros are visible. When I saved this workbook, I saved it as an Excel file.
I then imported pandas in my Jupyter Notebook, as I would normally when loading a dataframe for work and before preparing a file to be used in SPSS. I wanted to see the first few lines of the file using the
DataFrame.head() method and saw that the leading zeros had been dropped.
After seeing that the leading zeros were not showing I checked the Excel file to confirm that the IDs were saved as text. Unfortunately, the ID column was saved as text, so there was nothing to alter there.
My next attempt at resolving the issue involved an reading an Excel file, and attempting to change the ID format to string. Regrettably that lead to an error that ended with
'The 'dtype' option is not supported with the 'python' engine.
I then read documentation about the method of reading CSVs in pandas. From there, I saw that my earlier code, the object defining the datatype (
dtype), was meant to be used with with the function
read_csv(). Excited, I saved my example data as a CSV, and reloaded the dataframe and applied the datatype object. In the example below, you can see that once the
dtype is applied, the leading zeros are visible in the dataset.
After getting past the initial hurdle of finding out how I can format specific documents I found this object pretty quick and efficient. How have you found the best way to work with leading zeros, especially if you can only export into Excel?