Saturday, July 28, 2018

Data Migration: Combining 105 Excel Files in 3 Minutes

I needed to import multiple Excel files into AX;  all of them had a very similar format with the same columns.  Python to the rescue.  Ever since Build 2018 I've been using Visual Studio Code and enjoyed every moment of it, as I build Python scripts to assist me with anything I can think of.  I highly suggest that tool, and the Cobalt2 theme, for increased productivity and debugging- as a gateway drug to Python addiction(!).

This script will combine all of the files within a directory into a file, Combined.xls.  Install Python 3 then install xlwt and xlrd using pip.  The os module is part of the main Python installation already.  Create a python file (*.py) with the following code.  Then change the settings to suit your situation as well as the logic for skipping of the header record.

import os
import xlwt
import xlrd

# Settings
directory_path = r'~\Data Migration\RouteOpr'
combined_fullpath = r'~\Data Migration\RouteOpr\Combined.xls'
path_sep = '\\'
import_sheet_name = 'Template'
num_files_to_combine = 99999

print('Combining files in {}'.format(directory_path))

# Create workbook
wkbk = xlwt.Workbook()
outsheet = wkbk.add_sheet('Combined')

# Go through each file and add it to the combined file
outrow_idx = 0
file_num = 0
num_rows_this_file = 0
for root, dirs, files in os.walk(directory_path):
    for filename in files:
        file_num += 1
        if file_num > num_files_to_combine:
            break

        print('...{}'.format(filename),end='',flush=True)
        f = root + path_sep + filename

        num_rows_this_file = 0
        try:
            insheet = xlrd.open_workbook(f).sheet_by_name('Template')
        except xlrd.XLRDError as e:
            print('ERROR (skipping file): ',end='')
            print(e)
            continue
        
        for row_idx in range(insheet.nrows):
            num_rows_this_file += 1
            outrow_idx += 1

            # Skip header rows after the first file
            if file_num > 1 and row_idx < 10 and (insheet.cell_value(row_idx, 0) in ('AccError','Accumulated') or insheet.cell_value(row_idx, 1) == 'if configured Item will be required.'):   # Modify this line to skip any header records
                outrow_idx -= 1
                continue

            for col_idx in range(insheet.ncols):
                outsheet.write(outrow_idx, col_idx, insheet.cell_value(row_idx, col_idx))

            outsheet.write(outrow_idx, insheet.ncols, filename)
        print('   {:,} rows'.format(num_rows_this_file))

#save the file
wkbk.save(combined_fullpath)
print('Saved combined file to {}'.format(combined_fullpath))
print('Files combined successfully')

The output will look like -

Combining files in XXXX
Migration\RouteOpr\Test_2018-07-19
...Route Operations Template A.xlsx   298 rows
...Route operations template B.xlsx   282 rows
...Route operations template C.xlsx   173 rows
...Route operations template D.xlsx   1,874 rows
{...}
...Route operations template CD.xlsx   36 rows
Saved combined file to XXXXX\Data Migration\RouteOpr\Test_2018-07-19\Combined.xls
Files combined successfully

Cheers!