Something different today -- a spinoff of some work I've been doing to analyze my sales statistics and prioritize new publications.

This script imports publisher sales data files from Lightning Source, Inc. into a Pandas dataframe.

Lightning Source is the largest US print on demand vendor and a subsidiary of Ingram, the largest US book distributor.

Pandas is a leading data analytics package for Python.

Using this script has two key benefits:

  1. You can analyze your entire Ingram sales history at once.
  2. You can skip over the tiresome process I went through to learn how to process LSI's "tab-delimited Excel" format into Pandas. Spoiler: it's not Excel, and it doesn't use Python's default encodings.

This version only does the dataframe ingest. Future versions will include tools for analyzing profitability per title, plotting sales charts, and so on.

Installation

  1. Download the script (available below) to a local directory on your PC.

  2. pip install -r requirements.txt installs numpy, pandas, and forex-python.

  3. Python >= 3 and < 3.9 is required until numpy support for Python 3.9 becomes available.

Preparing the Data

Lightning Source / IngramSpark customers can request data reports via the Publisher Compensation report page on the LSI dashboard. You will want to request e-mail delivery in LSI's "tab-delimited Excel" format. Select date range = 1 year, e.g. 1/01/2020 - 12/31/2020. LSI only lets you return 365 days worth of data at a time so to include multiple years you will need to submit multiple requests. If you are signed up for distribution in multiple markets (UK, Global Connect, etc.) click the appropriate checkboxes.

Screen-Shot-2020-11-30-at-4.25.44-PM

Unfortunately, LSI's data file format does not include any internal evidence of the date range included in the data. So you will need to provide this in the file names by saving each year's file as either 2020.xls, 2020UK.xls, or 2020GC.xls. Save all data files in the same directory as lsi2df.py.

Running the Script

You can run the script either from within a Python interpreter or from the command line.

lsi2df.py

Results

The script will loop over all the data files in the current directory and assemble the results into a single Pandas dataframe that covers all years of your Ingram sales data.

Sample output:

fred@Freds-MacBook-Pro:~/bin/nimble/sales_analysis$lsi2df.py 
1 GBP to USD 1.3334075352 as of 2020-11-30 16:13:14.531409
read data for UK  2014 from file  2014UK.xls
read US data for year 2006 from file  2006.xls
read US data for year 2012 from file  2012.xls
read US data for year 2013 from file  2013.xls
read US data for year 2007 from file  2007.xls
read US data for year 2011 from file  2011.xls
read US data for year 2005 from file  2005.xls
read data for UK  2016 from file  2016UK.xls
read US data for year 2004 from file  2004.xls
read US data for year 2010 from file  2010.xls
read data for UK  2012 from file  2012UK.xls
read US data for year 2014 from file  2014.xls
read US data for year 2015 from file  2015.xls
read data for UK  2009 from file  2009UK.xls
read US data for year 2017 from file  2017.xls
read US data for year 2016 from file  2016.xls
read data for UK  2010 from file  2010UK.xls
read data from Global Connect file GC from file  2020GC.xls
read data for UK  2017 from file  2017UK.xls
read data for UK  2015 from file  2015UK.xls
read data for UK  2008 from file  2008UK.xls
read data for UK  2011 from file  2011UK.xls
read data for UK  2013 from file  2013UK.xls
read data for UK  2006 from file  2006UK.xls
read data for UK  2020 from file  2020UK.xls
read data for UK  2019 from file  2019UK.xls
read data for UK  2007 from file  2007UK.xls
read US data for year 2018 from file  2018.xls
read US data for year 2019 from file  2019.xls
read US data for year 2009 from file  2009.xls
read US data for year 2020 from file  2020.xls
read US data for year 2008 from file  2008.xls
read data for UK  2018 from file  2018UK.xls
...

I read all your LSI data files and created a single Pandas dataframe that contains the data for all years.
I enhanced the dataframe by filling in sales quantity and pubcomp NaNs as 0s and converted GBP to USD equivalents at today's exchange rate.
The dataframe has also been saved as df.csv

[2486 rows x 109 columns]

lsi2pdf.py

import pandas as pd
import glob
import os
import argparse 
import numpy as np
from forex_python.converter import CurrencyRates

parser = argparse.ArgumentParser()
parser.add_argument("--infile", help = "seed file", default = 'select_criteria/isbns')
parser.add_argument("--outfile", help = "path to outfile", default = 'results.xlsx')

args = parser.parse_args()

input_file = args.infile
output_file = args.outfile

df = pd.DataFrame()

today = pd.to_datetime("today")
c = CurrencyRates()
cgbp = c.get_rate('GBP', 'USD')

print('1 GBP to USD', cgbp, 'as of', today)

for i in glob.glob(r'*.xls'):

    data = pd.read_csv(i, sep='\t', lineterminator='\r', encoding='cp1252')
    yearval = os.path.splitext(i)[0]
    
    lasttwo = yearval[-2:]
    if  lasttwo == "UK" :
        yearval = yearval[:-2]
        print('read data for UK ', yearval, 'from file ', i)
    elif  lasttwo == "GC" :
        #print(lasttwo, 'lasttwo')
        yearval = yearval[:-2]
        print('read data from Global Connect file', lasttwo, 'from file ', i)
    else:
        print('read US data for year', yearval, 'from file ', i)
        pass
    data.insert(1, 'year', yearval)
    data = data[:-1] #necessary to remove extra lines
    df = df.append(data, ignore_index=True)

print('...')
print('')
print('I read all your LSI data files and created a single Pandas dataframe that contains the data for all years.')


df['USDeq_pub_comp'] = np.where(df['reporting_currency_code']== 'GBP', (df['YTD_pub_comp'] * cgbp).round(2), df['YTD_pub_comp'])


df['YTD_net_quantity'] = df['YTD_net_quantity'].fillna(0.0).astype(int)
df['isbn_13'] = df['isbn_13'].fillna(0).astype(int)
df['YTD_net_quantity'].fillna(0.0).astype(int)
df['YTD_pub_comp'].fillna(0.0).astype(int)
df['USDeq_pub_comp'].fillna(0.0).astype(int)


print ("I enhanced the dataframe by filling in sales quantity and pubcomp NaNs as 0s and converted GBP to USD equivalents at today's exchange rate.")
print("The dataframe has also been saved as df.csv")
print('...')
print('df:')
print(df)

requirements.txt:

numpy
pandas
forex-python