Something different today -- a spinoff of some work I've been doing to analyze my sales statistics and prioritize new publications.
This script imports publisher sales data files from Lightning Source, Inc. into a Pandas dataframe.
Lightning Source is the largest US print on demand vendor and a subsidiary of Ingram, the largest US book distributor.
Pandas is a leading data analytics package for Python.
Using this script has two key benefits:
- You can analyze your entire Ingram sales history at once.
- You can skip over the tiresome process I went through to learn how to process LSI's "tab-delimited Excel" format into Pandas. Spoiler: it's not Excel, and it doesn't use Python's default encodings.
This version only does the dataframe ingest. Future versions will include tools for analyzing profitability per title, plotting sales charts, and so on.
Download the script (available below) to a local directory on your PC.
pip install -r requirements.txt installs numpy, pandas, and forex-python.
Python >= 3 and < 3.9 is required until numpy support for Python 3.9 becomes available.
Preparing the Data
Lightning Source / IngramSpark customers can request data reports via the Publisher Compensation report page on the LSI dashboard. You will want to request e-mail delivery in LSI's "tab-delimited Excel" format. Select date range = 1 year, e.g. 1/01/2020 - 12/31/2020. LSI only lets you return 365 days worth of data at a time so to include multiple years you will need to submit multiple requests. If you are signed up for distribution in multiple markets (UK, Global Connect, etc.) click the appropriate checkboxes.
Unfortunately, LSI's data file format does not include any internal evidence of the date range included in the data. So you will need to provide this in the file names by saving each year's file as either 2020.xls, 2020UK.xls, or 2020GC.xls. Save all data files in the same directory as lsi2df.py.
Running the Script
You can run the script either from within a Python interpreter or from the command line.
The script will loop over all the data files in the current directory and assemble the results into a single Pandas dataframe that covers all years of your Ingram sales data.
fred@Freds-MacBook-Pro:~/bin/nimble/sales_analysis$lsi2df.py 1 GBP to USD 1.3334075352 as of 2020-11-30 16:13:14.531409 read data for UK 2014 from file 2014UK.xls read US data for year 2006 from file 2006.xls read US data for year 2012 from file 2012.xls read US data for year 2013 from file 2013.xls read US data for year 2007 from file 2007.xls read US data for year 2011 from file 2011.xls read US data for year 2005 from file 2005.xls read data for UK 2016 from file 2016UK.xls read US data for year 2004 from file 2004.xls read US data for year 2010 from file 2010.xls read data for UK 2012 from file 2012UK.xls read US data for year 2014 from file 2014.xls read US data for year 2015 from file 2015.xls read data for UK 2009 from file 2009UK.xls read US data for year 2017 from file 2017.xls read US data for year 2016 from file 2016.xls read data for UK 2010 from file 2010UK.xls read data from Global Connect file GC from file 2020GC.xls read data for UK 2017 from file 2017UK.xls read data for UK 2015 from file 2015UK.xls read data for UK 2008 from file 2008UK.xls read data for UK 2011 from file 2011UK.xls read data for UK 2013 from file 2013UK.xls read data for UK 2006 from file 2006UK.xls read data for UK 2020 from file 2020UK.xls read data for UK 2019 from file 2019UK.xls read data for UK 2007 from file 2007UK.xls read US data for year 2018 from file 2018.xls read US data for year 2019 from file 2019.xls read US data for year 2009 from file 2009.xls read US data for year 2020 from file 2020.xls read US data for year 2008 from file 2008.xls read data for UK 2018 from file 2018UK.xls ... I read all your LSI data files and created a single Pandas dataframe that contains the data for all years. I enhanced the dataframe by filling in sales quantity and pubcomp NaNs as 0s and converted GBP to USD equivalents at today's exchange rate. The dataframe has also been saved as df.csv [2486 rows x 109 columns]
import pandas as pd import glob import os import argparse import numpy as np from forex_python.converter import CurrencyRates parser = argparse.ArgumentParser() parser.add_argument("--infile", help = "seed file", default = 'select_criteria/isbns') parser.add_argument("--outfile", help = "path to outfile", default = 'results.xlsx') args = parser.parse_args() input_file = args.infile output_file = args.outfile df = pd.DataFrame() today = pd.to_datetime("today") c = CurrencyRates() cgbp = c.get_rate('GBP', 'USD') print('1 GBP to USD', cgbp, 'as of', today) for i in glob.glob(r'*.xls'): data = pd.read_csv(i, sep='\t', lineterminator='\r', encoding='cp1252') yearval = os.path.splitext(i) lasttwo = yearval[-2:] if lasttwo == "UK" : yearval = yearval[:-2] print('read data for UK ', yearval, 'from file ', i) elif lasttwo == "GC" : #print(lasttwo, 'lasttwo') yearval = yearval[:-2] print('read data from Global Connect file', lasttwo, 'from file ', i) else: print('read US data for year', yearval, 'from file ', i) pass data.insert(1, 'year', yearval) data = data[:-1] #necessary to remove extra lines df = df.append(data, ignore_index=True) print('...') print('') print('I read all your LSI data files and created a single Pandas dataframe that contains the data for all years.') df['USDeq_pub_comp'] = np.where(df['reporting_currency_code']== 'GBP', (df['YTD_pub_comp'] * cgbp).round(2), df['YTD_pub_comp']) df['YTD_net_quantity'] = df['YTD_net_quantity'].fillna(0.0).astype(int) df['isbn_13'] = df['isbn_13'].fillna(0).astype(int) df['YTD_net_quantity'].fillna(0.0).astype(int) df['YTD_pub_comp'].fillna(0.0).astype(int) df['USDeq_pub_comp'].fillna(0.0).astype(int) print ("I enhanced the dataframe by filling in sales quantity and pubcomp NaNs as 0s and converted GBP to USD equivalents at today's exchange rate.") print("The dataframe has also been saved as df.csv") print('...') print('df:') print(df)
numpy pandas forex-python