Create Non-US data bundle for Zipline

I will create a custom data (Non-US data) bundle for Zipline. In this case, I will create a data bundle for Thai Stock Market.

Here are the steps :

  1. Get CSV files from Yahoo Finance which normally in CSV format.
  2. Create a custom bundle support module called “viacsv“. You can name anything.
  3. Make Zipline aware of our new bundle by registering it via .zipline/extension.py
  4. Create the bundle
  5. Test our bundle with Zipline

 

STEP 1 – download Yahoo data

Here is the example of a csv file. The file format will look like this with several columns and rows. In this case, I downloaded ADVANC which is a big cap stock in Thailand. File name is ‘ADVANC.BK.csv’

Date Open High Low Close Adj Close Volume
1/4/2000 44.599998 46 43 43.400002 15.736162 1039000
1/5/2000 38.200001 41 38 40.599998 14.720927 2624000

 

STEP 2 – register ‘viacsv’ module to support local CSV files

Zipline installation path in Linux is:

/usr/local/lib/python2.7/dist-packages/zipline

we need to create ‘viacsv.py’ file in the path below.

/usr/local/lib/python2.7/dist-packages/zipline/data/bundles/viacsv.py

The file looks like this and you have to edit the path to your file location. In this case, I use ‘/home/node/stockdata/‘.

if you want less log messages, please update the line:

boDebug=False # Set False to get less log messages

#
# Ingest stock csv files to create a zipline data bundle

import os

import numpy as np
import pandas as pd
import datetime

boDebug=True # Set True to get trace messages

from zipline.utils.cli import maybe_show_progress

def viacsv(symbols,start=None,end=None):

# strict this in memory so that we can reiterate over it.
 # (Because it could be a generator and they live only once)
 tuSymbols = tuple(symbols)

if boDebug:
 print "entering viacsv. tuSymbols=",tuSymbols

# Define our custom ingest function
 def ingest(environ,
 asset_db_writer,
 minute_bar_writer, # unused
 daily_bar_writer,
 adjustment_writer,
 calendar,
 cache,
 show_progress,
 output_dir,
 # pass these as defaults to make them 'nonlocal' in py2
 start=start,
 end=end):

if boDebug:
 print "entering ingest and creating blank dfMetadata"

dfMetadata = pd.DataFrame(np.empty(len(tuSymbols), dtype=[
 ('start_date', 'datetime64[ns]'),
 ('end_date', 'datetime64[ns]'),
 ('auto_close_date', 'datetime64[ns]'),
 ('symbol', 'object'),
 ]))

if boDebug:
 print "dfMetadata",type(dfMetadata)
 print dfMetadata.describe
 print

# We need to feed something that is iterable - like a list or a generator -
 # that is a tuple with an integer for sid and a DataFrame for the data to
 # daily_bar_writer

liData=[]
 iSid=0
 for S in tuSymbols:
 IFIL="/home/node/stockdata/"+S
 if boDebug:
 print "S=",S,"IFIL=",IFIL
 dfData=pd.read_csv(IFIL,index_col='Date',parse_dates=True).sort_index()
 if boDebug:
 print "read_csv dfData",type(dfData),"length",len(dfData)
 print
 dfData.rename(
 columns={
 'Open': 'open',
 'High': 'high',
 'Low': 'low',
 'Close': 'close',
 'Volume': 'volume',
 'Adj Close': 'price',
 },
 inplace=True,
 )
 dfData['volume']=dfData['volume']/1000
 liData.append((iSid,dfData))

# the start date is the date of the first trade and
 start_date = dfData.index[0]
 if boDebug:
 print "start_date",type(start_date),start_date

# the end date is the date of the last trade
 end_date = dfData.index[-1]
 if boDebug:
 print "end_date",type(end_date),end_date

# The auto_close date is the day after the last trade.
 ac_date = end_date + pd.Timedelta(days=1)
 if boDebug:
 print "ac_date",type(ac_date),ac_date

# Update our meta data
 dfMetadata.iloc[iSid] = start_date, end_date, ac_date, S

iSid += 1

if boDebug:
 print "liData",type(liData),"length",len(liData)
 print liData
 print
 print "Now calling daily_bar_writer"

daily_bar_writer.write(liData, show_progress=False)

# Hardcode the exchange to "YAHOO" for all assets and (elsewhere)
 # register "YAHOO" to resolve to the NYSE calendar, because these are
 # all equities and thus can use the NYSE calendar.
 dfMetadata['exchange'] = "YAHOO"

if boDebug:
 print "returned from daily_bar_writer"
 print "calling asset_db_writer"
 print "dfMetadata",type(dfMetadata)
 print dfMetadata
 print

# Not sure why symbol_map is needed
 symbol_map = pd.Series(dfMetadata.symbol.index, dfMetadata.symbol)
 if boDebug:
 print "symbol_map",type(symbol_map)
 print symbol_map
 print

asset_db_writer.write(equities=dfMetadata)

if boDebug:
 print "returned from asset_db_writer"
 print "calling adjustment_writer"

adjustment_writer.write()

if boDebug:
 print "returned from adjustment_writer"
 print "now leaving ingest function"

if boDebug:
 print "about to return ingest function"
 return ingest

Do you worry much about the code above. As long as you edit the file path, it should work correctly.

 

STEP 3- Make zipline aware of ‘viacsv’ module

Now move to your home directory and create ‘.zipline’ folder if you use Linux. In this case, I use /home/toro/.zipline.

from zipline.data.bundles import register 
from zipline.data.bundles.viacsv import viacsv 
from zipline.utils.calendars import get_calendar 
from zipline.utils.calendars import exchange_calendar_lse

eqSym = { 
 "ADVANC", 
}

register( 
 'csv', # name this whatever you like 
 viacsv(eqSym), 
 calendar_name='LSE', 
)

 

STEP 4- create bundle

zipline ingest -b csv

If you got an error, it is highly possible that you are using incorrect trading calendar which does not match your CSV data. In my case, I downloaded ADVANC stock data which is Thai stock market trading calendar which is different from US one. So, we have to modify our trading calendar when we try to create the bundle.

This is the calendar file location that you need to modify. Don’t forget to backup it before you modify.

/usr/local/lib/python2.7/dist-packages/zipline/utils/calendars/exchange_calendar_lse.py

 

Back-Testing Non-US data with Zipline

I created this post to share how we can use Zipline to back-test non-us data. Zipline is designed by a VC company called Quantopian and they open the source code for retail traders to use for stock back testing. However, it is only support US market data. Fortunately, there are some things we can do to make it works with Non-US data.

I am going to make Zipline works with Thai Stock data because I am a professional investors in Thailand and want Zipline to be my main tools to check my trading strategies whether or not it sounds for Thailand stock market.

I assume that you are familiar with Python and Zipline and how to install packages by using command lines. If not, please check my other posts to see how we can setup Zipline.

Here is the steps:

1) Downloaded data from yahoo into a csv file
2) Implemented the steps to ingest custom data from this link:
3) Ran the ingest comment using the LSE calendar (learn more about LSE calendar on this link)

from zipline.data.bundles import register 
from zipline.data.bundles.viacsv import viacsv 
from zipline.utils.calendars import get_calendar 
from zipline.utils.calendars import exchange_calendar_lse

eqSym = { 
 "CPI", 
}

register( 
 'csv2', # name this whatever you like 
 viacsv(eqSym), 
 calendar_name='LSE', 
)

4) Implemented the following code – and so far – bundle_data.equity_daily_bar_reader.trading_calendar.all_sessions has returned a UK looking calendar

bundle_data = load('csv2', os.environ, None) 
 cal = bundle_data.equity_daily_bar_reader.trading_calendar.all_sessions 
 pipeline_loader = USEquityPricingLoader(bundle_data.equity_daily_bar_reader, bundle_data.adjustment_reader) 
 choose_loader = make_choose_loader(pipeline_loader) 
 env = TradingEnvironment(bm_symbol='^FTSE', 
 exchange_tz='Europe/London',asset_db_path=parse_sqlite_connstr(bundle_data.asset_finder.engine.url))

data = DataPortal( 
 env.asset_finder, get_calendar("LSE"), 
 first_trading_day=bundle_data.equity_minute_bar_reader.first_trading_day, 
 equity_minute_reader=bundle_data.equity_minute_bar_reader, 
 equity_daily_reader=bundle_data.equity_daily_bar_reader, 
 adjustment_reader=bundle_data.adjustment_reader, 
 )

sadfasf