Building Financial Models Programmatically with Python and openpyxl
Hand-edited Excel models are unmaintainable at scale. Python + openpyxl makes the build script the source of truth. Here's our standard pattern: helpers, named ranges, the no-circularity rule, and how we validate all 18 models in one command.
Every financial model in our catalogue is generated by a Python script using openpyxl. We never edit the .xlsx files by hand. The build script is the source of truth.
This isn't a hipster tooling choice — it's the only way to maintain a catalogue of 18 institutional-grade models without going insane. Here's the pattern.
Why scripts as source of truth
Hand-edited Excel files have four problems at scale:
- No version control. Git can store .xlsx but can't diff it. You can see that the file changed but not what changed.
- Style drift. Five models, five slightly different formats. Looks unprofessional.
- Cascading bugs. Fix a formula in one model, forget to fix it in the other 17.
- No testing. You can't unit-test a hand-edited Excel file. You can unit-test a Python script that generates one.
Build scripts solve all four. Each model has a build_name_vn.py file. Run the script, get the .xlsx. Change the script, regenerate. Commit the script to git.
The standard helper library
Every build script imports the same set of helpers:
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
from openpyxl.worksheet.datavalidation import DataValidation
from openpyxl.formatting.rule import CellIsRule, FormulaRule
from openpyxl.workbook.defined_name import DefinedName
# Style constants
NAVY = "1F3864"
MID_BLUE = "2E75B6"
WHITE = "FFFFFF"
BLUE_FONT = "0000FF" # hardcoded inputs
BLACK_FONT = "000000" # formulas
YELLOW_BG = "FFFDE7" # input cell fill
GREEN_PASS = "C6EFCE" # CHECKS pass
RED_FAIL = "FFC7CE" # CHECKS fail
GREEN_TOT = "E2EFDA" # total row fill
# Number formats
FMT_GBP = '#,##0;(#,##0);"-"'
FMT_GBPm = '#,##0.0,,"m";(#,##0.0,,"m");"-"'
FMT_PCT = '0.0%;(0.0%);"-"'
FMT_BPS = '0" bps";(0" bps");"-"'
FMT_DATE = 'mmm-yy'
def inp(ws, row, col, value, fmt=FMT_GBP):
# Hardcoded input cell: blue font, yellow fill.
c = ws.cell(row=row, column=col, value=value)
c.font = Font(name="Arial", size=10, color=BLUE_FONT)
c.fill = PatternFill("solid", fgColor=YELLOW_BG)
c.number_format = fmt
return c
def calc(ws, row, col, formula, fmt=FMT_GBP):
# Formula cell: black font, no fill.
c = ws.cell(row=row, column=col, value=formula)
c.font = Font(name="Arial", size=10, color=BLACK_FONT)
c.number_format = fmt
return c
def label(ws, row, col, text, indent=0, bold=False):
# Text label cell.
c = ws.cell(row=row, column=col, value=text)
c.font = Font(name="Arial", size=10, bold=bold)
c.alignment = Alignment(horizontal="left", indent=indent)
return c
def section_header(ws, row, col_start, col_end, text):
# Dark navy section header.
c = ws.cell(row=row, column=col_start, value=text)
c.font = Font(name="Arial", size=10, bold=True, color=WHITE)
c.fill = PatternFill("solid", fgColor=NAVY)
for col in range(col_start + 1, col_end + 1):
ws.cell(row=row, column=col).fill = PatternFill("solid", fgColor=NAVY)
return c
Three functions handle 80% of cell creation: inp() for hardcoded inputs (blue/yellow), calc() for formulas (black, no fill), label() for text. Consistency across all 18 models comes for free.
Common openpyxl gotchas
Named ranges API changed
The old wb.defined_names.definedName API is deprecated. Use string-key iteration:
# CORRECT
for name in wb.defined_names:
dn = wb.defined_names[name]
# WRONG (raises AttributeError in current openpyxl)
for dn in wb.defined_names.definedName:
...
Formulas are strings, not evaluated
openpyxl writes formula strings to cells but doesn't evaluate them. The cell's value when read back is the formula text, not the result. To get evaluated values, you need to either:
- Open the file in Excel/LibreOffice (which triggers recalc on save)
- Run the file through
libreoffice --headless --convert-to xlsxto force recalc - Use a separate library like
pycel(limited support) orformulas(better but slow)
Our validation pipeline runs every model through headless LibreOffice to force recalc, then reads the values back. Catches formula bugs before shipping.
Performance with large workbooks
openpyxl in default (cell-by-cell) mode is slow for large workbooks. For models with >10,000 formula cells, use write-only mode or batch the cell creation.
Bigger payoff: avoid creating empty cells. openpyxl tracks every accessed cell, so even reading ws.cell(row=1000, col=1) in an empty area inflates the file. Only touch cells you intend to populate.
The no-circularity rule
The hardest discipline in financial modelling: avoiding circular references.
The classic case in banking: interest income depends on the loan balance, but the loan balance depends on interest income (interest gets capitalised, or the cash book grows from interest received).
The naive fix — turn on Excel's iterative calculation — works but creates a brittle model. Audit teams reject models that depend on iterative calculation because they can produce different results based on iteration count.
The clean fix: always reference prior period balance, never current.
# Cell layout: opening, drawdown, repayment, interest, closing
# For row 12 (period 2):
calc(ws, 12, 5, '=I11') # opening = prior closing
calc(ws, 12, 8, '=E12*rate/12') # interest on opening
calc(ws, 12, 9, '=E12+F12-G12+H12') # closing = opening + flows + interest
This pattern works for: lending interest, deposit interest, capital constraints (current period capital from prior period), provision charges. Circular by reference, sequential by construction.
Validation: the CHECKS tab
Every model has a CHECKS tab with 30–40 integrity tests:
- BS balances (assets = liabilities + equity)
- CF closing ties to BS cash
- Debt schedule closing ties to BS debt
- Total interest from debt schedule = total interest in IS
- Scenario switches working (Upside > Base on key metrics)
- No negative equity in Base scenario
Each check returns "PASS" or "FAIL" with conditional formatting (green/red). The whole catalogue is validated by:
cd models && python3 validate_all.py
If any model has a failing CHECK, the build fails. Catches bugs before shipping.
The full pipeline
- Edit build_name_vn.py
- Run
python3 build_name_vn.py— outputs .xlsx - Run
python3 validate_all.py— opens each .xlsx through LibreOffice headless and reads CHECKS tab - If validation passes, commit the build script + .xlsx to git
- Push — CI rebuilds and re-validates as a sanity check
This is how we ship 18 institutional-grade models with two engineers (effectively just me) and zero production bugs in 18 months.
Should you do this?
If you're building 1–2 financial models, no — openpyxl is overkill, just use Excel directly.
If you're building 5+ models that share styling, structure, or reusable calculations, yes — the upfront investment in helpers pays back within the third model.
Our entire catalogue of 18 models uses this approach. If you're curious about the helper library or have specific openpyxl questions, ping us via contact.
Get the SFS Models Catalogue
The model that puts the principles in this post into practice. Bank-grade build, open formulas, no VBA. Same-day delivery.
SFS Models builds institutional-grade Excel financial models for banking and finance professionals. Browse the catalogue or get the free sample.