Home · Blog

Building Financial Models Programmatically with Python and openpyxl

Published 2026-04-22 · SFS Models

Hand-edited Excel models are unmaintainable at scale. Python + openpyxl makes the build script the source of truth. Here's our standard pattern: helpers, named ranges, the no-circularity rule, and how we validate all 18 models in one command.

Every financial model in our catalogue is generated by a Python script using openpyxl. We never edit the .xlsx files by hand. The build script is the source of truth.

This isn't a hipster tooling choice — it's the only way to maintain a catalogue of 18 institutional-grade models without going insane. Here's the pattern.

Why scripts as source of truth

Hand-edited Excel files have four problems at scale:

  1. No version control. Git can store .xlsx but can't diff it. You can see that the file changed but not what changed.
  2. Style drift. Five models, five slightly different formats. Looks unprofessional.
  3. Cascading bugs. Fix a formula in one model, forget to fix it in the other 17.
  4. No testing. You can't unit-test a hand-edited Excel file. You can unit-test a Python script that generates one.

Build scripts solve all four. Each model has a build_name_vn.py file. Run the script, get the .xlsx. Change the script, regenerate. Commit the script to git.

The standard helper library

Every build script imports the same set of helpers:

from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
from openpyxl.worksheet.datavalidation import DataValidation
from openpyxl.formatting.rule import CellIsRule, FormulaRule
from openpyxl.workbook.defined_name import DefinedName

# Style constants
NAVY       = "1F3864"
MID_BLUE   = "2E75B6"
WHITE      = "FFFFFF"
BLUE_FONT  = "0000FF"   # hardcoded inputs
BLACK_FONT = "000000"   # formulas
YELLOW_BG  = "FFFDE7"   # input cell fill
GREEN_PASS = "C6EFCE"   # CHECKS pass
RED_FAIL   = "FFC7CE"   # CHECKS fail
GREEN_TOT  = "E2EFDA"   # total row fill

# Number formats
FMT_GBP    = '#,##0;(#,##0);"-"'
FMT_GBPm   = '#,##0.0,,"m";(#,##0.0,,"m");"-"'
FMT_PCT    = '0.0%;(0.0%);"-"'
FMT_BPS    = '0" bps";(0" bps");"-"'
FMT_DATE   = 'mmm-yy'

def inp(ws, row, col, value, fmt=FMT_GBP):
    # Hardcoded input cell: blue font, yellow fill.
    c = ws.cell(row=row, column=col, value=value)
    c.font = Font(name="Arial", size=10, color=BLUE_FONT)
    c.fill = PatternFill("solid", fgColor=YELLOW_BG)
    c.number_format = fmt
    return c

def calc(ws, row, col, formula, fmt=FMT_GBP):
    # Formula cell: black font, no fill.
    c = ws.cell(row=row, column=col, value=formula)
    c.font = Font(name="Arial", size=10, color=BLACK_FONT)
    c.number_format = fmt
    return c

def label(ws, row, col, text, indent=0, bold=False):
    # Text label cell.
    c = ws.cell(row=row, column=col, value=text)
    c.font = Font(name="Arial", size=10, bold=bold)
    c.alignment = Alignment(horizontal="left", indent=indent)
    return c

def section_header(ws, row, col_start, col_end, text):
    # Dark navy section header.
    c = ws.cell(row=row, column=col_start, value=text)
    c.font = Font(name="Arial", size=10, bold=True, color=WHITE)
    c.fill = PatternFill("solid", fgColor=NAVY)
    for col in range(col_start + 1, col_end + 1):
        ws.cell(row=row, column=col).fill = PatternFill("solid", fgColor=NAVY)
    return c

Three functions handle 80% of cell creation: inp() for hardcoded inputs (blue/yellow), calc() for formulas (black, no fill), label() for text. Consistency across all 18 models comes for free.

Common openpyxl gotchas

Named ranges API changed

The old wb.defined_names.definedName API is deprecated. Use string-key iteration:

# CORRECT
for name in wb.defined_names:
    dn = wb.defined_names[name]

# WRONG (raises AttributeError in current openpyxl)
for dn in wb.defined_names.definedName:
    ...

Formulas are strings, not evaluated

openpyxl writes formula strings to cells but doesn't evaluate them. The cell's value when read back is the formula text, not the result. To get evaluated values, you need to either:

Our validation pipeline runs every model through headless LibreOffice to force recalc, then reads the values back. Catches formula bugs before shipping.

Performance with large workbooks

openpyxl in default (cell-by-cell) mode is slow for large workbooks. For models with >10,000 formula cells, use write-only mode or batch the cell creation.

Bigger payoff: avoid creating empty cells. openpyxl tracks every accessed cell, so even reading ws.cell(row=1000, col=1) in an empty area inflates the file. Only touch cells you intend to populate.

The no-circularity rule

The hardest discipline in financial modelling: avoiding circular references.

The classic case in banking: interest income depends on the loan balance, but the loan balance depends on interest income (interest gets capitalised, or the cash book grows from interest received).

The naive fix — turn on Excel's iterative calculation — works but creates a brittle model. Audit teams reject models that depend on iterative calculation because they can produce different results based on iteration count.

The clean fix: always reference prior period balance, never current.

# Cell layout: opening, drawdown, repayment, interest, closing
# For row 12 (period 2):
calc(ws, 12, 5, '=I11')              # opening = prior closing
calc(ws, 12, 8, '=E12*rate/12')      # interest on opening
calc(ws, 12, 9, '=E12+F12-G12+H12')  # closing = opening + flows + interest

This pattern works for: lending interest, deposit interest, capital constraints (current period capital from prior period), provision charges. Circular by reference, sequential by construction.

Validation: the CHECKS tab

Every model has a CHECKS tab with 30–40 integrity tests:

Each check returns "PASS" or "FAIL" with conditional formatting (green/red). The whole catalogue is validated by:

cd models && python3 validate_all.py

If any model has a failing CHECK, the build fails. Catches bugs before shipping.

The full pipeline

  1. Edit build_name_vn.py
  2. Run python3 build_name_vn.py — outputs .xlsx
  3. Run python3 validate_all.py — opens each .xlsx through LibreOffice headless and reads CHECKS tab
  4. If validation passes, commit the build script + .xlsx to git
  5. Push — CI rebuilds and re-validates as a sanity check

This is how we ship 18 institutional-grade models with two engineers (effectively just me) and zero production bugs in 18 months.

Should you do this?

If you're building 1–2 financial models, no — openpyxl is overkill, just use Excel directly.

If you're building 5+ models that share styling, structure, or reusable calculations, yes — the upfront investment in helpers pays back within the third model.

Our entire catalogue of 18 models uses this approach. If you're curious about the helper library or have specific openpyxl questions, ping us via contact.

Get the SFS Models Catalogue

The model that puts the principles in this post into practice. Bank-grade build, open formulas, no VBA. Same-day delivery.

View SFS Models Catalogue   Try Free Sample

SFS Models builds institutional-grade Excel financial models for banking and finance professionals. Browse the catalogue or get the free sample.