Performance¶
A short list of things that move the needle on real models.
Threading¶
polar_high defaults POLARS_MAX_THREADS=1 at import time. On the
indexed-LP build patterns the engine drives, polars's Rayon
coordination overhead consistently exceeds the parallel speedup —
single-thread is faster and leaner. The benchmark page has the
numbers; the short version is "polars threading helps the matrix
assembly by maybe 10–15 % at large N, while costing per-thread
scratch memory at every N".
To override:
# or programmatically, BEFORE importing polar_high
import os
os.environ["POLARS_MAX_THREADS"] = "8"
import polar_high # picks up your override; setdefault no-ops
The env var must be set before any import of polars (or anything that imports polars) — polars reads it once at module load. If you've already imported polars elsewhere in your process, the default is locked in.
Solver options dominate¶
For LPs of meaningful size, HiGHS options matter more than build-side micro-optimizations. Two switches that have moved end-to-end runtime by 2–3× on real models:
p.set_solver_options({"presolve": "off"}) # for warm chains
p.set_solver_options({"solver": "simplex"}) # for LPs with simplex-friendly structure
- Default
presolve=onis great for cold solves but discards basis information. If you are warm-starting viaWarmProblem, turning presolve off is usually a win. solver=ipm(interior point) can be much faster on large LPs without warm starts, but provides no basis to re-use.
Always benchmark your specific model — these are starting points, not rules.
Build path¶
The kernel goes index frames → polars joins → coordinate-format
(COO) triples → compressed-sparse-column (CSC) → HiGHS passModel.
The hot loops are:
- Constraint loop (
Problem.solvewalks_cstrs, joining row indices with each LHS term, building COO triples). - COO → CSC (
numpylexsort + cumsum + populating theHighsLpstruct). passModel(a single C call into HiGHS).
For models in the 10⁴–10⁵ row range, the constraint loop dominates; beyond that, HiGHS run time dominates and build cost is noise.
polars patterns that help¶
- Use
LazyFrames for derived Params. The kernel internally stores Params lazily; if you compute composite Params withParam * Param / Param, that chain stays lazy until consumption. - Pre-build index frames once, reuse them for every constraint
that shares an index. Building the same
(n, d, t)frame 30 times is cheap individually but adds up. - Densify Params at the boundary, not inside the kernel. Inner-
joins drop missing cells (see warning),
so if you need zero-fill, do the
left_join/fill_null(0)once before constructing the Param.
Profiling¶
Problem.solve() is straightforward to profile with cProfile or
py-spy. The two phases worth timing separately are:
- build — everything before
h.run(); - run — HiGHS itself.
If HiGHS dominates, your time is best spent on solver options. If build dominates, look at the constraint loop and at any Param chains that are accidentally re-collected on each access.
Releasing memory between solves¶
In a long-running rolling-horizon or parameter-sweep loop, intermediate
polars frames can stick around longer than you expect. The kernel does
not call gc.collect() for you — that would pay a full mark-and-sweep
without freeing the polars/Arrow buffers, which are released when their
owning Python references drop. The right pattern is at the call site:
for window in windows:
p = build_problem(window)
sol = p.solve()
write_outputs(sol)
del p, sol # drop refs to LP, intermediates, Solution
gc.collect() # optional; only useful if reference cycles linger
del-then-gc.collect() is most useful when you've observed RSS
growing across iterations; in plain loops the refcount drop on
re-binding is usually enough.