Skip to content

Changelog

[3.4.0] — 2026-07-01

Added

  • InOutStabilizer (polar_high.decomposition) — a domain-agnostic in-out separation point picker (Ben-Ameur & Neto 2007) for cutting-plane / Benders drivers. Cutting-plane methods that generate each cut at the raw master vertex f_out tail off badly when the recourse is flat in the coupling variable (dual-degenerate slopes): the master wanders among cost-equivalent vertices and the bound closes very slowly. The stabilizer instead returns an interior separation point f_sep = λ·centre + (1-λ)·f_out at which to generate the cut — better-centred cuts, no wandering, faster bound closure, at zero extra subproblem solves. Domain-free (operates only on {col_id -> value} point dicts and scalar weights); constructed ONE PER REGION. λ = 0.0 is a verbatim no-op (returns its input unchanged, so byte-parity with exact Benders holds by construction). Convergence guarantee: the moment a region's cut fails to separate its f_out, the next separation_point returns the master point verbatim (λ = 0 ⇒ exact Benders), and the null-step weight-shrink bottoms out at a forced 0 rather than a positive floor. Stateful, deterministic, side-effect-free.

[3.3.0] — 2026-07-01

Added

  • StallMonitor / StallVerdict (polar_high.decomposition) — a domain-agnostic tail-off / stall detector for cutting-plane / Benders drivers (the outer loop that consumes WarmProblem.add_cut_row / add_recourse_col). It notices when the outer iteration has stopped making progress and lets the driver bail out with a diagnostic instead of silently burning the iteration cap. The monitor knows only the two scalars every Benders-style loop already maintains — a lower bound and a best-so-far upper bound — plus one caller-supplied reference_scale (a "sane objective magnitude" the driver computes from its own problem); it carries no domain concepts. A run is declared stalled only when a CONJUNCTION holds over a full trailing window: the relative gap exceeds gap_floor (far from converged), the best upper bound has not improved by more than min_rel (incumbent frozen), and the best upper bound is still above blowup_mult * reference_scale (frozen far above any sane magnitude — the penalty / complete-recourse regime). The conjunction is what separates a genuine tail-off from the benign frozen windows a converging run exhibits (early blow-up that shrinks fast, short flat stretches at a sane magnitude). Stateful (bounded deque), deterministic, and side-effect-free — it only reports.

[3.2.0] — 2026-06-30

Changed

  • Autoscale Layer 3 now centres the objective over HiGHS' cost comfort zone. When a cost coefficient would trip a HiGHS warning — smallest |c| < 1e-4 ("excessively small costs") or worst |c| > 1e+4 — Layer 3 picks the power-of-two user_objective_scale exponent that places the band's geometric centre sqrt(min·max) at the zone's geometric centre (1.0), instead of clamping the offending end to a boundary. For a band narrower than the zone both ends land inside [1e-4, 1e+4]; for a band wider than the zone the unavoidable violation falls symmetrically on both ends in log-space. Bands already inside the zone are untouched (N = 0), so well-scaled models are never perturbed. The exponent is a power of two and HiGHS unscales the objective and duals on output, so the reported solution is unchanged — only the magnitudes the simplex pivots on. The new audit tag is center.
  • The previously-unhandled small-cost case (HiGHS' "Problem has some excessively small costs" warning) is now corrected: it was silently ignored before — only large costs were scaled.
  • The pure large-cost case now centres as well (previously clamped the worst |c| to the 1e+4 working ceiling). The result is a slightly larger-magnitude down-scale; being power-of-two it remains output-invariant.

[3.1.0] — 2026-06-25

Added

  • Solution.max_primal_infeasibility and Solution.primal_feasibility_tolerance: expose the solver's achieved unscaled primal slack and its primal feasibility tolerance, so a caller that hand-checks a constraint on the returned solution (e.g. a Benders master coupling self-check) can size its tolerance from the solver's own achieved feasibility instead of a hard-coded magic constant. HiGHS enforces feasibility on the internally-scaled problem, so the unscaled slack reported here can exceed the nominal scaled tolerance — a normal solver artifact callers must allow for. Both return 0.0 for a synthesised Solution with no live HiGHS handle.

[3.0.0] — 2026-06-25

Refocuses regional decomposition: the built-in dual-subgradient LagrangianProblem driver is removed and replaced by small, generic cutting-plane primitives, on top of which the caller builds a Benders decomposition. Problem / WarmProblem / Param and all engine + solver exports are otherwise unchanged. (The subgradient driver shipped only in the unreleased 2.7–2.9 line; those tags never reached PyPI, so 3.0.0 is the first published release to carry this decomposition rework — a 2.x → 3.0.0 upgrade gains the primitives, not just loses the driver.)

Removed

  • LagrangianProblem and the whole subgradient decomposition (lagrangian.py: LagrangianProblem, CouplingSpec, CouplingEntry, LagrangianSolution, and the parallel-subsolve pool), plus their top-level exports. Its one consumer (FlexTool) moved to a Benders decomposition built from the cut-append / warm-restart / parallel primitives below, which give a tight bound without a subgradient tail. Pin polar-high<3 for the old driver.

Added

  • WarmProblem.add_cut_row(col_ids, coefs, lower) -> int: append an optimality-cut row Σ coefs·x >= lower to the live (already-built) master and return its row_id; the cut dual is read by Solution.row_dual[row_id]. Plus add_recourse_col(name, cost, lower, upper) for lazy recourse columns. Both are post-build live edits that deliberately bypass the build-time DSL lock (it only guards the fixed-size autoscale side vectors). No auto-scaling for appended rows — keep the master autoscale off, or pre-scale coefs so the cut lives on the built columns' scale.
  • WarmProblem.solve(*, retry_on_unknown=False): warm re-solve after a cut append — the retained basis lets dual simplex hot-start across the appended >= row, falling back to a single clearSolver cold re-presolve only if the warm run fails to certify kOptimal (kUnknown / transient kSolveError / spurious kUnbounded off a stale basis). Default False is byte-identical to before. On a well-scaled master the warm path holds with no fallback, removing the super-linear cold-presolve cost that dominated the Benders master at scale.
  • polar_high.parallel (exported at top level): solve_indexed_parallel (fan fn(i) across a thread pool, collect into per-index slots so the result is timing-independent; requires every WarmProblem already built, raises on an unbuilt one), prewarm_global_scheduler (pin the process-global HiGHS scheduler to one thread ONCE so concurrent run() calls stay single-threaded and deterministic), and resolve_worker_count. workers <= 1 keeps a sequential path byte-identical to a plain for loop. Recovers the scheduler-pin pattern from the removed subgradient pool; enables parallel Benders region solves.
  • WarmProblem.set_output_flag(enabled): enable/disable HiGHS' native solve log for this problem; the preference persists on the handle across cold / warm / retry solves (applied immediately if built, else at the initial build — before the log routing, so False also suppresses the HiGHS version banner). Lets a driver that fans out many sub-solves (Benders regions) mute the per-sub-solve output and show its own concise log instead.

Changed

  • Coverage for two live WarmProblem methods (update_obj_coef_array, fix_cols) that the deleted Lagrangian tests exercised is relocated into test_warm_problem.py.
  • Docs: the "Lagrangian" guide is replaced by "Decomposition building blocks" (cut-append / warm-restart / parallel); the API reference and cross-links are updated to the new public surface.

[2.9.0] — 2026-06-24

Adds opt-in thread-parallel subsolves and a per-subsolve callback to the Lagrangian driver, with HiGHS-log silencing for that path. Default behaviour is unchanged — with no new kwargs the solve stays sequential, fires no per-subsolve callback, and keeps today's verbose native log.

Added

  • LagrangianProblem.solve(max_workers=...): optional cap on the number of worker threads used to solve subproblems concurrently within each barrier (initial build, per-iteration, primal recovery). None (default) or 1 keeps the fully sequential path. The effective count is clamped to [1, n_subproblems]. When > 1, every subsolve is forced to threads=1 so each h.run() is deterministic (HiGHS is non-deterministic with threads > 1) and the box is not oversubscribed — two parallel solves with different max_workers are byte-identical. The COLD initial build also parallelizes ACROSS regions: the process-global HiGHS scheduler is pre-pinned to a single thread ONCE up front (_prewarm_global_scheduler), after which the per-region first solves fan out concurrently WITHOUT passing threads (so no per-instance resetGlobalScheduler). Parallelism is across regions only — each individual solve stays single-threaded on the pinned pool. If the one-time prewarm fails, the build falls back to a sequential cold loop on the calling thread (threads=1 per first solve pins the scheduler) and the warm iterations still parallelize; the cold-parallel and cold-sequential builds are bit-identical. The executor is shut down on every exit path, including the no-coupling early return and any raised exception (fail-fast on the lowest non-optimal subproblem index, queued siblings cancelled).
  • LagrangianProblem.solve(subsolve_callback=...): optional callable invoked at the start and finish of every individual subproblem solve. It fires from worker threads when max_workers > 1 and MUST be thread-safe; exceptions are suppressed so a faulty observer can never abort the solve. The callback dict schema (pinned):
  • start: {"event": "start", "iter": <int>, "subproblem": <int>, "phase": <"initial"|"iterate"|"recovery">}
  • finish: {"event": "finish", "iter": <int>, "subproblem": <int>, "phase": <same>, "obj": <float>}"obj" is present only when that subsolve reached optimality. phase is "initial" for the build solve (iter == 0), "iterate" for outer iterations (iter >= 1), and "recovery" for the primal-recovery solve (iter == -1).

Changed

  • When the caller uses the new functionality (max_workers > 1 or a subsolve_callback), the per-subsolve HiGHS native log is silenced. Set POLAR_HIGH_LAGRANGIAN_VERBOSE=1 to force the verbose native log back on. Plain existing callers keep today's verbose native log unchanged.

[2.8.0] — 2026-06-17

Retains each region's recovered-primal column values in the Lagrangian result so a downstream caller can reconstruct every subproblem's primal (e.g. investment-variable values) after the solve. Opt-in/backward- compatible — the new field defaults to an empty list and existing callers are unaffected.

Added

  • LagrangianSolution.subproblem_col_values: one numpy float64 col_value array per subproblem (region), in subproblem order, each the region's FINAL recovered-primal column values. Populated on every return path of LagrangianProblem.solve() — the main subgradient/primal-recovery path (a region whose recovery solve is skipped/non-optimal falls back to its most recent iterate, so the list is always full-length and index-aligned) and the trivial no-coupling early-return path (from each subproblem's initial solve). Each array is layout-aligned with that region's own subproblems[i]._vars[name].frame['col_id'], so a caller can index those col_ids into entry i to assemble a whole-system primal from the per-region solves.

[2.7.0] — 2026-06-17

Adds opt-in live progress reporting to the Lagrangian driver. Default behaviour is unchanged — solve() stays silent unless a callback is passed.

Added

  • LagrangianProblem.solve(progress_callback=...): an optional callable invoked once per outer subgradient iteration with that iteration's log dict (iter, alpha_k, max_abs_residual, total_obj), and once at the end with the final-summary dict (iter == -1, carrying best_dual_total / recovered_total). Lets callers stream live decomposition progress instead of only inspecting iteration_log after the fact. None (default) is a no-op and preserves the prior silent behaviour; callback exceptions are suppressed so a faulty observer can never abort the solve.

[2.6.0] — 2026-06-16

Adds an opt-in small-coefficient cutoff. Default behaviour is unchanged — byte-identical to 2.5.1 unless a caller sets the new threshold.

Added

  • Problem.coef_zero_threshold (default 0.0 = off): floors any LP matrix coefficient or RHS row-bound whose absolute value is below the threshold to exactly 0.0, narrowing the numerical range (conditioning) of the LP for callers that opt in. Applied at every coefficient/RHS finalize point so it is independent of the build path — initial build (_solve_streaming, _build_canonical_matrix, including row_lb/row_ub) and warm in-place updates (WarmProblem.update_param, update_rhs). ±inf / NaN sentinels are preserved and entries are replaced, never dropped, so matrix structure and determinism are unchanged; a threshold of 0.0 short-circuits to a no-op.

[2.5.1] — 2026-06-11

Hardens the 2.5.0 HiGHS-log routing so it can never lose the log. The LP build, autoscale, and solve numerics are unchanged from 2.5.0.

Fixed

  • route_highs_log_to_stdout suppressed HiGHS' native console write (log_to_console=False) and re-emitted via the sys.stdout callback on every routed solve. Suppressing native logging is a bet that the callback fires, and some highspy builds register the kCallbackLogging callback but never deliver a message — so suppress-native + silent-callback dropped the entire HiGHS log (observed in a Linux GUI subprocess on one machine, while an otherwise-identical machine with a different highspy was fine). The routing now suppresses native logging only when sys.stdout is not backed by the native stdout fd (fd 1). When the sink already is fd 1 — a terminal, a pipe (e.g. a GUI reading a subprocess), or a file on fd 1 — HiGHS' own native log already reaches it, so native logging is left intact and the callback is skipped (new helper _sink_is_native_stdout). Routing + suppression now happens only where sys.stdout genuinely diverges from fd 1 (Windows Basic Console OutStream, redirect_stdout, pytest capsys), i.e. where the native write is unreachable anyway. POLAR_HIGH_NATIVE_LOG=1 still opts out everywhere.

[2.5.0] — 2026-06-10

Routes the HiGHS solver log through Python's sys.stdout so it is visible in consoles that only capture the Python-level stream (not the native fd-1 write). The LP build, autoscale, and solve numerics are byte-identical to 2.4.5; only where the log appears changes.

Added

  • route_highs_log_to_stdout(h, *, stream=None) (new module polar_high._log_routing): registers a HiGHS kCallbackLogging callback that re-emits each message through sys.stdout (resolved lazily, so it follows later redirection such as ipykernel's) and suppresses the duplicate native console write (log_to_console=False) once the callback is confirmed registered. Idempotent (safe on a reused Highs), a no-op on silent solves (output_flag false), and fully defensive — any highspy error leaves the native logging path untouched rather than risking a lost log.

Changed

  • The in-process solve sites (solvers/_highs.py, the streaming Problem.solve, and WarmProblem._initial_build) now route the HiGHS log through sys.stdout by default, right after options are applied so the version banner is captured. This makes the solver log visible under the Jupyter / Spine-Toolbox Basic Console on Windows (where ipykernel only redirects fd 1 on POSIX, so the entire HiGHS log was previously lost) and under redirect_stdout / pytest capsys. Set POLAR_HIGH_NATIVE_LOG=1 to opt out and keep HiGHS' native fd-1 logging.

[2.4.5] — 2026-06-02

Docs-polish release. No runtime or public-API changes — the LP build, autoscale, and solve paths are byte-identical to 2.4.4.

Changed

  • Loading-data Memory section rewritten to lead with what keeps the footprint down — the integer-keyed coefficient matrix (col_id / row id / float64, no string labels) and the section-by-section streaming build that releases each constraint family's input frames before the next — before the long-format constant factor. Notes that HiGHS column names embed the dim labels and carry their own cost (~1.1 GB at the 3000² grid), shed via save_memory=True or write_mps(emit_names=False). Conclusion updated to match the current benchmarks: polar-high matches or beats linopy/xarray peak memory on the irregular network LP and on the dense N × N LP with save_memory.
  • Scaling guide: the "falsely infeasible" risk is now scoped to badly-scaled models (eight or nine decades) rather than implied for any wide spread; the "who this is for" bullet list is replaced with an inline sentence; and a new section explains the three scaling layers (detection, semantic rewrites, HiGHS-native global scaling) and how they compose with HiGHS' own simplex_scale_strategy.
  • Performance guide: the Threading section is rewritten — raising the thread count does speed up the build, with the trade-offs being per-thread scratch memory and fewer concurrent runs; the default of 1 is tuned for the "many independent solves" deployment. Scaling is added to the solver-options list.

Fixed

  • env-vars guide: dropped the retired POLAR_HIGH_RANGES_MAX_FAMILY_ROWS "Workload tuning" knob — it was retired when autoscale range detection moved to the per-term walk that bounds peak memory regardless of family size; the env var is no longer read and setting it is a no-op. Replaced the dead FlexTool dev/env_vars documentation link (404) with the repository.
  • Benchmark harness: sorted the pulp_net import block (ruff I001).

[2.4.4] — 2026-06-02

Docs-polish release. No runtime or public-API changes — the LP build, autoscale, and solve paths are byte-identical to 2.4.3.

Changed

  • Problem(dense_axes=...) now has substantive documentation: a new section in the Performance guide covers what the block-COO arm does (slice the dense suffix of each Var's frame as a zero-copy numpy view, multiply with ufuncs), the row-sort contract the caller signs up to, suffix-matching against each Var's dims, and POLAR_HIGH_DISABLE_BLOCK_COO=1 for A/B rollback. The Concepts page picks up a short mention pointing readers in. API reference was already covered via the Problem.__init__ docstring.
  • Enum dtype alignment: the depth that lived in the README is rebuilt under docs/guide/loading-data.md with a concrete side-by-side example (capacity_df on a subset Enum vocab, cost_df on the full vocab) before the alignment-table contract. The README section is trimmed; the documentation index already links to the Guide.
  • Benchmark page picks up the v2.4 numbers and a clarification that the save_memory trade-off axis is "how much work HiGHS does", not model size.
  • mkdocs.yml drops a legacy CDN script from extra_javascript (MathJax 3 renders on every current browser without it).

Fixed

  • Benchmark plot harness: polar_da rows fold into the polar line on the dense plots so the published figure tracks the forthcoming block-COO-by-default behaviour without a redundant overlapping series. The network plot keeps polar_da_net as a distinct line where the irregular topology surfaces a small visible delta.

[2.4.3] — 2026-06-01

Benchmark-methodology + docs release. No runtime or public-API changes — the LP build, autoscale, and solve paths are byte-identical to 2.4.1.

Note on 2.4.2. The v2.4.2 git tag was pushed before the pyproject.toml version bump landed, so the published 2.4.2 wheel carries version = "2.4.1" in its metadata. 2.4.2 has been yanked on PyPI; use 2.4.3, which contains the same source as 2.4.2 plus the correct version string. Pinned polar-high==2.4.2 installs are unaffected — yanking does not remove the wheel.

Changed

  • Benchmark harness wraps each cell in a fresh systemd-run --user --scope. The worker reads cgroup v2 memory.peak from its own cgroup and emits it as a new cgroup_peak_mb column — the kernel-level peak the OOM killer would charge against a budget, less noisy across reps than the process-level ru_maxrss we previously plotted. Auto-falls back to plain subprocess on hosts where systemd-run --user is unavailable; --no-cgroup-scope forces the fallback. In the fallback path cgroup_peak_mb is NaN and peak_rss_mb continues to work.
  • Benchmark grows two new polar variants alongside the existing polar / polar_net tools:
  • polar_sm / polar_sm_net — exercise save_memory=True so the harness produces directly comparable regular-mode and one-shot-mode numbers on the same hardware.
  • polar_da / polar_da_net — exercise the explicit Problem(dense_axes=...) contract on the dense and network LPs.
  • docs/compare/benchmark.md refreshed against v2.4.x:
  • headline cells show both polar modes side-by-side with the cgroup peaks;
  • "Measuring memory" leads with cgroup_peak_mb as the canonical peak metric and demotes peak_rss_mb to a process-level note;
  • dense full-solve N=3000 peak drops 38.1 GB → 33.2 GB on polar regular (the autoscale memory-bounding work from 2.4.0 shows up here);
  • network-LP threading speedup at N=10 000 ticks up from 1.35× to 1.40×; other rows refreshed accordingly.

[2.4.1] — 2026-06-01

CI / test-tooling release. No runtime or public-API changes — the LP build, autoscale, and solve paths are byte-identical to 2.4.0.

Fixed

  • psutil added to the test optional-dependencies. The bounded-memory branch-fired profile tests detect which evaluation branch ran via the env-gated profile lines on stderr, and those lines are psutil-gated: without psutil installed, autoscale/_ranges.py sets _profile=False and the signals never print, so the four branch-fired tests (test_ranges_block_coo_branch_fired, test_ranges_rhs_bound_branch_fired, test_walk_fallback_profile_signal_fires, test_rhs_walk_fallback_no_skip_and_fires) skipped their assertions in CI's lean pip install -e ".[test]" environment. Declaring psutil as a test dependency makes CI exercise the profiling signals. The bounded-memory feature itself was always correct and is unaffected — the byte-identical parity gate passes with or without psutil.

Changed

  • ruff lint + format cleanup of src/ and tests/ (import sorting, an unnecessary open(..., "r") mode argument, a stray unused local, and a whole-tree ruff format reflow) so the lint CI job passes under the latest ruff. No behavioural change.

[2.4.0] — 2026-06-01

Block-COO coefficient evaluation + bounded-memory autoscale

This release makes LP build and autoscale memory-bounded: no constraint or objective family can spike RAM by materialising a wide coefficient product. Two pillars:

Block-COO evaluation (Phase C). A polars frame sorted with its dense axis trailing is physically a sequence of contiguous Arrow blocks; the new path slices each block as a zero-copy numpy view and multiplies the factors with ufuncs — no wide relational join, no wide intermediate. The client declares its dense trailing axes once via the new Problem(dense_axes=...) / Problem.declare_dense_axes(...) contract (verified cheaply, O(n), no re-sort). Sum-wrapped chains are evaluated in one pass via a captured SumBlockMeta reconstruction recipe, with a relabel fast-path (reduce dims ⊆ var dims ⇒ a pure relabel, bit-identical to the polars reduce) — the memory win for Sum-heavy families. Genuine coefficient-combining sums stay bit-equivalent (FP reassociation ~1e-9).

Bounded-memory autoscale (Phase D). The autoscale Layer-1 range readout and Layer-2 magnitude-bucketing used to materialise the wide coefficient product just to read a statistic. Both now route through a new general primitive, bounded_coefficient_walk, which iterates the constraint/column spine in fixed row-batches and rebuilds each batch's (rid/col_id, coef) via the block-COO builders + a prune-down backstop — never holding more than one batch's product. Pluggable reducers compute min/max (byte-identical, order-free) and the log2-magnitude histogram (exponents may shift ±1 = an objective-invariant scaling change). The reconstruction recipe is forwarded through post-Sum Expr-algebra (scalar/Param mul+div, negation, subtraction, Where, and set_objective's collapse-all Sum) so every wide-product term — the objective and negated-Sum constraints included — routes through the walk instead of a materialising collect. The size-blind family-row skip is retired: every shape is now bounded, with no silent coverage gaps.

Validated on a 9-roll rolling-horizon LP (FlexTool DES / RETO-Africa): autoscale priv_dirty peak 46 → 23 GB, all objectives byte-identical, ~15% faster. The DSL is unchanged.

Added

  • Problem(dense_axes=...) / Problem.declare_dense_axes(...) — declare the pre-sorted dense trailing axes that enable block-COO evaluation.
  • bounded_coefficient_walk + CoefWalkRecipe (.from_term, .from_rhs_chain, .from_rhs_param, .is_buildable) + MinMaxAbsReducer / Log2HistogramReducer in autoscale/_coef_walk.py.
  • POLAR_HIGH_DISABLE_BLOCK_COO=1 — fall every term back to the polars path (A/B rollback).
  • POLAR_HIGH_BLOCK_COO_PROFILE / POLAR_HIGH_RANGES_PROFILE / POLAR_HIGH_LAYER2_PROFILE — env-gated instrumentation (no-op when off).

Changed

  • Autoscale Layer-1 (_ranges.py) and Layer-2 (_layer2.py) read coefficient ranges / magnitudes via the bounded walk. The ranges-PRE pass no longer gates the walk on the (not-yet-installed) Layer-2 side vectors, so it bounds every family there too.
  • The Sum block-COO path defers map-effect Where via _Term.where_map_frames and forwards sum_block_meta through Expr-algebra; a re-reducing outer Sum still correctly drops the recipe.

Removed

  • _skip_unbounded_over_cap and POLAR_HIGH_RANGES_MAX_FAMILY_ROWS — the size-blind family-row cap, superseded by the bounded walk.

[2.3.0] — 2026-05-29

Where pushdown (added in the same release window as the prune-down work below)

Where(expr, frame) with a pure-filter shape (frame columns are a subset of the expression's open dims — no map effect introducing new dims) now defers the filter into a new _Term.where_frames slot instead of inner-joining frame into t.lazy eagerly. The LHS prune-down (_build_lhs_pruned_plan) then applies each recorded frame at the Var leaf AND at every Param atomic during chain rebuild — mirror of the existing row_index pre-prune pattern. Net effect: the filter narrows every intermediate, not just the final result.

Pure-filter Where now also PRESERVES var_source / param_sources / coef_scalar on its output term (today's path cleared them). This closes the Where leg of the "Sum/Where/Lag wrapping" limitation flagged after the 2026-05-28 audit — LHS prune-down now fires on Where-wrapped terms in addition to bare Var × Param × … chains. Sum / Lag bake where_frames into t.lazy before consuming it (they change row identity); the Sum leg remains future work.

Behaviour-preserving on every tested scenario: the LP matrix is byte-identical between the deferred-pushdown path and the env-var- disabled (eager-join) path. Validated on FlexTool's DES (RETO-Africa) scenario — 5.6M rows × 4.6M cols, same presolve reductions, same coefficient ranges. DES itself does not exercise any of the pushdown-eligible families (no per-process-profiles, no commodity ladder, no investment, no reserves), so no measurable RSS delta is expected there; richer scenarios with pure-filter Wheres over multi-atomic chains will see the win.

Added — Where pushdown

  • POLAR_HIGH_DISABLE_WHERE_PUSHDOWN=1 — safety fallback env var. When set, every Where call eagerly inner-joins frame into t.lazy and clears the leaf metadata exactly as the pre-v2.3.0 path. Use as an opt-out if a model surfaces unexpected drift on the pushdown path.

  • _Term.where_frames: tuple[pl.LazyFrame, ...] | None slot — opt-in metadata recording pure-filter Where frames so they can be applied at the leaves during chain rebuild. Internal — no public API change.

  • _apply_where_frames(lazy, dims, where_frames) helper — used by Sum / Lag to bake pending filters before consuming t.lazy, and by consumer fallback paths in _build_canonical_matrix / _solve_streaming / WarmProblem._initial_build when leaf-rebuild prune-down can't fire.

  • tests/test_where_pushdown_parity.py (11 tests) — parity coverage for pure-filter, map-effect, nested Where, Where-after-Sum, Sum-after-Where, anonymous-Param-chain, scalar-fold, Where-then-mul- Param, disable-guard, shared-empty-extras-nonempty, and RHS-Where- preserved-through-negation.

Fixed (latent — exposed by the parity sweep)

  • Where(expr, frame) with shared == [] and extras != () now explicitly cross-joins frame instead of silently claiming the extras columns on _Term.dims without ever producing them in t.lazy. Pre-fix this was a corruption waiting to happen; no known caller relied on the broken behaviour.

Removed (dead code)

  • The elif isinstance(rhs, (Var, Expr)) / standalone-block negation patterns in _build_canonical_matrix, _solve_streaming, and WarmProblem._initial_build are deleted. Problem.add_cstr folds Var/Expr RHS into the LHS via Expr.__sub__ before storing in _CstrProto, so proto.rhs only ever reaches those sites as a Param or scalar — the elif branches were unreachable.

Prune-down (initial v2.3.0 work — merged earlier in the same release window)

Memory cliff fix for Param-chain RHS / LHS in _build_canonical_matrix plus matching coverage in _solve_streaming and WarmProblem._initial_build. On FlexTool's DES (RETO-Africa) scenario the canonicalise stage of the first solve stalled inside profile_flow_upper_limit (1.5 M rows, RHS = profile_value × process_existing_count × process_availability) — the chained inner joins produced a ~2.6 billion-row Cartesian intermediate before the row_index semi-join could prune it. The fix walks each chain's named atomic Params and pre-prunes them against the constraint's row_index keys (projected onto each atomic's own dim subset), bounding the intermediate to the constraint row count.

Behaviour-preserving: every solve that completed before still produces identical numerics (verified against the FlexTool scenario parity suite, 139 polar_high tests + the previously-failing flextool scenarios). LP matrices are byte-identical between the prune-down path and the original merged-lazy path on every covered chain shape; the difference is solely intermediate peak memory.

Added

  • POLAR_HIGH_SOLVE_PROFILE=1 — env-var-gated stderr profile lines covering every meaningful sub-step of Problem._solve_streaming (cold path, 27 checkpoints) and WarmProblem._initial_build (18 checkpoints, including the per-family LP-build loop and the HiGHS handoff). Tab-separated [solve profile] phase=… rss_gb=… delta_gb=±… wall_s=… format mirroring the POLAR_HIGH_WRITE_MPS_PROFILE precedent. Zero overhead when unset.

  • POLAR_HIGH_DISABLE_PRUNE_DOWN=1 — safety fallback env var. When set, every multi-atomic Param chain in RHS / LHS handling falls through to the merged-lazy semi-join path. Use as an opt-out if a future model surfaces an unexpected numerical drift on the prune-down path.

  • Param._value_scalar and _Term.coef_scalar slots — accumulate scalar folds (Param * float, Var * float, Expr.__neg__, Expr.__sub__, etc.) so the prune-down rebuild can seed its accumulator with the correct multiplicative constant. Without this tracking the rebuild would silently drop scalar factors that the merged-lazy path carries in the value / coef column. Internal — no public API change.

  • Per-family and per-term checkpoints inside _build_canonical_matrix (gated by POLAR_HIGH_WRITE_MPS_PROFILE=1). New labels: family_rhs_evaluated, family_rhs_l2baked, family_senses_built, family_rownames_built, family_term_plans_built, family_term_collect_start / family_term_collected (per LHS term), family_rhs_pruned_down (new prune-down path), family_lhs_scattered. Each emits family= and family_idx= extras so per-family slicing is trivial.

  • Reference tests:

  • tests/test_canonicalise_param_chain_prune.py (3 tests) — RHS prune-down parity vs merged-lazy path on synthetic 3-Param chain with disjoint-but-shared dims; covers Param.__truediv__.
  • tests/test_lhs_param_chain_prune.py (3 tests) — LHS prune-down parity at all three call sites (_build_canonical_matrix, _solve_streaming, WarmProblem._initial_build).
  • tests/test_prune_down_scalar_anonymous_fix.py (6 tests) — anonymous-Param-in-chain handling, scalar-fold tracking, sign propagation through Expr.__neg__ / __sub__, and the disable env-var fallback.
  • tests/test_lp_view.py test for to_csr post-vectorisation (zero-copy CSR round-trip parity).

Changed

  • _build_canonical_matrix RHS handling (engine.py): when rhs._sources is a chain of length ≥ 2 and the composite has dim columns, walk the atomics one at a time. Each atomic is semi-joined against the running accumulator's key projection (semi-join order: acc keys → atomic, NOT the other way around — atomic frame is the pre-pruned side). Final accumulator collects via the existing streaming fallback chain. Single-Param / scalar / Var-or-Expr-on-RHS branches unchanged.

  • _build_lhs_pruned_plan (new helper) + three LHS call sites (_build_canonical_matrix L1664-1692, _solve_streaming L2738-2763, WarmProblem._initial_build L3969-4002): when term.param_sources has length ≥ 2 AND term.var_source is set (i.e. the term has a direct Var anchor — not wrapped in Sum / Where / Lag which clear var_source to preserve safety), rebuild the LHS plan as row_index ⋈ pruned_var ⋈ pruned_param_1 ⋈ pruned_param_2 … with each factor pre-pruned via semi-join. Sum / Where / Lag wrapped terms fall back to the original path.

  • _lp_view.to_csr row index construction: replaced a Python for c in range(n_cols): col_of[a_start[c]:a_start[c+1]] = c loop with np.repeat(np.arange(n_cols), np.diff(a_start)). Output identical (verified in test_lp_view.py); about 100× faster on a sparse 5 M-row LP.

  • _build_canonical_matrix variable loop: merged the two consecutive loops over self._vars.values() (col bounds / integrality and col_names construction) into a single pass. Each v.frame["col_id"].to_numpy() materialises once instead of twice.

  • ~32 .astype(np.int64) / .astype(np.float64) call sites on freshly-allocated numpy arrays (from .to_numpy(), np.where, np.repeat, np.tile, np.concatenate) gained copy=False. Affected: _build_canonical_matrix per-family scatter (dim and scalar branches), global / family dedup, objective-term collect, HiGHS bound translations in _build_lp_arrays / _solve_streaming / _initial_build, RHS np.where row_lb/row_ub translations, tracked-source scatter in WarmProblem, and _lp_view.from_problem bound round-trip.

Fixed

  • Anonymous Param instances (no name, no _sources) were silently dropped from _merge_param_sources output when participating in a chain with named Params. The prune-down rebuild then walked only the named atomics, missing the anonymous one's contribution. Fixed: _sources_for_propagation now returns [(self, +1)] for anonymous atomics so the chain rebuild walks every constituent.

  • Scalar folds (Param * float, Var * float, Expr * float, Expr.__neg__, Expr.__sub__) collapsed constants into the value / coef column without recording them in _sources / param_sources. The prune-down rebuild had no way to see the scalars and produced numerically-different LP coefficients (DES scenario parity tests caught this as a ~2 % objective drift on test_fullYear_roll_matches_v3320_golden). Fixed via the new Param._value_scalar / _Term.coef_scalar slots; affected algebra ops propagate the scalar through to the prune-down accumulator.

Performance

  • Canonicalise of FlexTool DES (RETO-Africa) on a 64 GB box:
  • Before: stalled inside profile_flow_upper_limit RHS evaluation after ~10 seconds, RSS climbing from 12.7 GB toward a peak of ~38 GB that exceeded available memory in some configurations.
  • After: all 9 constraint families canonicalise in 27 seconds total; _initial_build exits at 49 seconds; peak RSS during the in-process HiGHS solve path is 23.3 GB. With --save-memory (subprocess HiGHS), peak is 15.2 GB.

  • Wall-time impact of the perf quick-wins (copy=False, vectorised to_csr, merged var-loop): 5-15 % wall-time win on the canonicalise + HiGHS-handoff portion of large LPs. Memory peak win is small (5-10 %) — these are a separate stack from the prune-down fix and apply per-cell rather than per-chain.

Notes

  • The fix is behaviour-preserving on every currently-tested scenario. If a future model surfaces a numerical drift, set POLAR_HIGH_DISABLE_PRUNE_DOWN=1 as a workaround and report the scenario so the engine can be fixed.
  • The LHS prune-down activates only on terms with a direct Var anchor (not wrapped in Sum / Where / Lag). Sum-wrapped LHS chains fall back to the merged-lazy path; this is intentional for safety. See specs/block_coo_evaluation_handoff.md for the planned follow-on that handles those cases via a different mechanism.
  • See specs/where_pushdown_handoff.md for the next architectural step (push Where(...) filter keys through the lazy plan tree the same way row_index keys are pushed today).

[2.2.0] — 2026-05-28

GLPK-style refactor of the Layer 2 scaling pipeline and the matrix emit path. Two architectural changes that, together, eliminate the "every consumer rebuilds the matrix from scratch" pattern that the v2.1.x line was tactically patching.

Added

  • Problem.canonicalise() — lazily builds and caches a single canonical CSC representation of the LP on Problem._matrix (a new _CanonicalMatrix slot-dataclass carrying col_ptr / row_idx / val plus per-row lb/ub/sense and per-col obj/lb/ub/integrality and names). Idempotent — repeat calls return the cached matrix unless _canonical_dirty is set. add_var / add_cstr set the dirty flag; the cached matrix is released by _release_python_lp_inputs and write_mps(release=True).

  • Problem._layer2_col_factor / Problem._layer2_row_factor — numpy side vectors written by flextool's apply_layer2. The col-factor vector stores 1 / cf_math (inverse); the row-factor vector stores rf_math (forward). At canonicalise time the vectors are baked into _matrix.val / _matrix.col_obj / _matrix.row_lb / _matrix.row_ub so consumers read pre-scaled values directly. Problem._layer2_locked prevents post-Layer-2 add_var / add_cstr that would invalidate the side-vector sizing.

  • Regression tests tests/test_layer2_side_vector_emit.py and tests/autoscale/test_ranges.py::test_ranges_via_streaming_honors_side_vectors — exercise every emit-site branch with explicit fake side vectors so missed multiply sites or indexing offsets fail fast without depending on flextool's bit-for-bit integration test.

Changed

  • Problem.write_mps (Stage B1) — now calls canonicalise() and walks _matrix.col_ptr column-by-column. The previous per-family triple-list build, group_by dedup, concat, and global sort are consolidated into _build_canonical_matrix (runs once per state-version, shared across all consumers). Cross-consumer workflows (write_mps then solve, or any combination of the four canonical-consuming sinks) now family-walk exactly once.

  • Problem._build_lp_arrays (Stage B2) — reduced to ~30 LoC. Reads from _matrix and applies the ±inf → kHighsInf substitution. Per-family LHS walk, per-family RHS, Stage A multiply-at-emit, global dedup + sort all moved into _build_canonical_matrix. Back-compat shim parameters (n_cols, col_lb, col_ub) removed from the signature; the two callers (LpView.from_problem, _ranges_via_passmodel) updated.

  • WarmProblem._initial_build (Stage B3) — bulk LP build (LHS / RHS / obj / bounds) now reads from _matrix. Tracked- source bookkeeping for WarmProblem.update_param keeps a small separate walk over _cstrs filtered to terms with param_sources set; these terms re-collect and apply the same Stage A multiply-at-emit so the cached _param_cells factors remain the SCALED coef (matches the pre-refactor formula that update_param relies on). Skipped entirely when self._mutable_params is empty.

  • LpView.from_problem — reads m.col_obj from the canonical matrix instead of walking _obj_terms and applying its own Stage A multiply-at-emit. After this commit, the ONLY remaining Stage A multiply-at-emit consumer is Problem._solve_streaming — intentional, its per-family CSR memory bound exists specifically to avoid full-matrix materialisation.

  • polar_high.autoscale._ranges._ranges_via_streaming honours the side vectors. When called post-Layer-2 it multiplies per-term abs(coef) by |row_factor| * |col_factor| after the polars collect (numpy, in place, no lazy plan modification) so the readout sees the same effective magnitudes the consumers will emit. No-op when the side vectors are None (the pre-Layer-2 readout pattern is unchanged).

GLPK-likeness scorecard

Property GLPK Pre-v2.2.0 Post-v2.2.0
Matrix is canonical (one copy) ✗ (per-family lazy + 2-3 transient copies during emit)
Scaling lives separately from coefs ✗ (rewrote lazy plans via flextool's _layer2)
Objective excluded from row scaling n/a (no row scaling)
Build/scale is O(m+n+nnz) ✗ (transient triple copies + dedup hash + global sort coexisted) Mostly ✓ (canonicalise still has transient peak ~3× nnz; consumers no longer rewalk)

The remaining "Mostly" on build/scale is the transient peak during _build_canonical_matrix itself: per-family triples → global concat → polars group_by dedup → sort → final CSC. A per-family streaming canonicalisation (consumers process families one at a time, dedup per-family, merge sorted CSC chunks) would close this gap by exploiting the disjoint-row-range property each family already has. Future work.

[2.1.3] — 2026-05-27

Fixed

  • Problem._build_lp_arrays, Problem._solve_streaming, and WarmProblem._initial_build now use the same semi-join + per-term streaming-collect pattern that v2.1.2 added to Problem.write_mps. Plain-inner-join + pl.collect_all previously materialised deep multi-Param LHS chains and multiplied the peak via parallel collect. Per-term peak is now bounded by row_count × cols-per-row instead of the upstream Param-product cardinality.

  • polar_high.autoscale.detect_ranges on a pre-solve Problem now bypasses Problem._build_lp_arrays entirely. New _ranges_via_streaming walks objective + constraint terms one at a time with a semi-join + per-row abs(coef) collect (the shape write_mps proves on the same chains) and numpy-reduces to min/max. Avoids materialising the full COO triple list + global dedup that the legacy _ranges_via_passmodel ran for the same readout. Legacy code stays for back-compat but is no longer reached from production callers.

Added

  • POLAR_HIGH_RANGES_MAX_FAMILY_ROWS env var (default 1000000, 0 to disable) — skips constraint families above the threshold in _ranges_via_streaming's RHS + matrix readout. Background: polars' streaming engine intermittently fails to push the row-key semi-join into deep multi-Param product chains on very large families, so a single term collect can allocate >30 GB before failing on workloads like FlexTool's DES LP (profile_flow_upper_limit at 1.5 M rows × multi-Param rhs). Skipping means the range report rides on the families it could read.

  • POLAR_HIGH_BUILD_LP_PROFILE=1 and POLAR_HIGH_RANGES_PROFILE=1 diagnostic env vars — per-family / per-phase psutil RSS deltas to stderr from Problem._build_lp_arrays and from the autoscale range detectors respectively. Zero overhead when unset.

  • Regression test test_detect_ranges_param_chain_does_not_explode — synthetic 200k-row Var × 3-Param chain. Pre-fix peak 515 MB on this shape; post-fix under 300 MB (test asserts <300 MB).

[2.1.2] — 2026-05-27

Fixed

  • Problem.write_mps per-term collect on LHS Param-chain terms (e.g. Var * Param₁ * Param₂ * ...) used to materialise the join chain's wide intermediate before the final row alignment. On a 9.9 M-row LP a single such family allocated +26 GB during one term.lazy.collect(), pushing write_mps peak to ~43 GB despite the spec target of 2-3 GB. Retrofitted the same anti-explosion pattern the RHS path has used since v2.0.0 (_align_enum_join_keys → semi-join against the row-index key set → collect(engine="streaming") with streaming=True and plain-.collect() fallbacks). Synthetic 100 k-row test: 527 MB peak → 178 MB after the fix. Coefficients byte-identical with the pre-fix path; HiGHS objective unchanged.

[2.1.1] — 2026-05-27

Added

  • docs/guide/performance.md — new section "Writing MPS without HiGHS" covering Problem.write_mps: API, the ~20× peak-memory advantage over Highs.writeModel, release=True semantics, cross-solver roundtrip coverage, and the POLAR_HIGH_WRITE_MPS_PROFILE=1 diagnostic env var. Cross-linked from docs/guide/scaling.md and docs/guide/solvers.md.

Fixed

  • Replaced a stale reference to a fictional Problem.solve(write_mps=...) kwarg in docs/guide/solvers.md with the real Problem.write_mps link.
  • Ruff lint warnings in tests/_bench_write_mps_parallel.py (intentional late polars import) and in the RHS section of Problem.write_mps (unused tuple element renamed _row_count).

[2.1.0] — 2026-05-27

Added

  • Problem.write_mps(path, *, free_format=True, column_order_strict=True, emit_names=True, release=False, name="POLAR_HIGH") — direct polars→MPS writer that bypasses highspy.Highs.writeModel. Mirrors the per-family streaming pattern from _solve_streaming, performs one streaming sort by (col_id, row_id), and chunked-streams the COLUMNS section with INTORG/INTEND integer markers. Target peak is ~2–3 GB on a 10 M-row / 5 M-col / 20 M-nz LP — about 20× lower than Highs.writeModel's transient. release=True reuses the same _release_python_lp_inputs teardown as solve(save_memory=True) so callers driving an out-of-process solver can drop the polar-side LP source immediately after the write.
  • POLAR_HIGH_WRITE_MPS_PROFILE=1 env var — when set, Problem.write_mps emits per-phase and per-constraint-family psutil RSS deltas to stderr for diagnosing memory hot spots. Zero overhead when unset (no psutil import, no closure call sites entered).
  • tests/_bench_write_mps_parallel.py — synthetic-LP bench for write_mps peak memory across single-family / multi-family topologies and polars thread counts.

Changed

  • Wrapper-driven MPS roundtrip harness (tests/test_mps_fallback_wrapper.py) is now parametrized over both the legacy LpView-based writer and the new Problem.write_mps direct writer, so the HiGHS / Gurobi / CPLEX / Xpress readback tests exercise both code paths.

[2.0.2] — 2026-05-26

Added

  • docs/guide/scaling.md — user-facing guide for the polar_high.autoscale package: when to use it, the typical detect_ranges → recommend_scaling → apply_scaling pattern, ScalingMode / ScalingConfig knobs, the precedence rules, the min-floor guard + geometric-centring escape branch, and migration from the retired auto_user_bound_scale=True flag. Wired into mkdocs.yml's Guide section between Solvers and Warm-starting.

Changed

  • Stripped proper-name callouts of specific caller-side LPs from source comments and CHANGELOG entries. Replaced with generic scenario descriptions ("a full-year LP with RHS=(1.84e-3, 2.02e+8)" etc.) so the technical narrative survives without leaking caller-side LP names.

[2.0.1] — 2026-05-26

Fixed

  • CI ruff check and ruff format --check failures inherited from the v2.0.0 commits. Sorted / removed unused imports, ran ruff format, and migrated ScalingMode(str, Enum)ScalingMode( enum.StrEnum) to clear the UP042 hint. Behaviour difference: str(ScalingMode.OFF) now returns "off" instead of "ScalingMode.OFF"; no code in src/ or tests/ stringifies the enum, so this is invisible at the API boundary.

Changed

  • Cross-solver MPS-fallback tests now have sharpened skip strings that distinguish "wrapper-installed-but-CLI-binary-missing" from "solver wholly absent", and point at the new wrapper-driven test file for parallel coverage when only the Python wrapper is present.

Added

  • tests/test_mps_fallback_wrapper.py: for each commercial solver whose Python wrapper is installed (Gurobi, CPLEX, Xpress), writes the polar-high MPS file, reads it back into the wrapper, solves, and asserts the objective matches a direct in-memory HiGHS solve. Catches MPS-format issues end-to-end without needing the standalone CLI binary. COPT is intentionally out of scope here due to the in-process COPT/HiGHS native-symbol conflict documented in solvers/_copt.py.

[2.0.0] — 2026-05-26

Headline: much-improved automatic LP scaling via a new polar_high.autoscale package. The previous one-shot auto_user_bound_scale=True constructor flag is retired and replaced by a richer caller-driven API that detects bound / cost / RHS / matrix ranges and recommends user_bound_scale and user_objective_scale exponents independently. The new path also adds a min-floor guard that catches a class of false-infeasibility results HiGHS' own suggestScaling can produce on wide-spread LPs. See the Scaling guide for the full caller story.

Added — autoscale

  • polar_high.autoscale package with three pieces:
  • detect_ranges(problem_or_solution, config) returns a RangeReport with the four (abs_min, abs_max) tuples (matrix, cost, col_bound, row_bound) plus per-category samples of smallest / largest contributors, usable on a built Problem or on a returned Solution (re-uses Solution.streamed_lp_ranges when available).
  • recommend_scaling(ranges, config) returns a Layer3Plan with user_bound_scale and user_objective_scale integer exponents, derived from HiGHS' own suggestScaling formula. Preserves the geometric-centering escape branch for severe asymmetric-bound LPs, now guarded by a min-floor check (see Fixed).
  • ScalingMode enum (OFF / SOLVER_ONLY / BASIC / FULL) with helper predicates so library callers can decide policy per mode rather than per-call kwarg.
  • Precedence check: an axis whose user_*_scale is already set by the caller (via set_solver_options or per-call options=) is skipped by recommend_scaling. The caller's explicit value always wins.
  • Problem.set_solver_option(name, value) and Problem.get_solver_option(name) accessors as the clean surface the precedence check reads from.

Removed (breaking)

  • Problem(auto_user_bound_scale: bool = ...) constructor option. The flag's one-shot, col-bound-only heuristic is superseded by autoscale.recommend_scaling(), which considers all four ranges independently and is configurable per-mode. Callers should:
  • Build the Problem as before.
  • Call detect_ranges(p, config) and then recommend_scaling( ranges, config) for the chosen ScalingMode.
  • Apply the returned Layer3Plan via Problem.set_solver_option. See the autoscale package docstring for the migration pattern.
  • The internal _recommend_user_bound_scale helper that backed the retired flag.

Fixed

  • False-infeasibility from over-aggressive scaling. When HiGHS' own suggestScaling looks only at the max of (bound_max, rhs_max), it can pick a user_bound_scale exponent that crushes the min below kExcessivelySmallBoundValue (1e-4). HiGHS' presolve then mis-handles the near-zero rows and the LP comes back infeasible. Observed on a full-year LP with RHS=(1.84e-3, 2.02e+8): the formula picked N=-8 → scaled RHS min 7.2e-6 → spurious infeasibility. The new recommend_scaling adds a min-floor guard: when the proposed delta would drag the scaled min below the threshold, the current scale is returned unchanged.
  • Duplicate-key rhs Param fan-out. The left-join from row_index against an upstream Param with duplicate (on=) keys used to surface as an opaque ValueError: operands could not be broadcast together with shapes (X,) (Y,) deep inside the solver adapter. _build_lp_arrays (and the chunked / WarmProblem variants) now raise immediately at the join boundary, naming the offending constraint plus a sample of the duplicate keys.
  • --highs-threads N>1 silently ignored. HiGHS' setOptionValue("threads", N) is a no-op once the global Rayon scheduler has been initialised (which happens at default threads=16). We now call Highs.resetGlobalScheduler(False) before applying the user's options so the requested thread count actually takes effect.

Notes

  • The 1.5.x releases (sidecar RSS sampler, save_memory=True one-shot mode, chunked LP-range accumulator) are subsumed under 2.0.0; their entries remain below as the detailed history.

[1.5.1] — 2026-05-24

Changed

  • docs/compare/benchmark.md: trim the "how this differs from earlier versions" methodology paragraph (covered in the 1.5.0 changelog entry) and minor wording cleanup.

[1.5.0] — 2026-05-24

Added

  • Problem.solve(save_memory: bool = False) opt-in one-shot mode for benchmark-style single solves. When True, polar-high drops its Python-side LP source-of-truth (term lazy plans, Param frames, caller-side column-bound / cost arrays, and the col_names / row_names lists) once HiGHS has copied them, and then writes the model to a temp MPS file, clears the original Highs instance, calls malloc_trim(0) to return glibc arenas to the OS, and creates a fresh Highs that reads the model back before h.run(). The disk roundtrip resets HiGHS' incremental-addRows allocator slack — at N=3000 dense full-solve it drops peak RSS from ~38 GB to ~28 GB at the cost of ~+90 s wall time (the MPS write + read). A subsequent Problem.solve() on a Problem that has been released raises a clear RuntimeError; WarmProblem-style incremental updates and re-solves are unavailable after save_memory=True. Cold-start rolling-horizon loops that rebuild the Problem from scratch each iteration are unaffected and benefit from the per-iteration memory drop. Default False preserves the warm-restart-capable behaviour.

Changed

  • _running_finite_nonzero_min_max (used by the streaming LP-range accumulator) now scans in chunks of 1 M float64s instead of materialising np.abs(arr[finite]) for the whole array. On a 36 M-nonzero constraint family that cuts the transient temp allocation from ~576 MB to ~16 MB. Functionally identical output.
  • _solve_streaming no longer concatenates col_lb_h with col_ub_h (or row_lb with row_ub per family) for range accumulation — each is scanned in place. Eliminates a 2·n_cols (and 2·n_rows-per-family) transient copy each.
  • _solve_streaming drops col_lb_h / col_ub_h / col_obj_h immediately after the LP-range accumulation completes — HiGHS has its own internal copies from addCols, so the originals are not needed through the family loop and h.run(). ~432 MB at N=3000 dense.
  • Column-array construction (col_lb / col_ub / col_obj / col_int / col_names) moved from Problem.solve() into _solve_streaming so the caller's frame doesn't pin those arrays through the entire family loop and h.run() call. Combined with the drop above, this removes ~864 MB of caller-side residue at N=3000 dense.
  • benchmark/run_one.py now starts a sidecar thread that samples VmRSS from /proc/self/status at 25 ms cadence while solve() runs, and calls malloc_trim(0) after gc.collect() at the post-build and post-solve checkpoints. New CSV columns: rss_after_build_trim_mb, rss_after_solve_trim_mb, rss_solve_min_mb, rss_solve_p50_mb, rss_solve_p95_mb, rss_solve_max_mb, n_samples. The peak_rss_mb column stays as before (ru_maxrss, the unavoidable high-water mark including transient HiGHS-setup scratch). Old 10-field CSV rows still parse through plot.py.
  • docs/compare/benchmark.md rewritten: new memory-measurement methodology section explains peak_rss_mb vs rss_solve_p50_mb vs rss_after_solve_trim_mb; new section on the regular vs save_memory modes with a side-by-side polar-high comparison; headline tables updated to use save_memory=True for the cross-tool comparison (matches linopy's io_api="lp" file-handoff pattern). Threading-benefit numbers updated — speedup at N=10 000 is now 1.18× rather than 1.33× because the MPS roundtrip is a serial step that doesn't scale with thread count.

[1.4.0] — 2026-05-22

Removed

  • Problem.peek_lp_ranges(). The method rebuilt the full LP into numpy arrays via the non-streaming path purely to extract coefficient ranges — duplicate work the streaming solve already does. The same four (abs_min, abs_max) tuples (matrix, cost, col_bound, row_bound) are now populated automatically on every solve() and exposed as Solution.streamed_lp_ranges. Callers that needed range inspection should read from the Solution instead; there is no more pre-solve range-inspection API.

Added

  • Problem(auto_user_bound_scale: bool = False) constructor option. When True, the streaming solve accumulates LP coefficient ranges during the family loop (at no extra allocation cost — it walks the per-family arrays we already build) and applies a user_bound_scale recommendation via setOptionValue before Highs.run(), but only when the caller has not already set user_bound_scale via the options dict / set_solver_options. The embedded heuristic _recommend_user_bound_scale(bound_range, rhs_range) is a direct port of HiGHS' own suggestScaling lambda at HighsSolve.cpp:570-607: it pulls max(bound_max, rhs_max) into HiGHS' [kExcessivelySmallBoundValue, kExcessivelyLargeBoundValue] = [1e-4, 1e+6] comfort zone using outer-rounded log2, and reproduces the integer HiGHS prints in its "Consider setting the user_bound_scale option to <N>" recommendation byte-for-byte.
  • Solution.streamed_lp_ranges: dict | None field. Populated by every solve that flows through _solve_streaming (which is the default path) with the four (abs_min, abs_max) | None range tuples. None on solves that don't go through streaming (e.g. the non-streaming solve(streaming=False) path).

Changed

  • _solve_streaming now performs running min/max accumulation over col_obj_h, col_lb_h/col_ub_h, and per-family val64 / row_lb / row_ub numpy arrays. Cost is a handful of O(n) scans with no new allocations. Used to drive auto_user_bound_scale and exposed on Solution.streamed_lp_ranges.
  • When auto_user_bound_scale=True, the decision is now reported on stdout so the run log shows what scaling (if any) was applied — one of: applying user_bound_scale=N (bound …, rhs …; HiGHS' own kExcessively[Small|Large]BoundValue formula), no scaling -- max(bound, rhs) already within HiGHS' [1e-4, 1e+6] comfort zone (bound …, rhs …), no scaling -- no finite bound or RHS entries to evaluate, or caller override in place (user_bound_scale=N).

[1.3.0] — 2026-05-22

Added

  • Generic Enum-dtype alignment on every internal join site. When two frames are joined on a column that is pl.Enum on both sides but with different categorical vocabularies, polar-high now up-casts the narrower side to the wider Enum (provided one's categories are a subset of the other's). Enum-vs-pl.Utf8 mismatches are resolved by casting the string side to the Enum dtype. Two Enums with neither-subset vocabs raise a clear ValueError pointing the caller to cast to pl.Utf8 or build a union Enum. The behaviour is exposed as the internal helper polar_high.engine._align_enum_join_keys and exercised by every internal .join call site (operator joins, Where, Sum, Lag, constraint-emission, WarmProblem updates).
  • tests/test_enum_dtype_align.py: unit + end-to-end coverage of the new alignment behaviour, including the disjoint-vocab raise path and an end-to-end Problem.add_cstr / solve with a narrower-vocab rhs Param.

Changed

  • README "Enum dtype handling" subsection documenting the subset-up-cast rule and the raise-for-no-subset behaviour. No DSL surface change — existing models keep building unchanged; mixed-vocab models that previously needed per-site casts in caller code no longer do.
  • engine.py: when a constraint's rhs is a Param (or a chain of Param * Param * ...), pre-filter the rhs lazy plan with a semi-join against row_index's join keys and collect via the streaming engine before the left-join into the constraint frame. Polars' optimiser doesn't always propagate the implicit row-set restriction through a multi-way Param product, so the intermediate buffers could blow up by orders of magnitude relative to the final row count. On FlexTool's South Africa 1-week PES-Hydro-dispatch case (a p_profile_value * p_process_existing_count * p_process_availability product), solver-finished ΔRSS drops from +28.77 GB to +9.40 GB (-67%) and the section runtime drops from 57.7 s to 17.5 s. Objective and total cost match the baseline byte-for-byte. Applied to all three rhs-Param call sites: the non-streaming Problem.add_cstr path, _solve_streaming, and WarmProblem.solve. Falls back to collect(streaming=True) on polars < 1.x.
  • README: quickstart code is now inlined (GitHub/PyPI don't render pymdownx.snippets includes); the cross-product index is split into reusable unit_index / time_index sets; cap is built per-unit then concatenated; v_idx renamed to composite_index (the v_ prefix is reserved for variables); _idx_index throughout.
  • Problem.add_cstr arg order in README and quickstart fixture reordered to lhs_terms before sense — reads more naturally as lhs sense rhs. No API change (these are keyword args).

Removed

  • Breaking: Problem.peek_lp_ranges() removed. The method rebuilt the full LP into numpy arrays via the non-streaming path purely to extract coefficient ranges — duplicate work the streaming solve already does. Stream-time range accumulation now populates Solution.streamed_lp_ranges with the same four (abs_min, abs_max) tuples (matrix, cost, col_bound, row_bound) at zero extra cost on every solve() that goes through _solve_streaming (the default). Callers that previously relied on peek_lp_ranges() for diagnostics should read sol.streamed_lp_ranges after solve() returns; the module helper polar_high.engine._recommend_user_bound_scale consumes the (lo, hi) of the col_bound entry for the geo-midpoint heuristic. The top_k > 0 per-coefficient name-lookup variant of peek_lp_ranges has no streaming-time replacement; if needed, build the LP via the non-streaming path (solve(streaming=False)) and inspect via the solver-specific HiGHS diagnostics.

[1.2.0] — 2026-05-12

Added

  • polar_high.solvers module: multi-solver dispatch behind a single solve(problem, solver_name=..., io_api=..., env=..., **options) entry point. HiGHS remains the default; Gurobi, CPLEX, FICO Xpress, and COPT are supported on a bring-your-own-license basis (we ship no binaries and no licenses).
  • polar_high.solvers.available_solvers: runtime registry of installed solver Python wrappers, populated at import time. Tells you which wrappers are installed; license checks fire inside the adapter.
  • IOMode.MPS file-based fallback for users with a solver's CLI binary on PATH but no matching Python wrapper. Writes a temp MPS via highspy, invokes the CLI, parses the resulting .sol file. Covers gurobi_cl, cplex, Xpress optimizer, and copt_cmd.
  • polar_high.solvers._lp_view.LpView: frozen, solver-agnostic extraction surface that every adapter consumes. Engine-private attribute access (Problem._build_lp_arrays etc.) is confined to this single module.
  • Optional install extras: polar-high[gurobi], polar-high[cplex], polar-high[xpress], polar-high[copt]. Each pulls only the vendor's Python wrapper (plus scipy where vectorized loads need it).
  • docs/guide/solvers.md: user-facing guide covering detection, per-solver install, the io_api='mps' escape hatch, the env= passthrough (Gurobi WLS example), and license troubleshooting.

Changed

  • Problem.solve(streaming=False) now routes through polar_high.solvers._highs.run. Behaviour and return type unchanged — streaming=True retains the existing HiGHS-only per-family addRows path.
  • COPT adapter auto-routes through the copt_cmd CLI fallback whenever highspy is already loaded in the interpreter. COPT 8.x's native core conflicts with HiGHS in-process (Highs.run() segfaults once coptpy is imported); the auto-route keeps both solvers usable from the same polar-high venv at the cost of a per-solve MPS write + subprocess invocation. Requires copt_cmd on PATH (not shipped by the coptpy pip wheel); a clean SolverNotAvailableError is raised when it is missing. Details in docs/guide/solvers.md.

[1.1.4] — 2026-05-11

Added

  • Problem.peek_lp_ranges(): build the LP into numpy arrays and return the abs-value ranges of finite non-zero entries on each axis (matrix, cost, bounds, rhs) — same numbers HiGHS prints in its "Coefficient ranges" diagnostic, but available before passModel() runs. Optional top_k returns the worst offenders per axis as (abs_value, col_name, row_name_or_side) triples. Lets callers pick user_bound_scale / user_cost_scale or refuse to solve a catastrophically scaled LP without paying for a full solve. Uses np.argpartition so the cost is O(n_nonzeros).
  • .github/dependabot.yml: weekly dependency PRs for GitHub Actions and Python (pip) ecosystems. The initial commit (c3836f5) was the GitHub-provided template with an empty package-ecosystem; this release fills it in so the bot actually opens PRs.

Changed

  • engine.py: factor the non-streaming LP-build out of solve() into a private _build_lp_arrays() helper. solve() and peek_lp_ranges() now share the same arrays — diagnostics are byte-for-byte what HiGHS sees.
  • engine.py: for constraint families with > 50 000 rows, collect term plans one at a time instead of pl.collect_all. Peak memory drops from O(n_terms × frame) to O(frame), preventing stalls under memory pressure on large network models.
  • engine.py: HiGHS no longer suppressed via h.silent() — solver progress and the "Coefficient ranges" line now print to stdout by default. Pass options={"output_flag": False} to silence.

[1.1.3] — 2026-05-07

Changed

  • docs/guide/debugging.md: expanded with worked examples; doc snippets are now wired to test fixtures (tests/fixtures/debug_example.py, tests/fixtures/lagrangian_example.py, tests/fixtures/quickstart_example.py) so they're exercised by the test suite and can't silently rot.
  • mkdocs.yml: drop dedent_sections from the snippets pymdownx config — incompatible with the multi-fixture snippet layout.

[1.1.2] — 2026-05-05

Added

  • docs/guide/loading-data.md: new guide page on going from CSV / parquet / database tables to Param and Var, including the long-format vs. wide-format trade-off and how column names become dimension names.

Changed

  • docs.yml: drop the dev alias deploy on main pushes; only tagged releases publish a versioned doc site.

[1.1.1] — 2026-05-05

Fixed

  • pyproject.toml: add Python 3.13 classifier. CI's test matrix already covers 3.13; the classifier was missing so the pyversions badge was reading "3.11 | 3.12" only.
  • release.yml: skip-existing: true on the PyPI publish step. Re-tagging the same version now no-ops on PyPI's duplicate-file rejection instead of showing the run as failed.

[1.1.0] — 2026-05-05

Changed

  • BREAKING: renamed package polar-high-optpolar-high (Python module polar_high_optpolar_high). All imports, PyPI install name, repo and docs URLs move with it.
  • BREAKING: Problem.solve() defaults changed: streaming=True (per-family addRows instead of one big passModel; lower peak memory; numerically identical) and keep_solver=False (the live highspy.Highs is dropped after primal/dual extraction; pass keep_solver=True to retain it for post-solve inspection like sol.highs.writeModel(...)).
  • BREAKING: polar_high sets POLARS_MAX_THREADS=1 at import. Rayon coordination overhead exceeds the parallel speedup on typical LP-build workloads (see benchmark page). Override by setting the env var before import polar_high.
  • COO row/column indices use int32 when nnz < 2^31, falling back to int64 only when needed. Cuts working-set memory in the matrix-assembly phase.
  • _Term.frame cache is no longer populated during Problem.solve() — the lazy plan is collected into a local that goes out of scope per family. Re-solves rebuild from the lazy plan as before.

Added

  • Benchmark suite under benchmark/: dense N×N LP (replicates linopy's benchmark) and a sparse network-flow LP with irregular edge→node topology. Reproducible via subprocess-isolated cells in benchmark/run.py; figures rendered by benchmark/plot.py.
  • New docs/compare/benchmark.md with five figures and the story for each (build-only headline, threads scaling at fixed N, threading benefit on the network LP, network LP, linopy-format replication).
  • Threading section in docs/guide/performance.md documenting the default-1 choice and the override pattern.
  • Tiny dispatch LP (wind + coal × 3 hours) replaces the abstract i / j placeholder in README and docs/quickstart.md.

[1.0.1] — 2026-05-05

Added

  • GitHub Actions: tests on push/PR (Python 3.11–3.13), docs deploy on main + tag (mike), PyPI release on tag (trusted publishing).
  • Ruff lint + format configured in pyproject.toml; [lint] optional-dependency added.
  • README badges: PyPI version, Python versions, license, tests CI, docs CI, ruff.

Changed

  • Repo / docs URLs moved from jkiviluo/polar-high to nodal-tools/polar-high; documentation site is hosted at https://nodal-tools.fi/polar-high/.
  • One-time ruff format reflow across the source tree.

Fixed

  • Dead intra-doc anchor link in guide/performance.md (the vars-and-params.md "Param × Param" heading slugifies to a single hyphen, not two).

Removed

  • Two dangling unused locals (engine.py and test_warm_problem.py).

[1.0.0] — 2026-05-05

First public release.

Added

  • Var, Param, Expr — building blocks for indexed expressions expressed as polars DataFrames.
  • Sum, Where, Lag — aggregation, filtering, and time-shift primitives that compile to LP rows efficiently.
  • Problem — assemble an LP/MIP and solve via HiGHS (highspy).
  • WarmProblem — re-solve with parameter / RHS / objective updates while preserving the basis.
  • LagrangianProblem — generic dual-subgradient driver for Lagrangian decomposition of coupled subproblems.
  • Solution — primal values, constraint duals, reduced costs, and a live highspy.Highs handle for advanced post-solve inspection.
  • MkDocs + mike documentation site under docs/.