In Causal Impact analysis (specifically Time-Based Regression or TBR), the hardest part isn’t running the regression—it’s designing the experiment. Finding the perfect combination of Treatment locations and Control locations is a massive combinatorial optimization problem.
In this post, I will break down the engineering behind how the TBR Matched Markets library automates this search to find the most statistically robust experimental design.
0. Data Preparation: Cleaning & Feasibility
- keep only the most recent n_pretest_max data points for the experiments.
- calculate the impact for each geos. ()
1
2
3
4
5
6
7
8
9
10
11
12
13
def estimate_required_impact(y):
return TBRMMDiagnostics(y,
parameters).estimate_required_impact(
parameters.rho_max)
# Consider only the most recent n_pretest_max time points
data.df = data.df.iloc[:, -parameters.n_pretest_max:]
# Calculate the required impact estimates for each geo.
geo_req_impact = data.df.apply(estimate_required_impact, axis=1)
self.geo_req_impact = geo_req_impact
self.data = data
self.parameters = parameters
If a group of cities is already over budget, any larger group containing them will also be over budget. So, skip them
1. The Scoring Function: “Safety First”
How does the algorithm decide if Design A is better than Design B? It uses a Lexicographical Scoring system embodied in the TBRMMScore class.
Instead of a single number, the score is a tuple. Python compares tuples element-by-element, meaning the first element is the most important “gatekeeper.”
1
2
3
4
5
6
7
8
Scoring(
corr_test, # Priority 1: Correlation Test (0 or 1). Is the relationship real?
aa_test, # Priority 2: A/A Test (0 or 1). Does the model hallucinate lift on fake data?
bb_test, # Priority 3: Brownian Bridge (0 or 1). Is the model stable over time?
dw_test, # Priority 4: Durbin-Watson (0 or 1). Are the residuals random (no autocorrelation)?
corr, # Priority 5: Correlation (0.0 to 1.0). How tight is the fit?
inv_imp # Priority 6: Sensitivity (Higher is better). Can we detect small lifts?
)
The Logic: A design with a 0.99 correlation that fails the A/A Test (Priority 2) will always lose to a design with 0.80 correlation that passes all diagnostic tests. This ensures we never trade statistical validity for a pretty chart.
2. The Data Structure: A Fixed-Size Priority Queue
Searching generates thousands of potential designs. We cannot store them all. We need a “Leaderboard” that only keeps the top $N$ best results.
The algorithm uses a custom HeapDict.
- Key: Usually the group size or category.
- Value: A Min-Heap of fixed size.
When a new design is found, we push it onto the heap. If the heap is full, the worst design (the smallest in the min-heap) is instantly discarded. This keeps memory usage constant regardless of how long the search runs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class HeapDict:
"""A dictionary of priority queues of a given limited size."""
def __init__(self, size: int):
self._size = size
self._result = collections.defaultdict(list)
def push(self, key: DictKey, item: Any):
"""
Push item. If the queue exceeds size, the worst item (smallest)
is automatically popped/discarded.
"""
queue = self._result[key]
if len(queue) < self._size:
heapq.heappush(queue, item)
else:
# Efficiently push new item and pop smallest item in one operation
heapq.heappushpop(queue, item)
self._result[key] = queue
3. The Search Algorithms
3.1. Greedy Search (Hill Climbing)
This method is fast and effective. It uses a “Stepwise” approach, alternating between optimizing the control group and expanding the treatment group.
-
Phase A: Optimize Control (Matching)
-
Goal: For a fixed Treatment group, find the best possible Control group.
-
Action: Iterate through available geographies. Use symmetric_difference to flip a geo’s status (add it if it’s out, remove it if it’s in).
-
Stop Condition: Keep swapping until no single change improves the TBRMMScore (a local maximum).
-
-
Phase B: Optimize Treatment (Growth)
-
Goal: Expand the experiment size.
-
Action: Add one new geography to the Treatment group that results in the best starting score.
-
Loop: Once a new geo is added, go back to Phase A to re-optimize the control group for this new, larger treatment group.
-
3.2. Exhaustive Search (Pruning)
This method attempts to find the global best solution by systematically checking valid combinations. However, a true brute-force search is impossible. To solve this, it uses Pruning (“Fail Fast” logic).
-
The “Fail Fast” Logic: It relies on the monotonicity of constraints (Budget and Volume).
- If {New York} is too expensive, then {New York, Boston} is definitely too expensive.
-
Implementation:
-
The algorithm remembers “bad” patterns (small groups that failed).
-
Before checking a larger group, it runs skip_if_subset().
-
If the new group contains a known “bad” pattern, it skips the expensive regression calculation entirely.
-
This allows the exhaustive search to explore millions of combinations while only calculating statistics for the viable ones.