What this data is, where it came from, and why you should trust it
This database was built from a public-records release by the Arizona Department of Education.
Here is exactly what we did, what we verified, and where confidence is lower.
The Source
ADE released its full Marketplace transaction history as
Marketplace-ESA-purchases-full-final.pdf — a 74.15 MB, 9,411-page PDF.
The file contains every line-item purchase made through the ESA Marketplace portal.
We verified the source file using SHA-256:
Public records law required ADE to produce this data. ADE complied — technically.
They produced a PDF. Not a spreadsheet. Not a database export. Not a CSV.
A PDF spanning 9,411 pages, in a format that is readable by a human with a very long afternoon
and completely resistant to bulk analysis without custom extraction tooling.
That choice was not accidental. A database export would take minutes to produce.
A 74-megabyte PDF takes infrastructure, validation, and weeks to make usable.
We built that infrastructure. You're looking at the result.
What We Did
We built an automated PDF text-extraction pipeline, parsed each page's tabular content,
derived a price (REAL) column from the raw
price_str text, and validated the entire output.
The dataset preserves original PDF text, including any OCR artifacts, as forensic evidence
of the source document's state.
Validation & Confidence
Price Data
High
0 mismatches between price_str and parsed price across 650,031 rows. 0 sort-order violations.
Item Descriptions
Good
Preserved verbatim from PDF. Some multi-line descriptions may be truncated at page breaks.
Vendor Extraction
Good
Extracted from "(taught by X)" pattern in tutoring line items only. NULL for all non-tutoring rows.
Completeness
Good
Total spend: $49,237,214.26. We have not independently verified this matches ADE's own totals.
What the Flags Mean
Arizona ESA program rules require additional documentation for purchases above certain dollar thresholds.
We derived flags for transactions priced just below those levels — a pattern consistent with
deliberately keeping individual purchases beneath documentation triggers.
$1,990 – $1,999.99 — within $10 below the $2,000 threshold
$990 – $999.99 — within $10 below the $1,000 threshold
$490 – $499.99 — within $10 below the $500 threshold
$95 – $99.99 — within $5 below the $100 threshold
Exactly $1,999.99 — a subset of the above, flagged separately
These flags do not prove intent. They are a signal worth examining, not a verdict.
What This Dataset Does Not Include
This is Marketplace transactions only. Direct-pay, reimbursement, and prepaid transactions
processed outside the Marketplace portal are not in this dataset.
The ADE 3,000-transaction statistical sample (a separate release) covers those payment types
and is analyzed separately.