Smart approach to the header problem. I hit something similar building a Netflix episode resolver — the data is all there in the HTML but you need the right headers or you get nothing.
Ended up just wrapping curl with the right User-Agent and it worked without needing a full browser.
The auto-learned Referer per upstream host is a nice touch. How often do the upstream sources change their header requirements on you?