Pythscan

Anyone heard of Joey Chestnut (@joeyjaws)? The American who holds 55 world records in competitive eating? Yeah, probably not, but trust me on this one - his resume is remarkable in a “should this be humanly possible?” sort of way. He’s a 9x World Rib Eating Champion, a 6x US Chicken Wing Eating Champion, a 3x Wing Bowl Champion, and let’s not forget, a 17x champion in the Nathan’s Hot Dog Eating Contest. That’s right, the annual hot dog-eating competition in Coney Island, New York, where competitors attempt to consume (and keep down) a shit ton of hot dogs and buns in 10 minutes. Chestnut’s 2025 results? How about an astounding 70.5 hot dogs and buns! Insane!

Even if I could, I’m not sure I would eat that many hot dogs. For starters, they’re made from the trimmings of pork, beef, and poultry after the prime cuts are removed. Not to mention the added salt, binders, and preservatives. Don’t get me wrong, I absolutely love a tasty hot dog. Grilled with a perfect char, topped with mustard, onions, and some relish. I could probably eat 3-4 in one sitting. They’re fuckin great, but again, not the best quality meat. Looks good, tastes good, but still a product repackaged for the consumer.

Similarly, decentralized oracle networks (DONs) might advertise their price data as reliable, but instead, “scrape” and aggregate it from third-party sources. Just as hot dogs are emulsified from trimmings and encased into products far from their original state, price feeds packaged this way lead to data inaccuracies, greater security risks, and higher operational costs.¹˒²

Third-party vs First-party Data Models¹˒²˒³˒⁴˒⁵˒⁶
“Growth in DeFI requires high-fidelity, time-sensitive, real-world data, direct from the source and made available on any L1 blockchain.”

In addition to push/pull architectural systems, another fundamental difference among DONs is the source of the data they provide. Oracle systems have historically depended on third-party sources, operating as reporter networks. In this setup, DONs rely on independent node operators to scrape price data from public aggregators (@coingecko or @CoinMarketCap). Once the nodes establish an off-chain consensus, they then relay that information on-chain. In contrast, first-party or publisher networks eliminate these “middlemen” entirely, sourcing data directly from the institutions that generate it. What’s a great example of a publisher network? @PythNetwork. Using its pull (on-demand) architecture, Pyth pulls proprietary price data from 130+ institutional publishers, including major trading firms, exchanges, and market makers (e.g., @JaneStreetGroup, @Cboe, @Tradeweb, etc). Because first-party providers operate their own nodes and publish their own data, Pyth leverages the publisher’s reputation to guarantee data reliability.

Oracle Telephone Game¹˒⁵
The limitations of third-party sourcing models are best understood using the “telephone game” concept. As we’ve already discussed, third-party aggregator models utilize multiple middlemen to deliver price updates to oracle smart contracts.

A quick recap of data transmission in third-party models (see figure 1):
Price data originates at an exchange and is → scraped and compiled by a public aggregator (e.g Coingecko) → retrieved by independent oracle nodes/operators → aggregated off-chain for a consensus → pushed on-chain.

The above sequence highlights how data reliability degrades as it passes through public aggregators and node operators, all before it’s ever delivered on-chain. The key takeaway? In third-party models, DONs depend on independent node infrastructure that relies on intermediaries themselves. And it’s this continued confidence in intermediaries that subjects the models to operational risk.

Consequences of Using Third-Party Systems⁶˒⁷˒⁸˒⁹˒¹⁰
Poor Quality / Unclear Sourcing:
Here’s a fun fact – public aggregators such as @coingecko and @CoinMarketCap DO NOT generate price data. Again (on repeat), public aggregators don’t generate price data! They scrape it from various exchanges, calculate a volume-weighted average price (VWAP), and broadcast the packaged result for consumption. By the time node operators reach a consensus and push the data on-chain, it’s already stripped of its freshness and trails real-time market prices. To make matters worse, if aggregators use exchanges that permit wash trading, the VWAP is also highly skewed.

Data Concentration (see figure 2):
To avoid single points of failure, third-party DONs aggregate data from multiple independent node operators. While this design helps maintain decentralization at the node level, the underlying data sources remain highly concentrated. This leaves the network at risk of delivering unreliable price data. A DON with multiple independent nodes/operators does not protect against poor data quality if all the nodes rely on the same source(s).

The Middleman Tax, Additional Costs, and Scaling Limitations:
Retrieving data through middlemen introduces a few economic barriers for the DONs that use them. One noticeable impact is the middleman tax. This is best described as financial incentives for node operators to keep running-- funds to assist with server maintenance, gas fees, and their own profit margins. While these incentives are designed to deter malicious node behavior, they provide no additional security benefits and make it increasingly difficult to scale the oracle network effectively.

Exploitation Risks:
→ Low Liquidity Markets: Public aggregators scraping data from low-liquidity exchanges are susceptible to flash loan manipulation.

-> Arbitrage: Data latency seen in third-party models creates arbitrage opportunities at the expense of regular users.

Push vs Pull Oracle Architecture
Although I’ve discussed this topic in greater detail before (see Push vs Pull: The Oracle Architecture that Changed Everything), it’s important to note that third-party aggregation models typically use push-based architecture, while first-party models often use a pull-based approach. This directly impacts operational performance with pull-based systems designed to optimize price data reliability, gas efficiency, update frequency, latency, and asset scalability across multiple blockchains.

Closing Remarks
Well, what have we learned? How about one simple truth - third-party aggregation models are not the answer for reliable price data. They act purely as reporter networks, relying on public aggregators and independent node operators (middlemen) to source it. Additionally, they often deploy inferior, push-based architecture that introduces significant challenges to data delivery. The solution? Publisher networks that use first-party data sources + Pull (on-demand) architecture. To be honest, It’s time to retire the legacy hot dog models. I want my on-chain price data to be prime cut, and frankly, I bet even Joey Chestnut would want that too.

References:

Cutting Out The Middlemen: Why First-Party Data Matters More Than You Think