Blog

Follow Mixnode on Twitter

Introducing the adstxt table: a simple alternative to ads.txt crawling

Nariman Jelveh April 3, 2019

It is with great pleasure that I announce the newest addition to the Mixnode data family. Starting today, you can use Mixnode to run SQL queries over hundreds of thousands of ads.txt files from all around the web.

What is ads.txt?

Authorized Digital Sellers (Also known as ads.txt) is a simple yet effective standard to increase transparency in the online advertising ecosystem. Inspired by the robots.txt standard and developed by the leading industry technology and standards developer IAB Tech Lab, ads.txt can remove the financial incentive from selling counterfeit inventory.

The ads.txt standard allows content owners to declare authorized advertising platforms and resellers by listing them in a text file named 'ads.txt'. The ads.txt file is then placed at the root of the website belonging to the content owner. For example, the list of authorized direct ad platforms and resellers for reuters.com is available at https://www.reuters.com/ads.txt and contains the following data:

exponential.com,163960,DIRECT,afac06385c445926
tribalfusion.com,163960,DIRECT,afac06385c445926
indexexchange.com,176280,RESELLER,50b1c356f2c5c8fc
google.com,pub-2051007210431666,RESELLER,f08c47fec0942fa0
google.com,pub-3746578658400510,RESELLER,f08c47fec0942fa0
indexexchange.com,185292,RESELLER
smaato.com,1100036918,DIRECT
google.com,pub-8200574565762874,DIRECT,f08c47fec0942fa0
spotxchange.com,152279,DIRECT,7842df1d2fe2db34
Spotx.tv,152279,DIRECT,7842df1d2fe2db34
rubiconproject.com,11384,DIRECT,0bfd66d529a55807
openx.com,537136463,DIRECT,6a698e2ec38604c6
openx.com,538986825,DIRECT,6a698e2ec38604c6
openx.com,537146938,DIRECT,6a698e2ec38604c6
Indexexchange.com,184971,DIRECT,50b1c356f2c5c8fc
rhythmone.com,301820969,DIRECT,a670c89d4a324e47
rhythmone.com,1575427301,DIRECT,a670c89d4a324e47
yieldmo.com,Reuters.com,DIRECT
...

You can find more details about the ads.txt standard by referring to the official Authorized Digital Resellers specification.

An alternative to ads.txt crawling

Aggregating ads.txt files from all around the web is a common practice used for anti-fraud applications and to gain insight into the world of publishers, advertising platforms and resellers; however, it often requires significant investments in building a massive-scale web crawler that can run across a cluster of machines. Additionally, due to numerous politeness and rate limiting considerations, the crawler needs to be operated with near-perfect precision in order to prevent overloading or unauthorized access to websites.

The new Mixnode adstxt table is designed to be a simple, fast, and cost-effective alternative to building and running your own ads.txt crawler and aggregator. Rather than building the infrastructure required to extract ads.txt files from all around the web, you can simply write standard SQL queries against the adstxt table to find trends, insights, and patterns.

Using SQL you can have a bird's-eye view of the entire ads.txt ecosystem in the wild. For example, the following query demonstrates how you can find all websites using the AppNexus exchange in a matter of seconds:

select 
    url_host 
from 
    adstxt 
where 
    content like '%appnexus.com%'

Similarly, if you need to find all websites using AppNexus and OpenX, you can simply extend your query with another LIKE condition:

select 
    url_host 
from 
    adstxt 
where 
    content like '%appnexus.com%'
    and
    content like '%openx.com%'

The power and flexibility of SQL allows you to write and execute queries with any number of conditions depending on your specific requirements and using Mixnode's adstxt table you only need to compose the right SQL query rather than building, running, and maintaining a full-blown distributed web crawler and data processing pipeline.

Turn the web into a database!

Mixnode is a fast, flexible and massively scalable platform to extract and analyze data from the web.

or contact us at hi@mixnode.com