Blog

Follow Mixnode on Twitter

Introducing the hompages table

We are incredibly excited to announce the homepages table: a database table containing millions of homepages from all around the web, allowing you to skim and analyze a massive number of websites quickly and affordably using simple SQL queries.

The homepages table at a glance

The first page that comes up when you enter a website's address in your browser is the homepage of that website. A homepage contains many important signals about the website it represents, such as the category and language of the website, technologies used by the website and a variety of valuable trends and patterns.

Every row in the homepages table maps directly to a website's homepage on the web. Not only does each row contain the raw HTML content of the homepage, but it also provides valuable metadata such as domain name, title, description meta tag, and hostname which allow you to compose fine-grained queries to pinpoint the exact data that you need.

Example

The hompages table makes it incredibly simple and cost-effective to track technologies used by different websites. Let's say you've created the next big WordPress plugin and now need to find all WordPress websites to market your plugin to. With a simple SQL query on the homepages table you can create a list of millions of websites that use WordPress:

select 
    url_host
from 
    homepages 
where 
    content like '%name="generator" content="WordPress%'

If your plugin is designed only for the higher education market, you can simply change your query to target university websites that use WordPress:

select 
    url_host
from 
    homepages 
where 
    content like '%name="generator" content="WordPress%'
    and
    url_etld = 'edu'

And if you need to get only WordPress websites that contain certain keywords such as 'bitcoin' and 'ethereum', you can narrow down your search even further:

select 
    url_host
from 
    homepages 
where 
    content like '%name="generator" content="WordPress%'
    and
    content like '%bitcoin%'
    and
    content like '%ethereum%'

As shown above, you can easily combine Mixnode's SQL capabilities and the columns provided by the homepages to create arbitrarily specific queries to analyze millions of homepages in order to segment the web, gather statistics, and detect trends and patterns.

Give it a try

As always, we would love to hear from you! Give the new homepages table a try and
contact us at hi@mixnode.com if you have any questions or comments.

Turn the web into a database!

Mixnode is a fast, flexible and massively scalable platform to extract and analyze data from the web.

or contact us at hi@mixnode.com