Command line crawling with Screaming Frog SEO Spider.

Use `apt-get` to install Screaming Frog

Visit Screaming Frog's Check Updates page to identify the latest version number.
Update apt-get

sudo apt-get update

Install Screaming Frog

wget https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_18.2_all.deb -P /path/to/download/dir

Install the package

sudo dpkg -i /path/to/download/dir/screamingfrogseospider_18.2_all.deb

Verify installation

which screamingfrogseospider

If you're unsure where to download your package, you can always use /usr/local/bin. I've included a diagram below to see other common places to use within the Ubuntu file directory.*

Configure

Add your paid license in headless mode.

Create a new license.txt file within a hidden directory called .ScreamingFrogSEOSpider.

sudo nano ~/.ScreamingFrogSEOSpider/licence.txt

Paste your license.

screaming_frog_username
XXXXXXXXXX-XXXXXXXXXX-XXXXXXXXXX

Accept the EULA

Create a new spider.config file within the same directory.

sudo nano ~/.ScreamingFrogSEOSpider/spider.config

Paste this acceptance agreement.

eula.accepted=11

Choose Your Storage Mode

In-Memory Mode

If you want to change the amount of memory, you want to allocate to the crawler, then create another configuration file.

sudo nano ~/.screamingfrogseospider

Suppose you want to increase your memory to 8GB. Here's the configuration detail.

-Xmx8g

If you're unsure of your available memory, try this command.

free -h

Database Mode

Your default mode is in-memory, but you might want to add a database file if you're dealing with stats like these.

Crawls < 200k URLS (8GB of RAM)
Crawls > 1M+ (16GB of RAM)

If you use a database instead of in-memory, add this to spider.config.

storage.mode=DB

Disable the Embedded Browser

Since we're working in headless mode, we'll want to disable the embedded browser.

embeddedBrowser.enable=false

Let's Start Crawling

Create a directory for crawls

mkdir ~/crawls-2023wk08

Minimalist example.

screamingfrogseospider --crawl https://www.chrisjmendez.com --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output

About Command Line Options

There is a list of available flags. Below are required to accomplish a basic example.
--crawl is the URL to crawl.
--headless is required for command line processes.
--save-crawl saves your data to a crawl.seospider.
--output-folder

where you want to save your file.
--timestamped-output creates a timestamped folder for crawl.seospider helps prevent crawl collisions from your previous processes.

Advanced Example

# screamingfrogseospider --crawl https://www.chrisjmendez.com --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output --create-images-sitemap

--create-images-sitemap creates a sitemap from the completed crawl.

Resources

URLs

Screaming Frog Command Line Info

Diagrams

Cheatsheet regarding where to put your files

Installing Screaming Frog on Ubuntu via command line

Use `apt-get` to install Screaming Frog

Configure

Add your paid license in headless mode.

Accept the EULA

Choose Your Storage Mode

In-Memory Mode

Database Mode

Disable the Embedded Browser

Let's Start Crawling

Resources

URLs

Diagrams

Others from Search

Scraping web pages with NickJS

Scraping web pages with ScrapingBee

Find your files on Mac

Curated List of web scraping tools for NodeJS

Installing Screaming Frog on Ubuntu via command line

Use apt-get to install Screaming Frog

Configure

Add your paid license in headless mode.

Accept the EULA

Choose Your Storage Mode

In-Memory Mode

Database Mode

Disable the Embedded Browser

Let's Start Crawling

Resources

URLs

Diagrams

Others from Search

Scraping web pages with NickJS

Scraping web pages with ScrapingBee

Find your files on Mac

Curated List of web scraping tools for NodeJS

Subscribe to new posts

Use `apt-get` to install Screaming Frog