Static WordPress Community

Long crawl times?

Hi, I have the latest 7 Alpha installed directly from git. After an initial export, if I edit just one page, the jobs are automatically queued, which is great, but the crawl portion is taking 20 minutes. The other steps are almost instant. Even though only a single page was edited, it’s touching a ton of files. To give you an idea of the size of the site, it has 331 index.html files generated. The total size of the output folder is 337MB. I’m testing this out on our much smaller domain, but we really want to get this going on the corporate marketing site. I’m scared how long the processing will take though when the marketing team does many edits per day.

Should it be taking that long to crawl after only a single page edit? The site is using Divi Builder which I suspect may have a hand in it (I’m not a fan). However we are not using Divi on the larger site, so I may just clone that site and give it a shot in a test environment.

Thanks,
~Nate

That is not normal. It could be inefficient plugins or themes, but it could also be a crappy host.

Do you have PHP 7.4 installed? It seems to be significantly faster.

Thanks for the reply. It is 7.4. It’s on a smallish Digital Ocean server (2cpu, 4G RAM).

I just tried again, editing a single page, then kicked off the processing while looking at some of the data in the db. It’s not touching every record in the crawl cache, but there are only 331 actual pages.

SELECT count(*) FROM wp_wp2static_crawl_cache where time > '2020-07-05'
=> 592
SELECT count(*) FROM wp_wp2static_crawl_cache where time < '2020-07-05'
=> 10489

When I look at the timestamps on the all the ‘index.html’ pages generated for the whole site, I can see that every single one got re-built.

Maybe I’ll try disabling all plugins then trying the test again after enabling them one-by-one.

I haven’t used Digital Ocean myself, but it’s supposed to be decent hosting. I can run WP2Static on the smallest AWS VM with 512MB RAM, and it’s blazing fast. So I think your hosting is fine. Is the DB on that server, or separate?

Those SELECT results look wrong, but it’s hard to say what is causing it. There were some changes to the tables recently, and WordPress’s dbDelta just silently fails if it can’t make a change. (I’m trying to move away from dbDelta, but it’s what we have for now). I’d suggest dropping wp_wp2static_urls and wp_wp2static_crawl_cache, and then deactivating/reactivating WP2Static (so it recreates the tables).

It’s normal at present for WP2Static to crawl the entire site. I actually like that since it’s usually very quick, and WordPress’s design means there’s no reliable way to tell if a site-wide change has been made. I’m not sure if Leon has any plans to support partial crawling.

Thanks again for the help. The db is local. I dropped the tables and re-initialized but it still took 20 minutes to crawl. I then switched themes off DIVI builder as a test and it took just over 10 minutes to crawl. I’m going to disable every plugin I can and see what happens, though that’s not really an option for the real site.

It still took nearly 10 minutes to process the queue after a single page edit with all plugins disabled and using the default WP theme. Is this what you guys would expect on a site with ~300 pages?

I’m looking at the crawl code, and it definitely should not be writing every file every time. Do you have “Use CrawlCache” checked on the Options page? You’ll still see activity in the crawl cache table even with that option off, since WP2Static needs the data, but that option controls whether the files are written to disk each time. (Sorry, I forgot that option had been added. It should be on by default in new installs, but upgrades might leave it off).

Yep, that option is checked. I’ll try with it off (even though I think you mean that it should be on). One of the log entries in the previous run said:
Crawling complete. 595 crawled, 7593 skipped (cached).

The time it’s taking on this test site is not a deal breaker, but I’m hoping to get it as quick as possible, and I’m a little worried about how long it will take on our bigger site.

Thanks for all the help.

Interesting that w/out the crawl cache turned on, it took the same amount of time, but the log reports:
Crawling complete. 8189 crawled, 0 skipped (cached).

I guess that’s as fast as it’s going to get for now. I know Leon has plans for an Advanced Crawling Addon in the future, but currently crawling always hits the entire site. I’m not sure if it would work for your use case, but if you need to crawl a very large site, it might be more practical to do it in a cron job rather than on every change. Or if you have the dev resources, I don’t think it would be terribly hard to modify WP2Static to record changed posts and crawl just those.

Hi @ncrosno, sorry for the late reply from me (bit of update of situation: Where did Leon disappear to? Project updates)

Those times definitely slow for a DO VPS with that much grunt and for so few pages.

Is this a 1-click WP or other application used or did you setup your own Nginx/Litespeed/Apache instance?

In case you got a dodgy box on DO by chance, it may be worth spinning up a minimal Vultr or EC2 box, clone site and compare there.

Another consideration is the DNS resolution. The plugin will make requests to whatever is in your WordPress > Settings > Site URL, so let’s say that’s https://dev.mydomain.com - is that resolving locally within the VPS or does it go outside, then get pointed back to the IP? That could greatly slow down each crawl request.

The next thing I’d probably look at is the site itself - ie, run something like wrk against the site and see how many requests per second it manages - if this is slow, then any site activity will be slow.

Thinking out loud, I may be confusing how things work behind the scenes, but if you have some external asserts, ie Google Fonts, remote JS or CSS, then on each crawl it could but shouldn’t be waiting on those for each crawl iteration.

There’s a few optimizations to go out in next release, along with what John mentioned, so there should be some speedups, but what you’ve described does sound exceptionally slow.

Cheers,

Leon