Crawling takes very long / not caching?

rkleinert · September 29, 2020, 7:37am

finally I got everything working with Static HTML Output, but I noticed that crawling times are very long, about 15 minutes for around 2k Urls.

Ist also crawls all pages over and over again instead of using cache. Why doesn’t it use cache?

And why does it regenerate all pages instead of only the ones which were updated?

I currently have Crawl Increment at 50 and Deployment Batch Size at 25, above these its not working.

We run on AWS Fargate with Docker.

Thanks,

Robert

leonstafford · September 29, 2020, 9:15am

Hi @rkleinert,

Oops! I had you confused with another Robert K, sorry for pinging you in another thread!

Crawl caching will see some improvements soon.

Re batch sizes/speed - you’re likely either hitting PHP max_execution_time or memory_limit limits. ie, if it tries to process 1,000 URLs in a “batch”, it will take longer before communication intervals from the browser to the server, that can exceed the max_execution_time if not increased beyond usual default value.

Less likely is it exceeding memory_limit, but that would depend on the site, plugins, themes, too.

The 2 values have some correlation - ie, if you increase the memory_limit setting, it may be able to work faster, thereby allowing a larger batch size to work without adjusting the max_execution_time.

leonstafford · September 29, 2020, 9:18am

Also, check your DNS routes - especially with Docker and depending on how this is all setup.

ie, you have your WordPress webserver within one (or more?) ECS containers? When WP2Static crawls, it’s running from within that container, making a request to mydomain.com. Does that container resolve mydomain.com to itself or need to go out to the load balancer or such and back in order to fulfil that request? That could be adding extra time to each request…

rkleinert · November 12, 2020, 7:40pm

Hi @leonstafford sorry for not getting back to you. Is there any update on this issue? Even when we update only one post, it takes 20 minutes to do the export. Is this normal?

WP runs inside one ECS container. How can I check if the container resolves the domain to itself?

Thanks,

Robert