Creating a Lifeboat site

leonstafford · August 12, 2020, 3:03pm

Bother away - my penance for not providing documentation. There is some outdated stuff still up at https://docs.wp2static.com that may work, but there’s not as much going on as 7 to be caught out with.

nathansmonk · August 12, 2020, 4:28pm

Does 7 write a lot to the DB? I seem to have serious slow down on my site, post crawl…

nathansmonk · August 12, 2020, 6:28pm

Gah, oh no - v6 doesn’t have wp cli support @leonstafford ?

leonstafford · August 12, 2020, 6:42pm

There is, but name is different, not wp wp2static, but wp statichtmloutput

leonstafford · August 12, 2020, 6:42pm

or statichtml sorry, away from pc

leonstafford · August 12, 2020, 6:43pm

if you run just wp, it should give you the available commands. Their not 1-1 match for wp2static, but enough to do the export

leonstafford · August 13, 2020, 1:42pm

So, little example of CLI functions and extending Static HTML Output were on the GH page

nathansmonk · February 22, 2021, 10:27pm

So I’m back! Sorry for going dark. I hope you are doing well?

My aim remains the same: to create a static copy of a large live production site on the wp cli and offload the files to an s3 bucket (with a cloudfront distribution sat in front) with a custom url. Route53 will monitor the health of the live production site, and in the event of the site becoming unhealthy (down) it will automatically re-reroute to the static copy.

My concerns: database performance/bloat

So my question coming back to this is: Should I be using wp2static or static-html-output?

leonstafford · February 22, 2021, 11:01pm

No worries re time, I’ve been in coder’s block cpl weeks now :{

I’d got wp2static based on the CLI usage and more techie sounding approach. The Advanced Crawling Addon will probably be required, can check out that repo, but I believe still tied to 7.1.6 of wp2static.

Give it all a go and let me know any issues you encounter.

john-shaffer · February 23, 2021, 12:28am

The latest master of Advanced Crawling Addon has revamped logic for the “Crawl only changed URLs” option that should help, but it’s new and hasn’t really been tested on live sites. It does make very fast deploys possible if you’ve only changed a few posts.

Unfortunately, performance is pretty horrendous on huge sites if you change the menu or something and have to reprocess the entire site. This is always going to be inherently tough since we’re working with WP and not a framework that’s actually designed for creating static sites, but I think there is room for a lot more optimization in WP2Static around minimizing DB usage. My changes to enable partial crawls use more DB calls to facilitate, and that seems to result in slower crawling when you have to do a full site crawl.

nathansmonk · April 26, 2021, 7:38pm

So crawling is pretty slow as you predicted - it’s taken the best part of a week of solid crawling to get to where it is. Its particularly hard to know how far along its done. I’m at circa 45k records and I can’t really say that I know how far along this is to complete. But in running over the CLI, I do seem to have been able to overcome the performance hoggining that was going on. Are there any ways to know how far along the process it is?

I’m trying to run a process command now, but again, it’s really hard to know if it is actually doing anything. The big thing preventing me from deploying at least something is the fact that my wp-content folder isnt in the wp2static-processed-site folder.

Is there anyway to exclude certain patterns (ie, anything including events) from being crawled/processed? Doing this, I think I could at least deploy something quite quickly and then add in additional sections once I have something up.

leonstafford · April 26, 2021, 10:10pm

Hi @nathansmonk,

Running via CLI, there’s a stale PR to show progress, but wasn’t wuite the same as the UI, where we report progress of every 300 files.

I have another terminal with a watch command doing something like:

du -sh DIR

Or a find to show incrementing size/files.

A week is intense. I’d rather clone the site locally and give more power/cutout network bottlenecks.

nathansmonk · April 27, 2021, 7:38pm

Yup, I’ve put together a little script which kinda does this now, so at least I can see it’s working. It’s certainly doing its thing now. My hope is once this first crawl is out of the way, the follow up ones will be much quicker. I’m not actually seeing a bottleneck which is weird - what parameters can I provide to give wp2static a bit of a turbo boost? the crawl chunk size?

I have 2 immediate problems:

It will not seem to crawl my actual css file.
I’ve added the path into the Additional Paths to Crawl in the Advanced Crawl Plugin, but when I run wp wp2static crawl, it doesn’t add this path. Any ideas what I’m doing wrong here? Do I need to run the detect stage again?
Because of my infrastructure, when the CPU gets too high, it just terminates the instance and starts a new one. This also terminates the long running script that is doing the crawl.

I think I can get round 2, if I address 1.

john-shaffer · April 29, 2021, 1:16am

It will not seem to crawl my actual css file.
I’ve added the path into the Additional Paths to Crawl in the Advanced Crawl Plugin, but when I run wp wp2static crawl, it doesn’t add this path. Any ideas what I’m doing wrong here? Do I need to run the detect stage again?

You don’t have to run detect again, as they are added during the crawl step. You should see a log message from WsLog::l( count( $additional_paths ) . ' additional paths added.' ); at the start of the crawl. It always adds “/”, so the reported number may be higher than the amount that you’ve added.

The paths have to be the exact, full filenames, not a directory location (since this case is intended for when auto-discovery isn’t working out).

Because of my infrastructure, when the CPU gets too high, it just terminates the instance and starts a new one. This also terminates the long running script that is doing the crawl.

I haven’t heard about this before. Why is the infrastructure like that?

nathansmonk · April 30, 2021, 2:27pm

I’m perhaps oversimplifying a bit, but ultimately its a self healing type affair, so servers come and go depending on traffic requirements, but there’s no control about which instances get binned when scaling down happens.

Good to know regarding detection. I think I’m almost there!

leonstafford · May 1, 2021, 11:38pm

Hi @nathansmonk,

Sounds like a fun project!

Re the self-healing, should be OK, as long as WP2Static is running on a permanent instance and the auto-scaling type stuff is just providing URLs to crawl through, but then, this would sound like a setup with additional DNS trips (crawling from same server as WP2Static is only really supported/recommended at the moment).

For giving WP2Static extra boost, yeah, giving more chunks per crawl is good and at least used to be an option via the UI, not sure if that’s in at the moment. I’m keen to see if Spatie’s PHP crawler library can be dropped into WP2Static, which could give some really nice crawl performance improvements.

If you want to bypass WP2Static or use it in conjunction with more optimised web crawlers, there’s some nice CLI tools, which come up when searching on ways to optimize cURL/wget, maybe some from here: GitHub - BruceDone/awesome-crawler: A collection of awesome web crawler,spider in different languages

If you can calculate the requests per second that WP2Static is processing using those tailing scripts, then you can compare with other servers (maybe try https://lokl.dev) as a consistent way to benchmark.

I like GitHub - wg/wrk: Modern HTTP benchmarking tool for general requests per second testing of servers. That may be good indicator of the server in general. If slow to load test, will be slow to crawl.

Crawling is expected the slowest part of the process (detect, crawl, post_process, deploy), simply due to the network requests, so I’d look into any lag there as lowest hanging fruits to speed up the whole export.

nathansmonk · May 4, 2021, 10:36pm

Well its now about 99.99% done.

I got a complete crawl process and deploy done.

It’s definitely missing a bunch of resources despite them being added in the “Additional Paths to Crawl”. I’ll try another crawl, but if there’s something I should know about this, I’m all ears.

I’ve also noticed that subdirectories produced (and shipped to s3) don’t default to going to the index.html file inside of them. Is there some additional config I need there?

leonstafford · May 5, 2021, 6:10am

Just quickly re the S3 - there’s a setting in the bucket for default paths, but I think that’s just for default bucket root index.html. The setting you probably want to change is in your CloudFront distribution settings, where you set a similar default index.html.

yanch · June 3, 2021, 2:13pm

Hi @nathansmonk
It’s actually the thing I’d like to do now with wp2static – a lifeboat

Can you share how your solution worked in the end?
Mb any tips?

Thanks!

nathansmonk · July 7, 2022, 1:07pm

Wish I could say I’ve cracked it, but sadly not. It sorta works. If I can crack it I’ll share my journey