Why are there so many more files with WP2Static vs Static HTML Output?

I just discovered there seems to be a substantial difference in the number files generated by WP2Static compare to Static HTML Output:

$ find uploads/static-html-output/ -type f | wc -l
64
$ find uploads/wp2static-processed-site/ -type f | wc -l
929

Is there some separate plugin, configuration option, or add-on in WP2Static that I’ve missed?

@openletter this is one of the reasons the 2 plugins remain as separate projects. They have different methods of detection and crawling. So, you may be getting more than you need by default in WP2Static, ie a theme may have a dist directory, with all their development assets and dependencies included. If you compare the 2 dirs, does that seem to be the case?

It seems the biggest differences include WP2Static includes many of the plugin files, all of the upload files, and many of the /wp-includes/ files.

Yeah, you may have luck with the Advanced Crawling Addon to better control that. There’s a dev build known to work here: https://github.com/leonstafford/wp2static/files/5797370/wp2static-addon-advanced-crawling-1.0-alpha-003.zip

I’m getting an error when I run after enabling the add-on and not changing the defaults in the add-on:

500 error code returned from server.
Please check your server’s error logs or try increasing your max_execution_time limit in PHP if this consistently fails after the same duration.
More information of the error may be logged in your browser’s console.

$ grep max_execution_time /etc/php/7.3/fpm/php.ini
max_execution_time = 300

(Note the failure appears instantly)

Error log:

2021/01/22 04:40:48 [error] 8477#8477: *2351 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Uncaught Error: Class 'WP2Static\Request' not found in /blog/wp-content/plugins/wp2static-addon-advanced-crawler/src/Crawler.php:40
Stack trace:
#0 /blog/wp-content/plugins/wp2static-addon-advanced-crawler/src/Crawler.php(77): WP2StaticAdvancedCrawling\Crawler->__construct()
#1 /blog/wp-includes/class-wp-hook.php(287): WP2StaticAdvancedCrawling\Crawler::wp2staticCrawl('/var/www/libert...', 'wp2static-addon...')
#2 /blog/wp-includes/class-wp-hook.php(311): WP_Hook->apply_filters(NULL, Array)
#3 /blog/wp-includes/plugin.php(484): WP_Hook->do_action(Array)
#4 /blog/wp-content/plugins/wp2static/src/Controller.php(740): do_action('wp2static_crawl', '/var/www/...', 'wp2static-addon...')
#5 /blog/wp-content/plugins/wp2static/src/Controller.php(623): WP2Static\Controller::wp2s" while reading response header from upstream, client: 203.0.113.91, server: www.example.com, request: "POST /blog/wp-admin/admin-ajax.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.3-fpm.sock:", host: "www.example.com", referrer: "https://www.example.com/blog/wp-admin/admin.php?page=wp2static"

Sorry for the late response, pretty sure I sent you the wrong build, one minute…

No, I think that was the latest build, but @john-shaffer will know

It looks like recent changes to the git build of WP2Static removed the Request.php file, which the addon needs. I haven’t tested against that build. It does work with WP2Static 7.1.6.

1 Like

I see. I’ll downgrade to 7.1.6. For reference I am using 7.1.7-dev.

Although the plugin seems to be running, the add-on didn’t seem to make an appreciable difference, and I’m not sure how to go through the files to clean this up.

Here is a breakdown of the numbers and what seems to be the real problem directories:

$ find . -type f | wc -l
892
$ find wp-content/ -type f | wc -l
305
$ find wp-includes/ -type f | wc -l
569

So the fundamental problem here is that JavaScript, being a Turing-complete language, is impossible to analyze in a way that would tell us what files it might reference. A slightly bloated site is a lot better than a broken site, so we try to include every file that could be referenced. Which includes everything in wp-content and wp-includes.

The alternative is to include nothing by default, crawl the site and detect as many includes as we can, and provide an option to manually specify any includes that we missed. I might be able to implement this quickly, so I’ll get back to you on that.

2 Likes

Ah, good catch, @john-shaffer! I’d have noticed before next round of releases, with auto testing of the addons. Sorry, @openletter!

Yeah, there should only be a few files needed from wp-includes (if any). If you simply delete that whole dir, does your exported site still look OK?

I’m playing catchup a bit on the Advanced Crawling Addon, so @john-shaffer may best advise expected behaviour.

I was able to modify the addon, resulting in 24 deployed URLs in a default WP install vs 694 normally. You have to manually disable the “Detect” step in WP2Static and delete everything in your “Crawl Queue”, as there’s currently no way to override WP2Static’s detection.

The modified addon is here. I added an “additionalPathsToCrawl” option, which you should use to add anything that gets missed. The crawl starts at the root path ( “/” ) and tries to detect all linked pages and assets. Then it crawls those linked pages and assets, and so on, until it detects nothing new. This requires “Add URLs Discovered While Crawling” to be enabled.

I don’t think we’re doing any parsing of CSS or JS currently, so this is probably going to miss a few things.

2 Likes

I’ll test it out tomorrow morning.

As a side note, and note that I know nothing about programming and not to suggest you didn’t think of this already, it is only the quantity of files that concerns me, so if you stuffed them all into 2 big honking files, or something, that would work just as well for my needs.

Can you use the zip addon, or do you have more complex needs?

I have to learn more about using Cloudflare workers to answer that, which is what I’m trying to figure out right now.

The free plan includes 1,000 write, delete, list operations per day, which I’m taking to mean that is each file? I haven’t gotten to that step, yet (you can see my recent post on this question here), but I think the zip option won’t help me.

I have found two tutorials for WP2Static on Cloudlfare:

This one on the WP2Static site:
https://wp2static.com/addons/cloudflare-workers/

And this one on the Cloudflare developers site:
https://developers.cloudflare.com/workers/tutorials/deploy-a-static-wordpress-site

They each use different methods to get the files to Cloudflare. The WP2Static tutorial seems to be using the WP2Static Cloudflare plugin to upload the files to Cloudlfare, while the Cloudflare tutorial is using the Cloudflare command line Wrangler tool (which I have to compile or install through npm, which requires installing npm) to upload to Cloudflare. Note that wrangler would use a zip file.