Crawling not working

@rkleinert if detect is not adding things to CrawlCache, that’s very suspicious.

Anything in your MAMP error logs? If you open PHPMyAdmin via MAMP, can you check the wp2static_urls (CrawlQueue) table that it exists and is empty?

@leonstafford
thanks for your reply.

No errors in log and wp2static_urls exists and is empty.

In Plugin logs I see

Detection complete. 19852 URLs added to Crawl Queue. But when I crawl, nothing happens.

One thing I noticed that I have some PHP notices on the command line:

    wp wp2static process_queue
    PHP Notice:  Undefined index: SERVER_NAME in phar:///usr/local/Cellar/wp-cli/2.4.0/bin/wp/vendor/wp-cli/wp-cli/php/WP_CLI/Runner.php(1197) : eval()'d code on line 28
    PHP Notice:  Constant WP_DEBUG already defined in phar:///usr/local/Cellar/wp-cli/2.4.0/bin/wp/vendor/wp-cli/wp-cli/php/WP_CLI/Runner.php(1197) : eval()'d code on line 42
     Processing 4 jobs
    PHP Notice:  Trying to get property 'hierarchical' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 282
    Notice: Trying to get property 'hierarchical' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 282
    PHP Notice:  Trying to get property 'query_var' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 292
    Notice: Trying to get property 'query_var' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 292
    PHP Notice:  Trying to get property 'hierarchical' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 282
    Notice: Trying to get property 'hierarchical' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 282
    PHP Notice:  Trying to get property 'query_var' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 292
    Notice: Trying to get property 'query_var' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 292
    PHP Notice:  Trying to get property 'hierarchical' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 282
    Notice: Trying to get property 'hierarchical' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 282
    PHP Notice:  Trying to get property 'query_var' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 292
    Notice: Trying to get property 'query_var' of non-object in /Users/robertkleinert/Projects/CMS/wp-includes/link-template.php on line 292
    PHP Notice:  Undefined index: SERVER_NAME in /Users/robertkleinert/Projects/CMS/wp-includes/pluggable.php on line 331
    Notice: Undefined index: SERVER_NAME in /Users/robertkleinert/Projects/CMS/wp-includes/pluggable.php on line 331
    Success: Done processing queue

Are these related to the plugin not working?

Thanks!

Thanks for including those @rkleinert.

It looks like 2 or 3 issues. The first, about SERVER_NAME being undefined is about how when running with WP_CLI, we’re not going through the webserver, so don’t see the SERVER_NAME variable defined. I need to check on the best way to recommend to do that. Oops, looks like I swept it under the rug last time I encountered it: https://github.com/WP2Static/wp2static/issues/230

I’ve seen suggestions of how to dynamically set that in wp-config.php when it’s not found, that may be worth a little searching and checking your own wp-config.php in case there’s any existing lines around it.

The WP_DEBUG already defined is a bit odd, maybe also double-check it’s not defined twice within your wp-config.php, else it’s probably being set somewhere else.

Those query related issues I have seen, IIRC, only when WooCommerce was installed - is that your case? It’s something off in the URL detection queries I need to get back to. They shouldn’t be blocking the URL detection/crawling at all, so I’d work on clearing the first 2 notices first. Maybe try disabling all other plugins and re-running, then turn on one by one to isolate if it appears to be coming from a plugin/theme.

Sorry this isn’t a very conclusive solution yet!

If you don’t get any luck, I can try to reproduce your setup on a Mac I have here.

I haven’t run into this issue with SERVER_NAME, but you can try something like this in wp-config.php:

if ( defined( 'WP_CLI' ) ) {
    $_SERVER['HTTP_HOST'] = 'localhost';
    $_SERVER['SERVER_NAME'] = 'localhost';
}

define('WP_SITEURL', 'http://' . $_SERVER['HTTP_HOST'] . '/');
define('WP_HOME', 'http://' . $_SERVER['HTTP_HOST'] . '/');
3 Likes

Thanks @leonstafford and @john-shaffer.

I tried all hints. The Server and WP-Debug thing is solved. I also disabled all plugins and custom post types. Now when I run wp wp2static detect I am getting these link-template.php notes even more often. So not sure what it is. And I am still not getting any urls crawled or processed :frowning:

Hi @rkleinert, I’ve put a new GitHub issue to track this:

It may take a little while for me to get to it. If you’re able to help fast-track the isolation of the issue, could you please try a new, completely vanilla WP site within the same MAMP setup and confirm if you get these notices when WP2Static is the only installed plugin?

If that’s so, then if you can list out your plugins/theme that you have on the site showing the notices, I can more quickly reproduce and then fix the issue.

Cheers,

Leon

Hi @leonstafford,
on a fresh WP with same PHP version on MAMP it is working without any problems…

Please find plugins in attached screenshots

Thanks,

Robert

@leonstafford ok now it’s getting weird.

i did detect via cli, got all these notices regarding link-template.php, then ran crawl and this time it seems to work…I didn’t change anything…

@rkleinert indeed - it’s an elusive issue, but hopefully will track it down once I’m back focused on v7/WP2Static (currently in hopefully final stages of V6/Static HTML Output updating).

@leonstafford we moved or staging site into a docker container on AWS now. But there the static site export with Static HTML Output is not working in that environment.

Do you have specific PHP requirements? Is it possible to check admin panel of our WordPress (I also installed php info in wordpress)?

Unfortunately the export log has no errors.

I will send you credentials via private message,

Thanks,

Robert

@leonstafford
Update:

I increased php_value post_max_size 40M
php_value upload_max_filesize 40M
php_value max_execution_time 120
php_value memory_limit 512M

so it is like on the instance where it is working (on bitnami). But nothing changed.

Does it need write permissions outside of wp-uploads? Because we have read only except wp-uploads.

Thanks

This is quite old, but maybe worth a look for Bitnami:

https://leonstafford.github.io/notes/lightsail_bitnami_wordpress_tweaks/

sorry I think you missunderstood. On Bitnami it works fine, but on our Docker on AWS installation its not working. It says I have to check export log, but in export log are no errors. Can you have a look please? Thanks

This is the export log:

2020-08-25 20:02:43 	 2020-08-25 08:02:43
PHP VERSION 7.4.7
OS VERSION Linux ip-172-31-19-152.eu-central-1.compute.internal 4.14.186-146.268.amzn2.x86_64 #1 SMP Tue Jul 14 18:16:52 UTC 2020 x86_64
WP VERSION 5.5
WP URL https://cms.staging.compeon.de
WP SITEURL https://cms.staging.compeon.de
WP HOME https://cms.staging.compeon.de
WP ADDRESS https://cms.staging.compeon.de
PLUGIN VERSION 6.6.21
VIA WP-CLI? 
STATIC EXPORT URL https://www1.staging.compeon.de
PERMALINK STRUCTURE /%postname%/
SERVER SOFTWARE Apache/2.4.38 (Debian) 	
2020-08-25 20:02:43 	 Active plugins: 	
2020-08-25 20:02:43 	 404-to-301/404-to-301.php 	
2020-08-25 20:02:43 	 acf-image-select/acf-image-select.php 	
2020-08-25 20:02:43 	 advanced-custom-fields-nav-menu-field/fz-acf-nav-menu.php 	
2020-08-25 20:02:43 	 advanced-custom-fields-pro/acf.php 	
2020-08-25 20:02:43 	 advanced-custom-fields-row-field/acf-row.php 	
2020-08-25 20:02:43 	 ajax-thumbnail-rebuild/ajax-thumbnail-rebuild.php 	
2020-08-25 20:02:43 	 autoptimize/autoptimize.php 	
2020-08-25 20:02:43 	 better-search-replace/better-search-replace.php 	
2020-08-25 20:02:43 	 duplicate-post/duplicate-post.php 	
2020-08-25 20:02:43 	 login-logo/login-logo.php 	
2020-08-25 20:02:43 	 pardot/pardot.php 	
2020-08-25 20:02:43 	 php-info-wp/phpinfo.php 	
2020-08-25 20:02:43 	 redirection/redirection.php 	
2020-08-25 20:02:43 	 rename-wp-login/rename-wp-login.php 	
2020-08-25 20:02:43 	 shortpixel-adaptive-images/short-pixel-ai.php 	
2020-08-25 20:02:43 	 static-html-output-plugin/static-html-output-plugin.php 	
2020-08-25 20:02:43 	 w3-total-cache/w3-total-cache.php 	
2020-08-25 20:02:43 	 wordpress-importer/wordpress-importer.php 	
2020-08-25 20:02:43 	 wordpress-seo/wp-seo.php 	
2020-08-25 20:02:43 	 wp-410/wp-410.php 	
2020-08-25 20:02:43 	 wp-mail-smtp/wp_mail_smtp.php 	
2020-08-25 20:02:43 	 wp-updates-notifier/class-sc-wp-updates-notifier.php 	
2020-08-25 20:02:43 	 Plugin options: 	
2020-08-25 20:02:43 	 additionalUrls:  	
2020-08-25 20:02:43 	 baseUrl: https://www1.staging.compeon.de 	
2020-08-25 20:02:43 	 baseUrl-bitbucket:  	
2020-08-25 20:02:43 	 baseUrl-bunnycdn:  	
2020-08-25 20:02:43 	 baseUrl-github:  	
2020-08-25 20:02:43 	 baseUrl-gitlab:  	
2020-08-25 20:02:43 	 baseUrl-netlify:  	
2020-08-25 20:02:43 	 baseUrl-s3:  	
2020-08-25 20:02:43 	 baseUrl-zip: https://www1.staging.compeon.de 	
2020-08-25 20:02:43 	 baseUrl-zip: https://www1.staging.compeon.de 	
2020-08-25 20:02:43 	 basicAuthPassword: ******************* 	
2020-08-25 20:02:43 	 basicAuthUser:  	
2020-08-25 20:02:43 	 bbBranch:  	
2020-08-25 20:02:43 	 bbRepo:  	
2020-08-25 20:02:43 	 bbToken: ******************* 	
2020-08-25 20:02:43 	 bunnycdnStorageZoneAccessKey: ******************* 	
2020-08-25 20:02:43 	 bunnycdnPullZoneAccessKey: ******************* 	
2020-08-25 20:02:43 	 bunnycdnPullZoneID:  	
2020-08-25 20:02:43 	 bunnycdnStorageZoneName:  	
2020-08-25 20:02:43 	 bunnycdn_api_host: ******************* 	
2020-08-25 20:02:43 	 cfDistributionId:  	
2020-08-25 20:02:43 	 completionEmail:  	
2020-08-25 20:02:43 	 crawl_delay: 0 	
2020-08-25 20:02:43 	 crawl_increment: 5 	
2020-08-25 20:02:43 	 crawlPort:  	
2020-08-25 20:02:43 	 delayBetweenAPICalls: 0 	
2020-08-25 20:02:43 	 deployBatchSize: 1 	
2020-08-25 20:02:43 	 excludeURLs:  	
2020-08-25 20:02:43 	 ghBranch:  	
2020-08-25 20:02:43 	 ghCommitMessage:  	
2020-08-25 20:02:43 	 ghRepo:  	
2020-08-25 20:02:43 	 ghToken: ******************* 	
2020-08-25 20:02:43 	 glBranch:  	
2020-08-25 20:02:43 	 glProject:  	
2020-08-25 20:02:43 	 glToken: ******************* 	
2020-08-25 20:02:43 	 netlifyHeaders:  	
2020-08-25 20:02:43 	 netlifyPersonalAccessToken: ******************* 	
2020-08-25 20:02:43 	 netlifyRedirects:  	
2020-08-25 20:02:43 	 netlifySiteID:  	
2020-08-25 20:02:43 	 removeConditionalHeadComments: 1 	
2020-08-25 20:02:43 	 removeHTMLComments: 1 	
2020-08-25 20:02:43 	 removeWPLinks: 1 	
2020-08-25 20:02:43 	 removeWPMeta: 1 	
2020-08-25 20:02:43 	 rewrite_rules:  	
2020-08-25 20:02:43 	 rename_rules:  	
2020-08-25 20:02:43 	 s3Bucket:  	
2020-08-25 20:02:44 	 s3Key:  	
2020-08-25 20:02:44 	 s3Region:  	
2020-08-25 20:02:44 	 s3Secret: ******************* 	
2020-08-25 20:02:44 	 selected_deployment_option: zip 	
2020-08-25 20:02:44 	 targetFolder:  	
2020-08-25 20:02:44 	 useBasicAuth:  	
2020-08-25 20:02:44 	 Installed extensions: 	
2020-08-25 20:02:44 	 Core,date,libxml,openssl,pcre,sqlite3,zlib,ctype,curl,dom,fileinfo,filter,ftp,hash,iconv,json,mbstring,SPL,PDO,session,posix,Reflection,standard,SimpleXML,pdo_sqlite,Phar,tokenizer,xml,xmlreader,xmlwriter,mysqlnd,apache2handler,bcmath,exif,gd,imagick,mysqli,sodium,zip,Zend OPcache

Sorry @rkleinert, I did misunderstand.

If you SSH into the container, are you able to curl or wget the WP Site URL and get good response?

If not, it may be a DNS issue, not being able to resolve the address while crawling.

@leonstafford thank you. We can’t ssh into our container, but file_get_contents inside functions.php shows the site correctly.

Is there any chance for you to check in backend of WP?

Thanks

Taking quick look now

So, the 2nd URL in the Crawl Log, after /, is /0b97062af3b12cf6a6d5d65142766290/- which behaves weirdly - 301’s to IP address but eventually times out.

Any idea where that URL’s coming from? It’s part of the initial crawl list, ie what the plugin detects as a WP post/page/common URL.

If you can’t find a reference to it within your Posts/Pages, it may be from some Plugin - in which case, you can try disabling plugins one by one until the URL disappears, then we can investigate further.

No doubt the plugin shouldn’t just stall because of that, but we may be seeing similar issues on the other URLs, too.

We can look at excluding that URL from crawl as a potential workaround if no luck otherwise.

@leonstafford thank you. I excluded the 2 strange urls. Now it stops after 5 urls. Any ideas?

If there’s nothing showing in the Export Log, then hopefully server logs will provide some information.

Are you using any load balancer/security later in AWS? That could be blocking requests.

I give a lot of possible troubleshooting steps here: