Audit full site failed (Cent0S 6.5)

linux-centos
audit-de-site
centos
crawler
Tags: #<Tag:0x00007f0157f03250> #<Tag:0x00007f0157f03110> #<Tag:0x00007f0157f02fd0> #<Tag:0x00007f0157f02e90>

#1

Hello, I am a newer user of Asqatsun. The environment has been set up successfully, page audit, file audit, manual audit, s,audit scenario are all ok. Only audit full site is not OK. The error message is like this:

Exception in thread "crawl-1512035621805 launchthread" java.lang.IllegalStateException: BeanFactory not initialized or already closed - call 'refresh' before accessing beans via the ApplicationContext
	at org.springframework.context.support.AbstractRefreshableApplicationContext.getBeanFactory(AbstractRefreshableApplicationContext.java:172)
	at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1098)
	at org.archive.crawler.framework.CrawlJob.getCrawlController(CrawlJob.java:499)
	at org.asqatasun.crawler.framework.AsqatasunCrawlJob$1.run(AsqatasunCrawlJob.java:210)

01-12-2017 13:59:50:558 72390646 WARN  org.asqatasun.crawler.framework.AsqatasunCrawlJob  - Failed to start bean 'fetchProcessors'; nested exception is java.lang.NoSuchMethodError: org.apache.commons.httpclient.HttpState.setCookiesMap(Ljava/util/SortedMap;)V

asqatasun-crawler-beans-site.html (23.6 KB)

It seemed that the setting files are not right, but I don’t know where is not right, can you give me some suggestions? Thanks very much. I have attached the asqatasun-crawler-beans-site setting file in the topic. Thanks for help.


#2

Hello @fengdetiankong1

This sounds weird. Could you share a few technical informations to help debugging:

  • Do you use the Docker Image or did you installed Asqatasun ?
  • If installed:
    • what is the OS (name + version) ?
    • what your Java version ?
    • what is the webserver you use ?

#3

Hello @mfaure
First of all, thanks for your reply.
Of course, I installed Asqatasun.


java version is 1.7
The webserver I used is apache-tomcat-7.0.73.


#4

hello @fengdetiankong1,

In summary:

  • CentOS 6.9
    • Java 1.7
    • Tomcat 7.0.73
      • Asqatasun
        • scenario audit = Ok
        • page audit = Ok
        • file audit = Ok
        • site audit = FAIL

Some questions:

  • what is the Asqatasun version ?
  • did you build Asqatasun with maven or just download the tar.gz ?
  • do you use a proxy ?

Perhaps you can increase log level:

vim /var/lib/tomcat7/webapps/asqatasun/WEB-INF/classes/log4j.properties
     log4j.logger.org.asqatasun.crawler=DEBUG
     log4j.logger.org.asqatasun.service=DEBUG

#5

Hello @fabrice, @mfaure

  1. List item
    Asqatasun 4.0.3
    Certainly I have built Asqatasun with maven.
    Proxy, I don’t know what it is. Maybe not? How to set it? Can you give me an example?

The log level I have increased it to debug.
In asqatasun.log, it is like this.

DEBUG org.springframework.web.servlet.DispatcherServlet  - Successfully completed request
DEBUG org.springframework.web.servlet.DispatcherServlet  - DispatcherServlet with name 'tgol-web-app' processing POST request for [/asqatasun/home/contract/audit-site-set-up.html]
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - Site audit on http://www.hotmail.com
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param false ALTERNATIVE_CONTRAST_MECHANISM
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param Rgaa30;LEVEL_2 LEVEL
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param  INFORMATIVE_IMAGE_MARKER
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param 1920 SCREEN_WIDTH
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param 86400 MAX_DURATION
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param true CONSIDER_COOKIES
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param 1000 MAX_DOCUMENTS
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param  EXCLUSION_REGEXP
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param  PRESENTATION_TABLE_MARKER
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param 20 DEPTH
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param  DATA_TABLE_MARKER
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param 1080 SCREEN_HEIGHT
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param  INCLUSION_REGEXP
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param  DECORATIVE_IMAGE_MARKER
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - param  COMPLEX_TABLE_MARKER
INFO org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - Launching audit site on http://www.hotmail.com
DEBUG org.springframework.web.servlet.DispatcherServlet  - Rendering view [org.springframework.web.servlet.view.JstlView: name 'audit-in-progress'; URL [/WEB-INF/view/audit-in-progress.jsp]] in DispatcherServlet with name 'tgol-web-app'
DEBUG org.asqatasun.service.AuditServiceImpl  - auditSite
DEBUG org.springframework.web.servlet.DispatcherServlet  - Successfully completed request
DEBUG org.asqatasun.service.AuditServiceThreadQueueImpl  - auditCommand polled
DEBUG org.asqatasun.service.AuditServiceThreadQueueImpl  - AuditServiceThread created from auditCommand
DEBUG org.asqatasun.service.AuditServiceThreadQueueImpl  - AuditServiceThread started
DEBUG org.asqatasun.webapp.orchestrator.AsqatasunOrchestratorImpl  - WAIT FOR AUDIT TO COMPLETE:org.asqatasun.entity.audit.AuditImpl@76e70a0,1512355460
INFO org.asqatasun.service.command.SiteAuditCommandImpl  - Launching crawler for page http://www.hotmail.com
INFO org.asqatasun.crawler.framework.AsqatasunCrawlJob  - outputDir------------/opt/soft/apache-tomcat-7.0.73/webapps/asqatasun
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - outputDir------------/opt/soft/apache-tomcat-7.0.73/webapps/asqatasun
INFO org.asqatasun.crawler.framework.AsqatasunCrawlJob  - crawlConfigFilePath------------/opt/soft/apache-tomcat-7.0.73/webapps/asqatasun/WEB-INF/conf/crawler/
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - crawlConfigFilePath------------/opt/soft/apache-tomcat-7.0.73/webapps/asqatasun/WEB-INF/conf/crawler/
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - Directory: /opt/soft/apache-tomcat-7.0.73/webapps/asqatasun/crawl-1512355461279 created
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - crawlConfigFilePath: /opt/soft/apache-tomcat-7.0.73/webapps/asqatasun/WEB-INF/conf/crawler/ for copy
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - filepath : /opt/soft/apache-tomcat-7.0.73/webapps/asqatasun/WEB-INF/conf/crawler//asqatasun-crawler-beans-site.xml
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - CONSIDER_COOKIES true
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - PRESENTATION_TABLE_MARKER 
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - EXCLUSION_REGEXP 
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - MAX_DOCUMENTS 1000
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - COMPLEX_TABLE_MARKER 
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - DATA_TABLE_MARKER 
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - SCREEN_WIDTH 1920
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - DECORATIVE_IMAGE_MARKER 
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - MAX_DURATION 86400
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - LEVEL Rgaa30;LEVEL_2
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - INCLUSION_REGEXP 
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - SCREEN_HEIGHT 1080
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - DEPTH 20
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - ALTERNATIVE_CONTRAST_MECHANISM false
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - INFORMATIVE_IMAGE_MARKER 
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Modifier found for value http://www.hotmail.com/
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - true CONSIDER_COOKIES
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Modifier found for value true
DEBUG org.asqatasun.crawler.util.HeritrixInverseBooleanAttributeValueModifier  - Update ignoreCookies attribute of bean fetchHttp with value false
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  -  PRESENTATION_TABLE_MARKER
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  -  EXCLUSION_REGEXP
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Modifier found for value 
DEBUG org.asqatasun.crawler.util.HeritrixParameterValueModifier  - [list: null] value 
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - 1000 MAX_DOCUMENTS
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Modifier found for value 1000
DEBUG org.asqatasun.crawler.util.HeritrixAttributeValueModifier  - Update maxDocumentsDownload attribute of bean crawlLimiter with value 1000
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  -  COMPLEX_TABLE_MARKER
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  -  DATA_TABLE_MARKER
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - 1920 SCREEN_WIDTH
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  -  DECORATIVE_IMAGE_MARKER
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - 86400 MAX_DURATION
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Modifier found for value 86400
DEBUG org.asqatasun.crawler.util.HeritrixAttributeValueModifier  - Update maxTimeSeconds attribute of bean crawlLimiter with value 86400
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Rgaa30;LEVEL_2 LEVEL
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  -  INCLUSION_REGEXP
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Modifier found for value 
DEBUG org.asqatasun.crawler.util.HeritrixParameterValueModifier  - [list: null] value 
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - 1080 SCREEN_HEIGHT
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - 20 DEPTH
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - Modifier found for value 20
DEBUG org.asqatasun.crawler.util.HeritrixAttributeValueModifier  - Update maxHops attribute of bean tooManyHopsDecideRule with value 20
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  - false ALTERNATIVE_CONTRAST_MECHANISM
DEBUG org.asqatasun.crawler.util.CrawlConfigurationUtils  -  INFORMATIVE_IMAGE_MARKER
INFO org.asqatasun.crawler.framework.AsqatasunCrawlJob  - configFile-----------/opt/soft/apache-tomcat-7.0.73/webapps/asqatasun/crawl-1512355461279/asqatasun-crawler-beans-site.xml
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - configFile-----------/opt/soft/apache-tomcat-7.0.73/webapps/asqatasun/crawl-1512355461279/asqatasun-crawler-beans-site.xml
INFO org.asqatasun.crawler.framework.AsqatasunCrawlJob  - urlList-----------[http://www.hotmail.com]
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - urlList-----------[http://www.hotmail.com]
INFO org.asqatasun.crawler.framework.AsqatasunCrawlJob  - heritrixFileName-----------asqatasun-crawler-beans-site.xml
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - heritrixFileName-----------asqatasun-crawler-beans-site.xml
INFO org.asqatasun.crawler.CrawlerImpl  - Rel canonical pages are kept for ref Rgaa30
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - crawljob is launchable
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - Job validated
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - Starting context
WARN org.asqatasun.crawler.framework.AsqatasunCrawlJob  - Failed to start bean 'fetchProcessors'; nested exception is java.lang.NoSuchMethodError: org.apache.commons.httpclient.HttpState.setCookiesMap(Ljava/util/SortedMap;)V
DEBUG org.asqatasun.crawler.framework.AsqatasunCrawlJob  - Context started

But there is another log file in tomcat server called catalina.out, the logs is like this.

Dec 04, 2017 10:44:21 AM org.archive.crawler.framework.CrawlJob instantiateContainer
INFO: Job instantiated
Dec 04, 2017 10:44:21 AM org.archive.spring.PathSharingContext initLaunchId
INFO: launch id 20171204024421
Exception in thread "crawl-1512355461279 launchthread" java.lang.IllegalStateException: BeanFactory not initialized or already closed - call 'refresh' before accessing beans via the ApplicationContext
	at org.springframework.context.support.AbstractRefreshableApplicationContext.getBeanFactory(AbstractRefreshableApplicationContext.java:172)
	at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1098)
	at org.archive.crawler.framework.CrawlJob.getCrawlController(CrawlJob.java:499)
	at org.asqatasun.crawler.framework.AsqatasunCrawlJob$1.run(AsqatasunCrawlJob.java:210)

Above all is the two logs content.

Please forgive me verbose, I pasted two logs content here.:wink:


#6

hello @fengdetiankong1,

Can you test with another URL like https://en.wikipedia.org/ ?
I have the same problem with your URL:

curl -i http://www.hotmail.com 
    HTTP/1.1 301 Moved Permanently
    Location: https://outlook.live.com/owa/

I think that site audit don’t follow the new location.


#7

Hello @fabrice
The error logs are all the same, the url changed, but the error log is the same.:sweat_smile:


#8

#9

Hi,

It looks like there are multiple version of a same library in your classpath (and so a mismatch between the compile, and the runtime context.
Can you please give us the content of the WEB-inf/lib folder to see what version HttpClient you’re using?
I suspect this library to be installed in your system, in a different version, that would explain the message error

Regards
Koj


#10

@fengdetiankong1, does the site audit work now?


#11

Hi koj,

I uploaded the image of the version of http-Client. Can you help check it?


#12

Hi Fabrice,

It still does not work now. @koj said the reason may be the version mismatch, I uploaded the screen shot in the reply. Can you help check it too? Thank you very much.

Best Regards
Fengdetiankong


#13

Hi,

I guess I found the reason of your problem.
In the snapshot you posted earlier, we can see that you have a dependency to httpclient-4.3.6.jar AND commons-httpclient-1.1.jar

They both embed a class named HttpState in the package org.apache.commons.httpclient. The way the classes are loaded by your classloader may differ from a context to another. It seems that in your case, the one from commons-httpclient-1.1.jar is used to load and the expected method does not exist.

Now, I have a question for you.
I’ve just downloaded the latest version from github
I had a look at the war archive and the commons-httpclient-1.1.jar is not present in the librairies.

Did you build your own artifact?

Regards

Koj


#14

Hi

I checked computer, I don’t have the commons-httpclient-1.1.jar . I only build the development environment using the command mvn clean install, it would download the dependencies auto. I didn’t find the commons-httpclient-1.1.jar , can you help share it to me? Thanks very much.


#15

I think I’ve been misunderstood

In the message in which you uploaded the list of dependencies (directory WEB-INF/lib), the commons-httpclient-3.1.jar (I made a mistake with the version yesterday) is present and SHOULD not.

We need to find how and why it is present in the version you built.

If you simply download the latest release, and install the war archive, it should works.


#16

I’ve just built the develop branch and the dependency is not present.

Did you add new dependency to any pom? If yes, what for?


#17

Hi koj,

I add new dependency to the /asqatasun-web-app/pom.xml . I added some. I pasted the screen shot. I also uploaded pom.xml here. Please help check it. Thank you very much.:grinning:


pom.json (13.7 KB)


#18

What is the wcag dependency? I guess this is where the dependency comes from (obviously it is not from mysql-connector or spring-instrument-tomcat)


#19

Hi koj,

It is about Accessibility rule WCAG 2.0 https://www.w3.org/TR/WCAG20/. We add it to the Asqatasun。But it will not influence the full site audit, I think.


#20

Hi koj,

Thanks for your support. I have solved the issue. It is because the jar in the lib. I download the whole project from github, compare it to my project, I modified the pom.xml refering to the initial project, executed then it succeed. But there is another issue, can you help check it? Thanks very much.