A visit collector is a way to collect visits (instances of
org.polliwog.data.Visit), based on certain criteria, as polliwog is processing a log file.
Once collected the visits can be processed and sections/pages generated to display the information. For instance you may wish to collect visits from potential bots/spiders (ones that aren't listed in the robots_list.txt file)
The collector definitions are stored in file:
data/visit-collectors.xml. This is an xml file that has the following elements/attributes:
XML Definition Help
The Children column shows the child elements that can be used within the specified element. Child elements can appear in any order (there is no enforcement via a DTD).
- A + after the element name indicates that at least 1 child element with that name must be present.
- A * after the element name indicates that 0 or more elements can be present.
- A ? after the element name indicates that either 1 or no elements with that name can be present.
If no symbol is provided after the name then one element must be provided.
The Attributes column shows the attribute that can be used on the specified element. Attribute definitions are defined as: name(value_type,required|optional), where value_type is one of:
- string - A string value, this can be anything.
- integer - An integer value.
- class - A fully qualified classname.
- enum{values} - A specific value, one or more of those given in the brackets, which will be comma-separated.
- boolean - Either
true or false .
Required and optional are represented as: R and O respectively.
|
Name | Root | Children | Parent(s) | Attributes | Description |
---|
collectors | Y | collector+ | NONE | NONE | The root elements, each child collector element defines a collection. |
collector | N | ANY | collectors | on(enum{normal, filtered}),O) name(string,R) class(string,R) | Defines a particular collection. The class attribute defines the class that will represent the collection and implements the org.polliwog.collectors.VisitCollector interface. The name attribute defines a unique name (within the list of visit collections) for the collection, it is this identifier that can be used in sections to identify the collection to use as the input. |
Whilst it is possible to provide your own implementation of the
VisitCollector interface a basic implementation that allows you to define criteria for collecting visits via a
JoSQL WHERE clause is available with the
org.polliwog.collectors.BasicVisitCollector class.
To use this class just use a value of:
org.polliwog.collectors.BasicVisitCollector
for the
class attribute of the
collector element.
The WHERE clause is then provided by the content of the
collector element. Remember that the class for any accessors is:
Visit. Any valid WHERE clause can be used, the expression used will be evaluated to a boolean true/false value. This value is returned from the
accept(org.polliwog.data.Hit) method.
Example
Find all potential bots/spiders.
<collector name="potentialBots"
class="org.polliwog.collectors.BasicVisitCollector">
userAgent $IN LIKE ('%http://%', '%bot%', '%spider%', '%crawler%')
</collector>
Example
Find hot linked image visits (replace
www.mysite.com with the name of your website).
<collector name="hotLinkedImageHits"
class="org.polliwog.collectors.BasicHitCollector">
(SELECT *
FROM pages
WHERE :_allobjs.size = 1
AND requestURI.path $IN LIKE ('%.gif', '%.jpg', '%.png')
AND refererURI.toString LIKE 'http://%'
AND refererURI.toString NOT LIKE 'http://www.mysite.com%')
</collector>