A hit collector is a way to collect hits (instances of
org.polliwog.data.Hit), based on certain criteria, as polliwog is processing a log file.
Once collected the hits can be processed and sections/pages generated to display the information. For instance you may wish to collect "image hot-linking hits" (that is hits from other websites that link directly to images on your site).
Each hit collection defines the
types of hits that it wants to collect.
The collector definitions are stored in file:
data/hit-collectors.xml. This is an xml file that has the following elements/attributes:
XML Definition Help
The Children column shows the child elements that can be used within the specified element. Child elements can appear in any order (there is no enforcement via a DTD).
- A + after the element name indicates that at least 1 child element with that name must be present.
- A * after the element name indicates that 0 or more elements can be present.
- A ? after the element name indicates that either 1 or no elements with that name can be present.
If no symbol is provided after the name then one element must be provided.
The Attributes column shows the attribute that can be used on the specified element. Attribute definitions are defined as: name(value_type,required|optional), where value_type is one of:
- string - A string value, this can be anything.
- integer - An integer value.
- class - A fully qualified classname.
- enum{values} - A specific value, one or more of those given in the brackets, which will be comma-separated.
- boolean - Either
true or false .
Required and optional are represented as: R and O respectively.
|
Name | Root | Children | Parent(s) | Attributes | Description |
---|
collectors | Y | collector+ | NONE | NONE | The root element, each child collector element defines a collection. |
collector | N | ANY | collectors | on(enum{nonPage, page, filtered},R) name(string,R) class(string,R) | Defines a particular collection. The class attribute defines the class that will represent the collection and implements the org.polliwog.collectors.HitCollector interface. The name attribute defines a unique name (within the list of hit collections) for the collection, it is this identifier that can be used in sections to identify the collection to use as the input. The on attribute defines the types of hit that should be collected, should be a comma-separated list of one of more of the values: nonPage, page, filtered. The usual value for this attribute is: nonPage, page. |
There are 3 types of hits that are relevant to hit collections. The types relate to how polliwog categorizes the hit. It should be noted that in terms of hit collections a hit will only be classified as one of the types. The types are:
- Filtered - The hit is categorized as filtered if there is a hit filter in use and it rejects the hit, i.e. it has been filtered and won't contribute to any visit/pages statistics. To collect this type of hit, use a value of: filtered for the on attribute of the collector element.
- Non page - The hit is categorized as not a page if the page collector decides that the hit does not constitute a page. To collect this type of hit, use a value of: nonPage for the on attribute of the collector element.
- Page - The hit is categorized as a page if the page collector decides that the hit does constitute a page. To collect this type of hit, use a value of: page for the on attribute of the collector element.
Whilst it is possible to provide your own implementation of the
HitCollector interface a basic implementation that allows you to define criteria for collecting hits via a
JoSQL WHERE clause is available with the
org.polliwog.collectors.BasicHitCollector class.
To use this class just use a value of:
org.polliwog.collectors.BasicHitCollector
for the
class attribute of the
collector element.
The WHERE clause is then provided by the content of the
collector element. Remember that the class for any accessors is:
Hit. Any valid WHERE clause can be used, the expression used will be evaluated to a boolean true/false value. This value is returned from the
accept(org.polliwog.data.Hit) method.
Example
Find all hits that have a request parameter:
search for pages.
<collector on="page"
name="searchHits"
class="org.polliwog.collectors.BasicHitCollector">
get (requestParameters, "search") != NULL
</collector>
Example
Find hot linked image hits (replace
www.mysite.com with the name of your website).
<collector on="nonPage,page"
name="hotLinkedImageHits"
class="org.polliwog.collectors.BasicHitCollector">
refererURI NOT LIKE 'http://www.mysite.com'
AND
refererURI.scheme LIKE 'http%'
AND
requestURI.path IN $LIKE ('%.gif', '%.jpg')
</collector>