When processing a log file not all the
visits should be included in the generated statistics. For example visits from spammers or bots should be excluded, or if you are using your website yourself then those visits are best off being excluded.
As such it is desirable to be able to filter the visits to only include in the statistics those that you have an interest in. Visit filtering allows you to do that.
Visit filtering will only occur if a visit filter is provided by specifying a class in
property:
visitFilterClass
The visit filter implementation must implement interface:
org.polliwog.filters.VisitFilter. For each visit method:
accept(org.polliwog.data.Visit) is called, if the method returns
true
then the visit is added to the statistics, otherwise it is
filtered, i.e. rejected.
A basic visit filtering implementation is provided with polliwog. It is implemented by class:
org.polliwog.filters.BasicVisitFilter. To make polliwog use this filter use value:
org.polliwog.filters.BasicVisitFilter
for the:
visitFilterClass property.
The basic visit filter uses a xml file to configure a number of rules that it applies to the visit to determine whether it should be filtered.
Each rule is applied sequentially. Each rule can define whether the hit should be
accepted or
rejected (which correspond to whether the
accept returns
true
or
false
).
The xml file has the following elements/attributes:
XML Definition Help
The Children column shows the child elements that can be used within the specified element. Child elements can appear in any order (there is no enforcement via a DTD).
- A + after the element name indicates that at least 1 child element with that name must be present.
- A * after the element name indicates that 0 or more elements can be present.
- A ? after the element name indicates that either 1 or no elements with that name can be present.
If no symbol is provided after the name then one element must be provided.
The Attributes column shows the attribute that can be used on the specified element. Attribute definitions are defined as: name(value_type,required|optional), where value_type is one of:
- string - A string value, this can be anything.
- integer - An integer value.
- class - A fully qualified classname.
- enum{values} - A specific value, one or more of those given in the brackets, which will be comma-separated.
- boolean - Either
true or false .
Required and optional are represented as: R and O respectively.
|
Name | Root | Children | Parent(s) | Attributes | Description |
---|
visit-filter | Y | rule+ | NONE | NONE | The root element, each child rule element defines a rule that should be applied to a visit. |
rule | N | ANY | visit-filter | type(string,R) action(string,R) | Defines a rule to be applied to a visit. The type attribute defines the type of rule that is created, the following values are supported: Depending upon the type of rule created there may be extra attributes needed for the rule element, see the relevant rule for details.
The action attribute can be either: accept or reject depending upon whether you want to accept or reject the visit if the rule matches. |
A JoSQL rule is created when the the
type attribute on a
rule element is:
josql. An instance of:
org.polliwog.filters.JoSQLRule is created.
A josql rule uses a
JoSQL WHERE clause to perform the filtering (you do not need to provide the WHERE keyword). The WHERE clause should be placed as the content of the
rule element.
Instances of
Visit will be passed to the rule and the WHERE clause applied. If the WHERE clause evaluates to
true
then the visit is accepted or rejected according to the
action attribute.
The functions from
JoSQLFunctionHandler are available for use in the WHERE clause.
Example
Only accept visits where the browser is Firefox.
<rule type="josql"
action="accept">
browser.name $= 'firefox'
</rule>
Example
Reject visits where the visitor only viewed a single page (bounces).
<rule type="josql"
action="reject">
pages.size = 0
</rule>
Example
Only accept visits where the visitor came from a Google search.
<rule type="josql"
action="accept">
entryPage.refererSearch.name LIKE 'Google%'
</rule>
Example
Reject visits where the visitor did not visit the
Products site area.
<rule type="josql"
action="reject">
(SELECT *
FROM siteAreas
WHERE siteArea.name = 'Products').size = 0
</rule>
Example
Only accept visits where the visitor:
- came from a linking site
- spent more than 2 minutes on the site
- looked at least 10 pages
- visited the Products site area
- used the Firefox browser
- is using Windows XP
- is in Australia
<rule type="josql"
action="accept">
<[CDATA[
externalSites.size > 0
AND
visitDuration > 120000
AND
pages.size >= 10
AND
(SELECT *
FROM siteAreas
WHERE siteArea.name = 'Products').size = 0
AND
browser.name $= 'firefox'
AND
browser.os = 'Windows XP'
AND
location.country = 'Australia'
]]>
</rule>