[manpage Hit Filtering] [openMenu basic_concepts /] [msection Hit Filtering] When processing a log file not all the hits should be included in the generated statistics. Log files can sometimes span multiple months and/or contain hits that you will have no interest in (for example requests to retrieve css files, javascript files, image files, the favicon.ico, the robots.txt file). As such it is desirable to be able to filter the hits to only include in the statistics those that you have an interest in. Hit filtering allows you to do that. Hit filtering will only occur if a hit filter is provided by specifying a class in [link properties-file.html]property[/link]: [value]hitFilterClass[/value] The hit filter implementation must implement interface: [javadoc polliwog,org.polliwog.filters.HitFilter /]. For each hit method: [javadoc polliwog,org.polliwog.filters.HitFilter#accept(org.polliwog.data.Hit),lastid=y /] is called, if the method returns true then the hit is added to the statistics, otherwise it is filtered, i.e. rejected. [/msection] [msubsection Basic Hit Filter] A basic hit filtering implementation is provided with polliwog. It is implemented by class: [javadoc polliwog,org.polliwog.filters.BasicHitFilter /]. To make polliwog use this filter use value: [value]org.polliwog.filters.BasicHitFilter[/value] for the: hitFilterClass property. The basic hit filter uses a xml file to configure a number of rules that it applies to the hit to determine whether it should be filtered. Each rule is applied sequentially. Each rule can define whether the hit should be accepted or rejected (which correspond to whether the accept returns true or false). The xml file has the following elements/attributes: [xmltable hit-filtering] [tr]hit-filter|Y|rule+|NONE|NONE|The root element, each child rule element defines a rule that should be applied to a hit.[/tr] [tr]rule|N|ANY|hit-filter|type(string,R)
action(string,R)|Defines a rule to be applied to a hit. The type attribute defines the type of rule that is created, the following values are supported: [l] [i]josql - Creates an instance of [javadoc polliwog,org.polliwog.filters.JoSQLRule /], see [link #josql-rule]JoSQL Rules[/link] for more details.[/i] [i]date - Creates an instance of [javadoc polliwog,org.polliwog.filters.DateRule /], see [link #date-rule]Date Rules[/link] for more details.[/i] [i]url - Creates an instance of [javadoc polliwog,org.polliwog.filters.URLRule /], see [link #url-rule]URL Rules[/link] for more details.[/i] [/l] Depending upon the type of rule created there may be extra attributes needed for the rule element, see the relevant rule for details. The action attribute can be either: accept or reject depending upon whether you want to accept or reject the hit if the rule matches.[/tr] [/xmltable] [/msubsection] [msubsection JoSQL Rules,josql-rule] A JoSQL rule is created when the the type attribute on a rule element is: josql. An instance of: [javadoc polliwog,org.polliwog.filters.JoSQLRule /] is created. A josql rule uses a [link http://josql.sourceforge.net/manual/where-clause.html]JoSQL WHERE clause[/link] to perform the filtering (you do not need to provide the WHERE keyword). The WHERE clause should be placed as the content of the rule element. Instances of [javadoc polliwog,org.polliwog.data.Hit,lastid=y /] will be passed to the rule and the WHERE clause applied. If the WHERE clause evaluates to true then the hit is accepted or rejected according to the action attribute. The functions from [javadoc polliwog,org.polliwog.handlers.JoSQLFunctionHandler,lastid=y /] are available for use in the WHERE clause. [sub-section Information available in the Hit object /] It should be noted that whilst the [javadoc polliwog,org.polliwog.data.Hit,lastid=y /] object can contain a large amount of information at the point where the filter rules are applied only non-derived information (i.e. information that is available in the log file) has a value. For instance the [link manual/site-areas.html]site area[/link], [link manual/pages.html]hit page[/link] and visit summary won't have a value so they should not be used in the WHERE clause. In general, you should use the following accessors in the WHERE clause: [l] [i][javadoc polliwog,org.polliwog.data.Hit#getDate()]date[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getSize()]size[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getUserAgent()]userAgent[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getRequest()]request[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getStatus()]status[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getPageType()]pageType[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getRequestURI()]requestURI[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getPath()]path[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getRequestParameters()]requestParameters[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getRequestMethod()]requestMethod[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getHostname()]hostname[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getRequest()]request[/javadoc][/i] [i][javadoc polliwog,org.polliwog.data.Hit#getReferer()]referer[/javadoc][/i] [/l] [example /] Only accept hits for the current month. [xml] currentMonth (date) [/xml] [example /] Reject all hits from the 192.168.0.X ip address range. [xml] hostname LIKE '192.168.0.%' [/xml] [example /] Reject hits for image files. [xml] pageType $IN ('gif', 'jpg', 'png') [/xml] [example /] Reject 404 hits that come from a post method where the size is greater than 10000 bytes. [xml] status = '404' AND requestMethod $= 'POST' AND size > 10000 [/xml] [/msubsection] [msubsection Date Rules,date-rule] [deprecated]This rule is deprecated as of version 0.7 and should not be used, instead use a [link #josql-rule]JoSQL rule[/link].[/deprecated] A date rule is created when the the type attribute on a rule element is: date. An instance of: [javadoc polliwog,org.polliwog.filters.DateRule /] is created. A date rule will filter hits based on the date that the hit occurred. The date rule uses the following extra optional attributes, which must be specified on the rule element, to initialize itself (the value in brackets indicates the type of value that should be specified): [l] [i]currentMonth(boolean) - when specified with a true value only hits with a date in the current month will be accepted/rejected. The time between 00:00:000 on the 1st of the calendar month to: 23:59:999 on the last day of the calendar month is considered to be the current month.[/i] [i]currentWeek(boolean) - when specified with a true value only hits with a date in the current week will be accepted/rejected. The time between 00:00:000 on the first day of the current week to 23:59:999 6 days later. In general (for most locales) this will mean either between: Monday 00:00:000 - Sunday 23:59:999 or: Sunday 00:00:000 - Saturday 23:59:999 either way, the time period used will span 7 days (minus 1 millisecond). Note: the first day of the week is determined by calling: [javadoc java,java.util.Calendar#getFirstDayOfWeek(),lastid=y /], the week is then taken as being seven days after that day.[/i] [i]today(boolean) - when specified with a true value only hits for the current date will be accepted/rejected.[/i] [i]after(string) - indicates that only hits after the date specified should be accepted/rejected. The default date format is: dd/MMM/yyyy but this can be overridden by using the format attribute.[/i] [i]before(string) - indicates that only hits before the date specified should be accepted/rejected. The default date format is: dd/MMM/yyyy but this can be overridden by using the format attribute.[/i] [i]format(string) - indicates that the format specified should be used instead of the default. The format should be suitable for use with a [javadoc java,java.text.SimpleDateFormat,lastid=y /] object.[/i] [i]month(string) - when specified only hits for the specified month (for the current year) will be accepted/rejected. The month value should be one of the standard three letter acronyms for your locale, i.e. Jan, Feb, Mar, Apr etc.[/i] [/l] [example /] Only accept hits for the current month. [xml] [/xml] [example /] Reject hits not between 3rd April 2007 and 23rd April 2007. Note this is the same as reversing the dates and using an action value of: accept. [xml] [/xml] [example /] Accept hits in June. [xml] [/xml] [/msubsection] [msubsection URL Rule,url-rule] [deprecated]This rule is deprecated as of version 0.7 and should not be used, instead use a [link #josql-rule]JoSQL rule[/link].[/deprecated] A url rule is created when the the type attribute on a rule element is: url. An instance of: [javadoc polliwog,org.polliwog.filters.URLRule /] is created. A url rule filters based on the requested url for the hit. The url rule uses the following extra optional attributes, which must be specified on the rule element, to initialize itself (the value in brackets indicates the type of value that should be specified): [l] [i]startsWith(string) - when specified indicates a value that the url should start with for the hit to be filtered.[/i] [i]endsWith(string) - when specified indicates a value that the url should end with for the hit to be filtered.[/i] [i]contains(string) - when specified indicates a value that the url should contain for the hit to be filtered.[/i] [i]ignoreCase(boolean) - when specified with a true value it indicates that comparisons should be case-insensitive[/i] [/l] Note: request urls start with /. [example /] Reject hits for the admin part of the site. [xml] [/xml] [example /] Only accept .php hits. [xml] [/xml] [/msubsection] [/manpage]