Filtering in tin 0. Status This is an overview of the new filtering capabilities of tin. This document will be absorbed in the main documentation at some point. 1. Introduction Tin's filtering mechanism has changed significantly since version 1.3beta. Originally there were only two possibilities: 1) kill an article matching a rule. 2) hot-select an article matching a rule. This led to constant confusion, as it seemed important which rule came first in the filter file, but it wasn't. Then if an article was selected for whatever reason it couldn't be killed even if it was Craig Shergold telling you how to make money fast in a crosspost to alt.test. This binary concept isn't modern anyway, so a much more up-to-date fuzzy mechanism was necessary: scoring. When using tin's new scoring mechanism you assign a "score" to each filter rule. The scores of rules matching the current article are added and the final score of the article decides if it is regular, marked hot or killed. The standard "kill" and standard "select" already in your filter-file have the score "score_kill" and "score_select" respectively (See section 4). 2. Changes to the filter-file format Tin understands the additional "score" command in the filter-file now. Old style rule: scope=* type=0 case=0 subj=*$$$* New style rule: group=* case=0 score=-100 subj=*$$$* ##### So you can give the individual rule a weight, based on your opinion about the rule. e.g. if you want to be sure to never read a certain individual again, you may give the rule a score of (-)9000. If you want only "classical" filtering and don't want to mess around with score values, you can use the magic words "kill" and "hot" as score values in your filter file. Example: group=* case=0 score=kill subj=*$$$* ##### These are handled as default values at program initialization time and may be somewhat easier to remember. You might have noticed by the examples above that tin inserts a line of hashes between two rules now. This is *not* required, it just improves readability. 3. Changes in the filter menu The on screen filter menu is now more compact and fits easily on small terminals such as a small xterm or a 640x200 CON: window now. It has been enhanced to allow you to enter a score for the rule you are adding. It should be in the range from 1 to SCORE_MAX otherwise it will default to "score_select" for select filter rules and "score_kill" for kill filter rules (See section 4). 4. Internal defaults and config options There are some constants defined in tin.h and tinrc: SCORE_MAX is the maximum score an article can reach. Any value above this is cut to SCORE_MAX, the same goes for negative scores. recommended: 10000 "score_kill" is the default score given for any kill rule, if no other is specified. recommended: -100 "score_select" is the default score for any auto-selection rule, if no other is specified. recommended: 100 "score_limit_kill" and "score_limit_select" are the limits that must be crossed to mark an article as killed or selected. recommended when used with values given above: -50/+50. "score_kill", "score_select", "score_limit_kill" "score_limit_select" are config options. You can find them in tin's configuration file (~/.tin/tinrc). They can also be changed at runtime in the config menu. 5. Overview of "filter"-commands Everything here is also described in the file ~/.tin/filter, albeit more concisely. All lines are of the form: command=value Valid "command"s are: add a comment to the following rule: comment= a short text multiple comment lines may be used, comments lines _must_ be right before the scope selection. scope selection: group=newsgroup_pattern_list newsgroup_pattern_list is a comma-separated list of newsgroup_patterns newsgroup_patterns can be a pattern (wildmat-style) or !pattern, negating the match of pattern. This is the same format used for the AUTO(UN)SUBSCRIBE environment variable. Tin doesn't rework your filter file, the new pattern matching is only used when you enter new entries by hand. additional info: case=num num: 0=casesensitive, 1=caseinsensitive score=num num: score value of rule, can now also be one of the magic words "kill" or "hot", which are equivalent to SCORE_KILL and SCORE_SELECT respectively. time=num num: time_t value; when rule expires. When tin writes the filter file it adds the time in human readable form as a comment in parentheses after the numeric value. When reading the file tin uses _only_ the numeric value, not the human readable form. matches: matched to: subj=pattern Subject: from=pattern From: Tin converts the contents of the From-header to an old-style e-mail address, i.e. ''some@body.example (John Doe)'' instead of ''John Doe '', before trying to match the patterns in the filter rule. That way a rule tailored to to match the full from header "jsmith@ac.example (John Smith)" will still work when John posts with a different newsreader which uses "John Smith ". msgid=pattern Message-Id: *AND* full References: msgid_last=pattern Message-Id: and last Reference:s entry only msgid_only=pattern Message-Id: refs_only=pattern References: line (e.g. <123@ether.net>) without Message-Id: lines=num Lines: ; num matches more than. gnksa=[<>]?NUM GNKSA parse_from() return code xref=pattern Xref: ; filter crossposts to groups matching pattern When you are using wildmat pattern-matching, patterns in ~/.tin/filter should be delimited with "*", verbatim wildcards in patterns must be escaped with "\". When using the built-in filter-file functions, tin tries to take care of it for itself, except when you are entering text in the built in kill/hot-menu. Then you have to quote manually because tin doesn't know if e.g. "\[" is already quoted or not. GNKSA return codes: these are the return codes of the From:-address parser, enabling you to filter on certain kinds of syntactical and semantical errors present in that header. For an up-to-date list see the definitions in extern.h and the parser source code in misc.c, the following is just a short introduction. 0-99: internal codes code error description 0 no error, valid address 1 internal error, should not happen (blame me) 100-199: general syntactical errors code error description 100 left angle bracket ("<") missing in route address 101 left parenthesis ("(") missing in oldstyle address (realname comment) 102 right parenthesis (")") missing in oldstyle address (realname comment) 103 at-sign ("@") missing in mail address 200-299: right hand side (FQDN part) of address, syntax and semantics code error description 200 right hand side (RHS) of address is a single component 201 RHS has an unknown top level domain (3 or more characters) 202 RHS has a malformed top level domain 203 RHS has an unknown country code as top level domain 204 illegal character in RHS 205 leading or trailing dot or two consecutive dots in RHS 206 RHS has a component longer than 63 characters 207 RHS has a component with leading or trailing hyphen ("-") 208 RHS has a component starting with a digit (with ENFORCE_RFC1034 only) 209 RHS is not a valid IP address 210 RHS is an IP address from private IP space (see RFC1918) or loopback 211 brackets ("[", "]") around IP address missing in RHS 300-399: syntactical errors left hand side (localpart) of address code error description 300 there was no localpart found at all in address 301 localpart contains illegal characters 302 localpart has leading, trailing or consecutive dots 400-499: syntactical errors in realname part code error description 400 illegal character in unquoted word in realname part 401 illegal character in quoted word in realname part 402 illegal character in encoded word in realname part 403 bad syntax in encoded word in realname part 404 illegal character in oldstyle realname part (one of "()<>\") 405 illegal character in realname part 6. EXAMPLES 6.1 WILDMAT EXAMPLES none given, too simple, find out yourself ,-) 6.2 REGEXP EXAMPLES Be sure to change Wildcard setting from WILDMAT (default) to REGEX to make the following examples to work properly. This can be done using the internal configuration menu or in file ~/.tin/tinrc comment= this kills all articles about CNews, DNEWS or diablo comment= in news.software.* but not in news.software.readers group=news.software.*,!news.software.readers case=1 score=kill subj=([cd]news|diablo) comment= this should mark all articles about tin, rtin, tind, ktin or cdtin comment= as hot group=* case=1 score=hot subj=\b(cd|[rk]?)?tin(d|pre)?[-.0-9]*\b comment= mark own articles and followups to own articles as hot in all groups comment= except local ones comment= match From: (a bit complex) and/or comment= Message-ID: (I'm the only user who's posting on this server) group=*,!akk.*,!tin.* case=1 score=hot from=urs@(.*\.)?((akk\.uni-karlsruhe|arbeitsen)\.de|(karlsruhe|tin|akk)\.org|ka\.nu) msgid=@akk3(?:-dmz)?\.akk\.uni-karlsruhe\.de> comment= stupid ppl. sometimes read control.cancel to see if there are any comment= forged cancels around... the next rule helps you a bit comment= ignore know despammers and net.* cancels group=control.cancel case=1 score=kill from=(news@news\.msfc\.nasa\.gov|clewis@ferret\.ocunix\.on\.ca|jem@xpat\.com|(jeremy|lysander)@exit109\.com|howardk@iswest\.com|cosmo.roadkill.*rauug\.mil\.wi\.us|spamless@pacbell\.net|cwilkins@.*\.clark\.net) msgid_only= only f'ups to own articles keep marked hot group=* case=1 score=-200 msgid_only=doeblitz\.ts\.rz\.tu-bs\.de comment= kill all articles which do not have your message-id comment= as last reference _if_ article has any references group=de.newusers.questions case=1 score=-100 refs_only=.*<[^@\s]+@\S+(?$ comment= Kill all articles from John Smith, who writes under different comment= addresses at ac.example, e.g john@ac.example and boss@ac.example group=* case=1 score=kill from=@ac\.example\s\(John\sSmith\)$ 7. TODO - make the time value in the filter file more human readable. - rewrite filtering order to get optimal performance - filtering on arbitrary header lines - move docu to tin.5