INTRODUCTION
v0.20.0 is a major update to POPFile with the focus being on performance.
In addition POPFile makes another leap forward in support for non-English
languages with many new UI translations (including our first right to
left language) and full support for parsing of Japanese and Korean.
v0.20.0 is intended to end-of-life v0.18.x and v0.19.x. All future
development work will be occurring around the v0.20.0 line of code and no
bugs will be fixed against previous versions. A lot of work has gone
into making v0.20.0 the version of POPFile to have.
To improve POPFile's performance the following changes have been made:
1. The corpus (where the word lists are stored) has been changed from a
flat text file to a BerkeleyDB database. When you run POPFile v0.20.0
for the first time you will see your existing corpus get automatically
upgraded to the BerkeleyDB.
(See the 'license' file for details of BerkeleyDB's license)
The use of the database both speeds up POPFile (especially the
reclassification process which has slowed down in v0.19.x) and reduces
its memory requirements.
The time to load the corpus has now gone from minutes/seconds to close
to zero.
2. The history data is cached between sessions. If you regularly start and
stop POPFile (e.g. you start and stop your PC every day) then you'll
notice another load-time speedup: the history data is being cached
between POPFile sessions to make loading faster.
3. The history is progressively updated. As messages are downloaded from
a server POPFile used to store all the messages for insertion into the
history the next time the history was viewed. Now as messages are
downloaded they are inserted in to the history progressively.
4. When viewing a colored message you'll notice a big speedup because there
were previously two scans through the message (one for classification
and one for coloring) this is now reduced to one.
5. When downloading messages we previously saved the message to disk and
then reloaded it for classification. Now the mail parser has been
modified so that the text of the message is streamed into it as it is
read from the mail server and classification happens in line without the
need to reread from disk.
6. On Windows the default configuration for the proxy is to no longer fork()
the server. This means that downloading mail starts very quickly, but
has the downside that only one email account can be checked at a time
and the UI cannot be used during download. This new option is controlled
by a configuration parameter (-pop3_force_fork) and through the UI. On
non-Windows platforms POPFile will fork() each new connection.
To improve POPFile's stability:
There's been a huge effort to write a complete test suite for POPFile.
Currently we have tests that cover 99% of POPFile's code (i.e. almost every
line of code is exercised by a test) and the plan going forward is to try
to keep it that way.
The test suite exercises the UI as if it were a user clicking buttons and
submitting forms, it includes a complete POP3 server and client so that the
proxy functionality can be tested and contains hundreds of tests for mail
parsing.
Every module has an equivalent TestFOO.tst that tests it, if you are interested
in running the test suite get the tests/ directory from CVS and run 'gmake
test'.
To improve POPFile's accuracy:
A number of bugs have been fixed that sometimes caused POPFile internally to
get the right classification and then insert the wrong headers. The mail
parser has been updated with the latest spammers' tricks and new pseudowords
and we've done an experiement with 'unsure' classifications and decided to
ship with code that will mark a message as 'unclassified' if it isn't 100
times more certain it's in bucket A than bucket B. This should reduce the
false positive rate a little at the expense of POPFile saying it's not sure.
ESSENTIAL READING IF YOU ARE UPGRADING TO v0.20.0
1. BACK UP YOUR OLD INSTALLATION: POPFile makes this really easy, just copy
the entire POPFile directory somewhere. You can then safely install
POPFile v0.20.0 on top of your current installation; I just think a back up
is a sensible precaution.
2. IF YOU ARE RUNNING WINDOWS: Please read the section below I AM RUNNING
WINDOWS AND NEED TO CHECK MULTIPLE EMAIL ACCOUNTS SIMULTANEOUSLY
3. ON WINDOWS POPFILE IS NOW AN EXE. Windows users will now be able to see
POPFile running in the Task Manager with an executable called popfileXX.exe
where the XX is one of f, if, b, ib depending on configuration. POPFile
is started by running popfile.exe which chooses the appropriate popfileXX.exe
This might cause your firewall to ask about giving popfileXX.exe permissions,
in addition if you had allowed Perl permissions in your firewall they are
NO LONGER needed.
4. The installer will cause POPFile to run in the foreground if the database
upgrade is required so that the upgrade process is evident to the user.
Once upgraded you can switch to background my going to the Configuration
tab and changing "Run POPFile in a console window?" to No.
I AM RUNNING WINDOWS AND NEED TO CHECK MULTIPLE EMAIL ACCOUNTS SIMULTANEOUSLY
Because the time taken to start a new process on Windows is long under Perl
there is an optimization for Windows that is present by default: when a new
connection is made between your email program and POPFile, POPFile handles it
in the 'parent' process. This means that the connect happens fast and mail
starts downloading very quickly, but is means that you can only downloaded
messages from one server at a time (up to 6 other connections will be queued
up and dealt with in the order they arrive) and the UI is unavaiable while
downloading email.
You can turn this behavior off (and get simultaneous UI/email access and as
many email connections as you like) by going to the Configuration panel in
the UI and making sure that "Allow concurrent POP3 connections:" is set to
Yes, or by specifying -pop3_force_fork 1 on the command line.
I AM USING THE CROSS PLATFORM VERSION
POPFile requires a number of Perl modules that are available from CPAN. New
in v0.20.0 are the need for the following:
BerkeleyDB
Text::Kakasi (if you want Japanese language support)
Encode (if you want Japanese language support)
I LIKE TO LIVE DANGEROUSLY
In a future version POPFile will add official support for message classification
through the SMTP and NNTP (Usenet news) protocols. There are currently proxy
modules for these protocols that work with v0.20.0, but they have not been
fully tested. If you are interested in getting them get them here:
http://cvs.sourceforge.net/viewcvs.py/*checkout*/popfile/engine/Proxy/SMTP.pm?rev=1.22
http://cvs.sourceforge.net/viewcvs.py/*checkout*/popfile/engine/Proxy/NNTP.pm?rev=1.21
and place them in POPFile's Proxy/ directory.
DOWNLOADING
You can obtain the latest releases of POPFile by visiting
http://sourceforge.net/project/showfiles.php?group_id=63137
UPGRADING
Just install POPFile on top of the currently installed version. But did you
read the ESSENTIAL READING above first.
INTERNATIONALIZATION
POPFile's support for non-English languages has grown and the UI is now
localized into 26 languages:
Bulgarian
Chinese (simplified)
Chinese (traditional)
NEW Czech
Danish
Dutch
English
English (UK)
Finnish
French
German
NEW Greek
NEW Hebrew
Hungarian
NEW Italian
NEW Japanese
Korean
Norwegian
NEW Polish
NEW Portugese (Iberian)
Portugese (Brazilian)
Russian
Slovak
Spanish
Swedish
Ukrainian
Also added support for understanding Japanese and Korean and doing word
splitting correctly.
DONATIONS
Thank you to everyone who has clicked the Donate! button and donated their
hard earned cash to me in support of POPFile. Thank you also to the people
who have contributed patches, feature requests, bug reports and translations.
http://sourceforge.net/forum/forum.php?forum_id=213876
CONCLUSION
Keep the ideas and bug reports coming. If you are interested in knowing
more about what's planned for future POPFile versions (or just learning
about POPFile's history) visit the POPFile Roadmap:
http://sourceforge.net/docman/display_doc.php?docid=17906&group_id=63137
John.
Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.