C extension for Porter stemmer
Filed in: Ruby Add comments
When playing with LSI, I noticed that the program runs for too long and uses enormous amounts of memory.
Using a great tool ruby-prof, I found, to my astonishment, that I waste more time in stemming than in SVD.
So I wanted to try to see, if using a compiled C extension will make a difference. So I took the thread-safe porter algorithm from http://tartarus.org/~martin/PorterStemmer/ and wrapped it with swig.
The results were almost in an order of magnitude (10000 rounds for 11 words):
user system total real
stem : 3.480000 0.250000 3.730000 ( 3.719107)
fstem: 0.440000 0.090000 0.530000 ( 0.526526)
This I call “performance boost”
porter.i (for swig):
%module stemmer %{ char *stem_word(char *word) { int length, i; char *res; struct stemmer * z = create_stemmer(); length = stem(z, word, strlen(word)-1); /* length is the index of last char, add one for size and one for '�' */ res = (char *)malloc((length+2) * sizeof(char)); for (i=0; i<=length; i++) { res[i] = word[i]; } res[length+1] = 0; free_stemmer(z); return res; } %} %newobject stem_word; char *stem_word(char *);