Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1166

A tokenfilter to decompose compound words

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • modules/analysis
    • None
    • Patch Available

    Description

      A tokenfilter to decompose compound words you find in many germanic languages (like German, Swedish, ...) into single tokens.

      An example: Donaudampfschiff would be decomposed to Donau, dampf, schiff so that you can find the word even when you only enter "Schiff".

      I use the hyphenation code from the Apache XML project FOP (http://xmlgraphics.apache.org/fop/) to do the first step of decomposition. Currently I use the FOP jars directly. I only use a handful of classes from the FOP project.

      My question now:
      Would it be OK to copy this classes over to the Lucene project (renaming the packages of course) or should I stick with the dependency to the FOP jars? The FOP code uses the ASF V2 license as well.

      What do you think?

      Attachments

        1. CompoundTokenFilter.patch
          106 kB
          Thomas Peuss
        2. CompoundTokenFilter.patch
          106 kB
          Thomas Peuss
        3. CompoundTokenFilter.patch
          105 kB
          Thomas Peuss
        4. CompoundTokenFilter.patch
          99 kB
          Thomas Peuss
        5. CompoundTokenFilter.patch
          90 kB
          Thomas Peuss
        6. CompoundTokenFilter.patch
          90 kB
          Thomas Peuss
        7. CompoundTokenFilter.patch
          91 kB
          Thomas Peuss
        8. CompoundTokenFilter.patch
          90 kB
          Thomas Peuss
        9. CompoundTokenFilter.patch
          85 kB
          Thomas Peuss
        10. CompoundTokenFilter.patch
          76 kB
          Thomas Peuss
        11. CompoundTokenFilter.patch
          71 kB
          Thomas Peuss
        12. de.xml
          48 kB
          Thomas Peuss
        13. hyphenation.dtd
          3 kB
          Thomas Peuss

        Issue Links

          Activity

            People

              gsingers Grant Ingersoll
              tpeuss Thomas Peuss
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: