TC Java
2011 Version

gov.nih.nlm.nls.tc.FilterApi
Class WordExtractionFilter

java.lang.Object
  extended by gov.nih.nlm.nls.tc.FilterApi.WordExtractionFilter

public class WordExtractionFilter
extends java.lang.Object

This class is to extract word from input strings. In our application. it gets the string from TI and AB in MEDLINE. The filtered string filter out follows:

This file needs the final modification.

History:

Version:
V-2011
Author:
NLM Lexical Systems Group
See Also:
Design Document

Constructor Summary
WordExtractionFilter(Contractions contractions)
          Create a word extraction filter object by specifying contractions Java object.
WordExtractionFilter(java.lang.String contractionFile)
          Create a word extraction filter object by specifying contractions file.
WordExtractionFilter(java.lang.String contractionFile, boolean verbose)
          Create a word extraction filter object by specifying contractions file and verbose flag.
 
Method Summary
 java.util.Vector<java.lang.String> ExpandContraction(java.util.Vector<java.lang.String> inWords)
          Expand contraction to full name
 java.lang.String GetFilteredStr(java.lang.String inStr)
          Get filtered string of the input string.
static void main(java.lang.String[] args)
           
static java.lang.String RemoveExactEndStr(java.lang.String inStr, java.util.Vector<java.lang.String> exactEndStrs)
          remove end string if it is exact match.
static java.lang.String RemoveMatchEndStr(java.lang.String inStr, java.lang.String headMatchStr, java.lang.String tailMatchStr)
          Remove match head and tail string at the end, such as remove .....[headMatchStr ...
static java.lang.String RemoveMatchEndStr(java.lang.String inStr, java.util.Vector<java.lang.String> matchEndStrs, boolean caseSensitiveFlag)
          remove match head string at the end, such as remove .....[headMatchStr ...]
static java.lang.String RemoveMatchEndStr(java.lang.String inStr, java.util.Vector<java.lang.String> headMatchStrs, java.util.Vector<java.lang.String> tailMatchStrs, java.util.Vector<java.lang.String> headExceptionStrs)
          Remove match head and tail string at the end with head exception string, such as remove .....[headMatchStr ...
static java.lang.String RemoveMatchStr(java.lang.String inStr, java.util.Vector<java.lang.String> headMatchStrs, java.util.Vector<java.lang.String> tailMatchStrs)
          Remove match head and tail string at the end, such as remove .....[headMatchStr ...
static java.util.Vector<java.lang.String> RemoveNonAlphaNumCharAtBeginEnd(java.util.Vector<java.lang.String> inWords)
          Remove non-alpha-numeric characters at the beginning or end of the string.
static java.util.Vector<java.lang.String> RemoveNonAlphaNumCharAtBeginsEnds(java.util.Vector<java.lang.String> inWords)
          Remove non-alpha-numeric characters at the beginning or end of the string recursively.
static java.lang.String ReplacePuntuationWithSpace(java.lang.String inStr)
          Replaces punctuation with space
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordExtractionFilter

public WordExtractionFilter(Contractions contractions)
Create a word extraction filter object by specifying contractions Java object.

Parameters:
contractions - contractions Java object

WordExtractionFilter

public WordExtractionFilter(java.lang.String contractionFile)
Create a word extraction filter object by specifying contractions file.

Parameters:
contractionFile - file name of contractions (contractions.txt)

WordExtractionFilter

public WordExtractionFilter(java.lang.String contractionFile,
                            boolean verbose)
Create a word extraction filter object by specifying contractions file and verbose flag.

Parameters:
contractionFile - file name of contractions (contractions.txt)
verbose - flag of verbose on reading input file
Method Detail

GetFilteredStr

public java.lang.String GetFilteredStr(java.lang.String inStr)
Get filtered string of the input string.

Parameters:
inStr - the string to be filtered
Returns:
filtered string of the input string

ReplacePuntuationWithSpace

public static java.lang.String ReplacePuntuationWithSpace(java.lang.String inStr)
Replaces punctuation with space

Parameters:
inStr - the string to be processed
Returns:
processed string of the input string

ExpandContraction

public java.util.Vector<java.lang.String> ExpandContraction(java.util.Vector<java.lang.String> inWords)
Expand contraction to full name

Parameters:
inWords - a collection of Strings to be processed
Returns:
a collection of processed strings of the inputs

RemoveNonAlphaNumCharAtBeginEnd

public static java.util.Vector<java.lang.String> RemoveNonAlphaNumCharAtBeginEnd(java.util.Vector<java.lang.String> inWords)
Remove non-alpha-numeric characters at the beginning or end of the string. This method is not used in JdI now. It is replaced by RemoveNonAlphaNumCharAtBeginsEnds() It was used because the origianl Lisp code was not recursive.

Parameters:
inWords - a collection of Strings to be processed
Returns:
a collection of processed strings of the inputs

RemoveNonAlphaNumCharAtBeginsEnds

public static java.util.Vector<java.lang.String> RemoveNonAlphaNumCharAtBeginsEnds(java.util.Vector<java.lang.String> inWords)
Remove non-alpha-numeric characters at the beginning or end of the string recursively. This method is used in tcPre (pre-process), GenerateRestrictWords.

Parameters:
inWords - a collection of Strings to be processed
Returns:
a collection of processed strings of the inputs

RemoveExactEndStr

public static java.lang.String RemoveExactEndStr(java.lang.String inStr,
                                                 java.util.Vector<java.lang.String> exactEndStrs)
remove end string if it is exact match. Such as remove ...[exactEndStr]

Parameters:
inStr - the string to be processed
exactEndStrs - a coolectin of pattern strings for exact match
Returns:
processed string

RemoveMatchEndStr

public static java.lang.String RemoveMatchEndStr(java.lang.String inStr,
                                                 java.util.Vector<java.lang.String> matchEndStrs,
                                                 boolean caseSensitiveFlag)
remove match head string at the end, such as remove .....[headMatchStr ...]

Parameters:
inStr - the string to be processed
matchEndStrs - a collection of patterns string for match
caseSensitiveFlag - a boolean flag for case sensitive match
Returns:
processed string

RemoveMatchEndStr

public static java.lang.String RemoveMatchEndStr(java.lang.String inStr,
                                                 java.lang.String headMatchStr,
                                                 java.lang.String tailMatchStr)
Remove match head and tail string at the end, such as remove .....[headMatchStr ... tailMatchStr]

Parameters:
inStr - the string to be processed
headMatchStr - head match pattern
tailMatchStr - tail match pattern
Returns:
processed string

RemoveMatchStr

public static java.lang.String RemoveMatchStr(java.lang.String inStr,
                                              java.util.Vector<java.lang.String> headMatchStrs,
                                              java.util.Vector<java.lang.String> tailMatchStrs)
Remove match head and tail string at the end, such as remove .....[headMatchStr ... tailMatchStr]

Parameters:
inStr - the string to be processed
headMatchStrs - a collection of head match pattern
tailMatchStrs - a collection of tail match pattern
Returns:
processed string

RemoveMatchEndStr

public static java.lang.String RemoveMatchEndStr(java.lang.String inStr,
                                                 java.util.Vector<java.lang.String> headMatchStrs,
                                                 java.util.Vector<java.lang.String> tailMatchStrs,
                                                 java.util.Vector<java.lang.String> headExceptionStrs)
Remove match head and tail string at the end with head exception string, such as remove .....[headMatchStr ... tailMatchStr]

Parameters:
inStr - the string to be processed
headMatchStrs - a collection of head match pattern
tailMatchStrs - a collection of tail match pattern
headExceptionStrs - a collection of head match exceptions
Returns:
processed string

main

public static void main(java.lang.String[] args)

TC Java
2011 Version

Submit a bug or feature

Copyright © 2011 National Library of Medicine