Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Category

Category

A syntactic category is a part-of-speech (noun, verb, adjective, etc). Word forms can have more than one category. e.g. "square" can be an noun, a verb, an adjective or an adverb. The categories a word form can have are represented in the lexical tools as a bit vector. Each bit represents the presence or absence of a category. In the Java implementation, a Category class is an extension of the BitMaskBase class.

  • Category variants
    Category variants are described in the following table:

    BitValueNameOther SymbolsExamplePossible Inflections
    01 adj
    • adjective
    • ADJ
    • red
    • redder
    • reddest
    • red
    • base (1)
    • comparative (2)
    • superlative (4)
    • positive (256)
    12 adv
    • adverb
    • ADV
    • fast
    • faster
    • fastest
    • fast
    • base (1)
    • comparative (2)
    • superlative (4)
    • positive (256)
    24 aux
    • auxiliary
    • be
    • being
    • did
    • been
    • is
    • be
    • do
    • didn't
    • don't
    • am
    • weren't
    • were
    • wasn't
    • are
    • aren't
    • was
    • isn't
    • base (1)
    • presPart (16)
    • past (32)
    • pastPart (64)
    • pres3s (128)
    • infinitive (1024)
    • pres123p (2048)
    • pastNeg (4096)
    • pres123pNeg (8192)
    • pres1s (16384)
    • past1p23pNeg (32768)
    • past1p23p (65536)
    • past1s3sNeg (131072)
    • pres1p23p (262144)
    • pres1p23pNeg (524288)
    • past1s3s (1048576)
    • pres3sNeg (4194304)
    38 compl
    • complementizer
    • that
    • base (1)
    416 conj
    • conjunction
    • CON
    • con
    • and
    • or
    • but
    • base (1)
    532 det
    • determiner
    • DET
    • a
    • the
    • some
    • each
    • base (1)
    664 modal  
    • dare
    • may
    • must
    • ought
    • shall
    • will
    • can
    • could
    • couldn't
    • can
    • can't
    • base (1)
    • past (32)
    • pastNeg (4096)
    • pres (2097152)
    • presNeg (8388608)
    7128 noun
    • NOM
    • NPR
    • dog
    • gods
    • dog
    • base (1)
    • plural (8)
    • singular (512)
    8256 prep
    • preposition
    • PRE
    • pre
    • to
    • on
    • in
    • at
    • by
    • base (1)
    9512 pron
    • pronoun
    • it
    • he
    • they
    • base (1)
    101024 verb
    • VER
    • ver
    • break
    • breaking
    • broke
    • broken
    • breaks
    • break
    • break
    • base (1)
    • presPart (16)
    • past (32)
    • pastPart (64)
    • pres3s (128)
    • infinitive (1024)
    • pres123p (2048)

  • Combination of multiple categories
    As described in the BitMaskBase page, in addition to use a value to represent a single category, it can be represented a combination of multiple categories.

    For examples, "saw" is a noun (128) and a verb (1024). It can be represented as a value of 1152 (= 128 + 1124). This is useful when the -CR:o (combine records by outputs) options is used. This value can be viewed as names <noun + verb> by using -SC (show category) options. In some Lexical tools flow operations, there is no information about the category, such as lower case (-f:l). In such case, a value of 2047 is used to represent all categories <all> because:
    2047 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024