Lexical Tools

Category

Category

A syntactic category is a part-of-speech (noun, verb, adjective, etc). Word forms can have more than one category. e.g. "square" can be an noun, a verb, an adjective or an adverb. The categories a word form can have are represented in the lexical tools as a bit vector. Each bit represents the presence or absence of a category. In the Java implementation, a Category class is an extension of the BitMaskBase class.

  • Category variants
    Category variants are described in the following table:

    BitValueNameOther SymbolsExamplePossible Inflections
    01 adj
    • adjective
    • ADJ
    • red
    • redder
    • reddest
    • red
    • base (1)
    • comparative (2)
    • superlative (4)
    • positive (256)
    12 adv
    • adverb
    • ADV
    • fast
    • faster
    • fastest
    • fast
    • base (1)
    • comparative (2)
    • superlative (4)
    • positive (256)
    24 aux
    • auxiliary
    • be
    • being
    • did
    • been
    • is
    • be
    • do
    • didn't
    • don't
    • am
    • weren't
    • were
    • wasn't
    • are
    • aren't
    • was
    • isn't
    • base (1)
    • presPart (16)
    • past (32)
    • pastPart (64)
    • pres3s (128)
    • infinitive (1024)
    • pres123p (2048)
    • pastNeg (4096)
    • pres123pNeg (8192)
    • pres1s (16384)
    • past1p23pNeg (32768)
    • past1p23p (65536)
    • past1s3sNeg (131072)
    • pres1p23p (262144)
    • pres1p23pNeg (524288)
    • past1s3s (1048576)
    • pres3sNeg (4194304)
    38 compl
    • complementizer
    • that
    • base (1)
    416 conj
    • conjunction
    • CON
    • con
    • and
    • or
    • but
    • base (1)
    532 det
    • determiner
    • DET
    • a
    • the
    • some
    • each
    • base (1)
    664 modal  
    • dare
    • may
    • must
    • ought
    • shall
    • will
    • can
    • could
    • couldn't
    • can
    • can't
    • base (1)
    • past (32)
    • pastNeg (4096)
    • pres (2097152)
    • presNeg (8388608)
    7128 noun
    • NOM
    • NPR
    • dog
    • gods
    • dog
    • base (1)
    • plural (8)
    • singular (512)
    8256 prep
    • preposition
    • PRE
    • pre
    • to
    • on
    • in
    • at
    • by
    • base (1)
    9512 pron
    • pronoun
    • it
    • he
    • they
    • base (1)
    101024 verb
    • VER
    • ver
    • break
    • breaking
    • broke
    • broken
    • breaks
    • break
    • break
    • base (1)
    • presPart (16)
    • past (32)
    • pastPart (64)
    • pres3s (128)
    • infinitive (1024)
    • pres123p (2048)

  • Combination of multiple categories
    As described in the BitMaskBase page, in addition to use a value to represent a single category, it can be represented a combination of multiple categories.

    For examples, "saw" is a noun (128) and a verb (1024). It can be represented as a value of 1152 (= 128 + 1124). This is useful when the -CR:o (combine records by outputs) options is used. This value can be viewed as names <noun + verb> by using -SC (show category) options. In some Lexical tools flow operations, there is no information about the category, such as lower case (-f:l). In such case, a value of 2047 is used to represent all categories <all> because:
    2047 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024