Category
A syntactic category is a part-of-speech (noun, verb, adjective, etc). Word forms can have more than one category. e.g. "square" can be an noun, a verb, an adjective or an adverb. The categories a word form can have are represented in the lexical tools as a bit vector. Each bit represents the presence or absence of a category. In the Java implementation, a Category class is an extension of the BitMaskBase class.
Bit | Value | Name | Other Symbols | Example | Possible Inflections |
---|---|---|---|---|---|
0 | 1 | adj |
|
|
|
1 | 2 | adv |
|
|
|
2 | 4 | aux |
|
|
|
3 | 8 | compl |
|
|
|
4 | 16 | conj |
|
|
|
5 | 32 | det |
|
|
|
6 | 64 | modal |
|
| |
7 | 128 | noun |
|
|
|
8 | 256 | prep |
|
|
|
9 | 512 | pron |
|
|
|
10 | 1024 | verb |
|
|
|
For examples, "saw" is a noun (128) and a verb (1024). It can be represented as a value of 1152 (= 128 + 1124). This is useful when the -CR:o (combine records by outputs) options is used. This value can be viewed as names <noun + verb> by using -SC (show category) options. In some Lexical tools flow operations, there is no information about the category, such as lower case (-f:l). In such case, a value of 2047 is used to represent all categories <all> because:
2047 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024