Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Generate Multiwords from Verb Complements: Process

This section describes the processes to retrieve multiwords from verb complement types form associated verb in the Lexicon.

I. Setup and Inputs

  • Directory: ${MULTIWORDS}/data/${YEAR}/outData/14.VerbComplements
  • Program: ${MULTIWORDS}/bin/14.VerbComplements ${YEAR}

II. Multiword Varification

Not all LVCs and VPCs are LMWs. They are verified by linguists as follows:

  • Files:
    • lightVerbs.cand
    • verbParticles.cand
  • Format:
    1st field2nd field3rd field4th field
    EUImultiword (LVC or VPC)Tag from WordNetTag
  • 3rd field: multiword (LVC or VPC)
    This field is tagged by WordNet and is used for references. Three types of tags are:
    • [Y]: in the WordNet with POS is verb
    • [N]: not in the WordNet
    • [P]: in the WordNet, yet the POS is not verb. Currently, there are two instatnces:
      • E0036598|knock on|P|y
      • E0053897|rub up|P|y
  • 4th field: tag
    • [y]: a valid multiword
    • [n]: an invalid multiword

III. Processes

StepDescirptionInputsOutputsNotes
1Get raw LVCs
  • ./inData/LEXICON.release
  • lightVerbs.data.raw
  • lightVerbs.infl.raw
  • lightVerbs.form.raw
  • use the latest LEXICON
2
  • Add tags (from previous tags and WordNet) to raw LVCs
  • Generate multiword candidates for tagging
  • lightVerbs.data.raw
  • ./inData/lightVerbs.data.tag
  • ./inData/WnIndexWords.data.3.0.mw
  • lightVerbs.cand
  • lightVerbs.tag
  • Send lightVerbs.cand to linguists for tagging
3Verify linguist's tags
  • lightVerbs.cand.tag.${YEAR}
None
  • copy tagged file to lightVerbs.cand.tag.${YEAR}
  • append lightVerbs.cand.tag.${YEAR} to ./inData/lightVerbs.data.tag
  • rerun step 2 until lightVerbs.cand is 0
4
  • Get multiwords from tagged LVCs
  • Get stats reports
  • lightVerbs.tag
  • lightVerbs.infl.raw
  • lightVerbs.form.raw
  • lightVerbs.data (used for LEXICON release)
  • lightVerbs.inflVars (used for LEXICON release)
  • lightVerbs.form
  • lightVerbs.stats
  • Use LVC type in the script
11Get raw VPCs
  • ./inData/LEXICON.release
  • verbParticles.data.raw
  • verbParticles.infl.raw
  • verbParticles.form.raw
  • use the latest LEXICON
12
  • Add tags (from previous tags and WordNet) to raw VPCs
  • Generate multiword candidates for tagging
  • verbParticles.raw
  • ./inData/verbParticles.data.tag
  • ./inData/WnIndexWords.data.3.0.mw
  • verbParticles.cand
  • verbParticles.tag
  • Send verbParticles.cand to linguists for tagging
13Verify linguist tags
  • verbParticles.cand.tag.${YEAR}
Nonenbsp;
  • copy tagged file to verbParticles.cand.tag.${YEAR}
  • append verbParticles.cand.tag.${YEAR} to ./inData/verbParticles.data.tag
  • rerun step 12 until verbParticles.cand is 0
14
  • Get multiwords from tagged VPCs
  • Get stats reports
  • verbParticles.tag
  • verbParticles.infl.raw
  • verbParticles.form.raw
  • verbParticles.data (used for LEXICON release)
  • verbParticles.inflVars (used for LEXICON release)
  • verbParticles.form
  • verbParticles.stats
  • Use VPC type in the script
20 Get Stats for LVCs, VPCs and combined VCs
  • lightVerbs.tag
  • verbParticles.tag
  • lightVerbs.tag.stats
  • verbParticles.tag.stats
  • verbComplements.tag
  • verbComplements.tag.stats
  • combine lightVerbs.tag and verbParticles.tag to verbComplements.tag