Visual Tagging Tool

VTT File Format

Format Information

  • Format version: 2008.1.1
  • Build dates: 11.13.2008 ~ 11.18.2008
  • What is new:
    • All markups (lines) are sorted by offset (smaller first) and then length (larger first)
    • Add READABLE_FORMAT by lineup fields 1 ~ 3: offset, length, and tag name
    • Format version 2008.1.0 (SIMPLE_FORMAT) is compatible for VTT in this build

VTT opens and saves tagged text files in VTT file format. Two types of file format will be displayed (opened) in VTT:

  • Pure text
  • Correct VTT format
Please pay special attention to the VTT file format since VTT does not display file with incorrect format. Instead, an error message dialog will be displayed. VTT file format includes three parts, which are separated by pre-set "headers", as described below:

I. Text Content

  • The original not-tagged text in UTF-8
  • A line starts with # is ignored and used as a comment
  • Separated header for Text Content:
    #<---------------------------------------------------------------------->
    #<Text Content>
    #<---------------------------------------------------------------------->
    ...
    

II. Tags Configuration

  • Each line represents a Tag configuration used in VTT
  • Each line must contain all 13 fields in the correct format and legal value
  • A line starts with # is ignored and used as a comment
  • The first tag is reserved (reserved tag) by VTT
    • It's name is pre-defined as Text/Clear
    • It's Bold|Italic|Underline properties are used for clear markup
    • It's Display property is not used
    • It's foreground and background colors are used for the high-light color
  • A tag id uniquely define by name and category in VTT
  • Separated header and reserved (the 1st) tag for Tags Configuration:
    #<---------------------------------------------------------------------->
    #<Tags Configuration>
    #<Name|Bold|Italic|Underline|Display|FR|FG|FB|BR|BG|BB|FontFamily|FontSize>
    #<---------------------------------------------------------------------->
    Text/Clear|false|false|false|true|255|255|255|0|51|153|Monospaced|12
    #<---------------------------------------------------------------------->
    ...
    
  • Tag Fields Format:
     Field 1Field 2Field 3Field 4Field 5Field 6Field 7Field 8Field 9Field 10Field 11Field 12Field 13
    DescriptionNameBoldItalicUnderlineDisplay Foreground-RedForeground-GreenForeground-Blue Background-RedBackground-GreenBackground-Blue Font FamilyFont size
    Java TypeStringbooleanbooleanbooleanboolean intintintintintint StringString
    ExampleText/Clear
    • true
    • false
    • true
    • false
    • true
    • false
    • true
    • false
    0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255
    • Dialog
    • DialogInput
    • Monospaced
    • SansSerif
    • Serif
    • 8
    • 10
    • 12
    • 14
    • ...
    • +0
    • +2
    • -2

III. Markups Information

  • Each line represents a Markup information used in VTT
  • A line starts with # is ignored and used as a comment
  • Each line must contain the correct format/data for Fields 1 ~ 4
    • Field 1 (Offset) & field 2 (Length): must be integer (< 2147483647)
    • Field 3 (tag name) must exist from the tags list
    • Field 4: Annotation can be empty
    • Spaces in the beginning and end are trimmed in Fields 1 ~ 4
  • Fields 1,2, 3 are lineup for readability
  • Fields 5 is the tagged text, which is used for NLP purposed and is ignored in VTT
  • Fields 5+ are ignored in VTT and can be used for other NLP purposes
  • Character "|" is not allowed to used in the first 4 fields
  • No two lines should have same offset and length (A word can only markuped with one tag)
  • All lines are sorted by offset (smaller first) and then length (larger first)
  • Separated header for markups information:
    #<---------------------------------------------------------------------->
    #<Markups Information>
    #<Offset|Length|TagName|Annotation|TagText>
    #<---------------------------------------------------------------------->
    ...
    

     Field 1Field 2Field 3Field 4Field 5More Fields
    DescriptionOffsetLengthTagged nameAnnotationTagged TextOther NLP fields
    Java TypeintintStringStringNot usedNot used