Tagged corpus text converter

Source code

Design & behavior

  1. Tags must be wrapped in < and >
  2. Tag names and tag values may only alphanumeric characters, spaces, underscores, and hypens.
  3. Tag names must be separated from tag values by a :
  4. Spaces at the beginning at end of tag names or tag values are ignored; spaces within tag values will be preserved
  5. Items with multiple values may be indicated by a pipe (|) character or semicolon (;)
  6. Everything not wrapped in < and > will be considered "text"
Status Tag Example Explanation
Good <MyTag:SomeText>
Good <My Tag:Some Text> Spaces in tag names & values OK
Good < My Tag : Some Text > Spaces padding tag names & values OK
Good < My-Tag : Some_Text > Underscores & hyphens OK
Good < My-Tag : First value | Second value> Pipe or semicolon used to indicate multiple values
Bad < My/Tag : Some:Text > Other characters not OK