# ---- Recognizer Pattern File ---- # from pattern.h: # Possible Pattern Predicates: # # # # # # # # # # # # # # # # # # (only "to ") # # # # # # # Combinations of predicates are allowed and considered to be # AND'ed together. # # All predicates should be followed by "->" and then the type # of constituent constituent you want to be created followed # by a dash and the pattern type: # NP-LOCATION, # NP-COMPANY, # NP-NUMBER, # NP-MONEY, # NP-PERCENT, # NP-TIME # NP-ING. # # # NOTE: Because of the way that ApplyConstPattern() works, every # predicate must target a constituent of the type that would exist # after sundance segmenting at the point that all this pattern # recognition gets called. For example, if we call this after np # segmenting, then the VP and PP predicates really don't do anything # for us. Also, since each predicate must target one constituent # in a sequence, there is no facility for looking at, for example, # two particular words within the same NP. If we ever want to change # this, we've got to make the necessary changes in the ApplyConstPattern() # function. Note also that since the ApplyConstPattern() function # returns pairs of numbers which equate to the starting and ending # position of a sentence's children which should be rolled together, # this recognizer code cannot be called after clause handling because # clause handling creates a 3-level parse and we've just assumed # a 2-level parse. # # So, because the current implementation applies these patterns # after NP segmenting, but before any other segmenting, the examples # with VP's and PP's are irrelevant for now. (The logic is there in # sundance, we'd just need to have multiple pattern files to be # called in at different points of the parsing.) # # Pattern Precedence Rules : The pattern recognizer now enforces a # type of precedence when more than one pattern may apply to # a set of shared constituents. The rule is simple...the longest # one wins, and in the case of ties, the rule that is listed first # in the rule file wins. # #Part added by Pol Schumacher # #This pattern should recognize labels which are in " # -> NP-TIME # -> NP-TIME #Pattern for keys -> NP-LIST -> NP-LIST -> NP-LIST #Menu tree structures ->NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST #Pattern for time # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME ##Pattern buttons -> NP-LIST ##Pattern for ingredient list -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST ##Pattern comma and -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST ##Pattern comma or -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST -> NP-LIST #End Pol's part # To capture "IBM Corp." "Nestle Inc." and "L. L. Bean Co." ### -> NP-COMPANY ### -> NP-COMPANY ### -> NP-COMPANY # To capture : "Canada" ### -> NP-LOCATION # To capture : Alberta, Canada -> NP-LOCATION # NOTE: can't do this!!! Really need for *contextual* features in # these rules! The preposition "in" is a crucial part of recognizing # that Ames is a location and therefore a necessary part of the # pattern, but it shouldn't be pulled into the NP itself! # # Ex: "in Ames, Iowa" # where Ames is an unknown word but Iowa is a known LOCATION. # I think this should be a good and pretty safe rule -emr 8/3/07 # # -> NP-LOCATION # To capture : Calgary, Alberta, Canada ### -> NP-LOCATION #=================== # More aggressive location tagging... -> NP-LOCATION # Date/time tagging--anyword or head ### -> NP-TIME -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME # -> NP-TIME -> NP-TIME -> NP-TIME -> NP-TIME # added rule below because some years are labeled as TIMEs in the dictionary -emr -> NP-TIME -> NP-TIME # -> NP-MONEY # -> NP-MONEY # -> NP-PROPERNOUN # this should be subsumed by more general pattern below -emr # -> NP-TIME # This will force "May" to be parsed as a NOUN (Month) when it is # capitalized. A bit risky in that it will be labeled a TIME when it # is a person name, or in a headline, or all caps. # BUT ... realized this will mess up everything in MUC-4 corpus 'cause # it is all caps. Grrr... -emr # -> NP-TIME # *********************************************************** # Added cases below for VIN domain -emr # *********************************************************** # Ex: ## days ago ; a few days ago ; one day ago -> NP-TIME -> NP-TIME # Ex: a month later -> NP-TIME -> NP-TIME # Ex: last week, next week, past week -> NP-TIME -> NP-TIME -> NP-TIME # Ex: that evening EXPERIMENTAL! This might be too dangerous, # though it *seems* relatively safe. -emr -> NP-TIME