Techniques of Information Extraction for Selection Marks
Abstract. A method may include detecting one or more selection boxes and one or more text lines in a primary document. The method may include determining respective vectors associated with the selection box and adjacent text lines to the selection box in a plurality of directions. The method may include determining a set of respective vectors associated with a unique selection box. The method may include determining a variance between respective vectors in the set of respective vectors and identifying a particular direction corresponding to a minimal variance between the respective vectors in the set of respective vectors as compared to a variance of other sets of respective vectors. The method may include generating a key-value pair based on the set of respective vectors characterized by the minimal variance. The method may include generating a document model, including the key-value pair, and extracting data according to the document model.
Links: Patent