Abstract:
Different from the conventional address word segmentation model, which relies on the city address dictionary or the rule set, this paper proposes a word segmentation method which does not depend on the address dictionary but based on massive address data mining. This method combines the statistic rules to calculate the distribution of the address elements in the address dataset, excavates the suffix points and the drop points of the address elements in the address data. The method constructs the statistical decision tree based on their relative position relations to extract the address elements, uses the investigation data of building address in Shenzhen to verify and to make a useful supplement to the current gazetteers.