Technological Trends of Natural Language Processing Based Semantic Analysis: A Comparative Study of the US, the EU, and Korea Patents Data

Citation Author(s):
Young Geun
Hyun
Jindeuk
Ko
Jeong Hyeon
Han
Submitted by:
Young Hyun
Last updated:
Thu, 03/26/2020 - 07:37
DOI:
10.21227/wqem-sj55
Data Format:
License:
303 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

The age of Artificial Intelligence (AI) is coming. Since Natural Language Processing (NLP) is a core AI technology for communication between humans and devices, it is vital to understand technological trends. Early research on NLP focused on syntactic processing such as information extraction and subject modeling but later developed into the semantic-oriented analysis. To analyze technological trends concerning NLP, especially semantic analysis, patent data that contains objective and extensive information is analyzed. The analysis procedures follow text mining to collect patent information, pre-processing, and analysis in keyword frequency, keyword network, and time series. The results reveal that there is a difference in the direction of technological development as the core keywords are at different frequencies and centrality among countries. Besides, from the time series analysis for five intervals over 20 years, twelve keywords of the rising / falling trend are observed in the US, seven in the EU, and five in Korea. The greater number of keywords infer that the US underwent further technological progress as compared to other countries. Moreover, the technical linkage of the US-EU is presumed to be sturdier than the US-Korea based on the keyword similarity over time. The analysis results of this study can be used as valuable references for future technical predictions related to NLP.

Instructions: 

The dataset is raw data used by Thesis. Abstract is extracted from patent information with python program, and csv file is applicable to this.

The Python program is written for abstract extraction, keyword network extraction, and time series analysis, and is divided into Korean and English.