Extracting important information from Chinese Operation Notes with natural language processing methods

Extracting important information from Chinese Operation Notes with natural language processing methods

Journal of Biomedical Informatics 48 (2014) 130–136 Contents lists available at ScienceDirect Journal of Biomedical Informatics journal homepage: ww...

1MB Sizes 0 Downloads 26 Views

Journal of Biomedical Informatics 48 (2014) 130–136

Contents lists available at ScienceDirect

Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin

Extracting important information from Chinese Operation Notes with natural language processing methods Hui Wang a,b,c, Weide Zhang d, Qiang Zeng e, Zuofeng Li e, Kaiyan Feng f, Lei Liu a,b,e,⇑ a Shanghai Public Health Clinical Center, Institutes of Biomedical Sciences, and Key laboratory of Medical Molecular Virology, Ministry of Education and Health, Fudan University, Shanghai, China b Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention of Shanghai, Fudan University, Shanghai, China c Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China d Zhongshan Hospital, Fudan University, Shanghai, China e Shanghai Center for Bioinformatics Technology, Shanghai, China f BGI-Shenzhen, Shenzhen, China

a r t i c l e

i n f o

Article history: Received 4 January 2013 Accepted 13 December 2013 Available online 31 January 2014 Keywords: Clinical operation notes Information extraction Chinese EMR Conditional random fields

a b s t r a c t Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machinelearning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction Clinical documents contain a wealth of information for medical study. It has been advocated that electronic medical record (EMR) adoption is a key to solving problems related to quality of care, clinical decision support, and reliable information flow among individuals and departments participating in patient care [1]. But a large part of EMRs’ data is saved in an unstructured textual format (such as discharge summaries and progress reports) which presents a big challenge for automated text mining. Manually annotating such narratives by domain experts is definitely a time consuming and error-prone process. Therefore, extracting relevant data elements from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment [2]. Recent years we have seen rapid adoption of hospital information system across China and medical documents in Chinese are accumulating fast. Realizing the potential of Chinese EMRs is a burgeoning field of research. A lot of approaches have been devel⇑ Corresponding author at: Shanghai Public Health Clinical Center, Institutes of Biomedical Sciences, and Key laboratory of Medical Molecular Virology, Ministry of Education and Health, Fudan University, Shanghai, China. E-mail address: [email protected] (L. Liu). http://dx.doi.org/10.1016/j.jbi.2013.12.017 1532-0464/Ó 2014 Elsevier Inc. All rights reserved.

oped for English medical language processing, but studies focusing on Chinese are relatively limited. In this paper we have tried both rule-based method and sequential labeling algorithm which have been applied on English successfully. The structured representation of medical concepts and values could enable physicians a quick abstraction of patients’ pathological status and also offers great convenience for large scale analysis. The results can also provide us with a lot of insights on how to design extracting models according to Chinese own characteristics. 2. Background In the past 20 years, a number of tools and systems have been developed specifically for information extraction from clinical documents [3–10]. These systems and methods have been applied to many different tasks such as adverse events detection [11], abstraction of family history from discharge summaries [12], medication information extraction [13,14], etc. Due to the casualty and conciseness of clinical narratives, methods for fine-grained demand such as negation detection [15], coreference resolution [16], and ontology techniques [17,18] are also explored. Meystre et al. [19] presented a detailed review for extracting information from textual documents in EMRs. The i2b2 [20] organizers have held a series of shared tasks focusing on biomedical informatics since 2006. The fourth i2b2/VA

H. Wang et al. / Journal of Biomedical Informatics 48 (2014) 130–136

shared-task and workshop [21] dealing with extraction of medical concepts, assertions and relations in clinical text was quite informative for our study. A number of novel approaches [22–24] were proposed by worldwide participants. Studies on Chinese NLP started in recent 10 years. Research groups at Stanford University have developed several software focusing on Chinese word segmentation [25,26]. Their tools rely on a linear-chain conditional random filed (CRF) model, which treats word segmentation as a binary decision task. ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) [27] is an integrated Chinese lexical analysis system that uses an approach based on multi-layer HMM. ICTCLAS includes word segmentation, Part-Of-Speech tagging and unknown words recognition. Both precision and recall rates reached above 90%. Topics like Chinese named entity recognition has also been investigated, Jiang and his colleagues did a preliminary work on symptom recognition from traditional Chinese medicine records and proposed some reasonable features for machine learning models [28]. Zhao et al. reported their findings on which kind of tokens that should be taken as the graininess in NER task, characters or words [29]. Group from Zhejiang University has conducted several projects on extracting temporal relation [30] from Chinese narrative medical records and terms and negation detection [31]. Based on CRF model, they explored different templates with 63 annotated documents and achieved 86.94% accuracy in extracting temporal attributes and almost 100% accuracy in detecting negations. Researchers from Microsoft Research Asia established an annotated corpus of Chinese discharge summaries and conducted word segmentation and named entity recognition [32]. They improved the performance of both tasks by using combined techniques called dual decomposition. Our problem is not much alike named entity recognition. What we want to extract is values for our pre-defined concepts or attributes. For example, attributes like tumor size recorded in operation note is very important to clinicians, so we expect to get (attribute: value) pair like (tumor size: 2  3  3 cm) automatically when given a paragraph of plain text. These attributes are questions frequently inquired by doctors. MedLEE [33] and MedKATp [34] have similar functions as our method has, but their approaches are rule-based. No machine learning method has ever been applied to solve the problems mentioned above. Our unique contribution in this paper is an extracting strategy based on keyword search, extracting answers by sequential tagging results. With this method, doctors can effectively obtain structured data from free-text operation notes. 3. Methods 3.1. Data collection and preprocessing Clinical documents we used for developing and testing our approaches were operation notes of hepatic carcinomas. We obtained

131

a total of 115 electronic medical records from Zhongshan Hospital affiliated to Fudan University. They came from 115 individual patients who were admitted between July and November in 2008. The original EMRs from the database of the hospital contained all information of patients such as basic information, operation notes, and discharge summaries and so on. We converted the de-identified EMRs into a format of XML (each content has a tag as identifier) and isolated the operation notes from other contents. Then, 86 samples were randomly selected for training extracting models and the rest 29 operation notes were left for evaluation. Fig. 1 shows a sample of operation notes we used. 3.2. Data elements to extract and annotation method After extensive consultation with medical researchers and doctors from hepatic department of Zhong Shan hospital, we identified 12 data elements doctors wanted to get from a free-text operation note. These data were key information of operation details and usually highly related to patients’ pathological status. They would be of great value for clinical studies of hepatic carcinoma if they can be automatically processed into structured format. These data elements were targets of our extracting system and they are presented as 12 questions shown in Table 2. Two doctors were recruited to annotate these 12 elements manually in all 115 samples. We used Protégé [35] (version 3.3.1) to establish ontology for each clinical entity to be extracted from the operation notes. A plug-in called Knowtator [36] recorded each entity’s location in the documents. A third clinical researcher was in charge of dealing with inconsistence between the two annotations. Then the annotated dataset were used as gold standard for constructing and validating our models. Table 1 lists the interannotator agreement results compared to the gold standard. There were 961 entities annotated, 704 in training dataset and 257 in test dataset. For each questions, there were at least 20 samples for training, and 20 for testing. An annotation sample is shown in Fig. 2. 3.3. Extraction strategy When extracting knowledge manually from EMRs, people usually search for some keywords which related to (or indicate) the concepts they concern. For example, when looking up whether the patient has tumor thrombus, one will first locate the word ‘‘thrombus” and then scan for answers from its context, neighboring words of this keyword. Our extracting strategy is to simulate this procedure by computer, so our method is a two-step process. The whole pipeline of our system can be found in Fig. 3. First we locate the attributes by identifying related keywords. According to doctors’ suggestions we picked one to three keywords manually for each question. These keywords were generated purely based on doctors’ clinical knowledge and experience and

Fig. 1. An operation note sample written in Chinese free-text.

132

H. Wang et al. / Journal of Biomedical Informatics 48 (2014) 130–136

Table 1 Pair-wise inter-annotator agreement results between each annotator and the gold standard.

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 All

Annotator-1 (%)

Annotator-2

93.91 96.25 95.24 100.00 100.00 100.00 77.68 71.05 100.00 53.45 95.37 75.00 85.46

93.04 98.75 89.29 100.00 98.10 96.43 72.32 93.86 100.00 97.41 95.37 77.50 92.16

Table 2 Questions for our extracting system. Question ID

Question

Q1

Whether the patient has tumor thrombus and specify the location if found Whether tumor capsule is intact or not Whether tumor boundary is clear or not Whether patient has hemorrhage and specify the volume if found Whether patient has ascites and specify the volume if found Whether patient has an enlargement issue Whether patient accepted blood transfusion and specify the volume if found Tumor location Degree of hepatic cirrhosis Whether patient’s liver is soft or hard Tumor size Whether patient accepted portal vein blocking and specify the duration if found

Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12

3.4. Rule-based model By observing the operation notes we have noticed that doctors from the same department share some distinctive sub-language. For one specific attribute, value type and value location in its context is relatively regular. According to this fact we tried out rule-based method since it has demonstrated good performance on medical language processing [37,38]. First we classified these 12 questions into 5 categories by their value types (Table 4), different category adopted different rule. We also collected frequently mentioned terms into question-specific vocabulary, such as units. The extracting strategy combined with regular expression and term searching within such vocabulary. Dataset for testing is not used in vocabulary collecting and rule designing phase. Fig. 4 shows some of the regular expressions and vocabulary we manually compiled. Totally, 16 rules were developed to undertake the extraction task. For category C1–C4, a look-up window with size 8 is used when detecting values surrounding keywords. For category C5, since the rule incorporated with punctuation, there was no fixed window size assigned. The content which can be matched by our rules is the output of the system. 3.5. Conditional Random Fields based model Conditional Random Fields (CRFs) are a class of undirected graphical models with exponent distribution [39]. This model is widely used on solving sequence tagging problem in medical natural language processing domain. To train such models, we remarked the training set with BIO tags indicating whether a character is the beginning of the value word, in the word or out of the word. We used basic features like Cn and CnCn+1 where C0 represents the current character and Cn represents the nst character from the current character. 3 binary features were also added in the training template: D(Cn), M(Cn) and N(Cn) indicating whether the current character is a number or a sign multiplication (e.g. ‘‘x”,‘‘”) or a negation word (e.g. ‘‘无”) respectively. We chose package CRF++ [40] to train and test our model. The parameter for cost is set to 10.0 in the training phase. Here is the list showing all the features we used in the template: (a) (b) (c) (d) (e)

Cn (n = 3, 2, 1, 0, 1, 2, 3); CnCn+1 (n = 3, 2, 1, 0, 1, 2); D(Cn) (n = 3, 2, 1, 0, 1, 2, 3); M(Cn) (n = 3, 2, 1, 0, 1, 2, 3); N(Cn) (n = 3, 2, 1, 0, 1, 2, 3);

In this way, we got 12 unique models for each question. Given a sentence (sequence), the model will choose a tag (‘‘B”, ‘‘I” or ‘‘O”) for each character. Then characters tagged ‘‘B” and ‘‘I” continuously will be joined together as the final output for this extraction task. 4. Results Fig. 2. Annotation illustration.

the doctors were not enrolled in the annotation procedure. Keywords for each question are listed in Table 3 together with their English expressions. This step can be regarded as a much simplified named entity recognition procedure. When keywords are found by string matching in one sentence, the sentence is tagged and passed into next step. In the second step we are trying to find potential answer in the context of keyword. Rule-based method and Conditional Random Fields algorithm are explored. We did not conduct word segmentation process, so we treat each individual character as a token.

As mentioned before, both training data and testing dataset were annotated by clinicians. Every entity annotated was saved with its absolute location number (start position and end position) in the documents. To measure the performance of our extraction algorithm, 29 samples that were not utilized nor observed in the training phase were used for evaluation. The output of our system was re-organized into format like the following: model=‘‘Q5” value=‘‘无” pos=‘‘1580:1581” Start position index and end position index were compared with that in gold standard annotation. If the two numbers are

H. Wang et al. / Journal of Biomedical Informatics 48 (2014) 130–136

133

Fig. 3. Pipeline of the extracting system.

Table 3 Keywords for each data element and its English expression, the order of attributes is in correspondence with that in Table 2. For one attribute, more than one keyword may be assigned, which are split by ‘‘|”.

a

Question IDa

Shorthand

English expression

Keywords

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12

癌栓 包膜 边界 出血 腹水 肿大 输血 位置 硬化 质 直径 阻断

Tumor thrombus Tumor capsule Tumor boundary Hemorrhage Ascites Lymph node enlargement Blood transfusion volume Tumor location Hepatic cirrhosis Hardness Tumor size Portal vein blocking

癌栓|栓子 包膜 界|边界 出血 腹水 肿大 输血 位于|扪及 硬化 质 大小约|直径约|约 阻断

Refer to Table 2 for details.

Table 4 Rule category. Category ID

Value type

Questions

C1

Modality or quantity + unit

C2 C3 C4 C5

Modality Quantity + unit Short description Long description

Q4(出血),Q5(腹水),Q7(输血), Q12(阻断) Q1(癌栓),Q6(肿大) ,Q9(硬化) Q11(直径) Q2(包膜),Q3(边界),Q10(质) Q8(位置)

Fig. 4. Example of some regular expressions and vocabulary.

exactly the same, the record is marked correct with exact matching. While if the interval indicated by the two indexes contains the span in gold standard, it will be marked correct with inexact matching. We used standard metrics (precision, recall and F-score) for measuring the performance. Let S be the size of the ground truth list (doctors’ annotations), D is the number of correct, distinct val-

ues extracted by our system and N be the total number of values returned by the system [41]. Then:

D S number of correct; distinct values returned by the system ¼ size of the ground truth list

recall ¼

D N number of correct; distinct values returned by the system ¼ total number of results returned by the system

precision ¼

F score ¼

2  precision  recall precision þ recall

Since the performance of keyword identification step will affect the extraction results, we first evaluate this process. In all of the 115 operation notes, our system identified 1058 keywords according to our manually specified keyword list, among which 966 is true positive. So for all the 967 entities, we achieved 91.3% precision, 99.9% recall and 95.4% F-score. To evaluate the performance of value extraction, two boundary matching strategies were adopted. One was exact matching, that is, both the start and the end of the system’s output must be exactly the same with the reference annotation; the other was inexact match [42], with the system’s output containing the right answers. We computed precision, recall and F-score for both matching strategies. The performance of rule-based method and CRF model can be summarized in Table 5. Table 6 shows results for each question from our rule-based system. Since values got from exact matching can be filled into structured information table without any post-processing step, we gave more attention on the result of exact matching. Fig. 5 plotted detailed performance of each question using CRF with 3 binary features. Fig. 6 shows the comparisons between results on training dataset and testing dataset using CRF model with 3 binary features. Table 7 shows a sample of result extracted by our system, which is the most desirable output of our system. By this means we can automatically convert free-text language into structured information. 5. Discussion and future work Manual annotation by doctors was used as gold standard reference in our evaluation process. The benefits are: (1) The doctors annotated the information based on their professional background

134

H. Wang et al. / Journal of Biomedical Informatics 48 (2014) 130–136 Table 5 Overall performance of rule-based system and CRF model. Category

Exact matching

Rule-based method CRF CRF with 3 binary features

Inexact matching

Precision

Recall

F-score

Precision

Recall

F-score

0.510 0.390 0.583

0.591 0.681 0.696

0.548 0.496 0.635

0.510 0.412 0.596

0.603 0.720 0.712

0.553 0.524 0.649

Table 6 Precision, recall and F-score of rule-based method for each question. Question ID

Shorthand

癌栓 包膜 边界 出血 腹水 肿大 输血 位置 硬化 质 直径 阻断

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12

Exact matching

Inexact matching

Precision

Recall

F-score

Precision

Recall

F-score

0.846 0.667 0.789 0.288 0.621 0.489 0.412 0.679 0.667 1.000 0.207 0.385

0.815 0.174 0.652 1.000 0.818 0.920 0.875 0.633 0.286 0.750 0.194 0.385

0.83 0.276 0.714 0.447 0.706 0.639 0.560 0.655 0.400 0.857 0.200 0.385

0.846 0.667 0.789 0.288 0.621 0.489 0.412 0.786 0.667 1.000 0.207 0.385

0.815 0.174 0.652 1.000 0.818 0.920 0.875 0.733 0.286 0.750 0.194 0.385

0.830 0.276 0.714 0.447 0.706 0.639 0.560 0.759 0.400 0.857 0.200 0.385

Fig. 5. Precision, recall and F-score for each question using CRF model with 3 binary features (exact matching).

Fig. 6. Comparisons between results on training data and testing data using CRF method.

Table 7 A sample of extracted result. Question ID

Keyword

Extracted value

Q2 Q3 Q4 Q5 Q7 Q8 Q10 Q12

包膜 界 出血 腹水 输血 位于 质 阻断

完整 清 100 ml 无 无 肝左叶 硬 未

that would improve the accuracy of the results; (2) The result tagged by Protégé Knowtator contained not only the text information, but also the location of the results, which was very conducive in establishing data mining models; (3) The annotating ontology was structured following human cognitive to ensure the value of the models. The keyword identification step, as mentioned earlier, is a much simplified named entity recognition procedure. Since the attributes are pre-defined and the samples came from the same department of a hospital, style of the expression of concepts (entities) were relatively fixed or stable. That’s why we got pretty high accuracy

H. Wang et al. / Journal of Biomedical Informatics 48 (2014) 130–136

in automatically recognizing them. If we hope to make the extractor more generalizable and robust, which means extend the use to other department from other hospital, more work should been done to improve it. We may consider using a Chinese medical dictionary (like UMLS [43]) and categorize the potential attribute by value types, just like we did in rule-based method. Rule-based system outperforms the CRF model we developed using 3 characters in the context. From the result of rule-based system, we can find out that there is little difference in exact and inexact matching but for Q8 (tumor location). Values for tumor location are all descriptive text without fixed length or controlled vocabulary, so the result is reasonable. If more detailed rules were added to the current system, there was still some space for the overall performance to get better. For example, Q4 and Q6 had much higher recall than precision, which means the current rules precisely captured part of potential values and missed the rest. They need other rules to capture. While for Q2 and Q9, who have relatively higher precision than recall, they need more specific rules to separate the ‘‘right answer” with the noise. Giving more work on rule designing may contribute to the performance, but this may also make the system even harder to generalize and bring about over fitting problem. Introducing those 3 binary features (is_digit, is_multiplication, is_negation) improved the CRF model significantly. Our best configuration for CRF reached 63.5% and 64.9% in F-score for exact and inexact matching respectively, while still have some distance compared with method developed for English language. From Fig. 5 we can see that most questions had a F-score above 50% except Q8 (tumor location), Q11 (tumor size) and Q12 (portal vein blocking). For tumor location, the reason for badly performed is still the uncertain of value length and shortage of features. As for tumor location, the desired values were either in the format like ‘‘2  2  3 cm” or just like ‘‘5 cm”. There were also some contexts describing two tumors together, such as ‘‘tumors size are 2 cm and 3 cm”(肿瘤大小约2 cm和3 cm). Our model might be confused by these different circumstances due to unbalanced distribution in samples. For the attribute ‘‘portal vein blocking”, which belongs to category ‘‘modality or quantity + unit”, we did not consider the distribution of different value type when splitting training set and testing set. There turns out to be too many ‘‘modality” samples in training data while relatively more ‘‘quantity + unit” samples in testing set, so the model would have some preferences in specific values. It can also explain the result in Fig. 6 in which Q6 and Q9 had better performance on test dataset than on the training. This situation may be improved by a balanced group in value types. We think the power of CRF model is not fully exploited, so we may try out more lexical and syntactic features on the model in the future work, such as Part-Of-Speech tags. A combination of rule-based method and CRF model is also worth trying. Since CRF model tend to have low recall, the output of model still need some post processing step. We can design some rules to exclude the noise. 6. Conclusion In this paper, we developed two kinds of extraction method for Chinese operation notes, rule-base method and CRF model base method. We obtained 63.5% F-score by average for all 12 questions. With further study, this method can significantly improve the efficiency in processing EMRs and facilitate researchers to effectively obtain valuable information for medical research. Acknowledgments This work is supported by National High-tech R&D Program of China (Grant Nos. 2012AA02A602 and 2012AA02A604), Research

135

Grants (Grant Nos. 2012ZX09303013-015 and 2012ZX10002010002) from the Ministry of Science and Technology and National Health and Family Planning Commission of the People’s Republic of China, Research Grant (Grant No. 201302010) from National Health and Family Planning Commission of the People’s Republic of China, and Research Grant (Grant No. 13DZ1512102) from Science and Technology Commission of Shanghai Municipality.

References [1] Hornberger J. Electronic health records: a guide for clinicians and administrators. Book and media review. JAMA 2009;301(1):110. [2] Jonnalagadda S, Cohen T, Wu S, Gonzalez G. Enhancing clinical concept extraction with distributional semantics. J Biomed Inform 2012;45(1):129–40. [3] Friedman C, Alderson PO, Austin J, et al. A general natural language text processor for clinical radiology. J Am Med Inform Assoc 1994;1:161–74. [4] Haug PJ, Koehler S, Lau LM, et al. Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care 1995:284–8. [5] Haug PJ, Christensen L, Gundersen M, et al. A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp 1997:814–8. [6] Friedman C, Knirsch C, Shagina L, et al. Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries. Proc AMIA Symp 1999:256–60. [7] Hahn U, Romacker M, Schulz S. Creating knowledge repositories from biomedical reports: the MEDSYNDIKATE text mining system. Pac Symp Biocomput 2002:338–49. [8] Mack R, Mukherjea S, Soffer A, et al. Text analytics for life science using the unstructured information management architecture. IBM Syst J 2004;43:490–515. [9] Zeng Q, Goryachev S, Weiss S, et al. Extracting principal diagnosis, comorbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 2006;6:30. [10] Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010;17:507–13. [11] Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 2005;12:448–57. [12] Goryachev S, Kim H, Zeng-Treitler Q. Identification and extraction of family history information from clinical reports. AMIA Annu Symp Proc 2008:247–51. [13] Li Z, Liu F, Antieau L, Cao Y, Yu H. Lancet: a high precision medication event extraction system for clinical text. J Am Med Inform Assoc 2010;17:563–7. [14] Xu H, Stenner SP, Doan S, et al. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc 2010;17:19–24. [15] Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 2001:301–10. [16] Zheng J, Chapman WW, Crowley RS, Savova GK. Coreference resolution: a review of general methodologies and applications in the clinical domain. J Biomed Inform 2011;44(6):1113–22. [17] Liu K, Hogan WR, Crowley RS. Natural language processing methods and systems for biomedical ontology learning. J Biomed Inform 2011;44:163–79. [18] Liu Y, Coulet A, LePendu P, et al. Using ontology-based annotation to profile disease research. J Am Med Inform Assoc 2012;19:177–86. [19] Meystre SM, Savova GK, Kipper-Schuler KC, et al. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008:128–44. [20] Informatics for Integrating Biology and the Bedside (i2b2). NIH-funded National Center for Biomedical Computing (NCBC) based at Partners HealthCare System in Boston, MA; established in 2004. [accessed 31.07.12]. [21] Uzuner O, South B, Shen S, et al. I2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 2011;18:552–6. [22] Patrick JD, Nguyen DH, Wang Y, Li M. A knowledge discovery and reuse pipeline for information extraction in clinical notes. J Am Med Inform Assoc 2011;18:574–9. [23] Xu Y, Hong K, Tsujii J, et al. Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J Am Med Inform Assoc 2012. http:// dx.doi.org/10.1136/amiajnl-2011-000776. [24] Rink B, Haraba giu S, Roberts K. Automatic extraction of relations between medical concepts in clinical texts. J Am Med Inform Assoc 2011;18:594–600. [25] Pradhan S, Sun H, Ward W, Martin JH, Jurafsky D. Parsing arguments of nominalizations in English and Chinese. In: The human language technology conference/North American chapter of the association for computational linguistics annual meeting (HLT/NAACL-2004). [26] Chang PC, Galley M, Manning C. Optimizing Chinese word segmentation for machine translation performance. In: ACL workshop on statistical, machine translation; 2008. p. 224–32. [27] Zhang HP, Yu HK, Xiong DY, Liu Q. HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of SIGHAN, workshop; 2003.

136

H. Wang et al. / Journal of Biomedical Informatics 48 (2014) 130–136

[28] Jiang Y, Yu Z, Wang Y, Liu Y, Chen L. A preliminary work on symptom name recognition from free-text clinical records of traditional Chinese medicine using conditional random fields and reasonable features. In: BioNLP: Proceedings of the 2012 workshop on biomedical natural language processing. [29] Liu Z, Zhu C, Zhao T. Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words. In: Advanced intelligent computing theories and applications 2010. With aspects of artificial intelligence, vol. 6216; 2010. p. 634–40. [30] Zhou XJ, Li HM, Duan HL, Lu XD. The automatic extraction of temporal relation from Chinese narrative medical records using conditional random fields. Chin J Biomed Eng 2010;29(5):710–6. [31] Li HM, Li Y, Duan HL, Lu XD. Term extraction and negation detection method in Chinese clinical document. Chin J Biomed Eng 2008;27(5). [32] Xu Y, Wang Y, Liu T, et al. Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries. J Am Med Inform Assoc 2013. [33] Friedman C, Alderson PO, Austin J, et al. A general natural language text processor for clinical radiology. J Am Med Inform Assoc 1994;1:161–74. [34] http://ohnlp.sourceforge.net/MedKATp/. [35] http://protege.stanford.edu/.

[36] Ogren PV. Knowtator: a Protégé plug-in for annotated corpus construction. Proc Hum Lang Technol 2006:273–5. [37] Heintzelman NH, Taylor RJ, Bova GS, et al. Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text. J Am Med Inform Assoc 2012. [38] Sohn S, Wagholikar KB, Li D, Jonnalagadda SR, Tao C, Elayavilli RK, et al. Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. J Am Med Inform Assoc 2013. [39] Lafferty JD, McCallum A, Pereira FCN. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning. Burlington, MA, USA: Morgan Kaufmann Publishers Inc.; 2001. p. 282–9. [40] Kudo T.CRF++: Yet Another CRF Toolkit. . [41] Uzuner O, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc 2010;17:514–8. [42] Tsai RTH, Wu SH, Chou WC, Lin YC, He D, Hsiang J, et al. Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinform 2006;7:92. [43] Lindberg DAB, Humphreys BL, McCray AT. The unified medical language system. Methods Inf Med 1993;32:281–91.