World Patent Information 43 (2015) 25e49
Contents lists available at ScienceDirect
World Patent Information journal homepage: www.elsevier.com/locate/worpatin
A comparison of official abstracts and enhanced abstracts for patent search Alain Materne*, Gershom Sleightholme European Patent Office, D-10958 Berlin, Germany
a r t i c l e i n f o
a b s t r a c t
Article history: Received 20 April 2015 Received in revised form 28 July 2015 Accepted 16 September 2015 Available online xxx
The official abstracts published together with many patent applications are freely available for search, e.g. using patent office websites such as Espacenet or search engines. Some service providers also offer, for a price, their own enhanced abstracts of patent applications. The authors propose a way of making an objective comparison between searching using the official abstracts and using the enhanced abstracts. The advantages offered by enhanced abstracts will be explained and include simple benefits such as saving time in finding the relevant prior art, and more intricate ones such as better suitability for ranking techniques. © 2015 Elsevier Ltd. All rights reserved.
Keywords: Abstracts comparison Official abstracts from EPODOC or Espacenet Enhanced abstracts from WPI Manual input search strategy Automatic pre-search strategy Concepts extraction to build a strategy Pivot ranking Evaluation of prior art search relevancy over search report citations
1. Introduction
2.1. Official abstracts
The article begins with a qualitative comparison of official abstracts with enhanced abstracts by considering, amongst other things, international coverage, language, vocabulary, classification schemes and content. More importantly, the article goes on to compare how the differences affect the search strategy and the results. Examples based on the authors' experience as examiners at the EPO, where both published abstracts and Derwent World Patents Index® (WPI) abstracts are available, will be presented. The method used here for the comparison of search results could be applied to a larger number of searches in order to obtain statistically relevant conclusions.
These are the abstracts published by the patent office with the patent application or patent. They appear on the published document under INID code ‘57’, sometimes in more than one language as in Fig. 1. Usually the abstract is accompanied by a figure from the application. The European Patent Office (EPO) database containing these abstracts is called DOCDB. The record corresponding to Fig. 1 is shown in Fig. 2. Note that the DOCDB version is in English and French. The raw source data of DOCDB is used by Espacenet, Global Patent Index and EPODOC. Espacenet is for external users and free of charge. EPODOC is for examiners at the EPO and some national patent offices. Subscriptions to the DOCDB database are possible, see Ref. [1]. For information on the Global Patent Index, see Ref. [2].
2. Abstract databases The official abstracts published together with many patent applications are freely available for search, e.g. using patent office websites such as Espacenet or search engines.
* Corresponding author. E-mail address:
[email protected] (A. Materne). http://dx.doi.org/10.1016/j.wpi.2015.09.003 0172-2190/© 2015 Elsevier Ltd. All rights reserved.
2.2. Enhanced abstracts Some service providers offer their own enhanced abstracts of patent applications and patents. Derwent World Patents Index® (WPI) was chosen for the comparison presented here because it and EPODOC are the two main patent abstract databases used by EPO examiners. An example is shown in Fig. 3, alongside the official abstract. The WPI abstract refers to a figure when the subheading ‘Description of Drawings’ is present.
26
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 1. Official abstracts for WO2008/062885 (English and Japanese).
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 2. Electronic EPODOC abstracts for WO2008/062885 (English and French).
Fig. 3. Official and Derwent WPI abstracts for CN201965968U, a Chinese utility model.
27
28
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 4. Comparison of classification schemes e EP2338741 as an example.
Fig. 5. Comparison of patent families.
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
29
Fig. 6. Concepts extraction algorithm for a claim.
2.3. Abstract database content 2.3.1. International coverage Both EPODOC and Derwent WPI offer worldwide coverage from all major patent offices. For EPODOC, see Ref. [3]; for Derwent WPI: see Ref. [4]. 2.3.2. Languages EPODOC generally contains an English-language title and abstract for every record. Additionally, French and German titles and abstracts are available, depending on the language of the patent application. PCT (WO) publications usually have titles and abstracts in at least French and English, and German as well if that was the language of the application. By contrast, Derwent WPI is exclusively in English. One could therefore argue that EPODOC has a qualitative advantage over Derwent WPI in terms of languages. 2.3.3. Terminology and content In EPODOC, the titles and abstracts are those of the published patent or patent application, usually as supplied by the applicant (sometimes translated). The length can vary as requirements differ. For example the EP publications should be no more than 150 words according to Rule 47(3) EPC; PCT abstracts should be 50 to 150 words according to Rule 8.1(b) PCT. Especially GB documents published before 1977 [5] and written by patent examiners could have very long abstracts if more than one subject was covered by a patent [5,6]. There are normally no subheadings, an exception being JP and KR documents where subheadings such as ‘Problem to be solved’ and ‘Solution’ can be found. Occasionally there are no reference signs (the numbers used to link the features mentioned in the text to those shown in the figures, according to the European Patent Convention [7]). The database is updated daily. The Derwent WPI titles and abstracts are written by ‘scientifically trained editors … to summarise the novelty, use and advantage of an invention’ [4]. The terminology used is based on editorial rules and ‘consistent, industry-specific terms are applied to each record’ [4]. New style ‘Alerting Abstracts’ were made available in 1999 with a new structure and revised technical content. The
abstracts include several subheadings, such as ‘Novelty’, ‘Use’ and ‘Advantage’, to facilitate reading of the invention e see Fig. 3. Some subheadings are only used in particular technical fields, especially chemistry (e.g. ‘Activity’, ‘Mechanism of action’, ‘Polymer’). The subheadings are indexed and searchable separately by e.g. STN. At the EPO, the Derwent WPI subheadings are currently not in separate data fields. Pre-1999 abstracts contain a section based on the claims together with a Use, Use/Advantage or Advantage section. The database is updated weekly. It is a generally held view (see, for example [8,9,10]) that the terminology and content of the Derwent WPI abstracts is qualitatively better than that of the official abstracts and this is also the view of the authors. Note, however, that even if the enhanced abstracts appear qualitatively better, this does not necessarily mean that using them produces better search results. This is an issue addressed in Section 4.4. The fact that the EPODOC titles are not amended renders them more suitable for finding similar patent applications from the same applicant. This is an issue addressed elsewhere [11,12].
2.3.4. Figure of the abstract The figure (drawing) accompanying the official abstract is either the one selected by the applicant or a replacement selected by the examiner. Especially in the former case, this is not always the figure which shows the most detail of the disclosure of the application. The WPI abstracts refer explicitly to a figure (see subheading ‘Description of drawings’) and frequently give the meanings of some of the reference signs on the figure, but as implemented at the EPO, no indication is given of the figure to which the WPI abstract refers. Because of this, no opinion can be given here on whether the WPI figure is better than the official one. If the abstract figure number would be indexed in both EPODOC and WPI, it could be used to find a relevant passage of the description underlying the relevant search strategy to extract. The idea is that if the applicant or the database provider indicates Fig. 11, see for example WO2008091669 in relationship with WO2007040767, as the most relevant figure, the examiner could try to find the part of the description describing Fig. 11 and use this as
30
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 7. Espacenet claims tree for EP1.
the basis for finding relevant concepts. This technique might provide the reader with more technical aspects than those known from the claims. It could apply more particularly to similar applications filed together and help to differentiate them, see EP395263 to EP395276 exemplified in Ref. [12]. 2.3.5. Classification schemes at the EPO The classification schemes which are available depend on details of the subscription. The following section relates to what is available to examiners at the EPO. EPODOC classification schemes are the following:
International Patent Classification (IPC): IPC-Advanced LevelInvention (ICAI), IPC-Advanced Level-Additional (ICAN), IPCCore Level-Invention (ICCI), IPC-Core Level-Additional (ICCN). JPO Themes/F-Terms (FT), JPO File Index Classification (FI) Former EPO Classification (EC) (being phased out), former USPTO Classification (UC), current UKPO Cooperative Patent Classification (CPC), the joint classification scheme managed by the EPO and the USPTO, now also used by many offices around the world including Brazil (INPI), China (SIPO), Korea (KIPO), Mexico (IMPI), etc.
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 8. Tree view in Xfr.
Fig. 9. Xfr graphical user interface.
31
32
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 10. Example 1, document US2012139303.
WPI classification and indexing schemes are the following: International Patent Classification (IPC): IPC-Advanced LevelInvention (ICAI), IPC-Advanced Level-Additional (ICAN), IPCCore Level-Invention (ICCI), IPC-Core Level-Additional (ICCN). Derwent Classification (DC), Manual Codes (MC) subdivided into Electrical Codes, Engineering Codes, and Chemical Codes (Mn), Plasdoc Codes (Ann) Other classes (e.g. F-Terms, UC and CPC) can be provided for external users. It is interesting to note that the IPC classes allocated in EPODOC for different family members are not always the same, whilst in Derwent WPI each record contains the IPC classes assigned to other family members as well. In the example of Fig. 4 (EP2338741), the IPC classes given by the US, Chinese, Japanese and European patent offices are present in the WPI record e a total of seven classes e whereas the EPODOC record contains the single IPC class assigned by the EPO. The availability of the CPC is a major advantage of EPODOC for examiners at the EPO since it allows focussing the search on a much smaller set of documents with the result that a higher precision of search results would be expected. For EPO examiners CPC is not available in WPI. Whilst it is possible for examiners to select a CPC class in EPODOC as a search scope and then to transfer these documents to the WPI database, this is obviously less convenient than simply selecting the documents directly in WPI. One advantage Derwent WPI has over EPODOC, however, is the way IC classes are handled, as mentioned above: in some circumstances (for example a search for CN, JP or KR documents having a particular IPC class)
this could retrieve documents not found by EPODOC. 2.3.6. Patent family The way patent families are defined is different between EPODOC and WPI. A family in EPODOC is defined as a set of documents which have identical priorities. WPI defines a family as documents having at least one priority in common. Obviously this makes no difference if there is only one priority anyway, but for sets of divisional applications (continuations in part) where members do not share all the priorities, the difference can be marked. The more strict definition used by EPODOC means that, for the same set of patents, EPODOC generally has a larger number of families, each family having fewer members. This can lead to viewing what is effectively the same patent more than once in EPODOC e see Fig. 5. This duplication is avoided in WPI, which is considered an advantage by most users, since fewer records need to be viewed. Typically, there are more than twice as many records in EPODOC as in WPI for the same set of patent documents. For example, in the field B60R11/0235 (holding flat screens in vehicles) there are 2846 records in EPODOC compared with 1290 records in WPI. In fields with many divisional applications (continuations-in-part, Teilanmeldung) the scaling factor can be even higher, and the time saving through viewing fewer records is higher as well. 3. Methodology for the comparison To compare searching in the databases the authors drew on two sets of five prior-art type patent searches. The first set, examples 1e5, were ‘manual input’ searches in the sense that the search concepts were defined by the examiner, i.e. manually, and the
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
33
Fig. 11. Example 1, document WO2008155110.
search was carried out in the usual way using a series of queries. An overview of the search concepts used is shown in Table 1. Synonyms were also used, in English, French and German. The second set, examples 6e10, were ‘automatic’ pre-searches in the sense that the search concepts were defined from a given text by a computer application. 3.1. Subjective impressions The authors will begin by presenting selected documents found from the first set of five search examples to illustrate some subjectively-perceived, qualitative differences between EPODOC
and WPI. The documents were not chosen at random but rather because they highlight some of the differences. 3.2. CPC class overview Typically, a prior art search is carried out using several search concepts within the scope (limitation) of a particular class, e.g. a CPC class. Although CPC is not available in WPI for EPO examiners (as explained above), the documents having the CPC class were selected in EPODOC and then ‘transferred’ to WPI so that exactly the same starting set of documents was used for both EPODOC and WPI. It is reasonable to expect that the best database in which to
Table 1 Overview of the search concepts. Example no.
Example 1
Example 2
Example 3
Example 4
Example 5
Subject-matter
Swivel mount for vehicle headrest display bezel injury force break attach pivot display seat
Electrically actuated motorcycle steering lock angle switch motorcycle electric cam stop engine start
Adaptive street lighting vehicle speed pedestrian cyclist vehicle polarising infrared range lens mirror
Second battery for engine start engine start charge amount transfer charging
Remote keyless vehicle entry with sensitivity adjustment sensitivity range adjust time
Search concepts
34
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
search would be the one where the search concepts occur in more documents having that class. The authors therefore compared, for one of the first set of five searches, EPODOC and WPI in terms of the number of patent families having a particular search concept in the most relevant search class, and also the frequency of occurrence of the search concepts. The search concepts actually also consisted of synonyms in English (En), French (Fr) and German (De), which can be searched in EPODOC but not in WPI. Results were therefore given separately for the English occurrences in EPODOC and for occurrences in all three languages. Another factor which needs to be taken into account is that, because EPODOC has a stricter definition of a patent family, as referred to earlier, there were many more EPODOC records in the CPC class than WPI records. To compensate for this, a normalisation was applied by multiplying the figures for EPODOC in all three search languages by the factor (no. WPI records ¼ 1290)/(no. EPODOC records ¼ 2846), which expresses the average difference in size of an EPODOC family compared to a WPI family for the CPC class used for the search, here B60R11/0235. 3.3. Comparison based on cited documents In this section the ability to retrieve the relevant prior art documents in official and WPI abstracts is compared. The relevant prior art documents are defined here as those cited with category ‘X’ in the EPO search report e the so-called ‘X’ documents. In cases where no ‘X’ documents were found, ‘Y’ documents were also considered [13]. This is the crucial comparison because it shows whether using the enhanced abstracts would have made any difference to the outcome of the prior art search. It can therefore be considered a more objective or quantitative comparison than the above subjective or qualitative comparisons. To make this comparison, the authors considered that the ‘X’ documents could have been found in the database if at least one of the search concepts was present in the abstract of the database in question. By contrast, if no search concept at all was present, the authors conclude that the document could not have been found in that database. Where available, abstracts in French and German were also considered in EPODOC. For the five manual input search examples, a full-text search was also carried out. In some cases, ‘X’ documents which could not be found in the abstract databases were found by the full-text search. This provides for a useful comparison of the recall of the abstract searches. 3.4. Automatic pre-search It is becoming increasingly common, in an initial stage of the search, for the search concept definition to be performed not by manual input of the searcher, but automatically, and then for a presearch to be carried out automatically based on these concepts within a scope defined either by the classification of the file or the working range allocated to the examiner of the file. Indeed, such an automated pre-search is now carried out systematically at the EPO [14,15], and is also in use at the USPTO [16]. It is to be expected that the documents found depend to some extent on the database used for search. It is also clear that the text from which the concepts are extracted will have some effect on which concepts are selected: if an abstract containing no mention of ‘rotation’ or its synonyms is used to extract the concepts, then clearly no concept relating to rotation will be extracted. It is therefore useful to compare the results obtained from EPODOC and WPI for both of these processes, namely defining the search concepts and carrying out the search. For the automatic pre-
search examples, apart from using the text of the EPODOC and WPI abstracts, the claims, the title or another text field, in particular the ‛Index Word’ /IW field in WPI, were considered for concept extraction. Concepts extracted from the first claim could be considered as a reference. A third factor in determining the results, in addition to where the concepts are derived from and where the search takes place, is obviously how the concepts are extracted. An internal EPO computer application called XFR and developed by Alain Materne, which is used for the EPO official automatic pre-search, is used here to extract the features. An overview of the semantic process is shown in Fig. 6 for a claim. A similar strategy is used where the text is an abstract, a title, etc. This process is performed for generating the list of concepts from a piece of text where stop words (common, non-indexed words such as ‘and’) are used as concept delimiters. Each concept should not contain more than three technical keywords in proximity for matching the search strategy with the capacity of the search engine and to comply with facet ranking. More than three keywords in a concept trigger a concept splitting routine. See, for instance, EP2845015, where a sequence of five English keywords (scanning electrochemical microscopy probe tip) is separated into two sets of 2 (scanning electrochemical) and 3 keywords (microscopy probe tip), assuming that adjectives occur before nouns in English. A claim tree process as known from Espacenet (see Fig. 7, which relates to patent EP1) is applied to the English translation of the set of claims that results in a tree view (Fig. 8) showing concepts extracted by claims. This feature extraction algorithm provides the user with a comparison of the dependent claims with the first claim. The presence of concepts in the dependent claims as well (shown by a light-green dot in Fig. 8, e.g. ‘transfer zone’ in claims 2e4) provides additional information on their importance, which may be used, for instance to determine if these concepts should be set as ‘musts’. In this case, the first claim extraction was finally retained to feed the left pane of the Graphical User Interface (Fig. 9), see Mase [17]. A general class scope is set when no action took place by the Examiner. A more specific class scope becomes available when the current application is allocated a classification by the Examiner, thus replacing the broad class scope. A narrow class scope is also set where classification information is available, as is the case for applications which have already been published before search. ‘Must’ and ‘facet’ categories of search concepts are extracted. The must concepts are combined by ‘AND’, then each facet concept is combined by ‘OR’ with the musts. Finally, these sets are ranked using ‘facet’ ranking (see Iyer [18]) so that the results with all of the musts and most of the facets appear first. The ‘position’ of the documents refers to their position in the results after ranking. The procedure is described in more th and Sleightholme [19]. For a more general disdetail in Horva cussion of ranking, please refer to the authors' earlier paper [10]. 4. Manual input search results 4.1. Subjective impressions Fig. 10 shows a screenshot from the Viewer used by EPO examiners (described in more detail in Ref. [20]) of a document found during a search for a swivel mount for a vehicle headrest display. First it is necessary to explain the screen layout. The screen is divided into two parts: the left-hand side shows the EPODOC record for the document and the right-hand side shows the WPI record for the same document. The dark vertical bar at the right-hand side of each part is called the ‘Visual indicator for navigation’ or ‘highlight bar’ and it shows where in the record the highlighted
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
terms occur. The highlighted terms are the concepts used in the search. Each column in the highlight bar represents one search concept. This is a very useful means to see at a glance which search concepts are present, how frequently they occur and their proximity to each other. In the first example, eight concepts were searched and so each highlight bar has eight colours, not all of which are necessarily present in each record. In Fig. 10, the entire EPODOC record is visible on the left so there is no scroll bar, and the yellow and dark green dashes in the highlight bar mark where the yellow (display) and dark green (seat) concepts occur in the record. Not all of the WPI record is visible so there is a scroll bar, the length of which corresponds to the area visible. It can readily be seen, by the number of colours present, that more concepts are contained in the WPI record than in the EPODOC record. Fig. 11 shows a further sample document. Notice that even the title alone in WPI has four of the searched concepts, whereas the subheading ‘Novelty’ has five. There is no mention of the ‘swivel/pivot’ concept in the official abstract. Fig. 12 relates to the second search example, for an electricallyactuated motorcycle steering lock, in which seven search concepts were used. Some search synonyms were missed e ‘motor cycle’ as two words in English and ‘profil’ in French, which explains why only two concepts were found in those languages compared with three in German. The third search example was for adaptive street lighting. An important feature was that the lighting reacts differently to pedestrians, cyclists and cars. In total nine search concepts were used. Fig. 13 illustrates the importance for search of the ‘Use’ subheading in WPI. The ‘Use’ and ‘Advantage’ subheadings frequently relate to
35
the problem which the invention solves, which is rarely mentioned in the official abstract. Fig. 14 shows a document found during the fourth search example, which related to a secondary battery for starting a vehicle engine in case the main battery would be flat. Four search concepts were used. Note that the official abstract is very broad, and even the important concept ‘engine start’ is missing. The fifth search example relates to remote keyless entry for a vehicle, in which ‘sensitivity’ was a key concept for the search. There were three further search concepts. In the document shown in Fig. 15, the EPODOC abstract mentioned this concept, whereas the WPI abstract did not.
4.2. CPC class overview Example 1, a swivel mount for a vehicle headrest display, was used. The most relevant CPC class was B60R11/0235. The results are shown in Table 2. The column headed ‘EPODOC*’ relates to the normalised values, compensating for the different definitions of a patent family, as described earlier. Looking, for example, at the search concept ‘fracture’, fourteen records were retrieved using the English synonyms in EPODOC, and 40 records were retrieved using English, French and German synonyms. After normalisation, the 40 records become 18 normalised records, in other words the 40 EPODOC families correspond to 18 WPI families. By comparison, 24 records were found in WPI using the English synonyms. The figures for ‘occurrences’ are higher than those for ‘records’ because each record might contain the search terms more than once. The 40 records found in EPODOC contain 75 occurrences
Fig. 12. Example 2, document WO2012089547.
36
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 13. Example 3, document EP2271184.
Fig. 14. Example 4, document EP0583630.
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 15. Example 5, document US2005237220.
Table 2 Example 1, a swivel mount for a vehicle headrest display.
37
38
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 16. Comparison of families (records) retrieved.
of the trilingual search terms, which become 34 normalised occurrences. This compares with 54 occurrences of the English search terms in the 24 WPI records. The normalised results are considered by the authors to provide a fairer basis for comparison, because
normalisation helps to compare families with families, rather than individual documents (EPODOC) with families (WPI). The trilingual EPODOC search results are used for the comparison, even though this tends to favour EPODOC over WPI, because this possibility for
Fig. 17. Comparison of occurrences of search concepts.
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
39
Fig. 18. Graphical view of occurrences of search terms for WO documents in B60R11/0235.
search exists. The results are shown graphically in Figs. 16 and 17. Fig. 18 shows the comparison of the occurrences of the search concepts in a different way. In this figure, the 345 WO records in the CPC class are shown as vertical bars, each record representing a very thin horizontal slice of the bar. The fact that the colours are more intense in the WPI bar indicates that the concepts occur more frequently in WPI.
In Search 3, WPI found both of the relevant documents; EPODOC found neither. In Search 4, both EPODOC and WPI found three of the four relevant documents.
Table 3 Example 2.
4.3. Comparison based on cited documents Table 3 below shows the results of such a comparison for the second search example. In this example, one X-document was cited in the search report, WO2012089547. Its EPODOC abstract contained three of the seven search concepts (including synonyms in all three official languages). The high number of occurrences (15) is explained by the fact that abstracts in English, French and German were present. The WPI abstract contained four of the seven concepts and eleven occurrences of the search terms even though there is only one language. The column showing occurrences in the full text is added for interest. The fact that at least one concept was present in both EPODOC and WPI abstracts means that it would have been possible to find the one X-document in both databases. All five examples were considered in this way, and the results are shown in Table 4 below. The results can be summarised as follows: In Search 1, no ‘X’ documents were cited in the search report. In Searches 2 and 5, all relevant documents were found in both EPODOC and WPI.
Angle Switch Motorcycle Electric Cam Stop Engine start No. concepts present No. occurrences
EPODOC
WPI
Full text
WO2012089547 (De/En/Fr)
WO2012089547
WO2012089547
0 12 1 0 2 0 0 3 15
0 4 3 1 3 0 0 4 11
0 85 12 7 16 15 2 6 137
Table 4 Search number vs. Number of ‘X’ documents in EPODOC, WPI and Full text databases. Example no.
Number of ‘X’ documents EPODOC
WPI
Full text
Example Example Example Example Example
0 1 0 3 2
0 1 2 3 2
0 1 2 4 2
1 2 3 4 5
40
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 19. Example 6, the colours showing the eleven search concepts automatically extracted from claim 1. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
4.4. Discussion of results The subjective comparison of different documents found during the search seems to confirm the view that the enhanced abstracts are more appealing. In general, the features stand out more in the
WPI abstracts than in the EPODOC official abstracts both because in many cases more concepts are present and because the occurrence of the individual search terms is often higher. The results also suggest searching in a subsection of the WPI abstract, especially ‘Advantage’, ‘Novelty’ and ‘Use’, might achieve high precision. The CPC comparison shows that more records can be retrieved in WPI than in EPODOC and corroborates the findings of the subjective comparison that the occurrence of the search terms is higher. These results certainly lead one to assume that the probability of finding the best documents is higher in WPI than in EPODOC, but it cannot be shown just on the basis of these results that the documents actually cited in the search report as relevant for novelty or inventive step would not have been found by both EPODOC and WPI. The comparison based on the cited documents is the most interesting one, and the authors are not aware of any previous comparison of its kind. Amongst the five examples, the most striking difference between EPODOC and WPI is Example 3: had the examiner based his search only on the official abstracts, he/she would have concluded that the application being considered was new and inventive, whereas by using WPI he/she would have found at least two documents relevant for novelty or inventive step simply by searching in the WPI abstracts. However, the results provide more information than whether any relevant documents could have been found at all. It is also possible to make some, albeit qualitative, deductions relating to the time required for the search. In Example 3, for instance, after a WPI search there would have been no need for the examiner to search in the full text databases, of which there are currently 61 at the EPO. The time saving is not mainly in computer processing time but in avoiding the need to look at irrelevant documents found during an inherently more ‘noisy’ (less precise) full-text search. However, with no WPI search,
Fig. 20. EPODOC and WPI abstracts for DE19522508 and occurrences of search concepts (sixth example).
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
41
Fig. 21. Comparison of concept extraction, EPODOC (left) and WPI (right) abstracts.
since the search in EPODOC yielded no X-documents, the examiner might have decided to continue the search in full text, where the cited documents would have been found, but with considerable extra time and effort. If every query would have a direct cost, there would clearly be a cost penalty as well as a time one. The results of the cited document comparison also suggest another reason why further time (and cost) savings would ensue from searching in the enhanced abstracts compared to the official abstracts, although it is difficult to quantify the savings. Considering
Example 2, the EPODOC abstract of the single ‘X’ document contained three of the seven search concepts and so it could have been found by an EPODOC search. The WPI abstract contained four of the seven concepts and eleven occurrences of the search terms even though there is only one language. Obviously this would mean that it would be easier to find this document in WPI, for example if the searcher had begun by searching for the concept ‘electric’ the document already would have been found in WPI but not in EPODOC.
Fig. 22. Comparison of search strategies, seventh example.
42
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 23. Comparison of concept extraction, WPI title, index words and abstract.
The CPC class overview and the subjective comparison showed that the occurrences of the search concepts were higher in WPI than in EPODOC for the examples studied. Since many ranking algorithms use the number of occurrences of a search term to determine the position of the document in the results list, the higher occurrences in the enhanced abstracts would be likely to lead to the relevant documents being higher in the list of results than if the search had been carried out in the official abstracts. A
further time saving therefore would result from finding the document at position three (for instance) in the results list instead of (say) position 97. 5. Automatic pre-search results As a general comment, even though numerous full text databases cover a wealth of the patent knowledge available nowadays, the
Fig. 24. Comparison of search strategies, eighth example.
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 25. Ninth example, comparison of concept extraction from WPI and first claim.
43
44
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 26. Comparison of search strategies, ninth example.
above results illustrate that it is worthwhile to consider pre-searching abstracts of EPODOC or WPI for a matter of execution speed. 5.1. Sixth search example Example six relates to a vehicle steering lock of the type in which an inclined face moves the lock member into and out of engagement with the steering shaft. An important feature was the position of the spring (“biasing member”). For this example the concept extraction was based only on the first claim since no
abstracts were available at the time of the search. As can be seen from Fig. 19, eleven different concepts were extracted, of which the concept “steering lock” was considered a “must”. The automatic pre-search using these concepts found no document relevant for novelty in the first ten ranked results from a search in EPODOC, but one document in seventh place from a search in WPI. The EPODOC and WPI abstracts of this found document, as used for the pre-search, are shown in Fig. 20, together with a table showing the occurrences of the search concepts in DE19522508.
Fig. 27. Comparison of concept extraction, EPODOC abstract, WPI abstract and claim 1.
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
45
Fig. 28. Comparison of search strategies, tenth example.
This example is interesting because it shows that even though the EPODOC and WPI abstract contain the same concepts (in fact in this particular case the texts of the abstracts are identical, there having been no official English-language abstract), the concepts occur more in the WPI record, especially because of the title /TI and index word /IW fields, which by default are searched with the abstract. This is thought to give the document a higher rank, with the result that it could be found more easily (This document was ranked in eighteenth position after the EPODOC search.) 5.2. Seventh search example Example seven (Fig. 21) concerns a privacy screen for a
computer display which uses louvres to prevent others from seeing the display, published as EP 0 599 451 A1. This is the example considered by Schwander [8]. Fig. 22 compares the concepts extracted for search from the EPODOC abstract, the WPI abstract and the WPI title, ‘p’ being an operator requiring that the terms following be in the same paragraph: Four relevant documents were cited in the search report (as category ‘Y’) for this example, GB2161983 (D1), US4812709 (D2) and US4788094 (D3) being the most important. D1 was only found (in 79th position) using the combination of WPI abstract concepts and WPI database search. This combination also found D2 and D3 in positions 2 and 3 respectively and turned
Fig. 29. Comparison of WPI abstract with first and second embodiments from the full text.
46
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
Fig. 30. Tenth example, HTML page comparing positions of documents found by WPI (centre frame) and EPODOC (right frame).
out to be the most effective strategy for this example. Using EPODOC abstract concepts, the search results in EPODOC and WPI were similar, D2 and D3 being found in first and second places in WPI and first and third places in EPODOC. These combinations also found the fourth document, US4427264 (D4). The combinations of WPI title concepts with each of EPODOC and WPI search found D2 and D3 in first and second places searching in WPI and in second and fourth places searching in EPODOC. By comparison, strategies based on first claim concepts found D2 (78th place), D3 (83rd place) and D4 (25th place) in WPI but only D2 in EPODOC, in eleventh place. Based on the ranking criteria as selected in Ref. [10] with regard to the occurrences of search concepts calculated to derive a TF-IDF score, using concepts extracted from the first claim therefore performed least well in this example.
(US2002117685) misses only wet etching in/BI. D2 (US2002153534) misses the phosphide component, the ion implantation and the wet etching process. D3 (US2003045112) misses the indium phosphide based bipolar transistor heterostructure and the wet etching process. Fig. 24 summarises the results for this example. None of the combinations tested found either of D2 or D3, and D1 was only found in combinations involving a search in WPI, not EPODOC. The combination of searching in WPI using concepts e derived either from the WPI field index words (/IW), the WPI abstract or in the WPI title e all found D1, in positions 15, 35 and 22 respectively. It can therefore be said that the combination of WPI IW concepts and WPI database search was the most powerful strategy, enabling as it did to find the best document more quickly.
5.3. Eighth search example
5.4. Ninth search example
The eighth example relates to a process for the manufacture of transistors, published as EP1494274 A2 e see Fig. 23. Three documents, D1 to D3, were cited in the search report as category ‘X’, D1 having the most features of the claim. D1
The ninth example relates to an aircraft electrical power connector, published as WO2007/106739. Two documents can be considered as relevant search results, US5647751 (D1) and US5342213 (D2), D1 being more important. Concepts extracted from the ‘Novelty’ subheading in the WPI abstract were very similar to those extracted from the first claim even though the claim was very much longer (see Fig. 25). In this example the combination of concepts extracted from the EPODOC abstract with a search in WPI was most effective, finding D1 at ninth position. Fig. 26 summarises the results.
Table 5 Pre-search number vs. Number of ‘X’ documents in EPODOC and WPI. Example no.
Number of ‘X’ documents EPODOC
WPI
Pre-search Pre-search Pre-search Pre-search Pre-search
1 2 0 0 1
1 3 1 1 1
6 7 8 9 10
5.5. Tenth search example The tenth example (Fig. 27) concerns an intra-aortic (IA) balloon
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
47
Fig. 31. Confidence indicator for patent and non-patent databases.
Fig. 32. Results of Google prior art finder.
pumping catheter, published as EP1598090 A1. One document, US2001022415 (D1), was cited as ‘X’ by the examiner. Using EPODOC abstract concepts, this document was found in both the WPI and EPODOC databases at positions 93 and 75 respectively. Using WPI abstract concepts, D1 was also found in both WPI and EPODOC, in this case at third and first positions respectively. Exactly the same result was achieved using first claim concepts (see Fig. 28). Thus, although searches based on the EPODOC abstract concepts did find D1, it was lower in the list of results compared to strategies
based on the WPI abstract or the first claim. This result is somewhat surprising because the wording of the EPODOC abstract is usually considered to be very close to the first claim. The difference in performance appears to have been due to the first claim concepts having ‘intra-aortic’ and ‘pumping catheter’ as separate concepts, whereas the EPODOC concepts had ‘intra-aortic pumping’ as a single concept. A notable feature of this example, which perhaps helps to explain why the ‘X’ document was found best using the WPI abstract, is that the WPI abstract corresponds closely to both the first and second embodiments in the full text of the ‘X’ document. This is
48
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
shown in Fig. 29 by the fact that the pattern of the colours in the highlight bars is similar. The two embodiments displayed in full text reflect the pivot effect obtained at ranking with WPI abstracts as described in Ref. [10]. The comparison search results given in the comments of Fig. 28 are derived from a parallel WPI/EPODOC output as the HTML page in Fig. 30 shows. The left frame indicates the starting document and the search strategy as extracted from the Derwent WPI abstract. The middle and right frames deliver facet ranking results with scoring values from WPI and EPODOC respectively. The documents were grouped by facet levels and received subheading comments according to their relevancy.
5.6. Discussion of results To summarise the results of which was the best source to use for the automatic concept extraction, WPI was best in three of the four examples (7 e abstract; 8 e Index Words; 10 e abstract) whilst EPODOC was best in one example (9 e abstract). Simply based on this very limited number of examples, then, it turned out that WPI was the best source for extraction of the search concepts automatically. Although the /IW field performed slightly better than the abstract in example 8, the relevant document would also have been found quite efficiently in a pre-search based on WPI abstract concepts. Considering only the question of which was the best database in which to carry out the search (i.e. ignoring which was the best text to use for the concepts), the results from the automatic pre-search examples can be presented in a way similar to that for the manual input search examples, as in the following Table 5: The number of documents cited as ‘X’ in the search report refers to the number which could have been found in that database by looking at the entire automatic search result. In other words, how quickly the documents would have been found (based on their position after ranking) is ignored here. To summarise the results of this table: In pre-search 6, both EPODOC and WPI found one relevant document. In pre-search 7, WPI found three of the four relevant documents and EPODOC found two of them. In pre-search 8, WPI found one relevant document; EPODOC found neither. In pre-search 9, WPI found one relevant document; EPODOC found neither. In pre-search 10, WPI and EPODOC both found the single relevant document. Thus, in two of the above five examples, no relevant document would have been found automatically using EPODOC strategies. What this means in practice is that I all five cases, carrying out the automatic pre-search in WPI would have indicated to the examiner that the independent claim was probably not new or at least not inventive. The examiner could then, for example, have concentrated the further search on dependent claims, with an obvious benefit in efficiency. Had the automatic pre-search been carried out in EPODOC, this would have been true for only three of the five cases. The speed of automatic pre-search depends on the number of extractions performed in each selected database. 100 documents were extracted from both WPI and EPODOC in the last pre-search example within 1min, 39 s. Extracting 10 documents instead would drop the run-time to 28 s. Hence a time saving ensues after ranking as fewer documents need to be seen.
6. Confidence indicator Fig. 30, relating to the tenth search example, compares the performance of the databases WPI and EPODOC by showing where the relevant results appeared and how many results were found with the search concepts. As such it provides a ‘confidence indicator’ for the searcher about the likelihood of finding relevant results in a particular database and therefore its usefulness for search. This confidence indicator can be extended to the comparison of any abstract and full text databases available at the EPO, including nonpatent databases. Such a comparison is useful for non-patent databases because, unlike patent databases, the best database to use depends on the field of search. In the example shown in Fig. 31, the confidence indicator has been applied to the tenth example (the intra-aortic balloon pumping catheter, see also [10]). The leftmost column shows the input record including the search concepts (musts and facets), and further columns from left to right show the results from WPI, BIOSIS, MEDLINE and INSPEC respectively (The selection of databases is made according to field in which the Examiner works.) The number of search concepts found (musts and facets) gives a hint on the search precision (The recall is assessed by comparing with the search report ‘X’ citations, here a single patent document.) The process above took less than 1 min to present 10 results in each database. We could thus conclude that the choice of BIOSIS for NPL abstract search is as relevant as WPI with respect to the intra-aortic balloon pumping catheter search strategy. Other NPL abstract databases such as COMPENDEX, EMBASE, FSTA, INSPEC or MEDLINE did not perform as well for that particular strategy. 7. Google prior art finder It is interesting to mention in the context of automatic presearch the Google prior art finder, which is available free of charge on the internet [21]. The results shown in Fig. 32 were obtained by typing ‘EP1598090’ (cf. example ten) into the input window. The search concepts seem to have been extracted from the EPODOC (official) title. 8. Conclusions The comparison of EPODOC and WPI just presented considered: e not just whether the document was found or not, but how quickly; e which database is best for defining search concepts; e which database is best for carrying out the search. Although this study has presented only a limited number of examples it is believed that this is the first time such a comparison has been published. The method used allows an objective comparison, and this method could be used on a larger number of searches for a more statistically significant analysis. The study has shown that both EPODOC and WPI: offer worldwide coverage from all major patent offices. are updated frequently. EPODOC has the following advantages: CPC limited searching in German and French
A. Materne, G. Sleightholme / World Patent Information 43 (2015) 25e49
suitable for finding similar applications from the same applicant, with EPODOC titles. Based on the limited number of examples presented here, WPI, on the other hand has the following advantages: finds prior art more quickly: the higher occurrence of multiple concepts is thought to lead to the best documents being ranked more highly in the results list, meaning fewer documents need to be viewed; because the abstract is more specific than EPODOC and often covers the problem, the best documents can more often be found without needing to search further in the full text. more suited to defining search concepts, especially automatically. using the title and particular subheadings might offer further advantages. In the future it is possible that the EPODOC and WPI abstracts will be provided in a single (virtual) database at the EPO for examiners. This should allow the exploitation of the advantages of both databases more easily. Disclaimer Any opinions expressed are those of the authors and not necessarily those of the European Patent Office. Acknowledgements This article is based on a workshop presented by the authors at the Seminar on Search Matters held 3e4 April 2014 at the EPO in The Netherlands. Special thanks go to the organisation team of Search Matters, without whom this publication would not have rard been possible. Further thanks are due to Luc Krembel and Ge Vogt-Schilb. References [1] See http://www.epo.org/searching/subscription/raw/product-14-7.html. [2] www.epo.org/gpi. [3] See http://worldwide.espacenet.com/help?locale¼en_EP&method¼handleHelp Topic&topic¼coverageww Note: JP abstracts were incorporated into EPODOC in February 2005. [4] http://thomsonreuters.com/content/dam/openweb/documents/pdf/ intellectual-property/fact-sheet/derwent-world-patents-index.pdf. [5] David Newton, The patent abstract e an unsung hero? World Pat. Inf. 28 (2006) 1e3 http://www.sciencedirect.com/science/article/pii/ S0172219005001353. [6] Stephen Adams, The text, the full text and nothing but the text: part 1 e standards for creating textual information in patent documents and general search implications, World Pat. Inf. 32 (1) (03-01-2010) 22e29. http://www. sciencedirect.com/science/article/pii/S0172219009000519. [7] European Patent Convention, Rule 43(7), http://www.epo.org/law-practice/ legal-texts/html/epc/2013/e/r43.html. [8] P. Schwander, An evaluation of patent searching resources: comparing the professional and free on-line databases, World Pat. Inf. 22 (2000) 147e165, http://dx.doi.org/10.1016/S0172-2190(00)00045-4. XP004216447. [9] California Institute of Technology, Caltech Library, Class Handouts Spring/ Summer 2013, Patent Searching, http://library.caltech.edu/learning/
49
classhandouts/PatentSearching.pdf. [10] A. Materne, G. Sleightholme, Methods of ranking search results for searches based on multiple search concepts carried out in multiple databases, World Pat. Inf. 36 (2014) 4e15. http://www.sciencedirect.com/science/article/pii/ S0172219013001178. [11] A. Materne, G. Sleightholme, Workshop WS06, Patent search: abstracts published together with patent applications vs enhanced abstracts, Search Matters 2014, 3e4 April 2014, EPO, The Hague (NL). http://documents.epo.org/ projects/babylon/acad.nsf/0/10213AB56C3211C5C1257C29004980CF/$File/ Search%20Matters%20Abstracts%202014_web.pdf. [12] A. Materne, G. Sleightholme,Workshop WS03, Related patent applications and patent thickets, Search Matters 2015, 5e6 March 2015, EPO, Munich (DE). http://www.epo.org/learning-events/events/conferences/search-matters/ programme/workshop-abstracts.html. [13] EPO Guidelines for Examination, Part B, Chapter X, Section 9.2.1, see http:// www.epo.org/law-practice/legal-texts/html/guidelines/e/b_x_9_2_1.htm. [14] D. Andlauer, Automatic pre-search: an overview, in Search Matters 2013, 18 March 2013, The Hague (NL). http://www.epo.org/learning-events/events/ conferences/2013/search-matters/programme.html. [15] EPO Guidelines for Examination, Part B, Chapter IV, Section 1, see http://www. epo.org/law-practice/legal-texts/html/guidelines/e/b_iv_1.htm. [16] USPTO PLUS automated preliminary search, see e.g. slides 35-37 in http:// www.uspto.gov/sites/default/files/documents/SUMMIT%20SLIDE%20SET% 20FINAL%20pdf.pdf and also file wrapper information for e.g. US S/N 13478863, office action of 18.02.2014: ‘The Patent Linguistics Utility System (PLUS) is a USPTO automated search system for U.S. Patents from 1971 to the present. PLUS is a query-by-example search system which produces a list of patents that are most closely related linguistically to the application searched’. [17] H. Mase, T. Matsubayashi, Y. Ogawa, et al., Proposal of two-stage patent retrieval method considering the claim structure, ACM Trans. 4 (2) (2005). XP055052603. [18] H. Iyer, Online searching: use of classificatory structures, Adv. Knowl. Organ. 2 (1991) 159e167. XP646150. th, G. Sleightholme, FACET e an extended boolean search method, [19] R. Horva 16th Seminar on Search and Documentation Working Methods, 29 March e 1 April 2004, The Hague, The Netherlands. [20] A. Nuyts, in: EPOQUE Search and Viewer Tools at the European Patent Office in Proceedings of the International Chemical Information Conference, 22 October 2000, 2000, pp. 47e56. XP009003259. [21] Google prior art finder is available at: http://www.google.com/patents/ related.
Alain Materne holds a degree in Electronic Engineering from the ENS of Electronic Applications of Cergy-Pontoise and a diploma thesis from the TU Berlin. He worked in the electronic industry at Wandel & Goltermann in Reutlingen prior to joining the European Patent Office in 1988. He currently works in the field of audio video multimedia and develops various ranking scripts in ooRexx for retrieving related files and prior arts.
Gershom Sleightholme holds a degree in Mechanical Engineering from the University of Melbourne and a doctorate from the University of Cambridge. He worked for several years in the steel industry, at Jaguar Cars and in the British civil service prior to joining the European Patent Office in 1996. He currently works in the field of vehicle technology. Email address:
[email protected]