U’orld Porvnr In/ormurm. Prmted an Great Britain.
Vol 3. No. 2. pp
73 78. 1981. Pergamon
0172.2,904 I102 0073-06 $02 00/O lnternat~onal lnlormation Carp 0
1981 CEC/WIPO
Statistical Analysis as an Aid to the Revision of the International Patent Classification P. A. Higham, Head, Developing Countries Section, Classifications and Patent Information Division, World Intellectual Property Organization, Geneva
Summary
iod amendments to the IPC in both of its authentic language versions (English and French). At the end of a revision period, a new edition of the IPC enters into force six inonths after the notification of those amendments by the International Bureau to the competent authorities of the IPC Union.
Recent work is reported by the International Bureau of WIPO to determine whether statistical data, available from the International Patent Documentation Center (INPADOC), concerning the application of symbols of the International Patent Classification (IPC) to patent documents can be used as an aid to identifying areas of the IPC in need of revision. The results of two studies are given which highlight significant differences in the application of the IPC both at the section (or high) level as well as at the subgroup (or low) level.
The aim of revising the IPC is to keep it abreast of developments in technology and thus enable it to fulfil its stated purposes. These purposes, as defined by the IPC Committee of Experts, are: (i) as the primary purpose, the IPC ought to be an effective search tool for the retrieval of relevant patent documents by Patent Offices and other users to establish the novelty and evaluate the inventive step (including the assessment of technological advance and useful results or utility) of patent applications ; (ii) as other purposes (equally important to developing and developed countries), the IPC is to serve as: (4 an instrument for the orderly arrangement of patent documents in order to facilitate access to the information contained therein, the basis for selective dissemination of information to all users of patent information, and (4 a basis for the preparation of industrial property statistics which in turn permit the assessment of technological development in various areas.
Introduction The International Patent Classification (IPC)* is a 12year old classification system, which is revised at regular intervals. Since the appearance of the first edition, in 1968, the IPC has undergone two revisions, which resulted in a second edition in 1974 and a third edition in 1979 (the third edition became effective on January 1, 1980). At present a further revision is underway which will result in the fourth edition, planned to be published in 1984.
Revision of the IPC The revision of the IPC is provided for in the Strasbourg Agreement Concerning the International Patent Classification of 1971 (referred to in Articles 2(l) (a) (ii) and (iii), 5(5) and 6 of the Agreement).
The revision work is, as stated in Article 5(5) of the Strasbourg Agreement, carried out on the basis of proposals for amendment made by the competent authorities of the member States of the IPC Union and by the European Patent Office (EPO) represented in the Committee of Experts.
Prior to the entry into force of the Strasbourg Agreement in October 1975, the revision of the IPC was the responsibility of the Joint Ad Hoc Committee of the Council of Europe and the International Bureau of WIPO.
The Committee of Experts of the Special Union of the IPC (the IPC Union), established under the Strasbourg Agreement, approves during each revision per-
Revision of the IPC as stated above may become necessary due to new developments of technology which might call for new entries to be introduced in the classification or for its rearrangement. Or, the growth of file size in fields of high inventive activity results in less efficient searching and might thus call for further subdivision of existing entries.
*See also The International Patent Classification, P. Claus and B. Hansson, World Patent Informntion 2, (1980), No. 1, pp. 13-20.
The need to revise a given area of the IPC is felt in the course of its use by classifiers when classifying patent 73
P. A. Higham
74
the second edition were deleted and 5 832 groups were created to give 55 467 groups in the third edition. This table does not reflect the number of groups which were amended.
documents, or searchers when searching in files arranged according to the IPC. Most proposals emanate therefore from the Patent Offices of the major patentissuing countries, e.g., Australia, Austria, Denmark, Finland, France, Germany (Federal Republic of), the Soviet Union, Sweden, Switzerland, the United Kingdom and the United States of America and from the EPO.
Application
of the IPC
The competent authorities of more and ries apply the IPC to their published ments. At present, 36 countries and one organization apply the IPC down subdivision.
The amount of work necessary in each revision period requires the expenditure of very many man-hours by highly specialized technical staff of the said Offices. In addition, the amendments introduced in a given edition require that the users of the IPC (e.g., classifiers and searchers) familiarize themselves with the new edition, and those Offices which reclassify their search files according to a new edition of the IPC are faced with an important reclassification task.
more countpatent docuinternational to its finest
It is estimated that, the number of patent documents which have been classified to the finest subdivision of the IPC prior to 1981, amount to over 7 800000. Criteria for Revision
There is obviously a need to make the best possible use of the resources made available to the IPC Committee of Experts by the competent authorities of the members of the IPC Union. This means revising those areas of the IPC which most need revision and whose revision would be most beneficial, whilst avoiding unnecessary reclassification work.
Table 1 gives a general picture of the magnitude of the revision work carried out during the first and second revision periods. The extent of revision, however, varies from subclass to subclass and cannot easily be deduced from this table.
This need led to the establishment, by the IPC Committee of Experts, of criteria for the selection of different revision proposals submitted to it and for establishing their relative priority. Those criteria include, amongst other factors, statistical information concerning the size of search files in the area of the IPC for which revision is proposed and also an indication of the annual increase in the size of those search files.
Table 2 gives a picture of the increase in the number of groups (i.e. main groups and their subgroups) in the second and third editions compared with the first and second editions, respectively. This increase takes into account the deletion of formerly existing groups and the creation of new groups. For example, during the second revision period, 1793 out of 51428 groups in
Table 1
Section of the IPC
First edition of the IPC Total subclasses
A B C D E F G H Total
80 161 84 37 31 91 71 46 607
Amended
First revision period (1968 to 1974) Subclasses Deleted Created
66 123 45 11 21 30 46 35 377
Second edition of the IPC Total subclasses 80 162 88 38 31 97 73 45 614
1 I 1
3
2 1 4
11
Second revision period (1974 to 1979) Subclasses Amended Created Deleted 53 105 35 8 18 64 4-l 38 368
2 6 1
9
1 7
2 2 12
‘Third edition of the IPC Total subclasses 80 161 89 38 30 97 75 47 617
Table 2 Third edition
Second edition Main groups A B C D E F G H
Subgroups
Increase Total
~_
%
Main groups
Subgroups
‘Total
Increase Number
“Y,;
1008 1686 1034 342 325 1014 604 456
5060 11812 1868 2 144 2 566 5 501 4 760 5 248
6068 13498 8 902 2 486 2891 6515 5 364 5 704
535 269 2 185 165 28 95 341 549
+ 9.7 + 2.0 + 32.5 + 7.1 + 1.0 + 1.5 + 6.9 + 10.6
1018 1704 1092 343 322 1021 621 481
5486 11992 9 654 2 195 2 616 5 836 5 238 5 788
6504 13696 10 746 2 538 2 998 6 857 5 859 6 269
436 198 1 844 52 107 342 495 565
+ 1.2 + 1.5 + 20.7 + 2.1 + 3.7 + 5.2 + 9.2 + 9.9
6 469
44959
51428
4 173
+ 8.8
6 602
48 865
55 467
4039
+ 1.9
Statistical
Analysis
as an Aid to the Revision
Statistics During recent discussion by the Working Group on Planning of the WIPO Permanent Committee on Patent Information, it was agreed that any statistical analysis concerning the application of IPC symbols to patent documents which is to serve the revision work should at least provide the following information : (i) those areas, down to the finest or the subgroup level, of the IPC attracting a significantly greater than average number of patent documents; (ii) those areas, down to the subgroup level, of the IPC attracting a significantly lower than average number of patent documents ; (iii) those areas of the IPC which are applied to patent documents in an inconsistent manner; (iv) those areas of the IPC which overlap in technical subject with other areas of the IPC; (v) those technical areas in which retrieval of patent documents frequently requires the search of many IPC units since such a situation might be the sign of an inefficient breakdown of technical subject matter in the IPC. They also have agreed that, based upon the results of those analyses, a priority list of areas of the IPC where revision work is indicated should be established and reasons for the apparent necessity for revision investigated. Revision proposals in respect of the identified areas requiring revision should then be invited and regarded as priority tasks.
To produce reliable statistics concerning the use of the IPC on patent documents, it is necessary to select the data very carefully. Since a large amount of data has to be analysed, a first requirement is the availability of the data in a machine-readable form, that is, the data should exist in a computer data base. Moreover, the data base selected should record, comprehensively, all the IPC symbols allotted to and printed on a patent document by the issuing Office, and not just, for example, the first IPC symbol allotted or printed. It is also necessary to ensure that the data base includes details of all patent documents issued by a given Office, and also that it covers as many issuing Offices as possible. The information recorded by the International Patent Documentation Center (INPADOC) meets all those requirements to a very large extent. Since 1976, and for the majority of Offices somewhat earlier, INPADOC has recorded the full IPC information applied on all patent documents issued by the following 34 countries (an asterisk indicates those countries members of the Strasbourg Agreement) : Argentina, Australia*, Austria*, Brazil*, Bulgaria, Cuba, Cyprus, Czechoslovakia*, Denmark*, Egypt*, Finland*, France*, German Democratic Republic*, Germany (Federal Republic of)*, Hungary, India, Ireland*, Israel*, Japan*, Kenya, Mongolia, Netherlands*, Norway*, Philippines, Poland, Portugal*, Romania, Soviet Union*, Spain*, Sweden*, Switzerland*, United Kingdom*, United States of America*, Yugoslavia.
of IPC
75
Another factor to be considered in the selection of the data base is that all the IPC information recorded relates to the same edition of the IPC. The second edition of the IPC came into force on January 1, 1975. The majority of the larger Offices started applying the symbols of the second edition of the IPC on all patent documents published by them on or soon after January 1, 1975. It is to be expected that the application of IPC symbols to patent documents published immediately after a new edition of the IPC comes into force would be accompanied by a greater degree of inconsistency than IPC symbols applied somewhat later due, in particular, to classifiers requiring some time to gain experience in classifying patent documents relating to technical areas having been the subject of significant revision during a particular revision period. This factor, though, may be expected to be of practical importance only when considering the application of the IPC at a fine level (e.g., at the main group and subgroup levels) and not significant at a higher level (e.g., at the section or subclass levels). INPADOC’s Patent Classification Service (PCS) has been found to be a particularly useful data base for analysis in this respect. That data base is already organized according to IPC symbols and records, under each of the more than 55 000 symbols of the IPC, a list of those patent documents for which that symbol was used to classify patent documents. The two main parts of that data base used as the basis for analysis were as follows: (a) The PCS as existing in 1976 (the “1976 survey”) ; (b) Patent documents published in 1978 (the “1978 survey”). Data relating to the 1976 survey was extracted by INPADOC at that time. However, since it included IPC information relating to both the first and the second editions of the IPC, the analysis of the data was restricted to the application of the IPC at the section and subclass levels as described below. A more detailed analysis at the main group and subgroup levels was undertaken using the 1978 survey. A first analysis of both surveys was made to discover the percentage usage of IPC symbols of each of the eight IPC sections. The results are given in Table 3:
Table 3 Percentage
usage of IPC symbols
at the section
level
IPC Section
1976 Survey %
1978 Survey %
Mean %
A B C D E F G H
9.8 23.9 22.4 3.4 4.7 10.7 13.2 11.9
10.6 22.1 21.7 3.0 4.7 10.5 13.8 13.6
10.2 23.0 22.0 3.2 4.7 10.6 13.5 12.8
76
P. A. Higham
each subclass, a sample of which is given in Fig. 3. The printouts gave, at the subclass, main group and subgroup levels, the number of times IPC symbols, respectively, were used in 1978 for classifying documents. Against each number is given a percentage figure providing a comparison with the average usage (see below) at the level concerned. To highlight those subgroups having 10 y0 or less of the average usage an asterisk is printed adjacent to them.
1976 Survey As indicated above, the results at the subclass level of the 1976 survey were analysed. The total number of times (F) any symbol of a given IPC subclass was used to classify the technical subject matter contained in a patent document was divided by the number (N) of symbols, at the main group or subgroup level, provided in that subclass to thus derive a value of F/N for each subclass. The value of F/N found is thus a measure of the number of patent documents classified, on average, under each symbol of a subclass and may be regarded as a File Size Indicator (FSI). As is to be expected, the FSI varies very widely from one subclass to another. Table 4 gives that variation for each section of the IPC and also in total.
To provide a basis for the percentage figures given in the printouts, it was necessary to calculate the average usage, at each level of the IPC, of the 1234463 IPC symbols printed on patent documents published in 1978. That average usage was found to be: At the section level 154 307.88 times At the class level 10 641.92 times At the subclass level 2 010.53 times At the main group level 190.83 times At the subgroup level 24.00 times. A preliminary analysis of the data given in the printouts was made in three general ways as follows.
From the point of view of use of the IPC for search purposes, the greater the number of patent documents classified under an IPC symbol, the more tedious the search becomes. An estimate of the relative difficulty of search in some sections of the IPC as compared with other sections can be made by comparing the distribution of the value of the FSI among the subclasses of each section of the IPC as compared with the total, viz. the average, distribution of the FSI among all the subclasses of the IPC, using the results given in Table 4. Figure 1 compares the distribution within Section H with the total distribution and indicates that Section H is more difficult to search than average. Figure 2 compares the distribution within Section A with the total distribution and indicates that Section A is easier to search that on average. Similar comparisons show that Sections E and G are more difficult to search than on average whilst Sections B, D and F are easier than on average. It should be emphasized that the above results are obtained merely on the basis of FSIs and that many other factors determine the ease of search. But these results do provide an indication of where efforts to revise the IPC should be concentrated if it is desired to minimise the number of patent documents to be considered during a search. 1978 Survey The 1978 survey was directed towards more detail, viz. at the main group and the use of the IPC in classifying patent main results were obtained in the form
Firstly, those units (viz. a main group and its subgroups) of which a substantial number of subgroups, or the main group itself, attracted a much greater than average number of documents were identified. It was found, for example, that main group G 03 G 21/00 (which has no subgroups) attracted 777 patent documents in 1978; and A 61 B lO/OO (also having no subgroups) attracted 701 documents. Main group F 24 J 3/00 (which has 2 subgroups) attracted an average of 651 documents in each of its three units, whilst main groups C 02 C 3/00, and C 09 B 67/00, (both having no subgroups) attracted 605 documents and 543 documents respectively. Secondly, those subgroups which alone attracted a much greater than average number of documents were identified. It was found that G 01 N 33/16 attracted 2 308 patent documents in 1978 ; B 01 D 53/34, 2011 documents ; F 24 J 3/02,1747 documents ; and G 09 F 9/00, 1729 documents. Compared with the average usage of 24 times, those figures are very high. The subject matter covered by those active subgroups is as follows :
ascertaining in subgroup levels, documents. The of printouts for
lcihlc 4 0
IO
20
30
40
5”
60
70
80
YO
IN,
IO
20
30
40
50
h,,
70
80
90
100
I IO
Y
30
IO
7
7
6
3
2
I8
25
26
32
?I
II
7
Y
4
4
4
14
I5
9
I3
7
7
4
4
3
3
x
I"
6
6
1
I
i
3
1
5
10
8
i
3
2
I
II
24
iY
10
Y
Y
4
6
7
II
I3
10
13
2
7
6
7
IO
6
4
5
4
137
106
x5
5x
35
22
77
3')
4
I
3 2
3
I
I 14
l2
x
5
,
2
I
I
2
4
I2
3
I4
Statistical
Analysis
as an Aid to the Revision of the IPC
Fig. 1. FSI-distribution
of all IPC-subclasses, compared distribution of Section H.
77
with the
FSI
Fig. 2. FSI-distribution
of all IPC-subclasses, compared distribution of Section A.
G 01 N 33116:
analyzing investigating or chemical or physical properties of biological material, e.g. blood, urine.
B 01 D 53/34:
chemical purification of smoke or fumes, e.g. flue gas
F 24 I3/02:
producing heat, using heat or heating by using natural heat, e.g. solar ; stoves or ranges using solar heat
G09F9/00:
indicating arrangements operto display variable ated information.
It is of interest to note the high level of inventiveness being displayed in relation to two subjects of much concern, e.g. improving environmental conditions and utilizing renewable sources of energy.
Lastly, those units of which a substantial number of subgroups attracted a very small number of documents were identified. In this category, very many units were identified and it is those units for which revision does not seem warranted.
with the
As described earlier in this article, the IPC undergoes revision every five years. Revision proposals are submitted by members of the IPC Committee of Experts. Once a year-within the WIPO Permanent Committee on Patent Information-a selection of those revision proposals is made to form the next year’s IPC revision work. Recently, discussion has taken place with the aim of basing that selection process upon the areas of the IPC in most urgent need of revision.
The number of revision proposals submitted for consideration is such that the technical work involved in developing those proposals into generally acceptable amendments to the IPC is much larger than the available resources. It is therefore necessary to identify those revision proposals which, if acted upon to eventually produce amendments to the IPC, would introduce the maximum improvement in the use of the IPC.
The criteria for selection that has been developed to perform the above selection operates in two stages. Firstly, the revision proposal is submitted with infor-
P. A. Higham
78
Figure 3
STATISTICS CONCERNING
THE APPLICATION
OF THE INTERNATIONAL
***LIST***
PART 1 IPC-SUBCLASS:
F16D
Group
Number
%
TOT F16D
6231
309.92
ii00 l/O2 l/O4 l/O6 l/O8 l/l0 l/12
301 44 43 31 97 52 22 12
157.73 183.33 179.17 129.17 404.17 216.67 91.67 50.00
Group
886 29 22 20 32 3 14 24 26 45 23 4 33 65 47 32 13 27 7 6 8 2 22 20 26 0 0 9 6 6 15 18 4 33 29 10 14
464.29 120.83 91.67 83.33 133.33 12.50 58.33 100.00 108.33 187.50 95.83 16.67 137.50 270.83 195.83 133.33 54.17 112.50 29.17 25.00 33.33 8.33 91.67 83.33 108.33 0.00 0.00 37.50 25.00 25.00 62.50 75.00 16.67 137.50 120.83 41.67 58.33
Number
%
3164 3166 3168 3170 3172 3114 3176 3177 3118 3179 3/80 3182 3/84
9 14 32 16 7 24 30 1 32 0 7 0 20
37.50 58.33 133.33 66.67 29.17 100.00 125.00 4.17 133.33 0.00 29.17 0.00 83.33
5 5100
3 3
1.57 12.50
7 7100 7102 7104 7106
141 29 58 20 34
73.89 120.83 241.67 83.33 141.67
9 9100
62 62
32.49 258.33
11 ll/OO 1 l/O2 11/04 1 l/O6 * 11/08 1 l/l0 11/12
109 41 4 20 21 1 19 3
57.12 170.83 16.67 83.33 87.50 4.17 79.17 12.50
13 13100 13102 13104 * 13106 13/08 13/10 13/12 13114 * 13/16
557 30 12 5 1 31 10 4 3 1
291.88 125.00 50.00 20.83 4.17 129.17 41.67 16.67 12.50 4.17
* * *
:,OO 3102 3104 3106 3/08 3/10 3112 3114 3116 3/18 3119 3120 3122 3124 3126 3128 3130 3132 3133 3134 * 3136 3/38 3140 3141 * 3142 * 3143 3144 3146 3148 3150 3152 3154 3156 3158 3160 3162
PATENT CLASSIFICATION (IPC) PRODUCED: 79.09.24 PAGE: 527 (C) INPADOC 1979
*
mation giving the reason for the proposal, e.g. that the existing classification is inefficient for search or it is ambiguous or that file size is excessive. That information also gives the file size, rate of growth of files, and annual search activity. It should be noted that this information is supplied by the Office submitting the revision proposal. Secondly, a listing of all revision proposals is compiled giving statistics, relevant for each area of the IPC to which the revision proposals are directed, drawn from the 1978 survey. The selection of those revision proposals judged, partly on the
_
Group 13/18 * 13/20 13122 13124 13/26 13/28 * 13/30 * 13132 * 13134 * 13136 13/38 13140 13142 13144 13146 13/48 13150 13152 13154 13156 13/58 13160 * 13162 13164 * 13166 13/68 13169 13170 13171 13172 13174 13175 * 13176
12.50 0.00 16.67 16.67 16.67 16.67 0.00 0.00 0.00 0.00 108.33 41.67 16.67 54.17 37.50 16.67 145.83 75.00 20.83 29.17 191.67 108.33 8.33 383.33 4.17 45.83 29.17 95.83 137.50 79.17 45.83 154.17 8.33
8.91 70.83
3 3
1.57 12.50
* 19 19jOo
12 12
6.29 50.00
21 21100 21102
30 7 5
15.72 29.17 20.83
‘* 17 17/w
quantitative of revision
Group
%
17 17
* 15 15jOO
_
Number _~ 3 0 4 4 4 4 0 0 0 0 26 10 4 13 9 4 35 18 5 7 46 26 2 92 1 11 7 23 33 19 11 37 2
Number
%
21/04 21/06 21/08
7 8 3
29.17 33.33 12.50
23 23100 23102 t 23104 23106 23108 23110 23112 23114
199 17 33 11 27 6 16 21 68
104.28 70.83 137.50 45.83 112.50 25.00 66.67 87.50 283.33
&O * 25102 25104 25106 25106 1 251062 251063 * 251064 rl 251065 25/08 25110 25111 25112 25114
188 21 2 11 10 5 8 45 2 0 13 17 5 17 32
98.52 87.50 8.33 45.83 41.67 20.83 33.33 187.50 8.33 0.00 54.17 70.83 20.83 70.83 133.33
27 27100 27101 27102 27104 27106 27107 27/08 27109 27110 27112 27114 27116
149 36 2 7 14 7 5 1 2 34 2 28 11
78.08 150.00 8.33 29.17 58.33 29.17 20.83 4.17 8.33 141.67 8.33 116.67 45.83
* 29 29100
8 8
4.19 33.33
*
* * *
data, is thus made to form a “priority” proposals.
list
As revision of more and more areas of the IPC is accomplished, it is to be expected that the number of revision proposals submitted will start to drop. It is at that stage that statistical analysis concerning the application of the IPC to patent documents will be most profitably used to identify those areas of the IPC in which revision is indicated, by those statistics, as being necessary.