Information modeling for pharmaceutical product development

Information modeling for pharmaceutical product development

16th European Symposium on Computer Aided Process Engineering and 9th International Symposium on Process Systems Engineering W. Marquardt, C. PanteUde...

512KB Sizes 9 Downloads 252 Views

16th European Symposium on Computer Aided Process Engineering and 9th International Symposium on Process Systems Engineering W. Marquardt, C. PanteUdes (Editors) © 2006 PubHshed by Elsevier B.V.

2147

Information Modeling for Pharmaceutical Product Development Chunhua Zhao ^, Leaelaf Hailemariam ^, Ankur Jain ^, Girish Joglekar ^, Venkat Venkatasubramanian ^, Kenneth Morris ^ and Gintaras Reklaitis ^ "^ Laboratory for Intelligent Process Systems, School of Chemical Engineering, Purdue University, West Lafayette, IN 47907 USA ^ Department of Industrial and Physical Pharmacy, Purdue University, West Lafayette, IN 47907 USA Abstract Development of a pharmaceutical product involves several inter-related steps with multiple decisions requiring iterative improvements. Large amounts of information, including the properties of a drug substance, interactions of materials, unit operations, equipment etc, have to be gathered and used for decision making. A systematic model of the associated information is thus needed to streamline the product development process and provide a common foundation to support the information. Following the information-centric infrastructure proposed in our earlier work, ontology has been used to model the information. The information modeled as ontology provides information in a way that can be easily used by humans and processed by machines. The information modeling process and developed ontology are discussed in detail. The benefits are demonstrated by using a case study for managing information generated from the preformulation stage of pharmaceutical product development. Keywords: Information Modeling, Pharmaceutical Product Development, Ontology, Information Management 1. Introduction The development of a product goes through several stages during its lifecycle with emphasis on shortening development time, cutting development costs and improving the process design to ensure higher flexibility (Schneider and Marquardt, 2002). This is particularly true in the area of pharmaceuticals and specialty chemicals. Commercial scale product and process development typically goes through the following stages after the viability of a newly discovered molecule is established: laboratory scale, pilot plant scale and commercial scale manufacturing. Laboratory scale experiments are used to determine various synthetic routes and characterize key steps in each route, as well as obtain process parameter values. Pilot plant studies provide a detailed understanding of processing steps in the selected route and data needed for scale-up to commercial manufacturing, at which stage information related to manufacturing is applied in debottlenecking and productivity improvement. The three stages are closely related through the information they exchange. Information generated at the lab scale can be used to improve manufacturing. Problems identified at the manufacturing stage are communicated to the lab scale to identify their root causes, and provide ideas on how to avoid similar problems in ftiture process development. The stage-wise development of a process for a drug substance and its dosage form(s) involves a substantial amount of information sharing by tools in each stage; and

2148

C.Zhaoetal

between stages, information and knowledge have to be exchanged at appropriate times. However, most of the available tools have reached maturity and exist in islands of automation with little interaction among them. Each tool uses its own format and interpretation of information, causing inefficient information and knowledge transfer. In the development and manufacturing of pharmaceutical products, use of new process analytical technologies (PAT) has enabled scientists to get a better understanding of the underlying physical and chemical phenomena. The knowledge created from the learning process can be in different forms: reports on paper or electronic format and experience gained by scientists. With more and more information and knowledge becoming available, however, it is clear that we need more intelligent software systems to effectively manage and access them for efficient decision making. Pharmaceutical product development involves the integration of process modeling tools, effective handling of laboratory generated information and knowledge as well as development of technical specifications and information base to satisfy regulatory requirements. We argue that in order to better support the activities and decisions in pharmaceutical product development, formal and explicit models of related information need to be developed. These information models should be easily accessible by human and tools, and should provide a common understanding for information sharing. The remainder of the paper is organized as follows: Section 2 discusses the ontology based approach to model the information, using the material property ontology for pharmaceutical product development as an example to discuss the steps used and lessons acquired in building ontologies. Section 3 illustrates the use of information modeling to support the information management during the preformulation stage. Examples are used to compare the proposed approach with existing solutions. 2. Information Modeling 2.1. Ontology Several approaches exist to model the information. However, such information models are usually in closed form and only provide a limited view of the information. An information-centric approach has been proposed (Zhao et al., 2005) in which information is modeled using ontology. An ontology is a formal and explicit specification of a shared abstract model of a phenomenon through identification of its relevant concepts (Gruber, 1993). This may be seen during the measurement of the flow rate of an active pharmaceutical ingredient (API) powder through an orifice, which conceptualizes the API, the flow rate value, the experiment and its context (Figure 1). API: Flow Rate. Ontology captures relations like has-value-of ^ ^ \ Relative Humidity: 78.0 ^ and was-done-on. In addition, ontology makes it possible to relate different concepts mujyiujjmmijif \ through their being instances of the same 1.0 g/s: [0.8 , 1.2]* class (e.g. flow rate of material A related to that of material B). Such relations are usable Figure 1: Relations between concepts by both human and computer tools. Compared to a database schema which targets physical data independence, and an XML schema which targets document structure, an ontology targets agreed-upon and explicit semantics of the information, and directly describes the concepts and their relations. Web Ontology Language (OWL, 2004) was used in this work to encode ontology.

Information Modeling for Pharmaceutical Product Development

2149

2.2. Ontology Building for Pharmaceutical Product Development Ontology building is an evolutionary design process consisting of proposing, implementing and refining classes and properties that comprise an ontology (Noy and McGuinness, 2001). The steps involved include: determining the domain and scope of the ontology, reusing existing ontologies, enumerating important terms in the ontology and defining the classes and class hierarchy. In pharmaceutical product development, one undergoes selection of a dosage form, selection of a processing route, selection of excipient (inert material added to impart desired properties on the final product) roles and selection of specific excipients and their composition. The best way to systematically capture the above sequence of steps as applied to the final product is through a construct called a development state, which includes information about the dosage form, processing route, excipient roles, excipients and their compositions. In addition, it should contain the description of the API, of the excipients and the dose amount. In turn, the description of the API should include the description of its properties and the experiments done on it. A central concept in the abstraction above is the material, which represents substances and mixtures (which are characterized by pure substances and their compositions). A material has several properties (e.g. specific heat capacity), can play several roles (e.g. API, flow aid) and can be involved in several experiments (e.g. Hosokawa Tests). The list of material properties may be classified into engineering properties, compound properties, particle properties and powder properties. Engineering properties include those properties which are used in engineering calculations, like heat transfer properties. Compound properties include the molecular properties like molecular mass, and a description of the chemical reactions the material undergoes. Particle properties include the crystalline properties and a description of the stability of the physical form (if crystalline, the crystal system). Powder properties describe the behaviour of a large number of particles of the material, like flow and deformation of the powder into tablets. Each property is represented by a class with its own set of attributes. A material property value can be measured experimentally, calculated mathematically or retrieved from literature. If measured experimentally, the conditions under which an experiment is performed defines its context, for example temperature, pH, relative humidity and so on. The description of an experiment would include the materials involved, the experimenter, the location of experiment, the date and time of experiment, the equipment used, the procedure followed and the experimental data. The relations between the experiments, the materials they were done on and the properties that were measured are explicitly described. Modeling of the domain information, i.e. creating the domain ontology, requires understanding the ontology building techniques as well as the domain one tries to model. In this project, we work very closely with collaborators in Industrial Pharmacy. The current ontology is the result of several iterations of propose-discuss-revise. The visualization tools provided by the ontology editor. Protege (see URL) and the plugins in the editor including the view of class hierarchy, the graph view and the automated generated form for information entry are very convenient in the collaboration. The developed ontology has been used as the foundation for the information repository and provides information to various tools including an engine to execute guidelines and another engine to utilize mathematical knowledge (Zhao et al., 2006) following the methodology developed in our previous work (Zhao et al., 2005). In this paper, we discuss how the ontology could support management of information gathered in product development.

2150

C.Zhaoetal

3. Information Management Voluminous information is generated during product development, such as raw data generated from analytical instruments, pictures from SEMs, pictures of experiment setups, experiment notes and reports, mass and energy balance results from simulation tools etc. The information could also be in different formats, including plain text files, WORD documents. Excel worksheets, JPEG files, MPEG movies, PDFs etc. How to effectively gather information from different resources and organize it for its end use are key information management tasks. A few solutions have been developed to manage the information, most important of which are laboratory information management systems, e-LabNotebooks and content management systems. The key frinctionalities and problems of these systems are discussed in the next section. 3.1. Current Information Management Solutions Laboratory Information Management Systems (LIMS) LIMS are database applications that are used to store and manage information associated with a laboratory (Paszko and Pugsley, 2000). Typical LIMS frinctionalities include sample tracking, data entry, sample scheduling, quality analysis/quality control (allowing users to generate control charts and trend analysis graphs), automatic electronic data transfer (from analytical equipment to the LIMS), chemical and reagent inventory, personnel and equipment management and maintenance of the database. A LIMS stores information in relational databases such as Oracle, DB2 or MS SQL (Grauer, 2003). Most LIMS have interfaces that give to access to the database for information retrieval or storage. As discussed earlier, such database schemas provide only limited semantics. Relational database structures also limit the capability of describing complex relations between information. E-lab Notebooks An ELN can provide frinctionalities like browsing online libraries, databases, other electronic devices, and remote sources such as the Web, writing documents and data sets, managing data, publishing and sharing information, and creating records (Zall, 2001). With an electronic notebook, the records will be published electronically and shared with collaborators and reviewers. The utility of an ELN to provide a collaborative environment has proven inadequate for industry, especially when quality assurance/control is expected to be a major factor (Pavlis, 2005). Quality assurance demands experiments to be performed following standard operational procedures. This may be accomplished by an automated interface for data entry, which is currently not available for ELNs. Integration with information management systems is also lacking. Content Management Systems (CMS) A content management system supports the process of publishing, maintaining and dissemination of documents (Noga and Kruper, 2002). The major components of a CMS are the data repository, user interface, workflow scheme, editorial tools, and output utilities. They allow writers to create or update content, track changes, and publish contents to make them available to all users in a variety of configurations. 3.2. Ontology Driven Information Management Without specifying the semantics of the information, it is very difficult for these tools to provide functionalities beyond sharing the information among users and keyword-based search on the information. From our experience of implementing information management systems in our project, we found two major problems with the current systems: (1) organization of related information; and (2) lack of an open and systematic

Information Modeling for Pharmaceutical Product Development

2151

way to manage meta-data. These problems are directly related to the lack of the semantics of the information. We argue that the semantics of the information should be provided by the user who creates the information. Since ontology defines the semantics of the information, information can be captured by the individuals based on concepts defined in ontology. Given the semantic richness of the information defined in ontology, the information entry form can be generated automatically (as demonstrated by Protege, the ontology editor used in this project). To manage the information associated with an experiment, the experiment individual is created which is linked to the raw files generated from experiment. Similarly, as shown in Figure 2 the individuals of material properties can also be created by the user directly in which the link to experiment individuals which have been carried out for this property is specified. The system can automatically locate the concepts and relations and provide an integrated view of the information. For example, for a specific material, all the experiments done on it and its properties are accessed. The information infrastructure described above makes possible the effective management of experimental files, which may exist in different forms (spreadsheets, movies etc.) with non-descriptive names and folder ^^^H^HBH|M|^ locations. This system is developed on top of an F>r ImiUvkUiM • API: Flow Rate (insta... existing content management system to utilize i,,,^,,^,^,,,^n,m. Pn available functionalities such as user management, p-^----I workflow, security etc. Instead of the user creating folder structures in an arbitrary way, the folder isei^ni* lc^f.iit^^ria! ^ %^ structure is created based on the concept hierarchy ^^''' defined in the ontology. The system allows surfing un^^M^ 4 %^ between related instances and search by keyword as #109/3: [0 8,i2] well as on the hierarchy. A web-based interface to the ^ 4. #^ information repository can be created for users to ^^^^^^^^^^ *1 ^'n ^ ' r ' • A P I : Flow Rate Measurement access, and modify the mformation. 4 %^ ih:»C.0 life 5
Figure 2: Interjace jor input oj

potentially affected by therelative humidity. In the material properties current case study, the micromeritics (solid surface properties) of 39 materials (including mixtures) were studied, each with 18 micromeritic properties. There were on average 5 experiments for each property for every material and each experiment had an average of 3 files associated with it. In semantic search, identification of relative humidity as an instance of context would lead to all instances of properties in which that context appeared. Through the relations between property and material, the instances are further filtered based on the specified drug substance. The instances of these properties which are subclasses of micromeritic properties are found given the expHcit definition of class-subclass relationships in the ontology. The individual experiments that are done on these properties are identified through the partwhole relationship with the property. The experiment files which are hnked to these experiments are presented as the search results. The semantic search engine found 8 experiments done to determine the flow rate of a powder through an orifice for the particular drug substance as the micromeritic experiments affected by relative humidity. In contrast, without the ontology to provide the semantics, a keyword based search would not be able to navigate using the relationships. Search using keywords 'relative humidity micromeritics' did not identify any of the experiments, while a similar search

2152

C.Zhaoetal

using 'relative humidity' identified many documents, most of which had very little to do with experiments. While it is acknowledged that such results are not indicative of all experimental work, they are indicative of the challenges faced by humans and machines alike in processing large amounts of information with little or no semantics. Conversely, they serve to illustrate the utility of semantic search made possible through the development of material, property and experiment ontologies. 4. Summary In this paper, we discuss the importance of modeling information with explicit and formal semantics. Only with the semantics which makes the information machine processable could tools better utilize the information and provide better functionalities. Ontology was used to model the information. In this work, we concentrate on the information related to pharmaceutical product development. The ontology building process and the final ontology were discussed. We also demonstrate an ontology-driven information management approach based on the developed ontology. This approach provides an easy way for the user to create semantics as well as relations on the information generated during process development. This approach is very general and could easily be applied to manage information in domains like chemical process development. To develop ontology to model the domain information could be a difficult task which requires information modeling techniques as well as understanding of the domain to be modeled. Nevertheless, the ontology provides a solid foundation to better utilize the information by supporting tool development, information sharing between tools as well as information management. References Grauer Z. (2003), Laboratory Information Management Systems and Traceability of Quality Systems, American Laboratory, 9, 15. Gruber, T. R. (1993), A Translation Approach to Portable Ontology Specification, Knowledge Acquisition, 5, 2, 199. Noga, M., Kruper, F. (2002), Optimizing Content Management System Pipelines, Lecture Notes in Computer Science, 2487, 252. Noy, N.F., McGuinness, D.L. (2001), Ontology Development 101: A Guide to Creating Your First Ontology, Stanford Knowledge Systems Laboratory Technical Report KSL-01-05. OWL (2004), Web Ontology Language Overview, W3C Recommendation, http ://www. w3 .org/TR/owl-features/ Paszko, C , Pugsley, C. (2000), Considerations in Selecting a Laboratory Information Management System (LIMS), American Laboratory, 9, 38. Pavlis, R., (2005), Scientific Computing, 9, 31. Protege (Version 3.1), http://protege.stanford.edu. Schneider, R., Marquardt, W. (2002), Information Technology Support in the Chemical Process Design Life Cycle, Chemical Engineering Science, 57, 1763. Zall, M. (2001), The Nascent Paperless Laboratory, Chemical Innovation, 31, 2. Zhao, C , Joglekar, G., Jain, A., Venkatasubramanian, V., Reklaitis, G. V. (2005), Pharmaceutical Informatics: A Novel Paradigm for Pharmaceutical Product Development and Manufacture, Proceedings of ESCAPE 15. Zhao, C , Jain, A., Joglekar, G., Hailemariam, L., Venkatasubramanian, V., Morris, K., Reklaitis, G. V. (2006), A Unified Approach for Knowledge Modeling in Pharmaceutical Product Development, Submitted to ESCAPE 16.