Data Prep 3-1: Filling Missing Values With Constants

Data Prep 3-1: Filling Missing Values With Constants

T U T O R I A L M Data Prep 3-1: Filling Missing Values With Constants Roberta Bortolotti MSIS, CBAP, University of California, Irvine Filling missin...

6MB Sizes 0 Downloads 108 Views

T U T O R I A L

M Data Prep 3-1: Filling Missing Values With Constants Roberta Bortolotti MSIS, CBAP, University of California, Irvine Filling missing values with a suitable data value for a variable is an operation called data imputation. The task of deciding which value to use to fill blanks in a record should follow rules or a set of rules defined based on assumptions about the pattern of the data. There is a drawback in this operation though: data added to the pattern might not reflect the underlying pattern of the complete data. This tutorial consists of short tutorials that show data imputation operations by (a) filling missing values with constants, (b) filling missing values with formulas, (c) filling missing values with a model. 1. Open KNIME. Click on File > New to create a new workflow.

Handbook of Statistical Analysis and Data Mining Applications https://doi.org/10.1016/B978-0-12-416632-5.00035-9

529

Copyright © 2018 Roberta Bortolotti. Published by Elsevier Inc. All rights reserved.

530

M.  Data Prep 3-1: Filling Missing Values With Constants

2. In the Wizard window, select New KNIME Workflow and click Next.



M.  Data Prep 3-1: Filling Missing Values With Constants

531

3. In the next screen, name the new workflow Tutorial_3-1. Click on Browse to specify a Tutorial Folder, if necessary, and click Finish.

532

M.  Data Prep 3-1: Filling Missing Values With Constants

4. On the Node Repository section, expand the IO > Read node and drag the Table Reader node to the workflow space.

5. Double-click on the Table Reader node. 6. In the Configuration Dialog, for Input location, click on Browse, navigate to Tutorial_3 folder, and select Tutorial2_3.table file. Click Ok.

7. Right-click on the Table Reader node and select Execute.



M.  Data Prep 3-1: Filling Missing Values With Constants

533

8. Right-click on the Table Reader node and select Read table.

9. Expand the table. Note that there are records that contain missing values for a lot of variables. Variables MBCRAFT to PUBOPP show the number of times different mail promotions were accepted, which are important information for our model. They will be used in this exercise to show how to fill missing values with constant values.

534

M.  Data Prep 3-1: Filling Missing Values With Constants

10. Close the table. 11. On the Node Repository section, expand the Manipulation > Column > Transform node and select the Missing Value node. Drag the Missing Value node to the workflow space.

12. Connect the output triangle of the first Table Reader node to the left triangle of the Missing Value node. 13. Right-click on the Missing Value node and select Configure.



M.  Data Prep 3-1: Filling Missing Values With Constants

14. On the Default tab, select the Fix Value option from the drop-down list for Number (Integer). Notice that the default value is 0; leave it as 0.

535

536

M.  Data Prep 3-1: Filling Missing Values With Constants

15. Select the Fix Value option from the drop-down list for String. Then, enter the value U for unknown.



M.  Data Prep 3-1: Filling Missing Values With Constants

16. Select the Fix Value option from the drop-down list for Number (double). Notice that the default value is 0.0; leave it as 0.0.

17. Click OK. 18. Execute the Missing Value node.

537

538

M.  Data Prep 3-1: Filling Missing Values With Constants

19. Right-click on the Missing Value node and select Output table.

20. Expand the table. Note that all missing values were filled with a constant and that there are no more missing values in the data set. 21. Close the table. 22. Click File > Save to save the workflow. 23. Close the KNIME application.