What is OpenRefine?

OpenRefine, as what has been claimed by the developer, is a free, open-source, powerful tool for working with messy data. We will use this tool in order to clean some bibliographical data such as author's keywords, indexed keywords, affiliation, author's name, etc.

Installing OpenRefine

OpenRefine is designed to work with Windows, Mac, and Linux operating systems. Officially it can be obtained from openrefine.org. However, it can be also can be downloaded from the following links:

  1. for Windows
    https://github.com/OpenRefine/OpenRefine/releases/download/3.5.0/openrefine-win-3.5.0.zip
  2. for Mac
    https://github.com/OpenRefine/OpenRefine/releases/download/3.5.0/openrefine-mac-3.5.0.dmg

Java must be installed and configured on your computer to run OpenRefine. It is recommended that you download and install Java before proceeding with the OpenRefine installation. Please note that OpenRefine works with Java 8 to Java 15 but not Java 16 or later versions.

Basic Steps to Clean Bibliographical Data using OpenRefine

Basically, there are two types of data that we can use to clean the data for the purpose of bibliometric analysis.

  1. Scopus.csv file - a file that has been downloaded from the Scopus database.
  2. Bibliometrix Exported File - an Excel file that have extracted from Biblioshiny

1. Open OpenRefine application

screenshot_2678.png

2.  Choose files that you want to clean and click Next.

screenshot_2679.png

3. Click Create Project

screenshot_2680.png

4. Identify the column that you want to clean (such as Author Keywords) and then click the Dropdown button of that column, click Edit Cell and then click Split Multi-Valued Cells.

screenshot_2681.png

5. Enter the separator used for that column.

screenshot_2683.png

6. Click the Dropdown button of the Author Keywords column and then click Facet and then Text Facet.

screenshot_2684.png

7. Edit the Facet on the left side of the screen by going through one by one of the keywords, OR using the Cluster function.

This is where the cleaning process takes place. Time spent here might be a little bit longer.

screenshot_2685.png

8. Once done with the cleaning, you need to Join the Multi-Valued cells. Please re-enter the separator which is supposed to be similar to the separator that you used at the time you split.

screenshot_2686.png

9. Then export the file back to the original format that you import. Now your file is ready to be used in Bibliometric Analysis.

screenshot_2687.png

 

CONTACTMe

Assoc. Prof. Ts. Dr. Aidi Ahmi
Tunku Puteri Intan Safinaz School of Accountancy
Universiti Utara Malaysia
06010 UUM Sintok
Kedah, MALAYSIA
P: +604 928-7222
[email protected]

QUOTEof the Day

"Be faithful to that which exists within yourself."
Andre Gide