Application of K-Means Cluster and Spatial Statistics using Python to Analyze the Indicators of Indonesia Information Technology

The use of computers and the internet is very important for business improvement. Analysis of its use for delineation and development plans in order to provide a better role in the business field. The problem is that there is no information technology literacy map in Indonesia that can provide an overview for national policy formulation. The research was carried out to compile a map of mastery of information technology in Indonesia by data mining from the Central Bureau of Statistics and analyzed it into 4 clusters of mastery of information technology. The presentation results in the form of a spatial statistical map showing the mastery of information technology makes it easier for executive decisions to be made, which can be followed up with education, socialization and other floating plans to increase indications of mastery of information technology to increase business success.


Introduction
In recent years there has been a revolution in the use of computing and communication technology, and all of them indicate that technological progress and use of information technology will continue. Very rapid developments in the field of computing, information, and communication technology have changed the way people do business and in life in general. Information technology helps companies to be able to reach more customers appropriately, introduce new products and services quickly, control marketing, and collaborate with suppliers and business partners from various regions of the world. The transformation from an industrial society to an information society and an industrial economy into a knowledge economy is the result of the impact of the use of ICTs and the Internet [1]. Thus, it becomes very important to ensure the use of ICTs and the internet so that it can be well planned for improvement so that more can be used for business improvement. In addition, during the crisis caused by the co-19 pandemic, the use of information and communication technology (ICT) increased rapidly. ICT's are very important in keeping the economy going, allowing large groups of people to work and learn from home, improving social communication online, providing entertainment that is uniquely diverse and needed. [2], [3], [4]. In addition, the world of education has also experienced fundamental changes in its learning by applying online learning to most of its learning models [5]. Thus, a massive and fast policy is needed for the development of ICT in Indonesia. For this reason, support for mapping the level of ICT literacy is needed. So, these policies can be more effective and efficient than before. On the other hand, research on the mapping of ICT technology literacy is still limited to local research [6], as well as national scale research but does not describe the mapping of ICT mastery [7], [8]. So that this mapping can complement the shortcomings of this research and will improve decision making in the ICT field. Furthermore, research on mapping is carried out with a grouping technique as a technique in which a group of objects is put into a group called a cluster. Furthermore, in data management, the grouping technique as a technique in which a set of objects is inserted into a group called clusters. This grouping is very suitable for obtaining data x, No. x, July 201x : first_page -end_page 12 grouping and can be used to find out quickly the data position compared to others, evaluation, and follow-up planning. Thus, clustering as a very important part of data mining is needed. Kmeans clustering is a very well-known clustering technique and algorithm. It is also known as the nearest neighbor search [9], [10], [11]. Furthermore, the grouping/mastery of computer and internet usage indicators can be made grouping to facilitate analysis. Next, the spatial representation of data will help facilitate analysis. Some researchers visualized their data and analysis in a spatial figure [12], [13]. This is consistent with the results of research from [14] who explained that: effective communication media that present and convey data and information to help readers understand the context of reading well and effectively present complex information is in the form of a combination of text, tables, and graphics. [15]. Specifically, presenting the results of calculations in spatial data will get the same benefits as the results of research namely getting effective communication and interest [16]. Based on the explanation above, the state of the art of this research has been analyzing data, grouping them in some clusters and visualize on the map. The problem that should be researched is ICT indicators data not yet to be analyzed. It makes difficulties to give treatment to increase ICT indicators in Indonesia. So, it becomes important, interesting, and effective if Indonesian ICT indicator data that can be extracted from the Central Statistics Agency data needs to be analyzed by clustering and presented in a spatial map in accordance with the position of each province in Indonesia. Based on the explanation above, the aim of this research is: to analyze the cluster analysis and spatial map of ICT indicators for each Indonesian province.

Research method
Based on the theory of clustering [10], [11], spatial statistics [15] and python programming, this research is carried out with the following steps: Mining data from the Central Statistics Agency in the form of data on the use/mastery of computers and provincial internet use in Indonesia in 2018, in the 2019 information technology data report. Presentation of data that has been mined in a diagram of data points so that it can know the position and spread of data. The elbow method is used to get the best number of clusters in the k-mean cluster method. Data cluster calculations are performed using the k-mean cluster method, using the k calculation results in step c. The presentation of the results of clustering is carried out on maps of provinces throughout Indonesia using the theory of spatial statistics. Furthermore, the calculation results in the steps above can be described in accordance with the flow chart as follows:

Results and Discussion
Based on data obtained from www.bps.id [17] an internet access resume was carried out by the family in the last three months of 2018 as well as computer ownership/ownership by families in Indonesia by province, as follows. The K-means Cluster calculation is then performed and the calculation results are presented on a map of the provinces in Indonesia using Python Software 3.8 with the following steps.

Plot the Map of Provinces with Python
Plot the map of Indonesia provinces with python using the matplotlib, pandas, and numpy libraries as follows.  In the plot, a map of the provinces in Indonesia will be used as a basis for visualizing clustering results. Furthermore, the plot is continued by clustering the data according to table 1 using kmeans clustering as follows.

K-Means Clustering Process
First: described the position of data internet access and computer control, as follows.

Figure 4.
Pseudocode to Describe the Position of Data Furthermore, based on the code above, the following data distribution output is generated: and select k where the WSS first starts to decrease. Then plot WSS versus k plots, this is seen as a pattern changing like an elbow [16]. The steps can be explained as follows: Calculate K-Means grouping for different K values by varying K from 1 to 10 clusters. Calculate the total WCSS for each K. Plot WCSS curve vs. number of K clusters The location of bends such as the elbows in the plot is generally considered to be the most appropriate indicator of the number of clusters. Then, the whole steps of the elbow method are written in python with numpy library using the following pseudocode: k_means = KMeans(n_clusters=k, init="k-means++") k_means.fit(X) wcss.append(kmeans.inertia) plt. figure(figsize=(12,6)) plt.grid() plt.plot(range (1,11),wcss, linewidth=2, color="red", marker ="8") plt.xlabel("K Value") plt.xticks(np.arange(1,11,1)) plt.ylabel("WCSS") plt.show()

15
In cluster analysis, the elbow method is a method used in determining the number of clusters from a set of data. The elbow method is implemented by plotting the variation in the function of the number of clusters, and selecting the k value at the curve elbow as the number of clusters to be used. The output of the optimum k determination program is as shown below. The elbow calculation results show that k = 2, k=3, and k=4 is the optimum value. If we use k = 2 or k = 3 then the results achieved for the analysis are not optimal because they only differentiate between 2 or 3 data, even though differences in the treatment of the results of the analysis are needed to achieve an optimal solution to the problem of using ICT in Indonesia. So, we can used clustering with k=4 or 4-mean cluster. With k = 4 a clustering plot is made as follows.  The results indicated areas in cluster 0, 1, 2, and 3. We have 7 provinces in cluster_0 12 provinces in cluster_1, 13 provinces in cluster_2 and 3 provinces in cluster_3. From the results of clustering process, we get the result as follows: Some provinces need to special attention to increase the value of ICT indicators. But, it still need to be visualized in a map to explain to Indonesia society and government to get more serious follow-up.

Display Clustering Result at Indonesia Map
From Figure 5 it appears that the data plot and cluster center on the four clusters shows the existence of a good cluster center distribution with existing data. Thus, further analysis can be done. Then, labeling the data using the nearest method from the prediction point with the following steps.
The result is: [2 2 2 0 0 2 1 1 1 1 1 2 1 1 2 0 2 0 2 2 0 1 1 2 3 2 2 1 2 3 0 1 0 1] From the result, we can see that its labelling use integers from 0 untuk 3. The same labels indicate that the provinces is in the same cluster. The order of labeling is in accordance with the order in the list of provinces that has been compiled so that the placement of provinces in the next process is not wrong. From the results of the labeling, then displayed on a map of Indonesia, as follows. The output of the above program is: