{"id":17823,"date":"2023-06-06T14:00:00","date_gmt":"2023-06-06T06:00:00","guid":{"rendered":"https:\/\/www.tejwin.com\/?post_type=insight&#038;p=17823"},"modified":"2026-02-25T13:35:18","modified_gmt":"2026-02-25T05:35:18","slug":"employee-turnover-rate-prediction","status":"publish","type":"insight","link":"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/","title":{"rendered":"Employee Turnover Rate Prediction"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/0Hmifsm4X3DrQZVMQ.jpg\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@isaacmsmith?utm_source=medium&amp;utm_medium=referral\" rel=\"noreferrer noopener\" target=\"_blank\">Isaac Smith<\/a> on&nbsp;<a href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\" rel=\"noreferrer noopener\" target=\"_blank\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a6c833a2a16a\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"ez-toc-cssicon\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a6c833a2a16a\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Summary_of_Key_Points_in_This_Article\" >Summary of Key Points in This Article<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Editing_Environment_and_Module_Requirements\" >Editing Environment and Module Requirements<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Database_Utilization\" >Database Utilization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Data_Import\" >Data Import<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Label\" >Label<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#%E7%89%B9%E5%BE%B5\" >\u7279\u5fb5<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Data_Preprocessing\" >Data Preprocessing<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Column_Selection\" >Column Selection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Data_Table_Merge\" >Data Table Merge<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Data_Transformation\" >Data Transformation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Feature_and_Label_Split\" >Feature and Label Split<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Model_Building\" >Model Building<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Backward_Elimination\" >Backward Elimination<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Source_code\" >Source code<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Further_reading\" >Further reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.tejwin.com\/en\/insight\/employee-turnover-rate-prediction\/#Related_links\" >Related links<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Summary_of_Key_Points_in_This_Article\"><\/span>Summary of Key Points in This Article<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Article Difficulty\uff1a\u2605\u2605\u2606\u2606\u2606<\/li>\n\n\n\n<li>Building a Multiple Linear Regression Model and Improving Model Accuracy Using Backward Elimination<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Employee turnover rate refers to the fluctuations in the workforce of a company due to employee departures and new hires within a specific period. This metric is an important concept for assessing the stability of an organizational workforce. A lower employee turnover rate indicates relatively fewer personnel changes within the company, reflecting internal stability and continuity. Conversely, a higher turnover rate may suggest organizational issues, job dissatisfaction, or other factors that could have a negative impact on business operations and the work environment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Monitoring employee turnover rates can help companies understand and evaluate the effectiveness of their human resource management strategies and take appropriate measures to improve employee retention and satisfaction. This ensures long-term stability and development of the organization. Predicting employee turnover rates allows companies to better plan and manage their human resources, reduce costs, improve talent retention, and enhance organizational efficiency.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Editing_Environment_and_Module_Requirements\"><\/span>Editing Environment and Module Requirements<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This article utilizes the MacOS operating system and Jupyter Notebook as the editor.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># <\/code>Loading Required Packages<code>\nimport pandas as pd \nimport numpy as np \nimport tejapi\nfrom collections import Counter\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\nimport statsmodels.regression.linear_model as sm\nimport matplotlib.pyplot as plt\n\n# Log in TEJ API\napi_key = 'YOUR_KEY'\ntejapi.ApiConfig.api_key = api_key\ntejapi.ApiConfig.ignoretz = True<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Database_Utilization\"><\/span>Database Utilization<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/columns.html?idCode=TWN\/ACSR01A\" target=\"_blank\" rel=\"noreferrer noopener\">TWN\/ACSR01A<\/a>(Employee Turnover Rate)<\/li>\n\n\n\n<li><a href=\"https:\/\/api.tej.com.tw\/columns.html?idCode=TWN\/AXEMPA\" target=\"_blank\" rel=\"noreferrer noopener\">TWN\/AXEMPA<\/a>(Educational Composition of Listed (OTC) Employees)<\/li>\n\n\n\n<li><a href=\"https:\/\/api.tej.com.tw\/columns.html?idCode=TWN\/ACSR20A\" target=\"_blank\" rel=\"noreferrer noopener\">TWN\/ACSR20A<\/a>(Enterprises Violating Labor Standards Act)<\/li>\n\n\n\n<li><a href=\"https:\/\/api.tej.com.tw\/columns.html?idCode=TWN\/ACSR19A\" target=\"_blank\" rel=\"noreferrer noopener\">TWN\/ACSR19A<\/a>(TWSE Corporate Governance Evaluation)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Import\"><\/span>Data Import<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The fields used in this implementation are as follows. Please note that &#8220;violate_times&#8221; is not an original field in the data table; it is generated after preprocessing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Label\"><\/span>Label<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>turn_rate&nbsp;: Employee Turnover Rate<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"%E7%89%B9%E5%BE%B5\"><\/span>\u7279\u5fb5<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>num_resign&nbsp;: Resignation Count<\/li>\n\n\n\n<li>num_staff&nbsp;: Employee Count<\/li>\n\n\n\n<li>workage_avg&nbsp;: Average Employee Tenure<\/li>\n\n\n\n<li>age_avg&nbsp;: Average Employee Age<\/li>\n\n\n\n<li>apct&nbsp;: Doctorate %<\/li>\n\n\n\n<li>bpct&nbsp;: Master&#8217;s%<\/li>\n\n\n\n<li>cpct&nbsp;: Bachelor&#8217;s%<\/li>\n\n\n\n<li>dpct&nbsp;: High School %<\/li>\n\n\n\n<li>epct&nbsp;: Below High School %<\/li>\n\n\n\n<li>emp_sum&nbsp;: Total Employee Count<\/li>\n\n\n\n<li>emp_age&nbsp;: Average Age<\/li>\n\n\n\n<li>emp_yr&nbsp;: Average Tenure<\/li>\n\n\n\n<li>rating&nbsp;: Rating<\/li>\n\n\n\n<li>violate_times&nbsp;: Number of Violations of Labor Standards Act<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>gte, lte = '2021-01-01', '2021-12-31'\nTR = tejapi.get('TWN\/ACSR01A',\n                   paginate = True,\n                   mdate = {'gte':gte, 'lte':lte},             \n                  )\n\ngte, lte = '2020-01-01', '2021-01-01'\nED = tejapi.get('TWN\/AXEMPA',\n                   paginate = True,\n                   mdate = {'gte':gte, 'lte':lte},             \n                  )\n\ngte, lte = '2020-01-01', '2021-01-01'\nLAW = tejapi.get('TWN\/ACSR20A',\n                   paginate = True,\n                   mdate = {'gte':gte, 'lte':lte},             \n                  )\n\ngte, lte = '2021-01-01', '2021-12-31'\nTWSE = tejapi.get('TWN\/ACSR19A',\n                   paginate = True,\n                   mdate = {'gte':gte, 'lte':lte},             \n                  )<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Preprocessing\"><\/span>Data Preprocessing<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Column_Selection\"><\/span>Column Selection<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We filter the columns needed from the education level and TWSE Corporate Governance Evaluation tables and calculate the number of violations of the Labor Standards Act. Next, since the company codes in the TWSE Corporate Governance Evaluation table are in numeric format, which could lead to merging errors, we need to convert them into string type.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>ED = ED[[\"coid\", \"apct\", \"bpct\" ,\"cpct\", \"dpct\", \"epct\", \"fpct\" , \"emp_sum\"]]\nLAW = pd.DataFrame(data = [Counter(LAW[\"coid\"]).keys(), Counter(LAW[\"coid\"]).values()]).T.rename(columns = {0:\"coid\", 1:\"violate_times\"})\nTWSE = TWSE[[\"coid\", \"rating\"]]\nTWSE[\"coid\"] = TWSE[\"coid\"].astype(str)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Table_Merge\"><\/span>Data Table Merge<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>df_main = TR\nfor i in [ED, TWSE, LAW]:\n    df_main = pd.merge(df_main, i, on = \"coid\")<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1d6i-tbuTnXVLic8pd1qFVA.png\" alt=\"df_main\n\"\/><figcaption class=\"wp-element-caption\">df_main<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Transformation\"><\/span>Data Transformation<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Due to the presence of missing values in the dataset, which could hinder the training of the regression model, it&#8217;s important to note that filling missing values with zeros may not always be the best approach. There are various methods for handling missing values depending on the nature of the data and the desired outcomes. However, since the focus of this implementation is on model building, we will fill missing values with zeros for computational convenience.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Furthermore, we will convert the governance rating from string format to numeric format to align with the input requirements of the regression analysis model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Feature_and_Label_Split\"><\/span>Feature and Label Split<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>X = dataset_regression.iloc[:, 3:].values\ny = dataset_regression.iloc[:, 2].values<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Model_Building\"><\/span>Model Building<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We split the data into a training set and a test set in an 8:2 ratio and set a random_state to ensure consistent results each time the model is executed. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After feeding the test set into the model, we compare the actual values with the model&#8217;s predictions.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>x_train, x_test, y_train, y_test= train_test_split(X, y, test_size=0.8, random_state=42)\nregressor = LinearRegression()#<\/code>Create an object named 'regressor'.<code>\nregressor.fit(x_train, y_train)#<\/code>Train a Linear Regression Model<code>\n\ny_pred = regressor.predict(x_test)\nprint(y_pred.shape)#y_pred <\/code>For a one-dimensional vector<code>\n\nnp.set_printoptions(precision = 2)#<\/code>Displaying two decimal places<code>\nP_vs_T = np.concatenate((y_pred.reshape(len(y_pred),1),y_test.reshape(len(y_test),1)),1)\nprint(P_vs_T)\n#<\/code>Convert y_pred and y_test to 2D arrays with dimensions (len(y_pred), 1) and then merge them.<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center has-mobile-text-align-center\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1O_chVbr91gWDgcfwjw6bCQ.png\" alt=\"\u5be6\u969b\u503c V.S\u00a0\u6a21\u578b\u9810\u6e2c\u503c\" style=\"width:243px;height:418px\" width=\"243\" height=\"418\"\/><figcaption class=\"wp-element-caption\">Actual Values vs. Model Predictions<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By observing the values, it can be noted that some predicted values have significant errors, and there are even negative values. To gain a clear understanding of the accuracy of the numerical predictions, we will use a bar chart for easy visualization. Additionally, we define predictions as inaccurate when they differ from the actual values by more than 2%.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>classification = [1 if abs(x-y)&gt;1 else 0 for x,y in P_vs_T ]\nprint(Counter(classification))\nplt.bar(x = [0,1], height = Counter(classification).values(),tick_label=[\"False\", \"True\"] )\nplt.show()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1rvevjyHZVhQfgCWAsJrGYA.png\" alt=\"bar chart of classification\" style=\"width:426px;height:304px\" width=\"426\" height=\"304\"\/><figcaption class=\"wp-element-caption\">bar chart of classification<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">However, through the chart, we can observe that the overall performance of the model is quite good, with an accuracy of 88%.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>#<\/code>Output Model Intercept and Coefficients<code>\nprint(regressor.fit(x_train,y_train).intercept_)\nprint(regressor.fit(x_train,y_train).coef_)<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/129iIXY2Dg2bepHSJ1O_JjA.png\" alt=\"intercept and coefficient\" style=\"width:679px;height:72px\" width=\"679\" height=\"72\"\/><figcaption class=\"wp-element-caption\">intercept and coefficient<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Backward_Elimination\"><\/span>Backward Elimination<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Backward elimination is a feature selection method used to exclude features from a model that have no significant impact on the target variable. It starts with the initial full model and gradually removes the features that have the least impact on the model&#8217;s performance in each step, until the remaining features meet a certain criterion, such as a significance level (typically when the p-value is less than 0.05). This approach helps prevent overfitting, reduces model complexity, and improves the interpretability and predictive power of the model. Backward elimination is widely used in multiple linear regression to select the most influential features for the target variable.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The choice of the significance level, why it&#8217;s typically set at p-value &lt; 0.05, and the concept of hypothesis testing:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Backward elimination involves hypothesis testing, which is a statistical method used to make inferences about population parameters based on sample data. In hypothesis testing, two opposing hypotheses are proposed: the null hypothesis and the alternative hypothesis. The null hypothesis typically represents no effect, no difference, or no association, while the alternative hypothesis represents an effect, a difference, or an association.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Through the collection of sample data and statistical analysis, a test statistic is calculated to assess the support for or against the null hypothesis. By comparing the value of the test statistic to a predefined significance level, we make a statistical judgment about the results. If the test statistic&#8217;s value deviates significantly from what would be expected under the null hypothesis, we may reject the null hypothesis and support the alternative hypothesis. Otherwise, we fail to reject the null hypothesis.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The p-value represents the probability of obtaining the observed results or more extreme results when the null hypothesis is true. When the p-value is less than the predefined significance level (typically 0.05), we can reject the null hypothesis. This is because a p-value less than 0.05 indicates that the observed results are very rare, given the assumptions of the null hypothesis. Thus, rejecting the null hypothesis means that we have sufficient evidence to support the alternative hypothesis.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, it&#8217;s important to note that 0.05 is just a commonly used significance level. In some cases, a more stringent significance level (e.g., 0.01) may be chosen based on the nature of the study or the severity of the issue. Additionally, other factors such as sample size, study design, and effect size should be considered when making appropriate statistical judgments and interpreting results. Rejecting the null hypothesis is only one part of statistical inference and should be interpreted and reported with caution to avoid overinterpretation or misleading conclusions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In our case, we use a for loop to progressively eliminate parameters while retaining the model information with the highest accuracy.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># <\/code>Adding a Constant of 1 to Comply with the Regression Model<code> y = b0+b1*X+b2*X^2+b3*X^3...\nx_train = np.append(arr = np.ones((len(x_train[:,1]),1)).astype(int), values = x_train, axis = 1)\n\n# back elimination\ncol = [0,1,2,3,4,5,6,7,8,9,10,11,12,13]\n\nR_square = []\nfor i in range(len(col)):\n    x_opt=np.array(x_train[:,col], dtype=float)\n    regressor_OLS=sm.OLS(endog=y_train, exog= x_opt).fit()\n    R_square.append(regressor_OLS.rsquared)\n    \n    if regressor_OLS.rsquared == max(R_square):\n        summary = regressor_OLS.summary()\n        attribute = col.copy()\n        \n    P = regressor_OLS.pvalues.tolist()\n    print(col)\n    col.pop(P.index(max(P)))<\/code>\n\n    <\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1U9HmL6KYGaAcUbYNmLHQhQ.png\" alt=\"\u9010\u6b65\u6dd8\u6c70\u53c3\u6578\" style=\"width:540px;height:319px\" width=\"540\" height=\"319\"\/><figcaption class=\"wp-element-caption\">Stepwise Parameter Elimination<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We will visualize the R-squared values after eliminating parameters each time with a line chart. R-squared is a statistical measure used to assess the extent to which a regression model explains the variability of the dependent variable. It indicates the percentage of the variance in the dependent variable that can be explained by the independent variables in the model. R-squared values range from 0 to 1, where values closer to 1 indicate that the model can better explain the variability of the dependent variable, and values closer to 0 indicate weaker explanatory power of the model. Specifically, an R-squared of 0 means the model cannot explain the variance in the dependent variable, while an R-squared of 1 indicates that the model fully explains the variance in the dependent variable.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>plt.plot(R_square,'ro--', linewidth=2, markersize=6)\nplt.xticks(ticks = [i for i in range(0,14)], labels= [i for i in range(1,15)][::-1])\n\nplt.title(\"change in R_square\")\nplt.xlabel('number of attribute')\nplt.ylabel('R_square')\n\nplt.show()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1mUj29rt7xxCcFaoT3r1Vqg.png\" alt=\"change in\u00a0R_squared\n\" style=\"width:516px;height:348px\" width=\"516\" height=\"348\"\/><figcaption class=\"wp-element-caption\">change in&nbsp;R_squared<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We extract the parameters corresponding to the highest R-squared value and print out the names of the corresponding columns.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>col_map = dict(zip([0,1,2,3,4,5,6,7,8,9,10,11,], dataset_regression.iloc[:, 3:].columns))\n\nprint(attribute)\nprint(list(map(lambda x:col_map.get(x), attribute)))<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center\"><img decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1r3fR6IYosqusyTL9K-MmzA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Code Corresponding to Columns<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Viewing Statistics for the Highest R-squared<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.tejwin.com\/wp-content\/uploads\/1c6TMVhL6IeFvAKlJBVXljg.png\" alt=\"regressor_OLS summary\" style=\"width:531px;height:762px\" width=\"531\" height=\"762\"\/><figcaption class=\"wp-element-caption\">regressor_OLS summary<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Source_code\"><\/span>Source code<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/gist.github.com\/tej87681088\/1b88c84f3f23aae90ff03f30024ed46e\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Click here to <\/a><a href=\"https:\/\/gist.github.com\/tej87681088\/1b88c84f3f23aae90ff03f30024ed46e\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Github<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In conclusion, through reverse elimination, we successfully improved the model&#8217;s accuracy from an initial 46% to 77%. By analyzing the composition of parameters, we found that employee turnover is primarily influenced by factors such as years of service, salary, and the proportion of employees with education levels below a college degree. However, these conclusions are based on statistical principles, and the interpretation of the R-squared value should be combined with specific context and the characteristics of the model. It can help assess the model&#8217;s fitness and predictive capability and compare the strengths and weaknesses of different models. Therefore, it is essential to use R-squared cautiously and integrate it with other evaluation metrics and expert judgment for a comprehensive analysis to fully understand the model&#8217;s explanatory power and limitations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Further_reading\"><\/span>Further reading<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a aria-label=\"\u5973\u529b\u5d1b\u8d77\uff01\u5973\u6027\u8463\u4e8b \u65b0\u9762\u8c8c (opens in a new tab)\" href=\"https:\/\/www.tejwin.com\/en\/insight\/women-in-boardroom-in-2023\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">Rising Female Power! Women Directors&#8217; New Face<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.tejwin.com\/insight\/%e3%80%90tej%e7%9f%a5%e8%ad%98%e9%9b%86%e3%80%91%e4%bb%80%e9%ba%bc%e6%98%af%e5%85%ac%e5%8f%b8%e6%b2%bb%e7%90%86%e8%a9%95%e9%91%91%ef%bc%9f%e5%b0%8d%e5%85%ac%e5%8f%b8%e6%9c%89%e4%bb%80%e9%ba%bc\/\" class=\"ek-link\">What is Corporate Governance Evaluation? What Impact Does It Have on Companies?<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Related_links\"><\/span>Related links<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a aria-label=\"TEJ API \u8cc7\u6599\u5eab\u9996\u9801 (opens in a new tab)\" href=\"https:\/\/api.tej.com.tw\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">TEJ API Database Homepage<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_intro\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">TEJ E-Shop Complete Database Purchase<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Employee turnover rate refers to the fluctuation in human resources within a company during a specific period due to employee departures and new hires. This metric is a crucial concept for assessing the stability of both the organizational structure and the workforce within a company. A lower turnover rate indicates that there are relatively fewer personnel changes, reflecting stability and continuity within the organization. Conversely, a higher turnover rate may imply organizational issues, job dissatisfaction, or other factors that can have a negative impact on company operations and the work environment.<\/p>\n<p>Monitoring employee turnover rates helps companies understand and evaluate the effectiveness of their human resource management strategies. It enables them to take appropriate measures to improve employee retention and satisfaction, ensuring long-term stability and growth for the organization. Predicting turnover rates allows companies to better plan and manage their human resources, reduce costs, increase talent retention, and enhance organizational effectiveness.<\/p>\n","protected":false},"featured_media":10433,"template":"","tags":[2913,3204],"insight-category":[3651],"class_list":["post-17823","insight","type-insight","status-publish","has-post-thumbnail","hentry","tag-corporate-governance","tag-esg","insight-category-quant-data-science"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/17823","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":4,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/17823\/revisions"}],"predecessor-version":[{"id":43950,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight\/17823\/revisions\/43950"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media\/10433"}],"wp:attachment":[{"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/media?parent=17823"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/tags?post=17823"},{"taxonomy":"insight-category","embeddable":true,"href":"https:\/\/www.tejwin.com\/en\/wp-json\/wp\/v2\/insight-category?post=17823"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}