Mar 22 2021

Using NumPy and Pandas to start your first step of data analysis

After reading our previous articles, you might have already known how to get the data from TEJ API, store it into your computer, and update automatically! Then we are going to tell you

how to analyze this data by using these two important packages- Numpy and Pandas

Table of Contents

- Numpy Intro/Application
- Pandas Intro/Application

- API Official Website: TEJ API Official Website
- The Product Package: TEJ E SHOP
- Source Code: TEJ GITHUB

Numpy is designed to conveniently and efficiently process n-dimensional and large-scale data arrays. With built-in functions, users could perform preliminary and rapid data processing.

**Basic Application－Single Dimension**

```
import numpy as np
a = np.array([0, 0.5, 1.0, 1.5, 2.0]) #float ndarray -1-1.
b = np.array(['a', 'b', 'c']) #string ndarray -1-2.
c = np.arange(0, 10, 2) #array([0, 2, 4, 6, 8]) -2.
c[2:] #array([4, 6, 8]) -3.
c[:2] #array([0, 2]) -4.
```

Examples above:

- Create a float data type array; string data type array
- Through np.arange() function, creating an array starts with 0, ends with 2, and the interval is 2.
- In python,
**”[]” means select, and ”: ” means to….**But what we have to notice is that**the location of the first element is 0 instead of 1 in python.**Therefore**, c[2:] means selecting the element from location 2 to the end (include the last element).** - Same as above, but if we change from
**c[2:] to c[:2]**, which means selecting**from start to location 1( location 2 is not included)!!**

- Mathematical Tools

```
a = np.arange(0, 30, 2) #array([0, 2, 4, ..., 28])
a.sum() #210 -1-1.
a.mean() #14.0 -1-2.
a.std() #8.640987 -1-3.
a.cumsum() #array([0, 2, 6, 12, ...,210]) -1-4.
lst = [0, 2, 4]
lst*2 = [0, 2, 4, 0, 2, 4] -2-1.
a+a #array([0, 4, 8, ..., 56]) -2-2.
a*a #array([0, 4, 16, ..., 784]) -2-3.
```

Examples above:

- the sum of array a; average; standard deviation; cumulative sum
- elements in array an add with the corresponding position; multiply with the corresponding position

The first example is to use numpy built-in functions to calculate. In the second example, we can see the numpy vectorized computation. If we multiply a list(2–1) by 2, **the number of elements in the list will double instead of doubling the value.** But if it is numpy array(2–2, 2–3), it is possible to perform mathematical operations on the** corresponding positions of the elements** in the array~💪💪

**Basic Application－Multiple Dimensions**

```
b = np.array([a, a*2]) #array([0, 2, 4, ..., 28],
[0, 4, 8, ..., 56])
b[0] #array([0, 2, 4, ..., 28]) -1-1.
b[0][1] #2 -1-2.
b.sum(axis = 1) #array([210, 420]) -1-3.
b.shape #(2, 15) -2-1.
b.reshape(5,6) #array([0, 2, 4, 6, 8, 10], -2-2.
...
[36, 40, ..., 56]])
```

Examples above:

- Select the first row of array b; select the second element of the first row of the array b; row sum of array b
- Shape(2*15) of array b; change to a new shape(2*15 -> 5*6)

Next, let’s take a look at how numpy performs on multi-dimensional arrays. Similarly, we also use “[]” to select. The difference is that there are more elements that can be selected, so we can **use 2 “[][]” to select column and position respectively.** If we want to do some matrix operations, we can use **shape functions** in numpy to check and find the desired shape to do the calculation.~💪💪

**Other Applications－Boolean, Random Variables, Financial Functions**

```
#boolean
b > 15 #array([False, False, ..., True], -1-1.
[False, False, ..., True])
np.where(b>15, 1, 0) #array([0, 0, ..., 1], -1-2.
[0, 0, ..., 1])
#random variable
np.random.normal(5, 2, 10) -2-1.
np.random.standard_normal(5) -2-2.
#financial
pip install numpy_financial
import numpy_financial as npf
npf.fv(0.03, 5, 0, -1000) #1159.27 -3-1.
#fv(rate, nper, pmt, pv)
npf.irr([-95, 3, 3, 3, 103]) #0.0439 -3-2.
#irr(values)
```

Examples above:

**Boolean:**We can**directly use inequality**(bigger than 15 in the example) to find the corresponding T/F array**in numpy array**or use np.where() function to make a new way of judging T/F (T is 1, F is 0 in the example).**Random Variables:**Using different distributions in statistics to generate random variables, such as the**normal distribution**in the example(mean 5, std 2, 10 elements), and**standard normal distribution**, and so on.**Financial Functions:**In numpy, there is also a package designed for financial functions such as fv, pv, and irr which will be used when discounting. But we will need to install this package separately. All functions included in this package can be checked in**HERE**~.

Numpy has many applications for data processing, so it is very difficult for us to tell you all of them in just one article😢. Therefore, if you are interested in numpy, you can go through** ****Numpy Official Website** or leave the message below!💪💪

Pandas is a package that specializes in analyzing table data. Just like Excel, it presents data in a format we called DataFrame in order to help users analyze data more conveniently, especially for financial time series data.

**Basic Application**

```
import pandas as pd
df = pd.DataFrame([1, 2, 3, 4],
columns = ['Numbers'],
index = ['index_a','index_b','index_c','index_d'])
```

From the codes above, we can** create a table with column name “Numbers” and row names ”index_a, b, c, and d” respectively.**

```
df.loc['index_a'] #Numbers 1 -1-1.
df.iloc[0:2] #refer to source code -1-2.
df * 2 #same as numpy -1-3.
#add "Name" column
df['Name'] = ['Amy', 'Bob', 'Catherine', 'Duke'] -2-1.
#select whole column
df['Numbers'] -2-2.
#delete column
df.drop('column name', axis=1) -2-3.
#Math
df['Numbers'].sum() #10 -3-1.
df['Numbers'].mean() #2.5 -3-2.
df['Numbers'].std() #1.291 -3-3.
```

Examples above:

- Use loc and iloc to find the corresponding value. It should be noted that
**loc is the name of the column/row,**so we have to enter the name when selecting, while**iloc is the position corresponding to the element. For example(1–2), select the elements from the start to position 1 (2 is not included!).** - Add; select; delete the column
- Sum of the whole df; average; standard deviation

Like the numpy arrays which we have mentioned earlier, in Pandas, we also use brackets **[“column name”] to select or add columns. **But we will have to use the drop() function to delete columns. For operations, pandas dataFrame can perform basic statistical calculations in tables.**~**💪💪

**Basic Data Analysis**

```
import tejapi
tejapi.ApiConfig.api_key = “你的api_key”
df = tejapi.get('TWN/EWPRCD',
coid = ['2330'],
mdate={'gte':'2020-01-01', 'lte':'2020-12-31'},
opts={'columns': ['mdate','open_d','high_d','low_d','close_d']},
paginate=True
)
#Math
df.describe()
np.mean(df)
np.log(df)
#Plot
df['close_d'].plot()
```

The sample data we used for pandas data analysis is **2330.TW stock price daily data got from the TEJ API. **Then, most of the statistics that may be used further can be obtained through **describe() function**(figure above👆). If we want to do some operations on these values, we could **directly use numpy** to perform operations on the entire table!

Last is the data visualization. There are several ways for users to plot the graph in python, and Pandas provides a very very easy one! If the chart we want to present **is not complicated **such as simple stock daily price, daily return, etc. We can** select the column and use the plot() function to directly see the result! **(figure above👆)

The only thing we have to note here is that the **X and Y axes in the chart are the index and data you select respectively**. That’s why **we use a set_index() function to process our raw data at first.**

What we share with you this time is how to use Numpy and Pandas packages to do the data analysis. However, it is very difficult for us to explain all the functions included in these 2 packages. Therefore, if you have any question or interested in any topic, you could go to their websites or leave the message below ❗️❗️ Then, we will **go further into financial data analysis and applications in the next article**, please look forward to it ❗️❗️

Finally, if you like this topic, please click 👏 below, giving us more support and encouragement. Additionally, if you have any questions or suggestions, please leave a message or email us, we will try our best to reply to you.👍👍

- 1️⃣ API Official Website: TEJ API Official Website
- 2️⃣ The Product Package: TEJ E SHOP
- 3️⃣ Source Code: TEJ GITHUB

Category