제곱 함수와 n 제곱 함수 만들기¶

In [3]:

#제곱함수
def my_sq(x):
    return x**2

#n제곱 함수 
def my_exp(x, n):
    return x**n

In [4]:

print(my_sq(4))
print(my_exp(2,4))

16
16

시리즈와 apply 메서드¶

In [5]:

# 데이터프레임 준비
import pandas as pd

df=pd.DataFrame({'a':[10, 20, 30], 'b':[20,30,40]})
df

Out[5]:

	a	b
0	10	20
1	20	30
2	30	40

In [6]:

print(df['a']**2)

0    100
1    400
2    900
Name: a, dtype: int64

In [7]:

# 1개의 인자를 받도록 구성되어 있다면 인잣값 생략
sq=df['a'].apply(my_sq)
sq

Out[7]:

0    100
1    400
2    900
Name: a, dtype: int64

In [10]:

# 2개 인자 전달하는 n 제곱 함수 my_exp와 apply 메서드 사용
ex=df['a'].apply(my_exp, n=2)
print(ex)
ex=df['a'].apply(my_exp, n=3)
print(ex)

0    100
1    400
2    900
Name: a, dtype: int64
0     1000
1     8000
2    27000
Name: a, dtype: int64

데이터 프레임과 apply 메서드¶

In [11]:

df=pd.DataFrame({'a': [10,20,30], 'b':[20,30,40]})
print(df)

In [13]:

# 1개의 값 전달받아 출력하는 함수
def print_me(x):
    print(x)

In [14]:

print(df.apply(print_me, axis=0))

0    10
1    20
2    30
Name: a, dtype: int64
0    20
1    30
2    40
Name: b, dtype: int64
a    None
b    None
dtype: object

In [15]:

print(df['a'])
print(df['b'])

0    10
1    20
2    30
Name: a, dtype: int64
0    20
1    30
2    40
Name: b, dtype: int64

In [16]:

# 3개 인자 입력받아 평균 계산하는 함수
def avg_3 (x,y,z):
    return(x+y+z)/3

In [17]:

# 3개의 인잣값 필요로하는 함수인데 1개의 인잣값만 입력받았다는 오류 뜰거임 apply사용하면/ 따라서 avg_3 함수가 열 단위로 데이터 처리할 수 있도록 수정
print(df.apply(avg_3))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Users\Public\Documents\ESTsoft\CreatorTemp\ipykernel_31508\4003087714.py in <module>
      1 # 3개의 인잣값 필요로하는 함수인데 1개의 인잣값만 입력받았다는 오류 뜰거임 apply사용하면/ 따라서 avg_3 함수가 열 단위로 데이터 처리할 수 있도록 수정
----> 2 print(df.apply(avg_3))

~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
   8846             kwargs=kwargs,
   8847         )
-> 8848         return op.apply().__finalize__(self, method="apply")
   8849 
   8850     def applymap(

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply(self)
    731             return self.apply_raw()
    732 
--> 733         return self.apply_standard()
    734 
    735     def agg(self):

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    855 
    856     def apply_standard(self):
--> 857         results, res_index = self.apply_series_generator()
    858 
    859         # wrap results

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    871             for i, v in enumerate(series_gen):
    872                 # ignore SettingWithCopy here in case the user mutates
--> 873                 results[i] = self.f(v)
    874                 if isinstance(results[i], ABCSeries):
    875                     # If we have a view on v, we need to make a copy because

TypeError: avg_3() missing 2 required positional arguments: 'y' and 'z'

In [18]:

def avg_3_apply(col):
    x=col[0]
    y=col[1]
    z=col[2]
    return(x+y+z)/3
print(df.apply(avg_3_apply))

a    20.0
b    30.0
dtype: float64

In [20]:

# 앞에는 3이라는 인자를 알았을 경우임
#일반적으로는 for 문 사용
def avg_3_apply(col):
    sum=0
    for item in col:
        sum+=item
    return sum/df.shape[0]

In [23]:

# 위 함수 응용하면 행방향으로 데이터 처리하는 함수 만들 수 있음/ 마지막 return 문의 df.shape[0]-> df.shape[1]
def avg_2_apply(row):
    sum=0
    for item in row:
        sum+= item
    return sum/df.shape[1]

print(df.apply(avg_2_apply, axis=1))

0    15.0
1    25.0
2    35.0
dtype: float64

데이터프레임의 누락값을 처리한 다음 apply 메서드 사용하기 - 열 방향¶

In [28]:

import seaborn as sns
titanic=sns.load_dataset("titanic")

titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB

In [30]:

# 누락값 개수 반환하는 count_missing
import numpy as np

def count_missing(vec):
    null_vec=pd.isnull(vec)
    null_count=np.sum(null_vec)
    return null_count

In [31]:

# apply 메서드에 count _missing함수 전달
cmis_col=titanic.apply(count_missing)
cmis_col

Out[31]:

survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64

In [33]:

# 누락값 비율 계산하는 prop_missing함수 , 누락값 개수 구하고 size 속성 이용해 데이터프레임의 전체 데이터 수 나누면 누락값 비율 계산
def prop_missing(vec):
    num=count_missing(vec)
    dem=vec.size
    return num/dem

pmis_col=titanic.apply(prop_missing)
print(pmis_col)

survived       0.000000
pclass         0.000000
sex            0.000000
age            0.198653
sibsp          0.000000
parch          0.000000
fare           0.000000
embarked       0.002245
class          0.000000
who            0.000000
adult_male     0.000000
deck           0.772166
embark_town    0.002245
alive          0.000000
alone          0.000000
dtype: float64

In [35]:

def prop_complete(vec):
    return 1-prop_missing(vec)

데이터프레임의 누락값을 처리한 다음 apply 메서드 사용하기 - 행 방뱡¶

In [37]:

cmis_row=titanic.apply(count_missing, axis=1)
pmis_row=titanic.apply(prop_missing, axis=1)
pcom_row=titanic.apply(prop_complete, axis=1)

print(cmis_row.head())
print(pmis_row.head())
print(pcom_row.head())

0    1
1    0
2    1
3    0
4    1
dtype: int64
0    0.066667
1    0.000000
2    0.066667
3    0.000000
4    0.066667
dtype: float64
0    0.933333
1    1.000000
2    0.933333
3    1.000000
4    0.933333
dtype: float64

In [38]:

# 누락값 개수 구해 타이타닉 데이터 프레임에 추가한 것 - num_missing 열 추가
titanic['num_missing']=titanic.apply(count_missing, axis=1)

titanic.head()

Out[38]:

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone	num_missing
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False	1
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False	0
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True	1
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False	0
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True	1

In [39]:

# 누락값만 있는 데이터 추출
titanic.loc[titanic.num_missing > 1,:].sample(10)

Out[39]:

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone	num_missing
468	0	3	male	NaN	0	0	7.7250	Q	Third	man	True	NaN	Queenstown	no	True	2
593	0	3	female	NaN	0	2	7.7500	Q	Third	woman	False	NaN	Queenstown	no	False	2
648	0	3	male	NaN	0	0	7.5500	S	Third	man	True	NaN	Southampton	no	True	2
229	0	3	female	NaN	3	1	25.4667	S	Third	woman	False	NaN	Southampton	no	False	2
790	0	3	male	NaN	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True	2
36	1	3	male	NaN	0	0	7.2292	C	Third	man	True	NaN	Cherbourg	yes	True	2
643	1	3	male	NaN	0	0	56.4958	S	Third	man	True	NaN	Southampton	yes	True	2
650	0	3	male	NaN	0	0	7.8958	S	Third	man	True	NaN	Southampton	no	True	2
602	0	1	male	NaN	0	0	42.4000	S	First	man	True	NaN	Southampton	no	True	2
639	0	3	male	NaN	1	0	16.1000	S	Third	man	True	NaN	Southampton	no	False	2

판다스 10장 - apply메서드 이용

제곱 함수와 n 제곱 함수 만들기¶

시리즈와 apply 메서드¶

데이터 프레임과 apply 메서드¶

데이터프레임의 누락값을 처리한 다음 apply 메서드 사용하기 - 열 방향¶

데이터프레임의 누락값을 처리한 다음 apply 메서드 사용하기 - 행 방뱡¶