판다스

판다스 9장 문자열 처리하기

막막한 2023. 3. 13. 19:51

문자열 추출하기¶

In [2]:

# grail, a scratch 문자열 데이터 만들어 변수 word, sent에 저장
word= 'grail'
sent= 'a scratch'

word[0], sent[0]

Out[2]:

('g', 'a')

In [3]:

# 슬라이싱 사용해 여러 문자 한 번에 추출 - 0~2 인덱스의 문자 추출
word[0:3]

Out[3]:

'gra'

In [4]:

# 인덱스 -1로 지정하면 마지막 문자 추출
sent[-1], sent[-9:-8], sent[0:-8]

Out[4]:

('h', 'a', 'a')

전체 문자열 추출하기¶

In [7]:

#word[:3]
sent[2:len(sent)], sent[2:], sent[:]

Out[7]:

('scratch', 'scratch', 'a scratch')

In [8]:

# 문자열을 일정한 간격으로 건너뛰며 추출해야한다면 , 거리가 2인 인덱스 문자 추출 
sent[::2]

Out[8]:

'asrth'

In [ ]:

'''
<문자열 메서드>
capitalize- 첫 문자 대문자로 변환
count -  문자열 개수 반환
startswith-  문자열 특정문자로 시작하면 참이 된다
ednswith -  문자열 특정문자로 끝나면 참이 된다
find- 찾을 문자열의 첫 번째 인덱스 반환- 실패시 -1 반환
index- find 메서드와 같은 역할 수행, 실패 시 valueError 반환
isalpha- 모든문자가 알파벳이면 참
isdecimal- 모든문자가 숫자면 참
isalnum-  모든 문자가 알파벳이거나 숫자면 참
lower- 모든 문자 소문자로 변환
upper- 모든 문자 대문자로 변환
replace- 문자열의 문자를 다른 문자로 교체
strip- 문자열의 맨 앞과 맨 뒤에 있는 빈 칸 제거 
split- 구분자를 지정해 문자열 나누고 나눈 값들의 리스트 반환
partition- split 메서드와 비슷한 역할, 구분자 반환
center- 지정한 너비로 문자열 늘이고 문자열 가운데 정렬
zfill- 문자열 빈 칸을 0으로 채운다

'''

join, splitlines, replace 메서드 실습하기¶

In [9]:

# join 메서드 - 문자열 연결해 새로운 문자열 반환/ join 앞에 문자 ('') 지정하면 해당 문자를 단어 사이에 넣어 연결
d1= '40'
m1="46'"
s1='52.837"'
u1='N'

d2= '73'
m2="58'"
s2='26.837"'
u2='W'

coords=' '.join([d1, m1, s1, u1, d2, m2, s2, u2])
coords

Out[9]:

'40 46\' 52.837" N 73 58\' 26.837" W'

In [11]:

# splitilines 메서드 - 여러 행 가진 문자열을 분리한 다음 리스트로 반환
multi_str=""" Guard: What? Ridden on a horse?
King Arthur: Yes!
Guard: You're using coconuts!
King Arthur: What?
Guard: You've got ...coconuts[s] and you're bangin' 'em together.
"""

multi_str

Out[11]:

" Guard: What? Ridden on a horse?\nKing Arthur: Yes!\nGuard: You're using coconuts!\nKing Arthur: What?\nGuard: You've got ...coconuts[s] and you're bangin' 'em together.\n"

In [12]:

multi_str_split= multi_str.splitlines()
multi_str_split

Out[12]:

[' Guard: What? Ridden on a horse?',
 'King Arthur: Yes!',
 "Guard: You're using coconuts!",
 'King Arthur: What?',
 "Guard: You've got ...coconuts[s] and you're bangin' 'em together."]

In [13]:

# 인덱스 슬라이싱 응요하여 특정 문자열만 가져오기
guard= multi_str_split[::2]
guard

Out[13]:

[' Guard: What? Ridden on a horse?',
 "Guard: You're using coconuts!",
 "Guard: You've got ...coconuts[s] and you're bangin' 'em together."]

In [14]:

# replace 메서드 - guard 없애기
guard=multi_str.replace("Guard: ", "").splitlines()[::2]
guard

Out[14]:

[' What? Ridden on a horse?',
 "You're using coconuts!",
 "You've got ...coconuts[s] and you're bangin' 'em together."]

문자열 포매팅 실습하기¶

In [2]:

var='flesh wound'
s="It's just a {}! "

print(s.format(var))
print(s.format("scratch"))

It's just a flesh wound! 
It's just a scratch!

In [3]:

s= """
Blck Knight: 'Tis but a {0}.
King Arthur: A {0}? Your arm's off!
"""

print(s.format('scratch'))

Blck Knight: 'Tis but a scratch.
King Arthur: A scratch? Your arm's off!

In [4]:

# 플레이스 홀더에 변수지정 가능
#단, 포맷메서드에전달하는 문장ㄹ도 변수에 담아 전달 

s= ' Hayden Planetarium Coordinates: {lat}, {lon}'

print(s.format(lat='40.7815 N', lon='73.9733 W'))

 Hayden Planetarium Coordinates: 40.7815 N, 73.9733 W

수치값 포매팅 실습¶

In [5]:

#숫자도 데이터포매팅 가능
print('some digits of pi:{}'.format(3.141592))

some digits of pi:3.141592

In [7]:

# {}에 : , 를 넣으면 쉼표 넣어 숫자표현가능 
print("In 2005, Lu Chao of China recited {:,} digits of pi".format(678990))

In 2005, Lu Chao of China recited 678,990 digits of pi

In [8]:

# 소수는 좀 더 다양한 방법으로 포매팅 / .4는 소수점 이하의 숫자를 4개까지 출력하겠다는 것
print(" I remember {0: .4} or {0: .4%} of what Lu Chao recited".format(7/67890))

 I remember  0.0001031 or  0.0103% of what Lu Chao recited

In [9]:

# 5자리의 숫자로 표현되어야 한다면 빈칸을 0으로 채워주기
print("My ID number is {0: 05d}".format(42))  #d는 10진수를 의미한다

My ID number is  0042

% 연산자로 포매팅하기¶

In [10]:

# 삽입할 값이 10진수라면, 값 삽입할 위치에 %d라 입력해야한다 
s= ' I only know %d digits of pi' % 7
print(s)

 I only know 7 digits of pi

In [11]:

#삽입할 값이 문자열이라면 %s  , 이때 %와 s 사이에 소괄호 사용해 변수 지정
print('Sonme digits of %(cont)s: %(value).2f' % {'cont':'e', 'value' : 2.718})

Sonme digits of e: 2.72

f-strings로 포매팅 사용하기¶

In [14]:

#문자열 앞에 f를 붙인다 - 문자열 빠르게 처리 
var = 'flesh wound'
s= f"It's just a {var}!"
print(s)

It's just a flesh wound!

In [15]:

lat='40.78 N'
lon='73.97 W'
s=f'Hayden Planmetarium coordinates:{lat}, {lon}'
print(s)

Hayden Planmetarium coordinates:40.78 N, 73.97 W

In [ ]:

# 정규식 표현 - 문법, 특수문자-- 사진 참고

정규식으로 전화번호 패턴 찾기¶

In [16]:

# re 모듈과 테스트용 문자열 준지
import re

tele_num='1234567890'

In [19]:

# match 메서드 사용해 길이가 10인 숫자 확인 
# pattern인자에 10개 숫자의미하는 10개의 \d, string에는 테스트용 문자열인 tele_num전달
#만약 패턴 찾으면 match 오브젝트 반환 
# match 오브젝트 출력하면 span에 찾은 패턴의 인덱스가 match에 찾은 패턴의 문자열이 있는 것 확인

m=re.match(pattern='\d\d\d\d\d\d\d\d\d\d', string=tele_num)
print(type(m))
print(m)

<class 're.Match'>
<re.Match object; span=(0, 10), match='1234567890'>

In [21]:

# 이때 boolㅁ서드에 m 전달
#즉, match메서드가 반환한 match 오브젝트는 bool 메서드로 참 거짓 판단
print(bool(m))

if m:
    print('match')
else:
    print('no match')

True
match

compile 메서드로 정규식 메서드 사용하기¶

In [22]:

# 패턴을 컴파일한 다음 변수 저장했기 때문에 정규식 메서드 반복 사용
p=re.compile('\d{10}')
s='1234567890'
m=p.match(s)
print(m)

<re.Match object; span=(0, 10), match='1234567890'>

'판다스' 카테고리의 다른 글

판다스 10장 - apply메서드 이용 (0)	2023.03.13
판다스 8장 - 자료형변환/ 카테고리 자료형 (0)	2023.03.10
판다스 7장 중복값 제거 (0)	2023.03.10
판다스 6장 누락값 처리 (0)	2023.03.09
판다스 5장 데이터 연결하기 - concat/ merge (0)	2023.03.08

현재글판다스 9장 문자열 처리하기

MAKMAK

PO, 포트폴리오, 자기계발, 기획자, 역기획, 마케팅, 스터디, 잇기s, PM, 기획취준, 포폴, 기획, 취뽀, 대외활동, 갓생, 취준,

Today :
Yesterday :

MAKMAK

판다스 9장 문자열 처리하기

문자열 추출하기¶

전체 문자열 추출하기¶

join, splitlines, replace 메서드 실습하기¶

문자열 포매팅 실습하기¶

수치값 포매팅 실습¶

% 연산자로 포매팅하기¶

f-strings로 포매팅 사용하기¶

정규식으로 전화번호 패턴 찾기¶

compile 메서드로 정규식 메서드 사용하기¶

'판다스' 카테고리의 다른 글

'판다스'의 다른글

티스토리툴바

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

판다스 9장 문자열 처리하기

문자열 추출하기¶

전체 문자열 추출하기¶

join, splitlines, replace 메서드 실습하기¶

문자열 포매팅 실습하기¶

수치값 포매팅 실습¶

% 연산자로 포매팅하기¶

f-strings로 포매팅 사용하기¶

정규식으로 전화번호 패턴 찾기¶

compile 메서드로 정규식 메서드 사용하기¶

'판다스' 카테고리의 다른 글

'판다스'의 다른글

관련글

티스토리툴바