5. 데이터 시각화하기¶
5-1. 맷플롯립 기본 요소 알아보기¶
- Figure 객체¶
In [5]:
import pandas as pd
ns_book7 = pd.read_csv('ns_book7.csv', low_memory=False)
ns_book7.head()
Out[5]:
번호 | 도서명 | 저자 | 출판사 | 발행년도 | ISBN | 세트 ISBN | 부가기호 | 권 | 주제분류번호 | 도서권수 | 대출건수 | 등록일자 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 인공지능과 흙 | 김동훈 지음 | 민음사 | 2021 | 9788937444319 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
1 | 2 | 가짜 행복 권하는 사회 | 김태형 지음 | 갈매나무 | 2021 | 9791190123969 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
2 | 3 | 나도 한 문장 잘 쓰면 바랄 게 없겠네 | 김선영 지음 | 블랙피쉬 | 2021 | 9788968332982 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
3 | 4 | 예루살렘 해변 | 이도 게펜 지음, 임재희 옮김 | 문학세계사 | 2021 | 9788970759906 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
4 | 5 | 김성곤의 중국한시기행 : 장강·황하 편 | 김성곤 지음 | 김영사 | 2021 | 9788934990833 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
In [7]:
import matplotlib.pyplot as plt
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
plt.show()
In [8]:
plt.figure(figsize=(9,6))
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
plt.show()
In [9]:
print(plt.rcParams['figure.figsize'])
[6.4, 4.8]
In [10]:
print(plt.rcParams['figure.dpi'])
100.0
In [11]:
plt.figure(figsize=(900/100, 600/100))
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
plt.show()
In [12]:
%config InlineBackend.print_figure_kwargs = {'bbox_inches': None}
plt.figure(figsize=(900/100, 600/100))
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
plt.show()
In [13]:
%config InlineBackend.print_figure_kwargs = {'bbox_inches': 'tight'}
In [14]:
plt.figure(dpi=100)
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
plt.show()
In [15]:
plt.figure(dpi=200)
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
plt.show()
- rcParams 객체¶
In [18]:
plt.rcParams['figure.dpi'] = 72
In [22]:
plt.rcParams['scatter.marker']
Out[22]:
'o'
In [23]:
plt.rcParams['scatter.marker'] = '*'
In [25]:
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
plt.show()
In [26]:
plt.scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1, marker='+')
plt.show()
- 여러 개의 서브플롯 출력하기¶
In [30]:
plt.rcParams['figure.dpi'] = 100
In [31]:
fig, axs = plt.subplots(2)
axs[0].scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
axs[1].hist(ns_book7['대출건수'], bins=100)
axs[1].set_yscale('log')
fig.show()
/var/folders/3w/0y55k0y53pg1dvg3p_cqn81m0000gn/T/ipykernel_29404/3538463139.py:8: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
fig.show()
In [33]:
fig, axs = plt.subplots(2, figsize=(6, 8))
axs[0].scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
axs[0].set_title('scatter plot')
axs[1].hist(ns_book7['대출건수'], bins=100)
axs[1].set_title('histogram')
axs[1].set_yscale('log')
fig.show()
/var/folders/3w/0y55k0y53pg1dvg3p_cqn81m0000gn/T/ipykernel_29404/2025912959.py:10: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
fig.show()
In [34]:
fig, axs = plt.subplots(1, 2, figsize=(10,4))
axs[0].scatter(ns_book7['도서권수'], ns_book7['대출건수'], alpha=0.1)
axs[0].set_title('scatter plot')
axs[0].set_xlabel('nnumber of books')
axs[0].set_ylabel('borrow count')
axs[1].hist(ns_book7['대출건수'], bins=100)
axs[1].set_title('histogram')
axs[1].set_yscale('log')
axs[1].set_xlabel('borrow count')
axs[1].set_ylabel('frequency')
fig.show()
/var/folders/3w/0y55k0y53pg1dvg3p_cqn81m0000gn/T/ipykernel_29404/4022169940.py:14: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
fig.show()
5-2. 선 그래프와 막대 그래프 그리기¶
- 연도별 발행 도서 개수 구하기¶
In [37]:
import pandas as pd
ns_book7 = pd.read_csv('ns_book7.csv', low_memory=False)
ns_book7.head()
Out[37]:
번호 | 도서명 | 저자 | 출판사 | 발행년도 | ISBN | 세트 ISBN | 부가기호 | 권 | 주제분류번호 | 도서권수 | 대출건수 | 등록일자 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 인공지능과 흙 | 김동훈 지음 | 민음사 | 2021 | 9788937444319 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
1 | 2 | 가짜 행복 권하는 사회 | 김태형 지음 | 갈매나무 | 2021 | 9791190123969 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
2 | 3 | 나도 한 문장 잘 쓰면 바랄 게 없겠네 | 김선영 지음 | 블랙피쉬 | 2021 | 9788968332982 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
3 | 4 | 예루살렘 해변 | 이도 게펜 지음, 임재희 옮김 | 문학세계사 | 2021 | 9788970759906 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
4 | 5 | 김성곤의 중국한시기행 : 장강·황하 편 | 김성곤 지음 | 김영사 | 2021 | 9788934990833 | NaN | NaN | NaN | NaN | 1 | 0 | 2021-03-19 |
In [41]:
count_by_year = ns_book7['발행년도'].value_counts()
count_by_year
Out[41]:
발행년도
2012 18601
2014 17797
2009 17611
2011 17523
2010 17503
...
2650 1
2108 1
2104 1
2560 1
1947 1
Name: count, Length: 87, dtype: int64
In [43]:
count_by_year = count_by_year.sort_index()
count_by_year
Out[43]:
발행년도
1947 1
1948 1
1949 1
1952 11
1954 1
..
2551 1
2552 2
2559 1
2560 1
2650 1
Name: count, Length: 87, dtype: int64
In [45]:
count_by_year = count_by_year[count_by_year.index <= 2030]
count_by_year
Out[45]:
발행년도
1947 1
1948 1
1949 1
1952 11
1954 1
...
2020 11834
2021 1255
2025 1
2028 1
2030 1
Name: count, Length: 68, dtype: int64
- 주제별 도서 개수 구하기¶
In [49]:
import numpy as np
def kdc_1st_char(no):
if no is np.nan:
return '-1'
else:
return no[0]
count_by_subject = ns_book7['주제분류번호'].apply(kdc_1st_char).value_counts()
count_by_subject
Out[49]:
주제분류번호
8 108643
3 80767
5 40916
9 26375
6 25070
1 22647
-1 16978
7 15836
4 13688
2 13474
0 12376
Name: count, dtype: int64
- 선 그래프 그리기¶
In [51]:
import matplotlib.pyplot as plt
print(plt.rcParams['figure.dpi'])
100.0
In [52]:
plt.plot(count_by_year.index, count_by_year.values)
plt.title('Books by year')
plt.xlabel('year')
plt.ylabel('number of books')
plt.show()
In [54]:
plt.plot(count_by_year, marker='.', linestyle='--', color='red')
plt.title('Books by year')
plt.xlabel('year')
plt.ylabel('number of books')
plt.show()
In [55]:
plt.plot(count_by_year, '.-.c')
Out[55]:
[<matplotlib.lines.Line2D at 0x2848c5c40>]
In [56]:
plt.plot(count_by_year, '*-g')
plt.title('Books by year')
plt.xlabel('year')
plt.ylabel('number of books')
plt.show()
In [57]:
plt.plot(count_by_year, '*-g')
plt.title('Books by year')
plt.xlabel('year')
plt.ylabel('number of books')
plt.xticks(range(1947, 2030, 10))
for idx, val in count_by_year[::5].items():
plt.annotate(val, (idx, val))
plt.show()
In [58]:
plt.plot(count_by_year, '*-g')
plt.title('Books by year')
plt.xlabel('year')
plt.ylabel('number of books')
plt.xticks(range(1947, 2030, 10))
for idx, val in count_by_year[::5].items():
plt.annotate(val, (idx,val), xytext=(idx+1, val+10))
plt.show()
In [60]:
plt.plot(count_by_year, '*-g')
plt.title('Books by year')
plt.xlabel('year')
plt.ylabel('number of books')
plt.xticks(range(1947, 2030, 10))
for idx, val in count_by_year[::5].items():
plt.annotate(val, (idx,val), xytext=(0, 2), textcoords='offset points', ha='center')
plt.show()
- 막대 그래프 그리기¶
In [62]:
plt.bar(count_by_subject.index, count_by_subject.values)
plt.title('Books by subject')
plt.xlabel('subject')
plt.ylabel('number of books')
for idx, val in count_by_subject.items():
plt.annotate(val, (idx,val), xytext=(0, 2), textcoords='offset points')
plt.show()
In [63]:
plt.bar(count_by_subject.index, count_by_subject.values, width=0.7, color='blue')
plt.title('Books by subject')
plt.xlabel('subject')
plt.ylabel('number of books')
for idx, val in count_by_subject.items():
plt.annotate(val, (idx,val), xytext=(0, 2), textcoords='offset points', fontsize=8, ha='center', color='green')
plt.show()
In [64]:
plt.barh(count_by_subject.index, count_by_subject.values, height=0.7, color='blue')
plt.title('Books by subject')
plt.xlabel('number of books')
plt.ylabel('subject')
for idx, val in count_by_subject.items():
plt.annotate(val, (val, idx), xytext=(2, 0), textcoords='offset points', fontsize=8, va='center', color='green')
plt.show()
- 이미지 출력하고 저장하기¶
In [67]:
img = plt.imread('jupiter.png')
img.shape
Out[67]:
(1561, 1646, 3)
In [68]:
plt.imshow(img)
plt.show()
In [69]:
plt.figure(figsize=(8,6))
plt.imshow(img)
plt.axis('off')
plt.show()
In [70]:
from PIL import Image
pil_img = Image.open('jupiter.png')
plt.figure(figsize=(8,6))
plt.imshow(pil_img)
plt.axis('off')
plt.show()
In [71]:
import numpy as np
arr_img = np.array(pil_img)
arr_img.shape
Out[71]:
(1561, 1646, 3)
In [76]:
plt.imsave('jupiter.jpg', arr_img)
- 그래프를 이미지로 저장하기¶
In [83]:
plt.rcParams['savefig.dpi']
Out[83]:
'figure'
In [86]:
plt.barh(count_by_subject.index, count_by_subject.values, height=0.7, color='blue')
plt.title('Books by subject')
plt.xlabel('number of books')
plt.ylabel('subject')
for idx, val in count_by_subject.items():
plt.annotate(val, (val, idx), xytext=(2, 0), textcoords='offset points', fontsize=8, va='center', color='green')
plt.savefig('books_by_subject.png')
plt.show()
In [89]:
pil_img = Image.open('books_by_subject.png')
plt.figure(figsize=(8,6))
plt.imshow(pil_img)
plt.axis('off')
plt.show()
'[혼공] 데이터분석' 카테고리의 다른 글
[혼공파] 6주차_혼공분석 (0) | 2025.03.05 |
---|---|
[혼공파] 4주차_혼공분석 (0) | 2025.02.09 |
[혼공파] 3주차_혼공분석 (0) | 2025.01.26 |
[혼공파] 2주차_혼공분석 (2) | 2025.01.19 |