포스팅 목차
[데이터 관리] 4. 다중 조건에 의한 변환
1. Proc SQL
- case when문을 통하여 IF문을 대체한다. 단, 복합 조건에 대한 적용은 불가.
SAS Programming |
proc sql;
select id,
workshop,gender,q1,q2,q3,q4,
case when gender='m' then (20*q1)+q2
else (2*q1)+q2 end "score1",
case when gender='m' then (30*q1)+q2
else (3*q1)+q2 end "score2"
from back.mydata;
quit;
Results |
id workshop gender q1 q2 q3 q4 score1 score2
----------------------------------------------------------------------------------
1 1 f 1 1 5 1 3 4
2 2 f 2 1 4 1 5 7
3 1 f 2 2 4 3 6 8
4 2 f 3 1 . 3 7 10
5 1 m 4 5 2 4 85 125
6 2 m 5 4 5 5 104 154
7 1 m 5 3 4 4 103 153
8 2 m 4 5 5 5 85 125
2. SAS Programming
- 다중 조건에 의한 변환을 위한 SAS 프로그램
- 성별에 따른 스코어 1, 스코어 2 변수를 생성;
SAS Programming |
DATA withmooc;
SET BACK.mydata;
IF gender="m" THEN DO;
score1 = (1.1*q1)+q2;
score2 = (1.2*q1)+q2;
END;
ELSE IF gender="f" THEN DO;
score1 = (2.1*q1)+q2;
score2 = (2.2*q1)+q2;
END;
RUN;
proc print;run;
Results |
id workshop gender q1 q2 q3 q4 score1 score2
1 1 f 1 1 5 1 3.1 3.2
2 2 f 2 1 4 1 5.2 5.4
3 1 f 2 2 4 3 6.2 6.4
4 2 f 3 1 . 3 7.3 7.6
5 1 m 4 5 2 4 9.4 9.8
6 2 m 5 4 5 5 9.5 10.0
7 1 m 5 3 4 4 8.5 9.0
8 2 m 4 5 5 5 9.4 9.8
3. SPSS
SPSS Programming |
DO IF (gender EQ 'm').
+ COMPUTE score1 = (2*q1)+q2.
+ COMPUTE score2 = (3*q1)+q2.
ELSE IF (gender EQ 'f').
+ COMPUTE score1 = (20*q1)+q2.
+ COMPUTE score2 = (30*q1)+q2.
END IF.
EXECUTE.
4. R Programming (R-PROJECT)
R Programming |
from rpy2.robjects import r
%load_ext rpy2.ipython
Results |
The rpy2.ipython extension is already loaded. To reload it, use:
%reload_ext rpy2.ipython
R Programming |
%%R
library(tidyverse)
load(file="C:/work/data/mydata.Rdata")
withmooc = mydata
attach(withmooc) # mydata를 기본 데이터 세트로 지정.
withmooc
Results |
R[write to console]: The following objects are masked from withmooc (pos = 3):
gender, q1, q2, q3, q4, workshop
R[write to console]: The following objects are masked from withmooc (pos = 4):
gender, q1, q2, q3, q4, workshop
workshop gender q1 q2 q3 q4
1 1 f 1 1 5 1
2 2 f 2 1 4 1
3 1 f 2 2 4 3
4 2 f 3 1 NA 3
5 1 m 4 5 2 4
6 2 m 5 4 5 5
7 1 m 5 3 4 4
8 2 m 4 5 5 5
- cbind() 함수
mydata에 두 개의 새로운 칼럼을 추가하기 위하여 column bind를 이용한다.
R Programming |
%%R
withmooc <- cbind(withmooc, score1 = 0, score2 = 0)
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2
1 1 f 1 1 5 1 0 0
2 2 f 2 1 4 1 0 0
3 1 f 2 2 4 3 0 0
4 2 f 3 1 NA 3 0 0
5 1 m 4 5 2 4 0 0
6 2 m 5 4 5 5 0 0
7 1 m 5 3 4 4 0 0
8 2 m 4 5 5 5 0 0
R Programming |
%%R
withmooc$guys <- gender=="m" # 남성에 대하여 논리 벡터 생성.
withmooc$gals <- gender=="f" # 여성에 대하여 논리 벡터 생성.
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
1 1 f 1 1 5 1 0 0 FALSE TRUE
2 2 f 2 1 4 1 0 0 FALSE TRUE
3 1 f 2 2 4 3 0 0 FALSE TRUE
4 2 f 3 1 NA 3 0 0 FALSE TRUE
5 1 m 4 5 2 4 0 0 TRUE FALSE
6 2 m 5 4 5 5 0 0 TRUE FALSE
7 1 m 5 3 4 4 0 0 TRUE FALSE
8 2 m 4 5 5 5 0 0 TRUE FALSE
- 로직에 의해 선택된 그룹에 기본 공식 적용.
R Programming |
%%R
attach(withmooc)
withmooc$score1[gals]<- 2*q1[gals] + q2[gals]
withmooc$score2[gals]<- 3*q1[gals] + q2[gals]
withmooc$score1[guys]<-20*q1[guys] + q2[guys]
withmooc$score2[guys]<-30*q1[guys] + q2[guys]
withmooc
Results |
R[write to console]: The following objects are masked from withmooc (pos = 3):
gender, q1, q2, q3, q4, workshop
R[write to console]: The following objects are masked from withmooc (pos = 4):
gals, gender, guys, q1, q2, q3, q4, score1, score2, workshop
R[write to console]: The following objects are masked from withmooc (pos = 5):
gender, q1, q2, q3, q4, workshop
workshop gender q1 q2 q3 q4 score1 score2 guys gals
1 1 f 1 1 5 1 3 4 FALSE TRUE
2 2 f 2 1 4 1 5 7 FALSE TRUE
3 1 f 2 2 4 3 6 8 FALSE TRUE
4 2 f 3 1 NA 3 7 10 FALSE TRUE
5 1 m 4 5 2 4 85 125 TRUE FALSE
6 2 m 5 4 5 5 104 154 TRUE FALSE
7 1 m 5 3 4 4 103 153 TRUE FALSE
8 2 m 4 5 5 5 85 125 TRUE FALSE
- IFELSE함수를 사용하여 위 로직 적용.
R Programming |
%%R
withmooc$score1<-ifelse(withmooc$gals,
2*q1*withmooc$gals + q2*withmooc$gals,
20*q1*withmooc$guys + q2*withmooc$guys)
withmooc$score2<-ifelse(withmooc$gals,
3*q1*withmooc$gals + q2*withmooc$gals,
30*q1*withmooc$guys + q2*withmooc$guys)
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
1 1 f 1 1 5 1 3 4 FALSE TRUE
2 2 f 2 1 4 1 5 7 FALSE TRUE
3 1 f 2 2 4 3 6 8 FALSE TRUE
4 2 f 3 1 NA 3 7 10 FALSE TRUE
5 1 m 4 5 2 4 85 125 TRUE FALSE
6 2 m 5 4 5 5 104 154 TRUE FALSE
7 1 m 5 3 4 4 103 153 TRUE FALSE
8 2 m 4 5 5 5 85 125 TRUE FALSE
- NULL값을 이용하여 guys와 gals변수를 삭제.
R Programming |
%%R
withmooc$gals <- withmooc$guys <- NULL
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2
1 1 f 1 1 5 1 3 4
2 2 f 2 1 4 1 5 7
3 1 f 2 2 4 3 6 8
4 2 f 3 1 NA 3 7 10
5 1 m 4 5 2 4 85 125
6 2 m 5 4 5 5 104 154
7 1 m 5 3 4 4 103 153
8 2 m 4 5 5 5 85 125
R Programming |
%%R
detach(withmooc)
5. R - Tidyverse
R Programming |
from rpy2.robjects import r
%load_ext rpy2.ipython
Results |
The rpy2.ipython extension is already loaded. To reload it, use:
%reload_ext rpy2.ipython
R Programming |
%%R
library(tidyverse)
load(file="C:/work/data/mydata.Rdata")
withmooc = mydata
attach(withmooc) # mydata를 기본 데이터 세트로 지정.
withmooc
Results |
R[write to console]: The following objects are masked from withmooc (pos = 3):
gender, q1, q2, q3, q4, workshop
R[write to console]: The following objects are masked from withmooc (pos = 4):
gender, q1, q2, q3, q4, workshop
R[write to console]: The following objects are masked from withmooc (pos = 5):
gender, q1, q2, q3, q4, workshop
workshop gender q1 q2 q3 q4
1 1 f 1 1 5 1
2 2 f 2 1 4 1
3 1 f 2 2 4 3
4 2 f 3 1 NA 3
5 1 m 4 5 2 4
6 2 m 5 4 5 5
7 1 m 5 3 4 4
8 2 m 4 5 5 5
- mydata에 두개의 새로운 칼럼을 추가하기 위하여 column bind를 이용.
R Programming |
%%R
withmooc <- withmooc %>%
mutate(score1 = 0,
score2 = 0,
guys = (gender=="m"), # 남성에 대하여 논리 벡터 생성.
gals = (gender=="f") # 여성에 대하여 논리 벡터 생성.
)
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
1 1 f 1 1 5 1 0 0 FALSE TRUE
2 2 f 2 1 4 1 0 0 FALSE TRUE
3 1 f 2 2 4 3 0 0 FALSE TRUE
4 2 f 3 1 NA 3 0 0 FALSE TRUE
5 1 m 4 5 2 4 0 0 TRUE FALSE
6 2 m 5 4 5 5 0 0 TRUE FALSE
7 1 m 5 3 4 4 0 0 TRUE FALSE
8 2 m 4 5 5 5 0 0 TRUE FALSE
- IFELSE함수를 사용하여 위 로직 적용.
R Programming |
%%R
withmooc<- withmooc %>%
mutate(score1 = ifelse( gals ,(2*q1*gals + q2*gals),(20*q1*guys + q2*guys)),
score2 = ifelse( gals ,(3*q1*gals + q2*gals),(30*q1*guys + q2*guys)))
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
1 1 f 1 1 5 1 3 4 FALSE TRUE
2 2 f 2 1 4 1 5 7 FALSE TRUE
3 1 f 2 2 4 3 6 8 FALSE TRUE
4 2 f 3 1 NA 3 7 10 FALSE TRUE
5 1 m 4 5 2 4 85 125 TRUE FALSE
6 2 m 5 4 5 5 104 154 TRUE FALSE
7 1 m 5 3 4 4 103 153 TRUE FALSE
8 2 m 4 5 5 5 85 125 TRUE FALSE
- Select 구문을 이용하여 guys와 gals변수를 삭제.
R Programming |
%%R
withmooc <- withmooc %>%
select(-guys,-gals)
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2
1 1 f 1 1 5 1 3 4
2 2 f 2 1 4 1 5 7
3 1 f 2 2 4 3 6 8
4 2 f 3 1 NA 3 7 10
5 1 m 4 5 2 4 85 125
6 2 m 5 4 5 5 104 154
7 1 m 5 3 4 4 103 153
8 2 m 4 5 5 5 85 125
6. Python - Pandas
Python Programming |
import pandas as pd
import numpy as np
import sweetviz as sv
mydata = pd.read_csv("C:/work/data/mydata.csv",sep=",",
dtype={'id':object,'workshop':object,
'q1':int, 'q2':int, 'q3':float, 'q4':int},
na_values=['NaN'],skipinitialspace =True)
mydata =mydata.drop('id',axis=1)
withmooc= mydata.copy()
withmooc
Results |
workshop gender q1 q2 q3 q4
0 1 f 1 1 5.0 1
1 2 f 2 1 4.0 1
2 1 f 2 2 4.0 3
3 2 f 3 1 NaN 3
4 1 m 4 5 2.0 4
5 2 m 5 4 5.0 5
6 1 m 5 3 4.0 4
7 2 m 4 5 5.0 5
- mydata에 두개의 새로운 칼럼을 추가하기 위하여 column bind를 이용.
Python Programming |
withmooc= mydata.copy()
withmooc
withmooc['score1'] = 0
withmooc['score2'] = 0
withmooc['guys'] = withmooc['gender'] =="m" # 남성에 대하여 논리 벡터 생성.
withmooc['gals'] = withmooc['gender'] =="f" # 여성에 대하여 논리 벡터 생성.
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
0 1 f 1 1 5.0 1 0 0 False True
1 2 f 2 1 4.0 1 0 0 False True
2 1 f 2 2 4.0 3 0 0 False True
3 2 f 3 1 NaN 3 0 0 False True
4 1 m 4 5 2.0 4 0 0 True False
5 2 m 5 4 5.0 5 0 0 True False
6 1 m 5 3 4.0 4 0 0 True False
7 2 m 4 5 5.0 5 0 0 True False
- 로직에 의해 선택된 그룹에 기본 공식 적용.
Python Programming |
withmooc= mydata.copy()
withmooc
withmooc['score1'] = 0
withmooc['score2'] = 0
withmooc['guys'] = withmooc['gender'] =="m" # 남성에 대하여 논리 벡터 생성.
withmooc['gals'] = withmooc['gender'] =="f" # 여성에 대하여 논리 벡터 생성.
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
0 1 f 1 1 5.0 1 0 0 False True
1 2 f 2 1 4.0 1 0 0 False True
2 1 f 2 2 4.0 3 0 0 False True
3 2 f 3 1 NaN 3 0 0 False True
4 1 m 4 5 2.0 4 0 0 True False
5 2 m 5 4 5.0 5 0 0 True False
6 1 m 5 3 4.0 4 0 0 True False
7 2 m 4 5 5.0 5 0 0 True False
Python Programming |
withmooc.loc[ withmooc['gals'] == True , 'score1']= ( 2*withmooc['q1']*withmooc['gals'] + withmooc['q2']*withmooc['gals'])
withmooc.loc[ withmooc['gals'] == False, 'score1']= (20*withmooc['q1']*withmooc['guys'] + withmooc['q2']*withmooc['guys'])
withmooc.loc[ withmooc['gals'] == True , 'score2']= ( 3*withmooc['q1']*withmooc['gals'] + withmooc['q2']*withmooc['gals'])
withmooc.loc[ withmooc['gals'] == False, 'score2']= (30*withmooc['q1']*withmooc['guys'] + withmooc['q2']*withmooc['guys'])
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
0 1 f 1 1 5.0 1 3 4 False True
1 2 f 2 1 4.0 1 5 7 False True
2 1 f 2 2 4.0 3 6 8 False True
3 2 f 3 1 NaN 3 7 10 False True
4 1 m 4 5 2.0 4 85 125 True False
5 2 m 5 4 5.0 5 104 154 True False
6 1 m 5 3 4.0 4 103 153 True False
7 2 m 4 5 5.0 5 85 125 True False
- np.where 구문을 사용하여 위 로직 적용.
Python Programming |
withmooc['score1'] = np.where(withmooc['gals'], 2*withmooc['q1']*withmooc['gals'] + withmooc['q2']*withmooc['gals'] ,
20*withmooc['q1']*withmooc['guys'] + withmooc['q2']*withmooc['guys'])
withmooc['score2'] = np.where(withmooc['gals'], 3*withmooc['q1']*withmooc['gals'] + withmooc['q2']*withmooc['gals'] ,
30*withmooc['q1']*withmooc['guys'] + withmooc['q2']*withmooc['guys'])
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2 guys gals
0 1 f 1 1 5.0 1 3 4 False True
1 2 f 2 1 4.0 1 5 7 False True
2 1 f 2 2 4.0 3 6 8 False True
3 2 f 3 1 NaN 3 7 10 False True
4 1 m 4 5 2.0 4 85 125 True False
5 2 m 5 4 5.0 5 104 154 True False
6 1 m 5 3 4.0 4 103 153 True False
7 2 m 4 5 5.0 5 85 125 True False
- del구문을 이용하여 guys와 gals변수를 삭제.
Python Programming |
del withmooc['guys']
del withmooc['gals']
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2
0 1 f 1 1 5.0 1 3 4
1 2 f 2 1 4.0 1 5 7
2 1 f 2 2 4.0 3 6 8
3 2 f 3 1 NaN 3 7 10
4 1 m 4 5 2.0 4 85 125
5 2 m 5 4 5.0 5 104 154
6 1 m 5 3 4.0 4 103 153
7 2 m 4 5 5.0 5 85 125
- pop구문을 이용하여 guys와 gals변수를 삭제.
Python Programming |
# 확인을 위하여 변수 재 생성
withmooc['guys'] = withmooc['gender'] =="m" # 남성에 대하여 논리 벡터 생성.
withmooc['gals'] = withmooc['gender'] =="f" # 여성에 대하여 논리 벡터 생성.
mydata_guys = withmooc.pop('guys') # 삭제된 데이터를 별도로 반환하여 저장 가능
# mydata1.pop('guys')
print(mydata_guys)
mydata_gals = withmooc.pop('gals') # 삭제된 데이터를 별도로 반환하여 저장 가능
# mydata1.pop('gals')
print(mydata_gals)
withmooc
Results |
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 True
Name: guys, dtype: bool
0 True
1 True
2 True
3 True
4 False
5 False
6 False
7 False
Name: gals, dtype: bool
Results |
workshop gender q1 q2 q3 q4 score1 score2
0 1 f 1 1 5.0 1 3 4
1 2 f 2 1 4.0 1 5 7
2 1 f 2 2 4.0 3 6 8
3 2 f 3 1 NaN 3 7 10
4 1 m 4 5 2.0 4 85 125
5 2 m 5 4 5.0 5 104 154
6 1 m 5 3 4.0 4 103 153
7 2 m 4 5 5.0 5 85 125
- Drop 구문
Python Programming |
# 확인을 위하여 변수 재 생성
withmooc['guys'] = withmooc['gender'] =="m" # 남성에 대하여 논리 벡터 생성.
withmooc['gals'] = withmooc['gender'] =="f" # 여성에 대하여 논리 벡터 생성.
# withmooc.drop(['guys','gals'], axis='columns', inplace=True)
withmooc = withmooc.drop(['guys','gals'], axis=1)
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2
0 1 f 1 1 5.0 1 3 4
1 2 f 2 1 4.0 1 5 7
2 1 f 2 2 4.0 3 6 8
3 2 f 3 1 NaN 3 7 10
4 1 m 4 5 2.0 4 85 125
5 2 m 5 4 5.0 5 104 154
6 1 m 5 3 4.0 4 103 153
7 2 m 4 5 5.0 5 85 125
- 열 번호 지정하여 drop 사용하기
Python Programming |
# 확인을 위하여 변수 재 생성
withmooc['guys'] = withmooc['gender'] =="m" # 남성에 대하여 논리 벡터 생성.
withmooc['gals'] = withmooc['gender'] =="f" # 여성에 대하여 논리 벡터 생성.
withmooc.drop(withmooc.columns[[8, 9]], axis='columns', inplace=True)
withmooc
Results |
workshop gender q1 q2 q3 q4 score1 score2
0 1 f 1 1 5.0 1 3 4
1 2 f 2 1 4.0 1 5 7
2 1 f 2 2 4.0 3 6 8
3 2 f 3 1 NaN 3 7 10
4 1 m 4 5 2.0 4 85 125
5 2 m 5 4 5.0 5 104 154
6 1 m 5 3 4.0 4 103 153
7 2 m 4 5 5.0 5 85 125
7. Python - dfply
Python Programming |
import pandas as pd
from dfply import *
mydata = pd.read_csv("c:/work/data/mydata.csv",sep=",",
dtype={'id':object,'workshop':object,
'q1':int, 'q2':int, 'q3':float, 'q4':int},
na_values=['NaN'],skipinitialspace =True)
mydata =mydata.drop('id',axis=1)
withmooc= mydata.copy()
# 모든 변수 선택하기.
withmooc
Results |
workshop gender q1 q2 q3 q4
0 1 f 1 1 5.0 1
1 2 f 2 1 4.0 1
2 1 f 2 2 4.0 3
3 2 f 3 1 NaN 3
4 1 m 4 5 2.0 4
5 2 m 5 4 5.0 5
6 1 m 5 3 4.0 4
7 2 m 4 5 5.0 5
- mydata에 두개의 새로운 칼럼을 추가하기 위하여 column bind를 이용.
Python Programming |
withmooc = withmooc >> \
mutate(guys=make_symbolic(np.where)(X.gender == "m", True,False), \
gals=make_symbolic(np.where)(X.gender == "f", True,False) ) >> \
mutate(score1 = make_symbolic(np.where)(X.gals, (2*X.q1*X.gals + X.q2*X.gals) ,(20*X.q1*X.guys + X.q2*X.guys)), \
score2 = make_symbolic(np.where)(X.gals, (3*X.q1*X.gals + X.q2*X.gals) ,(30*X.q1*X.guys + X.q2*X.guys)) \
)
withmooc
Results |
workshop gender q1 q2 q3 q4 guys gals score1 score2
0 1 f 1 1 5.0 1 False True 3 4
1 2 f 2 1 4.0 1 False True 5 7
2 1 f 2 2 4.0 3 False True 6 8
3 2 f 3 1 NaN 3 False True 7 10
4 1 m 4 5 2.0 4 True False 85 125
5 2 m 5 4 5.0 5 True False 104 154
6 1 m 5 3 4.0 4 True False 103 153
7 2 m 4 5 5.0 5 True False 85 125
Python Programming |
withmooc >> \
select(~ X.guys,~ X.gals)
Results |
workshop gender q1 q2 q3 q4 score1 score2
0 1 f 1 1 5.0 1 3 4
1 2 f 2 1 4.0 1 5 7
2 1 f 2 2 4.0 3 6 8
3 2 f 3 1 NaN 3 7 10
4 1 m 4 5 2.0 4 85 125
5 2 m 5 4 5.0 5 104 154
6 1 m 5 3 4.0 4 103 153
7 2 m 4 5 5.0 5 85 125
- Drop() 함수를 사용하여서 변수 삭제
Python Programming |
withmooc >> \
mutate(guys = 0,gals = 0) >> \
drop([X.guys, X.gals])
Results |
workshop gender q1 q2 q3 q4 score1 score2
0 1 f 1 1 5.0 1 3 4
1 2 f 2 1 4.0 1 5 7
2 1 f 2 2 4.0 3 6 8
3 2 f 3 1 NaN 3 7 10
4 1 m 4 5 2.0 4 85 125
5 2 m 5 4 5.0 5 104 154
6 1 m 5 3 4.0 4 103 153
7 2 m 4 5 5.0 5 85 125
통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply) |
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 |
반응형
'통계프로그램 비교 시리즈 > 데이터 전처리 비교' 카테고리의 다른 글
[데이터 관리] 6. 관측값 포맷팅 & 관측값 일괄 변경하기 (0) | 2022.01.10 |
---|---|
통계프로그램 비교 시리즈 – [데이터 관리] 5. 변수명 변경(Rename) (0) | 2022.01.10 |
통계프로그램 비교 시리즈 – [데이터 관리] 3. 결측값 할당을 위한 조건 변환 (0) | 2022.01.10 |
통계프로그램 비교 시리즈 – [데이터 관리] 2. 조건문에 의한 변환 (0) | 2022.01.07 |
통계프로그램 비교 시리즈 – [데이터 관리] 1. 변수 변환 (0) | 2022.01.07 |
댓글