포스팅 목차
[데이터 관리] 2. 조건문에 의한 변수 변환 후 신규 변수 생성하기
1. Proc SQL
SAS Programming |
options ls=150;
proc sql;
select case when q4=5 then 1 else 0 end as x1,
ifn(q4=5,1,0) as x11,
case when q4>=4 then 1 else 0 end as x2,
ifn(q4>=4,1,0) as x22,
case when workshop=1 and q4>=5 then 1 else 0 end as x3,
ifn(workshop=1 and q4>=5,1,0) as x33,
case when gender='f' then 2*q1+q2 else 3*q1+q2 end as scoreA,
ifn(gender='f',2*q1+q2,3*q1+q2) as scoreAA,
case when workshop=1 and q4>=5 then 2*q1+q2 else 3*q1+q2 end as scoreB,
ifn(workshop=1 and q4>=5,2*q1+q2,3*q1+q2) as scoreBB
from BACK.mydata;
quit;
Results |
x1 x11 x2 x22 x3 x33 scoreA scoreAA scoreB scoreBB
---------------------------------------------------------------------------------------------
0 0 0 0 0 0 3 3 4 4
0 0 0 0 0 0 5 5 7 7
0 0 0 0 0 0 6 6 8 8
0 0 0 0 0 0 7 7 10 10
0 0 1 1 0 0 17 17 17 17
1 1 1 1 0 0 19 19 19 19
0 0 1 1 0 0 18 18 18 18
1 1 1 1 0 0 17 17 17 17
2. SAS Programming
- 조건문을 사용하여서 변수 생성;
- SAS ifn 함수 링크
SAS Programming |
DATA withmooc;
SET BACk.mydata;
If q4= 5 then x1=1;
else x1=0;
* if함수의 간단한 함수;
x11 = ifn(q4=5,1,0);
If q4>=4 then x2=1;
else x2=0;
x22 = ifn(q4>=4,1,0);
If workshop=1 & q4>=5 then x3=1;
else x3=0;
x33 = ifn(workshop=1 and q4>=5,1,0);
If gender="f" then scoreA=2*q1+q2;
Else scoreA=3*q1+q2;
scoreAA = ifn(gender='f',2*q1+q2,3*q1+q2);
If workshop=1 and q4>=5 then scoreB=2*q1+q2;
Else scoreB=3*q1+q2;
scoreBB = ifn(workshop=1 and q4>=5,2*q1+q2,3*q1+q2);
run;
proc print;run;
Results |
score score score score
q1 q2 q3 q4 x1 x11 x2 x22 x3 x33 A AA B BB
1 1 5 1 0 0 0 0 0 0 3 3 4 4
2 1 4 1 0 0 0 0 0 0 5 5 7 7
2 2 4 3 0 0 0 0 0 0 6 6 8 8
3 1 . 3 0 0 0 0 0 0 7 7 10 10
4 5 2 4 0 0 1 1 0 0 17 17 17 17
5 4 5 5 1 1 1 1 0 0 19 19 19 19
5 3 4 4 0 0 1 1 0 0 18 18 18 18
4 5 5 5 1 1 1 1 0 0 17 17 17 17
3. SPSS
SPSS Programming |
GET FILE=("C:\work\data\mydata.sav")
COMPUTE X1=0.
IF (q4 EQ 5 ) X1=1.
COMPUTE X2=0.
IF (q4 GE 4) X2=1.
COMPUTE X3=0.
IF (gender EQ 'f' AND Q4 GE 5) X3=1.
COMPUTE scoreA=3*q1+q2.
IF (gender='f') scoreA=2*q1+q2.
COMPUTE scoreB=3*q1+q2.
IF (workshop EQ 1 AND q4 GE 5) scoreB=2*q1+q2.
EXECUTE.
4. R Programming (R-PROJECT)
R Programming |
from rpy2.robjects import r
%load_ext rpy2.ipython
Results |
The rpy2.ipython extension is already loaded. To reload it, use:
%reload_ext rpy2.ipython
R Programming |
%%R
library(tidyverse)
load(file="C:/work/data/mydata.Rdata")
withmooc = mydata
attach(withmooc) # mydata를 기본 데이터 세트로 지정.
withmooc
Results |
R[write to console]: -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
---------------------------------------
R[write to console]: -- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Results |
workshop gender q1 q2 q3 q4
1 1 f 1 1 5 1
2 2 f 2 1 4 1
3 1 f 2 2 4 3
4 2 f 3 1 NA 3
5 1 m 4 5 2 4
6 2 m 5 4 5 5
7 1 m 5 3 4 4
8 2 m 4 5 5 5
- 조건문에 의한 변환을 위한 R-Project
- 0과 1의 이진 변수 생성.
- 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
- 문제 4를 맞힌 사람을 알수 있다.
R Programming |
%%R
withmooc$q4Sagree <- ifelse( q4 == 5, 1,0 )
withmooc
Results |
workshop gender q1 q2 q3 q4 q4Sagree
1 1 f 1 1 5 1 0
2 2 f 2 1 4 1 0
3 1 f 2 2 4 3 0
4 2 f 3 1 NA 3 0
5 1 m 4 5 2 4 0
6 2 m 5 4 5 5 1
7 1 m 5 3 4 4 0
8 2 m 4 5 5 5 1
- q4가 4 이상인 경우 1을 할당.
R Programming |
%%R
withmooc$q4agree <- ifelse( q4 >= 4, 1,0)
withmooc
Results |
workshop gender q1 q2 q3 q4 q4Sagree q4agree
1 1 f 1 1 5 1 0 0
2 2 f 2 1 4 1 0 0
3 1 f 2 2 4 3 0 0
4 2 f 3 1 NA 3 0 0
5 1 m 4 5 2 4 0 1
6 2 m 5 4 5 5 1 1
7 1 m 5 3 4 4 0 1
8 2 m 4 5 5 5 1 1
- 좀더 복잡한 로직 예제.
- workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
R Programming |
%%R
withmooc$workshop1agree <- ifelse( (workshop == 1 & q4 >=4 ) , 1,0)
withmooc
Results |
workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree
1 1 f 1 1 5 1 0 0 0
2 2 f 2 1 4 1 0 0 0
3 1 f 2 2 4 3 0 0 0
4 2 f 3 1 NA 3 0 0 0
5 1 m 4 5 2 4 0 1 1
6 2 m 5 4 5 5 1 1 0
7 1 m 5 3 4 4 0 1 1
8 2 m 4 5 5 5 1 1 0
- 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
- 남성이 참이면, 여성은 거짓이다.
R Programming |
%%R
withmooc$score <- ifelse( gender=="f" ,
(withmooc$score <- (2*q1)+q2), # 여성에 대하여.
(withmooc$score <- (3*q1)+q2) ) # 남성에 대하여.
withmooc
Results |
workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score
1 1 f 1 1 5 1 0 0 0 3
2 2 f 2 1 4 1 0 0 0 5
3 1 f 2 2 4 3 0 0 0 6
4 2 f 3 1 NA 3 0 0 0 7
5 1 m 4 5 2 4 0 1 1 17
6 2 m 5 4 5 5 1 1 0 19
7 1 m 5 3 4 4 0 1 1 18
8 2 m 4 5 5 5 1 1 0 17
- 남성과 여성에 대하여 다른 공식을 할당.
- 성별이 결측치 인경우 , 결과 역시 NA로 할당.
R Programming |
%%R
withmooc$score1 <- ifelse( gender=="f" ,
(withmooc$score1 <- 2*q1+q2),
ifelse( withmooc$gender=="m",
(withmooc$score1 <- 3*q1+q2), NA )
)
withmooc
Results |
workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score score1
1 1 f 1 1 5 1 0 0 0 3 3
2 2 f 2 1 4 1 0 0 0 5 5
3 1 f 2 2 4 3 0 0 0 6 6
4 2 f 3 1 NA 3 0 0 0 7 7
5 1 m 4 5 2 4 0 1 1 17 17
6 2 m 5 4 5 5 1 1 0 19 19
7 1 m 5 3 4 4 0 1 1 18 18
8 2 m 4 5 5 5 1 1 0 17 17
- 괄혼안에서 더 복잡한 로직을 이용한 조건문 공식.
- 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
R Programming |
%%R
withmooc$score2 <- ifelse( (workshop==1 & q4>=4) ,
(withmooc$score2 <- 2*q1+q2),
(withmooc$score2 <- 3*q1+q2) )
withmooc
Results |
workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score score1 score2
1 1 f 1 1 5 1 0 0 0 3 3 4
2 2 f 2 1 4 1 0 0 0 5 5 7
3 1 f 2 2 4 3 0 0 0 6 6 8
4 2 f 3 1 NA 3 0 0 0 7 7 10
5 1 m 4 5 2 4 0 1 1 17 17 13
6 2 m 5 4 5 5 1 1 0 19 19 19
7 1 m 5 3 4 4 0 1 1 18 18 13
8 2 m 4 5 5 5 1 1 0 17 17 17
R Programming |
%%R
detach(mydata)
Results |
From cffi callback :
Traceback (most recent call last):
File "C:\Users\BACK\anaconda3\lib\site-packages\rpy2\rinterface_lib\callbacks.py", line 131, in _consolewrite_ex
s = conversion._cchar_to_str_with_maxlen(buf, maxlen=n)
File "C:\Users\BACK\anaconda3\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 130, in _cchar_to_str_with_maxlen
s = ffi.string(c, maxlen).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 26: invalid start byte
'utf-8' codec can't decode byte 0xc0 in position 26: invalid start byte
--------------------------------------------------------------------------
PermissionError: [WinError 32] 다른 프로세스가 파일을 사용 중이기 때문에 프로세스가 액세스 할 수 없습니다: 'C:\\Users\\BACK\\AppData\\Local\\Temp\\tmph0wb9noj\\Rplots001.png'
5. R - Tidyverse
R Programming |
%%R
library(tidyverse)
attach(mydata) # mydata를 기본 데이터 세트로 지정.
load(file="C:/work/data/mydata.Rdata")
withmooc = mydata
attach(withmooc) # mydata를 기본 데이터 세트로 지정.
withmooc
R[write to console]: The following objects are masked from withmooc (pos = 3):
gender, q1, q2, q3, q4, workshop
------ ------ ------ ------ ------ ------
R[write to console]: The following objects are masked from withmooc (pos = 9):
gender, q1, q2, q3, q4, workshop
Results |
workshop gender q1 q2 q3 q4
1 1 f 1 1 5 1
2 2 f 2 1 4 1
3 1 f 2 2 4 3
4 2 f 3 1 NA 3
5 1 m 4 5 2 4
6 2 m 5 4 5 5
7 1 m 5 3 4 4
8 2 m 4 5 5 5
- 0과 1의 이진 변수 생성.
- 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
- 문제 4를 맞힌 사람을 알수 있다.
R Programming |
%%R
withmooc %>%
dplyr::mutate(q4Sagree = ifelse(q4 == 5, 1,0))
Results |
workshop gender q1 q2 q3 q4 q4Sagree
1 1 f 1 1 5 1 0
2 2 f 2 1 4 1 0
3 1 f 2 2 4 3 0
4 2 f 3 1 NA 3 0
5 1 m 4 5 2 4 0
6 2 m 5 4 5 5 1
7 1 m 5 3 4 4 0
8 2 m 4 5 5 5 1
- q4가 4 이상인 경우 1을 할당.
R Programming |
%%R
withmooc %>%
dplyr::mutate(q4Sagree = ifelse(q4 >= 4, 1,0))
Results |
workshop gender q1 q2 q3 q4 q4Sagree
1 1 f 1 1 5 1 0
2 2 f 2 1 4 1 0
3 1 f 2 2 4 3 0
4 2 f 3 1 NA 3 0
5 1 m 4 5 2 4 1
6 2 m 5 4 5 5 1
7 1 m 5 3 4 4 1
8 2 m 4 5 5 5 1
- 복잡한 로직 예제.
- workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
R Programming |
%%R
withmooc %>%
dplyr::mutate(workshop1agree = ifelse((workshop == 1 & q4 >=4 ) , 1,0))
Results |
workshop gender q1 q2 q3 q4 workshop1agree
1 1 f 1 1 5 1 0
2 2 f 2 1 4 1 0
3 1 f 2 2 4 3 0
4 2 f 3 1 NA 3 0
5 1 m 4 5 2 4 1
6 2 m 5 4 5 5 0
7 1 m 5 3 4 4 1
8 2 m 4 5 5 5 0
- 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
- 남성이 참이면, 여성은 거짓이다.
R Programming |
%%R
withmooc %>%
dplyr::mutate(score = ifelse( gender=="f" ,
((2*q1)+q2), # 여성에 대하여.
((3*q1)+q2) ) # 남성에 대하여.
)
Results |
workshop gender q1 q2 q3 q4 score
1 1 f 1 1 5 1 3
2 2 f 2 1 4 1 5
3 1 f 2 2 4 3 6
4 2 f 3 1 NA 3 7
5 1 m 4 5 2 4 17
6 2 m 5 4 5 5 19
7 1 m 5 3 4 4 18
8 2 m 4 5 5 5 17
- 남성과 여성에 대하여 다른 공식을 할당.
- 성별이 결측치 인경우 , 결과 역시 NA로 할당.
R Programming |
%%R
withmooc %>%
dplyr::mutate(score = ifelse( gender=="f" ,
((2*q1)+q2), # 여성에 대하여.
ifelse(gender == "m",
((3*q1)+q2), NA)
)
)
Results |
workshop gender q1 q2 q3 q4 score
1 1 f 1 1 5 1 3
2 2 f 2 1 4 1 5
3 1 f 2 2 4 3 6
4 2 f 3 1 NA 3 7
5 1 m 4 5 2 4 17
6 2 m 5 4 5 5 19
7 1 m 5 3 4 4 18
8 2 m 4 5 5 5 17
R Programming |
%%R
withmooc %>%
dplyr::mutate(score = ifelse( gender=="f" ,
(withmooc$score1 <- 2*q1+q2),
ifelse( withmooc$gender=="m",
(withmooc$score1 <- 3*q1+q2), NA )
)
)
Results |
workshop gender q1 q2 q3 q4 score
1 1 f 1 1 5 1 3
2 2 f 2 1 4 1 5
3 1 f 2 2 4 3 6
4 2 f 3 1 NA 3 7
5 1 m 4 5 2 4 17
6 2 m 5 4 5 5 19
7 1 m 5 3 4 4 18
8 2 m 4 5 5 5 17
- 괄호안에서 더 복잡한 로직을 이용한 조건문 공식.
- 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
R Programming |
%%R
withmooc %>%
dplyr::mutate(score2 = ifelse( (workshop==1 & q4>=4) ,
((2*q1)+q2),
((3*q1)+q2) )
)
Results |
workshop gender q1 q2 q3 q4 score2
1 1 f 1 1 5 1 4
2 2 f 2 1 4 1 7
3 1 f 2 2 4 3 8
4 2 f 3 1 NA 3 10
5 1 m 4 5 2 4 13
6 2 m 5 4 5 5 19
7 1 m 5 3 4 4 13
8 2 m 4 5 5 5 17
6. Python - Pandas
Python Programming |
import pandas as pd
import numpy as np
import sweetviz as sv
mydata = pd.read_csv("C:/work/data/mydata.csv",sep=",",
dtype={'id':object,'workshop':object,
'q1':int, 'q2':int, 'q3':float, 'q4':int},
na_values=['NaN'],skipinitialspace =True)
withmooc= mydata.copy()
withmooc
Results |
id workshop gender q1 q2 q3 q4
0 1 1 f 1 1 5.0 1
1 2 2 f 2 1 4.0 1
2 3 1 f 2 2 4.0 3
3 4 2 f 3 1 NaN 3
4 5 1 m 4 5 2.0 4
5 6 2 m 5 4 5.0 5
6 7 1 m 5 3 4.0 4
7 8 2 m 4 5 5.0 5
- np.where 사용
- 0과 1의 이진 변수 생성.
- 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
- 문제 4를 맞힌 사람을 알수 있다.
Python Programming |
withmooc['q4Sagree'] = np.where(withmooc['q4'] == 5, 1, 0)
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree
0 1 1 f 1 1 5.0 1 0
1 2 2 f 2 1 4.0 1 0
2 3 1 f 2 2 4.0 3 0
3 4 2 f 3 1 NaN 3 0
4 5 1 m 4 5 2.0 4 0
5 6 2 m 5 4 5.0 5 1
6 7 1 m 5 3 4.0 4 0
7 8 2 m 4 5 5.0 5 1
- 함수 사용 - 리스트 비교
Python Programming |
def if_this_else_that(x, list_of_checks, yes_label, no_label):
if x in list_of_checks:
res = 1
else:
res = 0
return(res)
withmooc['new_rating'] = withmooc['q4'].apply(lambda x: if_this_else_that(x, [5], 1, 0))
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating
0 1 1 f 1 1 5.0 1 0 0
1 2 2 f 2 1 4.0 1 0 0
2 3 1 f 2 2 4.0 3 0 0
3 4 2 f 3 1 NaN 3 0 0
4 5 1 m 4 5 2.0 4 0 0
5 6 2 m 5 4 5.0 5 1 1
6 7 1 m 5 3 4.0 4 0 0
7 8 2 m 4 5 5.0 5 1 1
- 함수 사용 - 상수(5) 비교
Python Programming |
def if_this_else_that(x, list_of_checks, yes_label, no_label):
if x == list_of_checks:
res = 1
else:
res = 0
return(res)
withmooc['new_rating_1'] = withmooc['q4'].apply(lambda x: if_this_else_that(x, 5, 1, 0))
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1
0 1 1 f 1 1 5.0 1 0 0 0
1 2 2 f 2 1 4.0 1 0 0 0
2 3 1 f 2 2 4.0 3 0 0 0
3 4 2 f 3 1 NaN 3 0 0 0
4 5 1 m 4 5 2.0 4 0 0 0
5 6 2 m 5 4 5.0 5 1 1 1
6 7 1 m 5 3 4.0 4 0 0 0
7 8 2 m 4 5 5.0 5 1 1 1
- 값과 조건을 비교합니다.
- 그 결과를 새로운 변수에 할당합니다.
Python Programming |
withmooc['new_rating_2'] = (withmooc['q4'] == 5)
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2
0 1 1 f 1 1 5.0 1 0 0 0 False
1 2 2 f 2 1 4.0 1 0 0 0 False
2 3 1 f 2 2 4.0 3 0 0 0 False
3 4 2 f 3 1 NaN 3 0 0 0 False
4 5 1 m 4 5 2.0 4 0 0 0 False
5 6 2 m 5 4 5.0 5 1 1 1 True
6 7 1 m 5 3 4.0 4 0 0 0 False
7 8 2 m 4 5 5.0 5 1 1 1 True
- 5 ways to apply an IF condition in pandas DataFrame
- : https://datatofish.com/if-condition-in-pandas-dataframe/
Python Programming |
# (1) IF condition – Set of numbers
withmooc.loc[mydata['q4'] == 5, 'new_rating_3'] = 1
withmooc.loc[mydata['q4'] != 5, 'new_rating_3'] = 0
# (2) IF condition – set of numbers and lambda
withmooc['new_rating_4'] = withmooc['q4'].apply(lambda x: 1 if x == 5 else 0)
# (3) IF condition with OR
withmooc.loc[(mydata['q4'] == 5) | (withmooc['q4'] == 4), 'name_match'] = 'Match'
withmooc.loc[(mydata['q4'] != 5) & (withmooc['q4'] != 4), 'name_match'] = 'Mismatch'
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match
- q4가 4 이상인 경우 1을 할당.
Python Programming |
withmooc['q4agree'] = np.where(withmooc['q4'] >= 4, 1, 0)
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match q4agree
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch 0
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch 0
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch 0
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch 0
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match 1
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match 1
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match 1
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match 1
- 복잡한 로직 예제.
- workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
Python Programming |
withmooc['workshop1agree'] = np.where( ( (withmooc['workshop'] == '1') & (mydata['q4'] >= 4) ), 1, 0)
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match q4agree workshop1agree
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch 0 0
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch 0 0
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch 0 0
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch 0 0
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match 1 1
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match 1 0
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match 1 1
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match 1 0
- 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
- 남성이 참이면, 여성은 거짓이다.
Python Programming |
withmooc['score'] = np.where( withmooc['gender'] == 'f', (2*mydata['q1'])+mydata['q2'] ,
(3*mydata['q1'])+mydata['q2'] )
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match q4agree workshop1agree score
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch 0 0 3
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch 0 0 5
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch 0 0 6
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch 0 0 7
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match 1 1 17
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match 1 0 19
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match 1 1 18
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match 1 0 17
- 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할수 있다.
- 남성이 참이면, 여성은 거짓이다.
Python Programming |
withmooc['score'] = np.where( withmooc['gender'] == 'f', (2*mydata['q1'])+withmooc['q2'] ,
np.where( withmooc['gender'] == 'm',(3*withmooc['q1'])+withmooc['q2'],0) )
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match q4agree workshop1agree score
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch 0 0 3
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch 0 0 5
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch 0 0 6
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch 0 0 7
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match 1 1 17
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match 1 0 19
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match 1 1 18
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match 1 0 17
Python Programming |
# mydata['score_2'] = mydata['gender','q1'].apply(lambda x ,y : 99 if (x + y) == 1 else ( 3) )
withmooc['score_2'] = withmooc.apply(lambda x: 2*x['q1'] + x['q2'] if x['gender'] == 'f' else \
(3*x['q1'] + x['q2'] if x['gender'] == 'm' else 0), axis=1)
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match q4agree workshop1agree score score_2
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch 0 0 3 3
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch 0 0 5 5
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch 0 0 6 6
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch 0 0 7 7
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match 1 1 17 17
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match 1 0 19 19
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match 1 1 18 18
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match 1 0 17 17
Python Programming |
def func_avg_100(x):
if x.gender == 'f':
return ( 2 * x.q1 + x.q2)
else:
return ( 3 * x.q1 + x.q2 )
withmooc['score_3'] = withmooc.apply(lambda x:func_avg_100(x),axis=1)
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match q4agree workshop1agree score score_2 score_3
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch 0 0 3 3 3
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch 0 0 5 5 5
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch 0 0 6 6 6
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch 0 0 7 7 7
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match 1 1 17 17 17
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match 1 0 19 19 19
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match 1 1 18 18 18
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match 1 0 17 17 17
- 괄혼안에서 더 복잡한 로직을 이용한 조건문 공식.
- 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
Python Programming |
def func_logic(x):
if (x.workshop == '1' and x.q4 >= 4) :
return ( 2 * x.q1 + x.q2)
else:
return ( 3 * x.q1 + x.q2 )
withmooc['score_4'] = withmooc.apply(lambda x:func_logic(x),axis=1)
withmooc
Results |
id workshop gender q1 q2 q3 q4 q4Sagree new_rating new_rating_1 new_rating_2 new_rating_3 new_rating_4 name_match q4agree workshop1agree score score_2 score_3 score_4
0 1 1 f 1 1 5.0 1 0 0 0 False 0.0 0 Mismatch 0 0 3 3 3 4
1 2 2 f 2 1 4.0 1 0 0 0 False 0.0 0 Mismatch 0 0 5 5 5 7
2 3 1 f 2 2 4.0 3 0 0 0 False 0.0 0 Mismatch 0 0 6 6 6 8
3 4 2 f 3 1 NaN 3 0 0 0 False 0.0 0 Mismatch 0 0 7 7 7 10
4 5 1 m 4 5 2.0 4 0 0 0 False 0.0 0 Match 1 1 17 17 17 13
5 6 2 m 5 4 5.0 5 1 1 1 True 1.0 1 Match 1 0 19 19 19 19
6 7 1 m 5 3 4.0 4 0 0 0 False 0.0 0 Match 1 1 18 18 18 13
7 8 2 m 4 5 5.0 5 1 1 1 True 1.0 1 Match 1 0 17 17 17 17
7. Python - dfply
Python Programming |
import pandas as pd
from dfply import *
mydata = pd.read_csv("C:/work/data/mydata.csv",sep=",",
dtype={'id':object,'workshop':object,
'q1':int, 'q2':int, 'q3':float, 'q4':int},
na_values=['NaN'],skipinitialspace =True)
withmooc= mydata.copy()
# 모든 변수 선택하기.
withmooc
Results |
id workshop gender q1 q2 q3 q4
0 1 1 f 1 1 5.0 1
1 2 2 f 2 1 4.0 1
2 3 1 f 2 2 4.0 3
3 4 2 f 3 1 NaN 3
4 5 1 m 4 5 2.0 4
5 6 2 m 5 4 5.0 5
6 7 1 m 5 3 4.0 4
7 8 2 m 4 5 5.0 5
- 0과 1의 이진 변수 생성.
- 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
- 문제 4를 맞힌 사람을 알수 있다.
Python Programming |
withmooc >> mutate(q4Sagree=(X.q4 == 5).replace({True: 1, False: 0}))
withmooc >> mutate(q4Sagree=make_symbolic(np.where)(X.q4 == 5, 1,0))
Results |
id workshop gender q1 q2 q3 q4 q4Sagree
0 1 1 f 1 1 5.0 1 0
1 2 2 f 2 1 4.0 1 0
2 3 1 f 2 2 4.0 3 0
3 4 2 f 3 1 NaN 3 0
4 5 1 m 4 5 2.0 4 0
5 6 2 m 5 4 5.0 5 1
6 7 1 m 5 3 4.0 4 0
7 8 2 m 4 5 5.0 5 1
- q4가 4 이상인 경우 1을 할당.
Python Programming |
withmooc >> mutate(q4Sagree=(X.q4 >= 4).replace({True: 1, False: 0}))
withmooc >> mutate(q4Sagree=make_symbolic(np.where)(X.q4 >= 4, 1,0))
Results |
id workshop gender q1 q2 q3 q4 q4Sagree
0 1 1 f 1 1 5.0 1 0
1 2 2 f 2 1 4.0 1 0
2 3 1 f 2 2 4.0 3 0
3 4 2 f 3 1 NaN 3 0
4 5 1 m 4 5 2.0 4 1
5 6 2 m 5 4 5.0 5 1
6 7 1 m 5 3 4.0 4 1
7 8 2 m 4 5 5.0 5 1
Python Programming |
withmooc >> mutate(q4Sagree=case_when([X.q4 >= 4, 1], [True, 0]
))
Results |
id workshop gender q1 q2 q3 q4 q4Sagree
0 1 1 f 1 1 5.0 1 0.0
1 2 2 f 2 1 4.0 1 0.0
2 3 1 f 2 2 4.0 3 0.0
3 4 2 f 3 1 NaN 3 0.0
4 5 1 m 4 5 2.0 4 1.0
5 6 2 m 5 4 5.0 5 1.0
6 7 1 m 5 3 4.0 4 1.0
7 8 2 m 4 5 5.0 5 1.0
- 좀더 복잡한 로직 예제.
- workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
Python Programming |
withmooc >> mutate(workshop1agree=( (X.workshop == '1') & (X.q4 >= 4) ).replace({True: 1, False: 0}))
withmooc >> mutate(workshop1agree=case_when([(X.workshop == '1') & (X.q4 >= 4), 1], [True, 0]))
withmooc >> mutate(workshop1agree=make_symbolic(np.where)( (X.workshop == '1') & (X.q4 >= 4) , 1,0))
Results |
id workshop gender q1 q2 q3 q4 workshop1agree
0 1 1 f 1 1 5.0 1 0
1 2 2 f 2 1 4.0 1 0
2 3 1 f 2 2 4.0 3 0
3 4 2 f 3 1 NaN 3 0
4 5 1 m 4 5 2.0 4 1
5 6 2 m 5 4 5.0 5 0
6 7 1 m 5 3 4.0 4 1
7 8 2 m 4 5 5.0 5 0
- 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
- 남성이 참이면, 여성은 거짓이다.
Python Programming |
withmooc >> mutate(score=make_symbolic(np.where)( (X.gender=="f") ,((2*X.q1)+X.q2),((3*X.q1)+X.q2) ))
withmooc >> mutate(score=case_when([(X.gender=="f"), ((2*X.q1)+X.q2)], [True, ((3*X.q1)+X.q2)]))
Results |
id workshop gender q1 q2 q3 q4 score
0 1 1 f 1 1 5.0 1 3.0
1 2 2 f 2 1 4.0 1 5.0
2 3 1 f 2 2 4.0 3 6.0
3 4 2 f 3 1 NaN 3 7.0
4 5 1 m 4 5 2.0 4 17.0
5 6 2 m 5 4 5.0 5 19.0
6 7 1 m 5 3 4.0 4 18.0
7 8 2 m 4 5 5.0 5 17.0
- 남성과 여성에 대하여 다른 공식을 할당.
- 성별이 결측치 인경우 , 결과 역시 NA로 할당.
Python Programming |
withmooc >> mutate(score=make_symbolic(np.where)( (X.gender=="f") ,((2*X.q1)+X.q2),
make_symbolic(np.where)( (X.gender=="m") ,((3*X.q1)+X.q2),0 )
))
Results |
id workshop gender q1 q2 q3 q4 score
0 1 1 f 1 1 5.0 1 3
1 2 2 f 2 1 4.0 1 5
2 3 1 f 2 2 4.0 3 6
3 4 2 f 3 1 NaN 3 7
4 5 1 m 4 5 2.0 4 17
5 6 2 m 5 4 5.0 5 19
6 7 1 m 5 3 4.0 4 18
7 8 2 m 4 5 5.0 5 17
통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply) |
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 |
반응형
'통계프로그램 비교 시리즈 > 데이터 전처리 비교' 카테고리의 다른 글
통계프로그램 비교 시리즈 – [데이터 관리] 4. 다중 조건에 의한 변환 (0) | 2022.01.10 |
---|---|
통계프로그램 비교 시리즈 – [데이터 관리] 3. 결측값 할당을 위한 조건 변환 (0) | 2022.01.10 |
통계프로그램 비교 시리즈 – [데이터 관리] 1. 변수 변환 (0) | 2022.01.07 |
통계프로그램 비교 시리즈 – [변수와 관측치 선택] 3. 변수와 관측치를 동시에 선택 (0) | 2022.01.07 |
통계프로그램 비교 시리즈 – [변수와 관측치 선택] 2. 관측치 선택( WHERE, IF, SELECT IF ) (0) | 2022.01.07 |
댓글