본문 바로가기
통계프로그램 비교 시리즈/데이터 전처리 비교

통계프로그램 비교 시리즈 – [데이터 관리] 2. 조건문에 의한 변환

by 기서무나구물 2022. 1. 7.

[데이터 관리] 2. 조건문에 의한 변수 변환 후 신규 변수 생성하기

 


1. Proc SQL

 

SAS Programming
options ls=150;

proc sql;
  select case when q4=5 then 1 else 0 end as x1,
         ifn(q4=5,1,0) as x11,

         case when q4>=4 then 1 else 0 end as x2,
         ifn(q4>=4,1,0) as x22,

         case when workshop=1 and q4>=5 then 1 else 0 end as x3,
         ifn(workshop=1 and q4>=5,1,0) as x33,

         case when gender='f' then 2*q1+q2 else 3*q1+q2 end as scoreA,
         ifn(gender='f',2*q1+q2,3*q1+q2) as scoreAA,

         case when workshop=1 and q4>=5 then 2*q1+q2 else 3*q1+q2 end as scoreB,
         ifn(workshop=1 and q4>=5,2*q1+q2,3*q1+q2) as scoreBB

  from BACK.mydata;
quit;

 

Results
 x1       x11        x2       x22        x3       x33    scoreA   scoreAA    scoreB   scoreBB
---------------------------------------------------------------------------------------------
  0         0         0         0         0         0         3         3         4         4
  0         0         0         0         0         0         5         5         7         7
  0         0         0         0         0         0         6         6         8         8
  0         0         0         0         0         0         7         7        10        10
  0         0         1         1         0         0        17        17        17        17
  1         1         1         1         0         0        19        19        19        19
  0         0         1         1         0         0        18        18        18        18
  1         1         1         1         0         0        17        17        17        17

 


2. SAS Programming

 

SAS Programming
DATA withmooc;
 SET BACk.mydata;

     If q4= 5 then x1=1;
     else          x1=0;

     * if함수의 간단한 함수;
     x11 = ifn(q4=5,1,0);


     If q4>=4 then x2=1;
     else          x2=0;

     x22 = ifn(q4>=4,1,0);


     If workshop=1 & q4>=5 then x3=1;
     else                       x3=0;

     x33 = ifn(workshop=1 and q4>=5,1,0);


     If gender="f" then scoreA=2*q1+q2;
     Else               scoreA=3*q1+q2;

     scoreAA = ifn(gender='f',2*q1+q2,3*q1+q2);


     If workshop=1 and q4>=5 then scoreB=2*q1+q2;
     Else                          scoreB=3*q1+q2;

     scoreBB = ifn(workshop=1 and q4>=5,2*q1+q2,3*q1+q2);
run;

proc print;run;

 

Results
                                                                score    score    score    score
 q1    q2    q3    q4    x1    x11    x2    x22    x3    x33       A        AA       B        BB
  1     1     5     1     0     0      0     0      0     0        3        3        4        4
  2     1     4     1     0     0      0     0      0     0        5        5        7        7
  2     2     4     3     0     0      0     0      0     0        6        6        8        8
  3     1     .     3     0     0      0     0      0     0        7        7       10       10
  4     5     2     4     0     0      1     1      0     0       17       17       17       17
  5     4     5     5     1     1      1     1      0     0       19       19       19       19
  5     3     4     4     0     0      1     1      0     0       18       18       18       18
  4     5     5     5     1     1      1     1      0     0       17       17       17       17

 


3. SPSS

 

SPSS Programming
GET FILE=("C:\work\data\mydata.sav")

COMPUTE X1=0.
IF (q4 EQ 5 ) X1=1.



COMPUTE X2=0.
IF (q4 GE 4) X2=1.


COMPUTE X3=0.
IF (gender EQ 'f' AND Q4 GE 5) X3=1.


COMPUTE scoreA=3*q1+q2.
IF (gender='f') scoreA=2*q1+q2.


COMPUTE scoreB=3*q1+q2.
IF (workshop EQ 1 AND q4 GE 5) scoreB=2*q1+q2.

EXECUTE.

 


4. R Programming (R-PROJECT)

 

R Programming
from rpy2.robjects import r
%load_ext rpy2.ipython

 

Results
The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

 

 

R Programming
%%R

library(tidyverse)
load(file="C:/work/data/mydata.Rdata")

withmooc = mydata
attach(withmooc) # mydata를 기본 데이터 세트로 지정.

withmooc

 

Results
R[write to console]: -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

---------------------------------------

R[write to console]: -- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

 

Results
  workshop gender q1 q2 q3 q4
1        1      f  1  1  5  1
2        2      f  2  1  4  1
3        1      f  2  2  4  3
4        2      f  3  1 NA  3
5        1      m  4  5  2  4
6        2      m  5  4  5  5
7        1      m  5  3  4  4
8        2      m  4  5  5  5

 

 

  • 조건문에 의한 변환을 위한 R-Project
  • 0과 1의 이진 변수 생성.
  • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
  • 문제 4를 맞힌 사람을 알수 있다.
R Programming
%%R

withmooc$q4Sagree <- ifelse( q4 == 5, 1,0 )

withmooc

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree
1        1      f  1  1  5  1        0
2        2      f  2  1  4  1        0
3        1      f  2  2  4  3        0
4        2      f  3  1 NA  3        0
5        1      m  4  5  2  4        0
6        2      m  5  4  5  5        1
7        1      m  5  3  4  4        0
8        2      m  4  5  5  5        1

 

 

  • q4가 4 이상인 경우 1을 할당.
R Programming
%%R

withmooc$q4agree <- ifelse( q4 >= 4, 1,0)

withmooc

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree q4agree
1        1      f  1  1  5  1        0       0
2        2      f  2  1  4  1        0       0
3        1      f  2  2  4  3        0       0
4        2      f  3  1 NA  3        0       0
5        1      m  4  5  2  4        0       1
6        2      m  5  4  5  5        1       1
7        1      m  5  3  4  4        0       1
8        2      m  4  5  5  5        1       1

 

 

  • 좀더 복잡한 로직 예제.
  • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
R Programming
%%R

withmooc$workshop1agree <- ifelse( (workshop == 1 & q4 >=4 ) , 1,0)

withmooc

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree
1        1      f  1  1  5  1        0       0              0
2        2      f  2  1  4  1        0       0              0
3        1      f  2  2  4  3        0       0              0
4        2      f  3  1 NA  3        0       0              0
5        1      m  4  5  2  4        0       1              1
6        2      m  5  4  5  5        1       1              0
7        1      m  5  3  4  4        0       1              1
8        2      m  4  5  5  5        1       1              0

 

 

  • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
  • 남성이 참이면, 여성은 거짓이다.
R Programming
%%R

withmooc$score <- ifelse( gender=="f" ,
                        (withmooc$score <- (2*q1)+q2),  # 여성에 대하여.
                        (withmooc$score <- (3*q1)+q2) ) # 남성에 대하여.

withmooc

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score
1        1      f  1  1  5  1        0       0              0     3
2        2      f  2  1  4  1        0       0              0     5
3        1      f  2  2  4  3        0       0              0     6
4        2      f  3  1 NA  3        0       0              0     7
5        1      m  4  5  2  4        0       1              1    17
6        2      m  5  4  5  5        1       1              0    19
7        1      m  5  3  4  4        0       1              1    18
8        2      m  4  5  5  5        1       1              0    17

 

 

  • 남성과 여성에 대하여 다른 공식을 할당.
  • 성별이 결측치 인경우 , 결과 역시 NA로 할당.
R Programming
%%R

withmooc$score1 <- ifelse( gender=="f" ,
                           (withmooc$score1 <- 2*q1+q2),
                           ifelse( withmooc$gender=="m",
                                   (withmooc$score1 <- 3*q1+q2), NA )
                          )

withmooc

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score score1
1        1      f  1  1  5  1        0       0              0     3      3
2        2      f  2  1  4  1        0       0              0     5      5
3        1      f  2  2  4  3        0       0              0     6      6
4        2      f  3  1 NA  3        0       0              0     7      7
5        1      m  4  5  2  4        0       1              1    17     17
6        2      m  5  4  5  5        1       1              0    19     19
7        1      m  5  3  4  4        0       1              1    18     18
8        2      m  4  5  5  5        1       1              0    17     17

 

 

  • 괄혼안에서 더 복잡한 로직을 이용한 조건문 공식.
  • 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
R Programming
%%R

withmooc$score2 <- ifelse( (workshop==1 & q4>=4) ,
                           (withmooc$score2 <- 2*q1+q2),
                           (withmooc$score2 <- 3*q1+q2) )

withmooc

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score score1 score2
1        1      f  1  1  5  1        0       0              0     3      3      4
2        2      f  2  1  4  1        0       0              0     5      5      7
3        1      f  2  2  4  3        0       0              0     6      6      8
4        2      f  3  1 NA  3        0       0              0     7      7     10
5        1      m  4  5  2  4        0       1              1    17     17     13
6        2      m  5  4  5  5        1       1              0    19     19     19
7        1      m  5  3  4  4        0       1              1    18     18     13
8        2      m  4  5  5  5        1       1              0    17     17     17

 

 

R Programming
%%R

detach(mydata)

 

Results
From cffi callback :
Traceback (most recent call last):
  File "C:\Users\BACK\anaconda3\lib\site-packages\rpy2\rinterface_lib\callbacks.py", line 131, in _consolewrite_ex
    s = conversion._cchar_to_str_with_maxlen(buf, maxlen=n)
  File "C:\Users\BACK\anaconda3\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 130, in _cchar_to_str_with_maxlen
    s = ffi.string(c, maxlen).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 26: invalid start byte



'utf-8' codec can't decode byte 0xc0 in position 26: invalid start byte



    --------------------------------------------------------------------------

PermissionError: [WinError 32] 다른 프로세스가 파일을 사용 중이기 때문에 프로세스가 액세스 할 수 없습니다: 'C:\\Users\\BACK\\AppData\\Local\\Temp\\tmph0wb9noj\\Rplots001.png'

 


5. R - Tidyverse

 

R Programming
%%R

library(tidyverse)
attach(mydata) # mydata를 기본 데이터 세트로 지정.
load(file="C:/work/data/mydata.Rdata")

withmooc = mydata

attach(withmooc) # mydata를 기본 데이터 세트로 지정.

withmooc
R[write to console]: The following objects are masked from withmooc (pos = 3):

    gender, q1, q2, q3, q4, workshop


------ ------ ------ ------ ------ ------


R[write to console]: The following objects are masked from withmooc (pos = 9):

    gender, q1, q2, q3, q4, workshop
    

 

Results
  workshop gender q1 q2 q3 q4
1        1      f  1  1  5  1
2        2      f  2  1  4  1
3        1      f  2  2  4  3
4        2      f  3  1 NA  3
5        1      m  4  5  2  4
6        2      m  5  4  5  5
7        1      m  5  3  4  4
8        2      m  4  5  5  5

 

 

  • 0과 1의 이진 변수 생성.
  • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
  • 문제 4를 맞힌 사람을 알수 있다.
R Programming
%%R

withmooc %>%
  dplyr::mutate(q4Sagree = ifelse(q4 == 5, 1,0))

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree
1        1      f  1  1  5  1        0
2        2      f  2  1  4  1        0
3        1      f  2  2  4  3        0
4        2      f  3  1 NA  3        0
5        1      m  4  5  2  4        0
6        2      m  5  4  5  5        1
7        1      m  5  3  4  4        0
8        2      m  4  5  5  5        1

 

 

  • q4가 4 이상인 경우 1을 할당.
R Programming
%%R

withmooc %>%
  dplyr::mutate(q4Sagree = ifelse(q4 >= 4, 1,0))

 

Results
  workshop gender q1 q2 q3 q4 q4Sagree
1        1      f  1  1  5  1        0
2        2      f  2  1  4  1        0
3        1      f  2  2  4  3        0
4        2      f  3  1 NA  3        0
5        1      m  4  5  2  4        1
6        2      m  5  4  5  5        1
7        1      m  5  3  4  4        1
8        2      m  4  5  5  5        1

 

 

  • 복잡한 로직 예제.
  • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
R Programming
%%R

withmooc %>%
  dplyr::mutate(workshop1agree = ifelse((workshop == 1 & q4 >=4 ) , 1,0))

 

Results
  workshop gender q1 q2 q3 q4 workshop1agree
1        1      f  1  1  5  1              0
2        2      f  2  1  4  1              0
3        1      f  2  2  4  3              0
4        2      f  3  1 NA  3              0
5        1      m  4  5  2  4              1
6        2      m  5  4  5  5              0
7        1      m  5  3  4  4              1
8        2      m  4  5  5  5              0

 

 

  • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
  • 남성이 참이면, 여성은 거짓이다.
R Programming
%%R

withmooc %>%
  dplyr::mutate(score = ifelse( gender=="f" ,
                                ((2*q1)+q2),  # 여성에 대하여.
                                ((3*q1)+q2) ) # 남성에 대하여.
                )

 

Results
  workshop gender q1 q2 q3 q4 score
1        1      f  1  1  5  1     3
2        2      f  2  1  4  1     5
3        1      f  2  2  4  3     6
4        2      f  3  1 NA  3     7
5        1      m  4  5  2  4    17
6        2      m  5  4  5  5    19
7        1      m  5  3  4  4    18
8        2      m  4  5  5  5    17

 

 

  • 남성과 여성에 대하여 다른 공식을 할당.
  • 성별이 결측치 인경우 , 결과 역시 NA로 할당.
R Programming
%%R

withmooc %>%
  dplyr::mutate(score = ifelse( gender=="f" ,
                                ((2*q1)+q2),  # 여성에 대하여.
                                ifelse(gender == "m",
                                       ((3*q1)+q2), NA)
                               ) 
               )

 

Results
  workshop gender q1 q2 q3 q4 score
1        1      f  1  1  5  1     3
2        2      f  2  1  4  1     5
3        1      f  2  2  4  3     6
4        2      f  3  1 NA  3     7
5        1      m  4  5  2  4    17
6        2      m  5  4  5  5    19
7        1      m  5  3  4  4    18
8        2      m  4  5  5  5    17

 

 

 

R Programming
%%R

withmooc %>%
  dplyr::mutate(score =  ifelse( gender=="f" ,
                                 (withmooc$score1 <- 2*q1+q2),
                                  ifelse( withmooc$gender=="m",
                                          (withmooc$score1 <- 3*q1+q2), NA )                        
                      )
               )

 

Results
  workshop gender q1 q2 q3 q4 score
1        1      f  1  1  5  1     3
2        2      f  2  1  4  1     5
3        1      f  2  2  4  3     6
4        2      f  3  1 NA  3     7
5        1      m  4  5  2  4    17
6        2      m  5  4  5  5    19
7        1      m  5  3  4  4    18
8        2      m  4  5  5  5    17

 

 

  • 괄호안에서 더 복잡한 로직을 이용한 조건문 공식.
  • 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
R Programming
%%R

withmooc %>%
  dplyr::mutate(score2 = ifelse( (workshop==1 & q4>=4) ,
                                 ((2*q1)+q2),
                                 ((3*q1)+q2) )
               )

 

Results
  workshop gender q1 q2 q3 q4 score2
1        1      f  1  1  5  1      4
2        2      f  2  1  4  1      7
3        1      f  2  2  4  3      8
4        2      f  3  1 NA  3     10
5        1      m  4  5  2  4     13
6        2      m  5  4  5  5     19
7        1      m  5  3  4  4     13
8        2      m  4  5  5  5     17

 


6. Python - Pandas

 

Python Programming
import pandas as pd
import numpy as np
import sweetviz as sv

mydata   = pd.read_csv("C:/work/data/mydata.csv",sep=",",
                       dtype={'id':object,'workshop':object,
                              'q1':int, 'q2':int, 'q3':float, 'q4':int},
                       na_values=['NaN'],skipinitialspace =True)

withmooc= mydata.copy()

withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4
0	1	1		f	1	1	5.0	1
1	2	2		f	2	1	4.0	1
2	3	1		f	2	2	4.0	3
3	4	2		f	3	1	NaN	3
4	5	1		m	4	5	2.0	4
5	6	2		m	5	4	5.0	5
6	7	1		m	5	3	4.0	4
7	8	2		m	4	5	5.0	5

 

  • np.where 사용
  • 0과 1의 이진 변수 생성.
  • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
  • 문제 4를 맞힌 사람을 알수 있다.
Python Programming
withmooc['q4Sagree'] = np.where(withmooc['q4'] == 5, 1, 0)
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree
0	1	1		f	1	1	5.0	1	0
1	2	2		f	2	1	4.0	1	0
2	3	1		f	2	2	4.0	3	0
3	4	2		f	3	1	NaN	3	0
4	5	1		m	4	5	2.0	4	0
5	6	2		m	5	4	5.0	5	1
6	7	1		m	5	3	4.0	4	0
7	8	2		m	4	5	5.0	5	1

 

  • 함수 사용 - 리스트 비교
Python Programming
def if_this_else_that(x, list_of_checks, yes_label, no_label):
    if x in list_of_checks:
        res = 1
    else: 
        res = 0
    return(res)

withmooc['new_rating'] = withmooc['q4'].apply(lambda x: if_this_else_that(x, [5], 1, 0))
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating
0	1	1		f	1	1	5.0	1	0	0
1	2	2		f	2	1	4.0	1	0	0
2	3	1		f	2	2	4.0	3	0	0
3	4	2		f	3	1	NaN	3	0	0
4	5	1		m	4	5	2.0	4	0	0
5	6	2		m	5	4	5.0	5	1	1
6	7	1		m	5	3	4.0	4	0	0
7	8	2		m	4	5	5.0	5	1	1

 

  • 함수 사용 - 상수(5) 비교
Python Programming
def if_this_else_that(x, list_of_checks, yes_label, no_label):
    if x ==  list_of_checks:
        res = 1
    else: 
        res = 0
    return(res)

withmooc['new_rating_1'] = withmooc['q4'].apply(lambda x: if_this_else_that(x, 5, 1, 0))
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1
0	1	1		f	1	1	5.0	1	0	0	0
1	2	2		f	2	1	4.0	1	0	0	0
2	3	1		f	2	2	4.0	3	0	0	0
3	4	2		f	3	1	NaN	3	0	0	0
4	5	1		m	4	5	2.0	4	0	0	0
5	6	2		m	5	4	5.0	5	1	1	1
6	7	1		m	5	3	4.0	4	0	0	0
7	8	2		m	4	5	5.0	5	1	1	1

 

 

 

 
  • 값과 조건을 비교합니다.
  • 그 결과를 새로운 변수에 할당합니다.
Python Programming
withmooc['new_rating_2'] = (withmooc['q4'] == 5)
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2
0	1	1		f	1	1	5.0	1	0		0		0		False
1	2	2		f	2	1	4.0	1	0		0		0		False
2	3	1		f	2	2	4.0	3	0		0		0		False
3	4	2		f	3	1	NaN	3	0		0		0		False
4	5	1		m	4	5	2.0	4	0		0		0		False
5	6	2		m	5	4	5.0	5	1		1		1		True
6	7	1		m	5	3	4.0	4	0		0		0		False
7	8	2		m	4	5	5.0	5	1		1		1		True

 

Python Programming
# (1) IF condition – Set of numbers
withmooc.loc[mydata['q4'] == 5, 'new_rating_3'] = 1
withmooc.loc[mydata['q4'] != 5, 'new_rating_3'] = 0

# (2) IF condition – set of numbers and lambda
withmooc['new_rating_4'] = withmooc['q4'].apply(lambda x: 1 if x == 5 else 0)

# (3)  IF condition with OR
withmooc.loc[(mydata['q4'] == 5) | (withmooc['q4'] == 4), 'name_match'] = 'Match'  
withmooc.loc[(mydata['q4'] != 5) & (withmooc['q4'] != 4), 'name_match'] = 'Mismatch' 

withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch
1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch
  3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match

 


  • q4가 4 이상인 경우 1을 할당.
Python Programming
withmooc['q4agree'] = np.where(withmooc['q4'] >= 4, 1, 0)
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0
1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0
3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1

 


  • 복잡한 로직 예제.
  • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
Python Programming
withmooc['workshop1agree'] = np.where( ( (withmooc['workshop'] == '1') & (mydata['q4'] >= 4) ), 1, 0)
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0
1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0
3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0

 

  • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
  • 남성이 참이면, 여성은 거짓이다.
Python Programming
withmooc['score'] = np.where( withmooc['gender'] == 'f', (2*mydata['q1'])+mydata['q2'] , 
                                                         (3*mydata['q1'])+mydata['q2'] )
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3
  1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6
3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17

 

  • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할수 있다.
  • 남성이 참이면, 여성은 거짓이다.
Python Programming
withmooc['score'] = np.where( withmooc['gender'] == 'f', (2*mydata['q1'])+withmooc['q2'] , 
                                                          np.where( withmooc['gender'] == 'm',(3*withmooc['q1'])+withmooc['q2'],0) )

withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3
1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6
3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17

 

Python Programming
# mydata['score_2'] = mydata['gender','q1'].apply(lambda x ,y : 99 if (x + y) == 1 else ( 3) )

withmooc['score_2'] = withmooc.apply(lambda x: 2*x['q1'] + x['q2'] if x['gender'] == 'f' else \
                                               (3*x['q1'] + x['q2'] if x['gender'] == 'm' else 0), axis=1)


withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score	score_2
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3	3
1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5	5
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6	6
3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7	7
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17	17
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19	19
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18	18
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17	17
 
 
Python Programming
def func_avg_100(x):
    if x.gender == 'f':
        return ( 2 * x.q1 + x.q2)
    else:
        return ( 3 * x.q1 + x.q2 )

withmooc['score_3'] = withmooc.apply(lambda x:func_avg_100(x),axis=1)

withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score	score_2	score_3
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3	3	3
1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5	5	5
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6	6	6
3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7	7	7
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17	17	17
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19	19	19
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18	18	18
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17	17	17

 

  • 괄혼안에서 더 복잡한 로직을 이용한 조건문 공식.
  • 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
Python Programming
def func_logic(x):
    if (x.workshop == '1' and x.q4 >= 4) :
        return ( 2 * x.q1 + x.q2)
    else:
        return ( 3 * x.q1 + x.q2 )

withmooc['score_4'] = withmooc.apply(lambda x:func_logic(x),axis=1)

withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score	score_2	score_3	score_4
0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3	3	3	4
1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5	5	5	7
2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6	6	6	8
  3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7	7	7	10
4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17	17	17	13
5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19	19	19	19
6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18	18	18	13
7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17	17	17	17

 


7. Python - dfply

 

Python Programming
import pandas as pd
from dfply import *

mydata   = pd.read_csv("C:/work/data/mydata.csv",sep=",",
                                            dtype={'id':object,'workshop':object,
                                                   'q1':int, 'q2':int, 'q3':float, 'q4':int},
                                            na_values=['NaN'],skipinitialspace =True)

withmooc= mydata.copy()

# 모든 변수 선택하기.
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4
0	1	1		f	1	1	5.0	1
1	2	2		f	2	1	4.0	1
2	3	1		f	2	2	4.0	3
3	4	2		f	3	1	NaN	3
4	5	1		m	4	5	2.0	4
5	6	2		m	5	4	5.0	5
6	7	1		m	5	3	4.0	4
7	8	2		m	4	5	5.0	5

 

  • 0과 1의 이진 변수 생성.
  • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
  • 문제 4를 맞힌 사람을 알수 있다.
Python Programming
withmooc >> mutate(q4Sagree=(X.q4 == 5).replace({True: 1, False: 0}))

withmooc >> mutate(q4Sagree=make_symbolic(np.where)(X.q4 == 5, 1,0))

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree
0	1	1		f	1	1	5.0	1	0
1	2	2		f	2	1	4.0	1	0
2	3	1		f	2	2	4.0	3	0
3	4	2		f	3	1	NaN	3	0
4	5	1		m	4	5	2.0	4	0
5	6	2		m	5	4	5.0	5	1
6	7	1		m	5	3	4.0	4	0
7	8	2		m	4	5	5.0	5	1

 

  • q4가 4 이상인 경우 1을 할당.
Python Programming
withmooc >> mutate(q4Sagree=(X.q4 >= 4).replace({True: 1, False: 0}))

withmooc >> mutate(q4Sagree=make_symbolic(np.where)(X.q4 >= 4, 1,0))

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree
0	1	1		f	1	1	5.0	1	0
1	2	2		f	2	1	4.0	1	0
2	3	1		f	2	2	4.0	3	0
3	4	2		f	3	1	NaN	3	0
4	5	1		m	4	5	2.0	4	1
5	6	2		m	5	4	5.0	5	1
6	7	1		m	5	3	4.0	4	1
7	8	2		m	4	5	5.0	5	1

 

Python Programming
withmooc >> mutate(q4Sagree=case_when([X.q4 >= 4, 1], [True, 0]
                                    ))

 

Results
	id	workshop	gender	q1	q2	q3	q4	q4Sagree
0	1	1		f	1	1	5.0	1	0.0
1	2	2		f	2	1	4.0	1	0.0
2	3	1		f	2	2	4.0	3	0.0
3	4	2		f	3	1	NaN	3	0.0
4	5	1		m	4	5	2.0	4	1.0
5	6	2		m	5	4	5.0	5	1.0
6	7	1		m	5	3	4.0	4	1.0
7	8	2		m	4	5	5.0	5	1.0

 

  • 좀더 복잡한 로직 예제.
  • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
Python Programming
withmooc >> mutate(workshop1agree=( (X.workshop == '1') & (X.q4 >= 4)  ).replace({True: 1, False: 0}))

withmooc >> mutate(workshop1agree=case_when([(X.workshop == '1') & (X.q4 >= 4), 1], [True, 0]))

withmooc >> mutate(workshop1agree=make_symbolic(np.where)( (X.workshop == '1') & (X.q4 >= 4) , 1,0))

 

Results
	id	workshop	gender	q1	q2	q3	q4	workshop1agree
0	1	1		f	1	1	5.0	1	0
1	2	2		f	2	1	4.0	1	0
2	3	1		f	2	2	4.0	3	0
3	4	2		f	3	1	NaN	3	0
4	5	1		m	4	5	2.0	4	1
5	6	2		m	5	4	5.0	5	0
6	7	1		m	5	3	4.0	4	1
7	8	2		m	4	5	5.0	5	0

 

  • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
  • 남성이 참이면, 여성은 거짓이다.
Python Programming
withmooc >> mutate(score=make_symbolic(np.where)( (X.gender=="f") ,((2*X.q1)+X.q2),((3*X.q1)+X.q2) ))

withmooc >> mutate(score=case_when([(X.gender=="f"), ((2*X.q1)+X.q2)], [True, ((3*X.q1)+X.q2)]))

 

Results
	id	workshop	gender	q1	q2	q3	q4	score
0	1	1		f	1	1	5.0	1	3.0
1	2	2		f	2	1	4.0	1	5.0
2	3	1		f	2	2	4.0	3	6.0
3	4	2		f	3	1	NaN	3	7.0
4	5	1		m	4	5	2.0	4	17.0
5	6	2		m	5	4	5.0	5	19.0
6	7	1		m	5	3	4.0	4	18.0
7	8	2		m	4	5	5.0	5	17.0

 

  • 남성과 여성에 대하여 다른 공식을 할당.
  • 성별이 결측치 인경우 , 결과 역시 NA로 할당.
Python Programming
withmooc >> mutate(score=make_symbolic(np.where)( (X.gender=="f") ,((2*X.q1)+X.q2),
                                                   make_symbolic(np.where)( (X.gender=="m") ,((3*X.q1)+X.q2),0 )
                                              ))

 

Results
	id	workshop	gender	q1	q2	q3	q4	score
0	1	1		f	1	1	5.0	1	3
1	2	2		f	2	1	4.0	1	5
2	3	1		f	2	2	4.0	3	6
3	4	2		f	3	1	NaN	3	7
4	5	1		m	4	5	2.0	4	17
5	6	2		m	5	4	5.0	5	19
6	7	1		m	5	3	4.0	4	18
7	8	2		m	4	5	5.0	5	17

 

 


 

통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply)
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]
SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크
반응형

댓글