본문 바로가기
통계프로그램 비교 시리즈/데이터 전처리 비교

통계프로그램 비교 시리즈 – [데이터 관리] 2. 조건문에 의한 변환

by 기서무나구물 2022. 1. 7.

포스팅 목차

    [데이터 관리] 2. 조건문에 의한 변수 변환 후 신규 변수 생성하기

     


    1. Proc SQL

     

    SAS Programming
    options ls=150;
    
    proc sql;
      select case when q4=5 then 1 else 0 end as x1,
             ifn(q4=5,1,0) as x11,
    
             case when q4>=4 then 1 else 0 end as x2,
             ifn(q4>=4,1,0) as x22,
    
             case when workshop=1 and q4>=5 then 1 else 0 end as x3,
             ifn(workshop=1 and q4>=5,1,0) as x33,
    
             case when gender='f' then 2*q1+q2 else 3*q1+q2 end as scoreA,
             ifn(gender='f',2*q1+q2,3*q1+q2) as scoreAA,
    
             case when workshop=1 and q4>=5 then 2*q1+q2 else 3*q1+q2 end as scoreB,
             ifn(workshop=1 and q4>=5,2*q1+q2,3*q1+q2) as scoreBB
    
      from BACK.mydata;
    quit;

     

    Results
     x1       x11        x2       x22        x3       x33    scoreA   scoreAA    scoreB   scoreBB
    ---------------------------------------------------------------------------------------------
      0         0         0         0         0         0         3         3         4         4
      0         0         0         0         0         0         5         5         7         7
      0         0         0         0         0         0         6         6         8         8
      0         0         0         0         0         0         7         7        10        10
      0         0         1         1         0         0        17        17        17        17
      1         1         1         1         0         0        19        19        19        19
      0         0         1         1         0         0        18        18        18        18
      1         1         1         1         0         0        17        17        17        17

     


    2. SAS Programming

     

    SAS Programming
    DATA withmooc;
     SET BACk.mydata;
    
         If q4= 5 then x1=1;
         else          x1=0;
    
         * if함수의 간단한 함수;
         x11 = ifn(q4=5,1,0);
    
    
         If q4>=4 then x2=1;
         else          x2=0;
    
         x22 = ifn(q4>=4,1,0);
    
    
         If workshop=1 & q4>=5 then x3=1;
         else                       x3=0;
    
         x33 = ifn(workshop=1 and q4>=5,1,0);
    
    
         If gender="f" then scoreA=2*q1+q2;
         Else               scoreA=3*q1+q2;
    
         scoreAA = ifn(gender='f',2*q1+q2,3*q1+q2);
    
    
         If workshop=1 and q4>=5 then scoreB=2*q1+q2;
         Else                          scoreB=3*q1+q2;
    
         scoreBB = ifn(workshop=1 and q4>=5,2*q1+q2,3*q1+q2);
    run;
    
    proc print;run;

     

    Results
                                                                    score    score    score    score
     q1    q2    q3    q4    x1    x11    x2    x22    x3    x33       A        AA       B        BB
      1     1     5     1     0     0      0     0      0     0        3        3        4        4
      2     1     4     1     0     0      0     0      0     0        5        5        7        7
      2     2     4     3     0     0      0     0      0     0        6        6        8        8
      3     1     .     3     0     0      0     0      0     0        7        7       10       10
      4     5     2     4     0     0      1     1      0     0       17       17       17       17
      5     4     5     5     1     1      1     1      0     0       19       19       19       19
      5     3     4     4     0     0      1     1      0     0       18       18       18       18
      4     5     5     5     1     1      1     1      0     0       17       17       17       17

     


    3. SPSS

     

    SPSS Programming
    GET FILE=("C:\work\data\mydata.sav")
    
    COMPUTE X1=0.
    IF (q4 EQ 5 ) X1=1.
    
    
    
    COMPUTE X2=0.
    IF (q4 GE 4) X2=1.
    
    
    COMPUTE X3=0.
    IF (gender EQ 'f' AND Q4 GE 5) X3=1.
    
    
    COMPUTE scoreA=3*q1+q2.
    IF (gender='f') scoreA=2*q1+q2.
    
    
    COMPUTE scoreB=3*q1+q2.
    IF (workshop EQ 1 AND q4 GE 5) scoreB=2*q1+q2.
    
    EXECUTE.

     


    4. R Programming (R-PROJECT)

     

    R Programming
    from rpy2.robjects import r
    %load_ext rpy2.ipython

     

    Results
    The rpy2.ipython extension is already loaded. To reload it, use:
      %reload_ext rpy2.ipython

     

     

    R Programming
    %%R
    
    library(tidyverse)
    load(file="C:/work/data/mydata.Rdata")
    
    withmooc = mydata
    attach(withmooc) # mydata를 기본 데이터 세트로 지정.
    
    withmooc

     

    Results
    R[write to console]: -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
    
    ---------------------------------------
    
    R[write to console]: -- Conflicts ------------------------------------------ tidyverse_conflicts() --
    x dplyr::filter() masks stats::filter()
    x dplyr::lag()    masks stats::lag()

     

    Results
      workshop gender q1 q2 q3 q4
    1        1      f  1  1  5  1
    2        2      f  2  1  4  1
    3        1      f  2  2  4  3
    4        2      f  3  1 NA  3
    5        1      m  4  5  2  4
    6        2      m  5  4  5  5
    7        1      m  5  3  4  4
    8        2      m  4  5  5  5

     

     

    • 조건문에 의한 변환을 위한 R-Project
    • 0과 1의 이진 변수 생성.
    • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
    • 문제 4를 맞힌 사람을 알수 있다.
    R Programming
    %%R
    
    withmooc$q4Sagree <- ifelse( q4 == 5, 1,0 )
    
    withmooc

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree
    1        1      f  1  1  5  1        0
    2        2      f  2  1  4  1        0
    3        1      f  2  2  4  3        0
    4        2      f  3  1 NA  3        0
    5        1      m  4  5  2  4        0
    6        2      m  5  4  5  5        1
    7        1      m  5  3  4  4        0
    8        2      m  4  5  5  5        1

     

     

    • q4가 4 이상인 경우 1을 할당.
    R Programming
    %%R
    
    withmooc$q4agree <- ifelse( q4 >= 4, 1,0)
    
    withmooc

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree q4agree
    1        1      f  1  1  5  1        0       0
    2        2      f  2  1  4  1        0       0
    3        1      f  2  2  4  3        0       0
    4        2      f  3  1 NA  3        0       0
    5        1      m  4  5  2  4        0       1
    6        2      m  5  4  5  5        1       1
    7        1      m  5  3  4  4        0       1
    8        2      m  4  5  5  5        1       1

     

     

    • 좀더 복잡한 로직 예제.
    • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
    R Programming
    %%R
    
    withmooc$workshop1agree <- ifelse( (workshop == 1 & q4 >=4 ) , 1,0)
    
    withmooc

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree
    1        1      f  1  1  5  1        0       0              0
    2        2      f  2  1  4  1        0       0              0
    3        1      f  2  2  4  3        0       0              0
    4        2      f  3  1 NA  3        0       0              0
    5        1      m  4  5  2  4        0       1              1
    6        2      m  5  4  5  5        1       1              0
    7        1      m  5  3  4  4        0       1              1
    8        2      m  4  5  5  5        1       1              0

     

     

    • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
    • 남성이 참이면, 여성은 거짓이다.
    R Programming
    %%R
    
    withmooc$score <- ifelse( gender=="f" ,
                            (withmooc$score <- (2*q1)+q2),  # 여성에 대하여.
                            (withmooc$score <- (3*q1)+q2) ) # 남성에 대하여.
    
    withmooc

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score
    1        1      f  1  1  5  1        0       0              0     3
    2        2      f  2  1  4  1        0       0              0     5
    3        1      f  2  2  4  3        0       0              0     6
    4        2      f  3  1 NA  3        0       0              0     7
    5        1      m  4  5  2  4        0       1              1    17
    6        2      m  5  4  5  5        1       1              0    19
    7        1      m  5  3  4  4        0       1              1    18
    8        2      m  4  5  5  5        1       1              0    17

     

     

    • 남성과 여성에 대하여 다른 공식을 할당.
    • 성별이 결측치 인경우 , 결과 역시 NA로 할당.
    R Programming
    %%R
    
    withmooc$score1 <- ifelse( gender=="f" ,
                               (withmooc$score1 <- 2*q1+q2),
                               ifelse( withmooc$gender=="m",
                                       (withmooc$score1 <- 3*q1+q2), NA )
                              )
    
    withmooc

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score score1
    1        1      f  1  1  5  1        0       0              0     3      3
    2        2      f  2  1  4  1        0       0              0     5      5
    3        1      f  2  2  4  3        0       0              0     6      6
    4        2      f  3  1 NA  3        0       0              0     7      7
    5        1      m  4  5  2  4        0       1              1    17     17
    6        2      m  5  4  5  5        1       1              0    19     19
    7        1      m  5  3  4  4        0       1              1    18     18
    8        2      m  4  5  5  5        1       1              0    17     17

     

     

    • 괄혼안에서 더 복잡한 로직을 이용한 조건문 공식.
    • 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
    R Programming
    %%R
    
    withmooc$score2 <- ifelse( (workshop==1 & q4>=4) ,
                               (withmooc$score2 <- 2*q1+q2),
                               (withmooc$score2 <- 3*q1+q2) )
    
    withmooc

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree q4agree workshop1agree score score1 score2
    1        1      f  1  1  5  1        0       0              0     3      3      4
    2        2      f  2  1  4  1        0       0              0     5      5      7
    3        1      f  2  2  4  3        0       0              0     6      6      8
    4        2      f  3  1 NA  3        0       0              0     7      7     10
    5        1      m  4  5  2  4        0       1              1    17     17     13
    6        2      m  5  4  5  5        1       1              0    19     19     19
    7        1      m  5  3  4  4        0       1              1    18     18     13
    8        2      m  4  5  5  5        1       1              0    17     17     17

     

     

    R Programming
    %%R
    
    detach(mydata)

     

    Results
    From cffi callback :
    Traceback (most recent call last):
      File "C:\Users\BACK\anaconda3\lib\site-packages\rpy2\rinterface_lib\callbacks.py", line 131, in _consolewrite_ex
        s = conversion._cchar_to_str_with_maxlen(buf, maxlen=n)
      File "C:\Users\BACK\anaconda3\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 130, in _cchar_to_str_with_maxlen
        s = ffi.string(c, maxlen).decode('utf-8')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 26: invalid start byte
    
    
    
    'utf-8' codec can't decode byte 0xc0 in position 26: invalid start byte
    
    
    
        --------------------------------------------------------------------------
    
    PermissionError: [WinError 32] 다른 프로세스가 파일을 사용 중이기 때문에 프로세스가 액세스 할 수 없습니다: 'C:\\Users\\BACK\\AppData\\Local\\Temp\\tmph0wb9noj\\Rplots001.png'

     


    5. R - Tidyverse

     

    R Programming
    %%R
    
    library(tidyverse)
    attach(mydata) # mydata를 기본 데이터 세트로 지정.
    load(file="C:/work/data/mydata.Rdata")
    
    withmooc = mydata
    
    attach(withmooc) # mydata를 기본 데이터 세트로 지정.
    
    withmooc
    R[write to console]: The following objects are masked from withmooc (pos = 3):
    
        gender, q1, q2, q3, q4, workshop
    
    
    ------ ------ ------ ------ ------ ------
    
    
    R[write to console]: The following objects are masked from withmooc (pos = 9):
    
        gender, q1, q2, q3, q4, workshop
        

     

    Results
      workshop gender q1 q2 q3 q4
    1        1      f  1  1  5  1
    2        2      f  2  1  4  1
    3        1      f  2  2  4  3
    4        2      f  3  1 NA  3
    5        1      m  4  5  2  4
    6        2      m  5  4  5  5
    7        1      m  5  3  4  4
    8        2      m  4  5  5  5

     

     

    • 0과 1의 이진 변수 생성.
    • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
    • 문제 4를 맞힌 사람을 알수 있다.
    R Programming
    %%R
    
    withmooc %>%
      dplyr::mutate(q4Sagree = ifelse(q4 == 5, 1,0))

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree
    1        1      f  1  1  5  1        0
    2        2      f  2  1  4  1        0
    3        1      f  2  2  4  3        0
    4        2      f  3  1 NA  3        0
    5        1      m  4  5  2  4        0
    6        2      m  5  4  5  5        1
    7        1      m  5  3  4  4        0
    8        2      m  4  5  5  5        1

     

     

    • q4가 4 이상인 경우 1을 할당.
    R Programming
    %%R
    
    withmooc %>%
      dplyr::mutate(q4Sagree = ifelse(q4 >= 4, 1,0))

     

    Results
      workshop gender q1 q2 q3 q4 q4Sagree
    1        1      f  1  1  5  1        0
    2        2      f  2  1  4  1        0
    3        1      f  2  2  4  3        0
    4        2      f  3  1 NA  3        0
    5        1      m  4  5  2  4        1
    6        2      m  5  4  5  5        1
    7        1      m  5  3  4  4        1
    8        2      m  4  5  5  5        1

     

     

    • 복잡한 로직 예제.
    • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
    R Programming
    %%R
    
    withmooc %>%
      dplyr::mutate(workshop1agree = ifelse((workshop == 1 & q4 >=4 ) , 1,0))

     

    Results
      workshop gender q1 q2 q3 q4 workshop1agree
    1        1      f  1  1  5  1              0
    2        2      f  2  1  4  1              0
    3        1      f  2  2  4  3              0
    4        2      f  3  1 NA  3              0
    5        1      m  4  5  2  4              1
    6        2      m  5  4  5  5              0
    7        1      m  5  3  4  4              1
    8        2      m  4  5  5  5              0

     

     

    • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
    • 남성이 참이면, 여성은 거짓이다.
    R Programming
    %%R
    
    withmooc %>%
      dplyr::mutate(score = ifelse( gender=="f" ,
                                    ((2*q1)+q2),  # 여성에 대하여.
                                    ((3*q1)+q2) ) # 남성에 대하여.
                    )

     

    Results
      workshop gender q1 q2 q3 q4 score
    1        1      f  1  1  5  1     3
    2        2      f  2  1  4  1     5
    3        1      f  2  2  4  3     6
    4        2      f  3  1 NA  3     7
    5        1      m  4  5  2  4    17
    6        2      m  5  4  5  5    19
    7        1      m  5  3  4  4    18
    8        2      m  4  5  5  5    17

     

     

    • 남성과 여성에 대하여 다른 공식을 할당.
    • 성별이 결측치 인경우 , 결과 역시 NA로 할당.
    R Programming
    %%R
    
    withmooc %>%
      dplyr::mutate(score = ifelse( gender=="f" ,
                                    ((2*q1)+q2),  # 여성에 대하여.
                                    ifelse(gender == "m",
                                           ((3*q1)+q2), NA)
                                   ) 
                   )

     

    Results
      workshop gender q1 q2 q3 q4 score
    1        1      f  1  1  5  1     3
    2        2      f  2  1  4  1     5
    3        1      f  2  2  4  3     6
    4        2      f  3  1 NA  3     7
    5        1      m  4  5  2  4    17
    6        2      m  5  4  5  5    19
    7        1      m  5  3  4  4    18
    8        2      m  4  5  5  5    17

     

     

     

    R Programming
    %%R
    
    withmooc %>%
      dplyr::mutate(score =  ifelse( gender=="f" ,
                                     (withmooc$score1 <- 2*q1+q2),
                                      ifelse( withmooc$gender=="m",
                                              (withmooc$score1 <- 3*q1+q2), NA )                        
                          )
                   )

     

    Results
      workshop gender q1 q2 q3 q4 score
    1        1      f  1  1  5  1     3
    2        2      f  2  1  4  1     5
    3        1      f  2  2  4  3     6
    4        2      f  3  1 NA  3     7
    5        1      m  4  5  2  4    17
    6        2      m  5  4  5  5    19
    7        1      m  5  3  4  4    18
    8        2      m  4  5  5  5    17

     

     

    • 괄호안에서 더 복잡한 로직을 이용한 조건문 공식.
    • 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
    R Programming
    %%R
    
    withmooc %>%
      dplyr::mutate(score2 = ifelse( (workshop==1 & q4>=4) ,
                                     ((2*q1)+q2),
                                     ((3*q1)+q2) )
                   )

     

    Results
      workshop gender q1 q2 q3 q4 score2
    1        1      f  1  1  5  1      4
    2        2      f  2  1  4  1      7
    3        1      f  2  2  4  3      8
    4        2      f  3  1 NA  3     10
    5        1      m  4  5  2  4     13
    6        2      m  5  4  5  5     19
    7        1      m  5  3  4  4     13
    8        2      m  4  5  5  5     17

     


    6. Python - Pandas

     

    Python Programming
    import pandas as pd
    import numpy as np
    import sweetviz as sv
    
    mydata   = pd.read_csv("C:/work/data/mydata.csv",sep=",",
                           dtype={'id':object,'workshop':object,
                                  'q1':int, 'q2':int, 'q3':float, 'q4':int},
                           na_values=['NaN'],skipinitialspace =True)
    
    withmooc= mydata.copy()
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	1		f	1	1	5.0	1
    1	2	2		f	2	1	4.0	1
    2	3	1		f	2	2	4.0	3
    3	4	2		f	3	1	NaN	3
    4	5	1		m	4	5	2.0	4
    5	6	2		m	5	4	5.0	5
    6	7	1		m	5	3	4.0	4
    7	8	2		m	4	5	5.0	5

     

    • np.where 사용
    • 0과 1의 이진 변수 생성.
    • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
    • 문제 4를 맞힌 사람을 알수 있다.
    Python Programming
    withmooc['q4Sagree'] = np.where(withmooc['q4'] == 5, 1, 0)
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree
    0	1	1		f	1	1	5.0	1	0
    1	2	2		f	2	1	4.0	1	0
    2	3	1		f	2	2	4.0	3	0
    3	4	2		f	3	1	NaN	3	0
    4	5	1		m	4	5	2.0	4	0
    5	6	2		m	5	4	5.0	5	1
    6	7	1		m	5	3	4.0	4	0
    7	8	2		m	4	5	5.0	5	1

     

    • 함수 사용 - 리스트 비교
    Python Programming
    def if_this_else_that(x, list_of_checks, yes_label, no_label):
        if x in list_of_checks:
            res = 1
        else: 
            res = 0
        return(res)
    
    withmooc['new_rating'] = withmooc['q4'].apply(lambda x: if_this_else_that(x, [5], 1, 0))
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating
    0	1	1		f	1	1	5.0	1	0	0
    1	2	2		f	2	1	4.0	1	0	0
    2	3	1		f	2	2	4.0	3	0	0
    3	4	2		f	3	1	NaN	3	0	0
    4	5	1		m	4	5	2.0	4	0	0
    5	6	2		m	5	4	5.0	5	1	1
    6	7	1		m	5	3	4.0	4	0	0
    7	8	2		m	4	5	5.0	5	1	1

     

    • 함수 사용 - 상수(5) 비교
    Python Programming
    def if_this_else_that(x, list_of_checks, yes_label, no_label):
        if x ==  list_of_checks:
            res = 1
        else: 
            res = 0
        return(res)
    
    withmooc['new_rating_1'] = withmooc['q4'].apply(lambda x: if_this_else_that(x, 5, 1, 0))
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1
    0	1	1		f	1	1	5.0	1	0	0	0
    1	2	2		f	2	1	4.0	1	0	0	0
    2	3	1		f	2	2	4.0	3	0	0	0
    3	4	2		f	3	1	NaN	3	0	0	0
    4	5	1		m	4	5	2.0	4	0	0	0
    5	6	2		m	5	4	5.0	5	1	1	1
    6	7	1		m	5	3	4.0	4	0	0	0
    7	8	2		m	4	5	5.0	5	1	1	1

     

     

     

     
    • 값과 조건을 비교합니다.
    • 그 결과를 새로운 변수에 할당합니다.
    Python Programming
    withmooc['new_rating_2'] = (withmooc['q4'] == 5)
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2
    0	1	1		f	1	1	5.0	1	0		0		0		False
    1	2	2		f	2	1	4.0	1	0		0		0		False
    2	3	1		f	2	2	4.0	3	0		0		0		False
    3	4	2		f	3	1	NaN	3	0		0		0		False
    4	5	1		m	4	5	2.0	4	0		0		0		False
    5	6	2		m	5	4	5.0	5	1		1		1		True
    6	7	1		m	5	3	4.0	4	0		0		0		False
    7	8	2		m	4	5	5.0	5	1		1		1		True

     

    Python Programming
    # (1) IF condition – Set of numbers
    withmooc.loc[mydata['q4'] == 5, 'new_rating_3'] = 1
    withmooc.loc[mydata['q4'] != 5, 'new_rating_3'] = 0
    
    # (2) IF condition – set of numbers and lambda
    withmooc['new_rating_4'] = withmooc['q4'].apply(lambda x: 1 if x == 5 else 0)
    
    # (3)  IF condition with OR
    withmooc.loc[(mydata['q4'] == 5) | (withmooc['q4'] == 4), 'name_match'] = 'Match'  
    withmooc.loc[(mydata['q4'] != 5) & (withmooc['q4'] != 4), 'name_match'] = 'Mismatch' 
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch
    1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch
      3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match

     


    • q4가 4 이상인 경우 1을 할당.
    Python Programming
    withmooc['q4agree'] = np.where(withmooc['q4'] >= 4, 1, 0)
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0
    1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0
    3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1

     


    • 복잡한 로직 예제.
    • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
    Python Programming
    withmooc['workshop1agree'] = np.where( ( (withmooc['workshop'] == '1') & (mydata['q4'] >= 4) ), 1, 0)
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0
    1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0
    3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0

     

    • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
    • 남성이 참이면, 여성은 거짓이다.
    Python Programming
    withmooc['score'] = np.where( withmooc['gender'] == 'f', (2*mydata['q1'])+mydata['q2'] , 
                                                             (3*mydata['q1'])+mydata['q2'] )
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3
      1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6
    3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17

     

    • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할수 있다.
    • 남성이 참이면, 여성은 거짓이다.
    Python Programming
    withmooc['score'] = np.where( withmooc['gender'] == 'f', (2*mydata['q1'])+withmooc['q2'] , 
                                                              np.where( withmooc['gender'] == 'm',(3*withmooc['q1'])+withmooc['q2'],0) )
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3
    1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6
    3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17

     

    Python Programming
    # mydata['score_2'] = mydata['gender','q1'].apply(lambda x ,y : 99 if (x + y) == 1 else ( 3) )
    
    withmooc['score_2'] = withmooc.apply(lambda x: 2*x['q1'] + x['q2'] if x['gender'] == 'f' else \
                                                   (3*x['q1'] + x['q2'] if x['gender'] == 'm' else 0), axis=1)
    
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score	score_2
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3	3
    1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5	5
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6	6
    3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7	7
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17	17
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19	19
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18	18
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17	17
     
     
    Python Programming
    def func_avg_100(x):
        if x.gender == 'f':
            return ( 2 * x.q1 + x.q2)
        else:
            return ( 3 * x.q1 + x.q2 )
    
    withmooc['score_3'] = withmooc.apply(lambda x:func_avg_100(x),axis=1)
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score	score_2	score_3
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3	3	3
    1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5	5	5
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6	6	6
    3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7	7	7
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17	17	17
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19	19	19
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18	18	18
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17	17	17

     

    • 괄혼안에서 더 복잡한 로직을 이용한 조건문 공식.
    • 만약 workshop 이나 q4가 결측치이면, 두 번째 공신을 적용.
    Python Programming
    def func_logic(x):
        if (x.workshop == '1' and x.q4 >= 4) :
            return ( 2 * x.q1 + x.q2)
        else:
            return ( 3 * x.q1 + x.q2 )
    
    withmooc['score_4'] = withmooc.apply(lambda x:func_logic(x),axis=1)
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree	new_rating	new_rating_1	new_rating_2	new_rating_3	new_rating_4	name_match	q4agree	workshop1agree	score	score_2	score_3	score_4
    0	1	1		f	1	1	5.0	1	0		0		0		False		0.0		0		Mismatch	0	0		3	3	3	4
    1	2	2		f	2	1	4.0	1	0		0		0		False		0.0		0		Mismatch	0	0		5	5	5	7
    2	3	1		f	2	2	4.0	3	0		0		0		False		0.0		0		Mismatch	0	0		6	6	6	8
      3	4	2		f	3	1	NaN	3	0		0		0		False		0.0		0		Mismatch	0	0		7	7	7	10
    4	5	1		m	4	5	2.0	4	0		0		0		False		0.0		0		Match		1	1		17	17	17	13
    5	6	2		m	5	4	5.0	5	1		1		1		True		1.0		1		Match		1	0		19	19	19	19
    6	7	1		m	5	3	4.0	4	0		0		0		False		0.0		0		Match		1	1		18	18	18	13
    7	8	2		m	4	5	5.0	5	1		1		1		True		1.0		1		Match		1	0		17	17	17	17

     


    7. Python - dfply

     

    Python Programming
    import pandas as pd
    from dfply import *
    
    mydata   = pd.read_csv("C:/work/data/mydata.csv",sep=",",
                                                dtype={'id':object,'workshop':object,
                                                       'q1':int, 'q2':int, 'q3':float, 'q4':int},
                                                na_values=['NaN'],skipinitialspace =True)
    
    withmooc= mydata.copy()
    
    # 모든 변수 선택하기.
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	1		f	1	1	5.0	1
    1	2	2		f	2	1	4.0	1
    2	3	1		f	2	2	4.0	3
    3	4	2		f	3	1	NaN	3
    4	5	1		m	4	5	2.0	4
    5	6	2		m	5	4	5.0	5
    6	7	1		m	5	3	4.0	4
    7	8	2		m	4	5	5.0	5

     

    • 0과 1의 이진 변수 생성.
    • 새로운 변수 q4SAgree는 만약 q4가 5라면 1의 값을 그 외에는 0의 값을 넣는다.
    • 문제 4를 맞힌 사람을 알수 있다.
    Python Programming
    withmooc >> mutate(q4Sagree=(X.q4 == 5).replace({True: 1, False: 0}))
    
    withmooc >> mutate(q4Sagree=make_symbolic(np.where)(X.q4 == 5, 1,0))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree
    0	1	1		f	1	1	5.0	1	0
    1	2	2		f	2	1	4.0	1	0
    2	3	1		f	2	2	4.0	3	0
    3	4	2		f	3	1	NaN	3	0
    4	5	1		m	4	5	2.0	4	0
    5	6	2		m	5	4	5.0	5	1
    6	7	1		m	5	3	4.0	4	0
    7	8	2		m	4	5	5.0	5	1

     

    • q4가 4 이상인 경우 1을 할당.
    Python Programming
    withmooc >> mutate(q4Sagree=(X.q4 >= 4).replace({True: 1, False: 0}))
    
    withmooc >> mutate(q4Sagree=make_symbolic(np.where)(X.q4 >= 4, 1,0))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree
    0	1	1		f	1	1	5.0	1	0
    1	2	2		f	2	1	4.0	1	0
    2	3	1		f	2	2	4.0	3	0
    3	4	2		f	3	1	NaN	3	0
    4	5	1		m	4	5	2.0	4	1
    5	6	2		m	5	4	5.0	5	1
    6	7	1		m	5	3	4.0	4	1
    7	8	2		m	4	5	5.0	5	1

     

    Python Programming
    withmooc >> mutate(q4Sagree=case_when([X.q4 >= 4, 1], [True, 0]
                                        ))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q4Sagree
    0	1	1		f	1	1	5.0	1	0.0
    1	2	2		f	2	1	4.0	1	0.0
    2	3	1		f	2	2	4.0	3	0.0
    3	4	2		f	3	1	NaN	3	0.0
    4	5	1		m	4	5	2.0	4	1.0
    5	6	2		m	5	4	5.0	5	1.0
    6	7	1		m	5	3	4.0	4	1.0
    7	8	2		m	4	5	5.0	5	1.0

     

    • 좀더 복잡한 로직 예제.
    • workshop이 1이고, q4가 4인 이상인 변수에 1을 할당.
    Python Programming
    withmooc >> mutate(workshop1agree=( (X.workshop == '1') & (X.q4 >= 4)  ).replace({True: 1, False: 0}))
    
    withmooc >> mutate(workshop1agree=case_when([(X.workshop == '1') & (X.q4 >= 4), 1], [True, 0]))
    
    withmooc >> mutate(workshop1agree=make_symbolic(np.where)( (X.workshop == '1') & (X.q4 >= 4) , 1,0))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	workshop1agree
    0	1	1		f	1	1	5.0	1	0
    1	2	2		f	2	1	4.0	1	0
    2	3	1		f	2	2	4.0	3	0
    3	4	2		f	3	1	NaN	3	0
    4	5	1		m	4	5	2.0	4	1
    5	6	2		m	5	4	5.0	5	0
    6	7	1		m	5	3	4.0	4	1
    7	8	2		m	4	5	5.0	5	0

     

    • 조건문에 의한 변화는 남성과 여성에 대해 다른 변환을 적용할 수 있다.
    • 남성이 참이면, 여성은 거짓이다.
    Python Programming
    withmooc >> mutate(score=make_symbolic(np.where)( (X.gender=="f") ,((2*X.q1)+X.q2),((3*X.q1)+X.q2) ))
    
    withmooc >> mutate(score=case_when([(X.gender=="f"), ((2*X.q1)+X.q2)], [True, ((3*X.q1)+X.q2)]))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	score
    0	1	1		f	1	1	5.0	1	3.0
    1	2	2		f	2	1	4.0	1	5.0
    2	3	1		f	2	2	4.0	3	6.0
    3	4	2		f	3	1	NaN	3	7.0
    4	5	1		m	4	5	2.0	4	17.0
    5	6	2		m	5	4	5.0	5	19.0
    6	7	1		m	5	3	4.0	4	18.0
    7	8	2		m	4	5	5.0	5	17.0

     

    • 남성과 여성에 대하여 다른 공식을 할당.
    • 성별이 결측치 인경우 , 결과 역시 NA로 할당.
    Python Programming
    withmooc >> mutate(score=make_symbolic(np.where)( (X.gender=="f") ,((2*X.q1)+X.q2),
                                                       make_symbolic(np.where)( (X.gender=="m") ,((3*X.q1)+X.q2),0 )
                                                  ))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	score
    0	1	1		f	1	1	5.0	1	3
    1	2	2		f	2	1	4.0	1	5
    2	3	1		f	2	2	4.0	3	6
    3	4	2		f	3	1	NaN	3	7
    4	5	1		m	4	5	2.0	4	17
    5	6	2		m	5	4	5.0	5	19
    6	7	1		m	5	3	4.0	4	18
    7	8	2		m	4	5	5.0	5	17

     

     


     

    통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply)
    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크
    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]
    SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크
    반응형

    댓글