본문 바로가기
통계프로그램 비교 시리즈/데이터 전처리 비교

14. 변수 특성에 따른 통계량 일괄 처리 & Value Labels Or Formats(& Measurement Level))

by 기서무나구물 2022. 1. 17.

포스팅 목차

    14. 변수 특성에 따른 통계량 일괄 처리 & Value Labels Or Formats(& Measurement Level))

     


    1. Proc SQL

    • SAS Program to Assign Value Labels (formats)
    SAS Programming
    options linesize=150;
    
    * SAS Program to Assign Value Labels (formats);
    
    PROC FORMAT;
         VALUE workshop_f 1="Control" 2="Treatment";
         VALUE $gender_f "m"="Male" "f"="Female";
         VALUE agreement 1='Strongly Disagree'
                         2='Disagree'
                         3='Neutral'
                         4='Agree'
                         5='Strongly Agree'.;
    
    run;
    
    
    
    proc sql;
      select id,
             workshop format=workshop_f.,
             gender   format=$gender_f.  ,
             q1       format=agreement. ,
             q2       format=agreement. ,
             q3       format=agreement. ,
             q4       format=agreement.
      from   BACK.mydata;
    
    quit;

     

    Results
    id   workshop  gender                 q1                 q2                 q3                 q4
    -------------------------------------------------------------------------------------------------
     1  Control    Female  Strongly Disagree  Strongly Disagree  Strongly Agree.    Strongly Disagree
     2  Treatment  Female  Disagree           Strongly Disagree  Agree              Strongly Disagree
     3  Control    Female  Disagree           Disagree           Agree              Neutral
     4  Treatment  Female  Neutral            Strongly Disagree                  .  Neutral
     5  Control    Male    Agree              Strongly Agree.    Disagree           Agree
     6  Treatment  Male    Strongly Agree.    Agree              Strongly Agree.    Strongly Agree.
     7  Control    Male    Strongly Agree.    Neutral            Agree              Agree
     8  Treatment  Male    Agree              Strongly Agree.    Strongly Agree.    Strongly Agree.

     

     


    2. SAS Programming

    • 값 라벨(포맷)을 할당하기 위한 SAS프로그램;
    SAS Programming
    PROC FORMAT;
         VALUE workshop_f 1="Control" 2="Treatment";
         VALUE $gender_f "m"="Male" "f"="Female";
         VALUE agreement 1='Strongly Disagree'
                         2='Disagree'
                         3='Neutral'
                         4='Agree'
                         5='Strongly Agree'.;
    run;
    
    
    
    DATA withmooc;
     SET BACK.mydata;
         FORMAT workshop workshop_f. gender gender_f.
                q1-q4 agreement.;
    run;
    
    proc print;run;

     

    Results
    OBS id workshop  gender q1                q2                       q3         q4
     1   1 Control   Female Strongly Disagree Strongly Disagree Strongly Agree.   Strongly Disagree
     2   2 Treatment Female Disagree          Strongly Disagree Agree             Strongly Disagree
     3   3 Control   Female Disagree          Disagree          Agree             Neutral
     4   4 Treatment Female Neutral           Strongly Disagree                 . Neutral
     5   5 Control   Male   Agree             Strongly Agree.   Disagree          Agree
     6   6 Treatment Male   Strongly Agree.   Agree             Strongly Agree.   Strongly Agree.
     7   7 Control   Male   Strongly Agree.   Neutral           Agree             Agree
     8   8 Treatment Male   Agree             Strongly Agree.   Strongly Agree.   Strongly Agree.

     

     


    3. SPSS

    • 값 라벨을 할당하기 위한 SPSS 프로그램.
    SPSS Programming
    GET FILE="c:\mydata.sav".
    
    VARIABLE LEVEL workshop (NOMINAL)
     /q1 TO q4 (SCALE).
    
    VALUE LABELS  workshop 1 'Control'  2 'Treatment'
     /q1 TO q4
     1 'Strongly Disagree'
     2 'Disagree'
     3 'Neutral'
     4 'Agree'
     5 'Strongly Agree'.
    
    SAVE OUTFILE="C:\mydata.sav".

     

     


    4. R Programming (R-PROJECT)

     

    R Programming
    from rpy2.robjects import r
    %load_ext rpy2.ipython

     

    Results
    The rpy2.ipython extension is already loaded. To reload it, use:
      %reload_ext rpy2.ipython

     

     

    R Programming
    %%R
    
    options(width = 200)
    
    library(tidyverse)
    library(psych)
    library(Hmisc)
    
    mydata <- read_csv("C:/work/data/mydata.csv", 
      col_types = cols( id       = col_double(),
                        workshop = col_character(),
                        gender   = col_character(),
                        q1       = col_double(),
                        q2       = col_double(),
                        q3       = col_double(),
                        q4       = col_double()
      )
    )
    
    withmooc = mydata
    
    attach(withmooc) # mydata를 기본 데이터 세트로 지정.
    
    withmooc

     

    Results
    R[write to console]: -- Attaching packages ------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
    
    From cffi callback :
    Traceback (most recent call last):
      File "C:\Users\BACK\anaconda3\lib\site-packages\rpy2\rinterface_lib\callbacks.py", line 131, in _consolewrite_ex
    
    ====================================================
    
        R[write to console]: The following object is masked from 'package:psych':
    
        describe
    
    
    R[write to console]: The following objects are masked from 'package:dplyr':
    
       src, summarize
    
    
    R[write to console]: The following objects are masked from 'package:base':
    
        format.pval, units

     

    Results
    # A tibble: 8 x 7
         id workshop gender    q1    q2    q3    q4
      <dbl> <chr>    <chr>  <dbl> <dbl> <dbl> <dbl>
    1     1 1        f          1     1     5     1
    2     2 2        f          2     1     4     1
    3     3 1        f          2     2     4     3
    4     4 2        f          3     1    NA     3
    5     5 1        m          4     5     2     4
    6     6 2        m          5     4     5     5
    7     7 1        m          5     3     4     4
    8     8 2        m          4     5     5     5

     

     

    • 값 라벨과 Factor 상태를 할당하기 위한 R-Project 프로그램.
    • 기본적으로, Group은 수치형으로 읽히고, Gender는 Factor로써 읽힌다.
    • Gender가 문자 이기 때문이다.
    • 기본적으로, Summary는 Group을 수치형으로 취급하지만, Gender는 Factor로 가정하고, 그것의 레벨을 카운트한다.
    R Programming
    %%R
    
    base::summary(withmooc)

     

    Results
           id         workshop            gender                q1             q2             q3              q4      
     Min.   :1.00   Length:8           Length:8           Min.   :1.00   Min.   :1.00   Min.   :2.000   Min.   :1.00  
     1st Qu.:2.75   Class :character   Class :character   1st Qu.:2.00   1st Qu.:1.00   1st Qu.:4.000   1st Qu.:2.50  
     Median :4.50   Mode  :character   Mode  :character   Median :3.50   Median :2.50   Median :4.000   Median :3.50  
     Mean   :4.50                                         Mean   :3.25   Mean   :2.75   Mean   :4.143   Mean   :3.25  
     3rd Qu.:6.25                                         3rd Qu.:4.25   3rd Qu.:4.25   3rd Qu.:5.000   3rd Qu.:4.25  
     Max.   :8.00                                         Max.   :5.00   Max.   :5.00   Max.   :5.000   Max.   :5.00  
                                                                                        NA's   :1                     

     

     

    R Programming
    %%R
    
    dlookr::diagnose_numeric(mydata)

     

    Results
    R[write to console]: Registered S3 method overwritten by 'quantmod':
      method            from
      as.zoo.data.frame zoo 
    
    
    
    # A tibble: 5 x 10
      variables   min    Q1  mean median    Q3   max  zero minus outlier
      <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <int> <int>   <int>
    1 id            1  2.75  4.5     4.5  6.25     8     0     0       0
    2 q1            1  2     3.25    3.5  4.25     5     0     0       0
    3 q2            1  1     2.75    2.5  4.25     5     0     0       0
    4 q3            2  4     4.14    4    5        5     0     0       1
    5 q4            1  2.5   3.25    3.5  4.25     5     0     0       0

     

     

    R Programming
    %%R
    
    withmooc %>%
      dlookr::diagnose_numeric()

     

    Results
    # A tibble: 5 x 10
      variables   min    Q1  mean median    Q3   max  zero minus outlier
      <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <int> <int>   <int>
    1 id            1  2.75  4.5     4.5  6.25     8     0     0       0
    2 q1            1  2     3.25    3.5  4.25     5     0     0       0
    3 q2            1  1     2.75    2.5  4.25     5     0     0       0
    4 q3            2  4     4.14    4    5        5     0     0       1
    5 q4            1  2.5   3.25    3.5  4.25     5     0     0       0

     

     

    R Programming
    %%R
    
    withmooc %>% 
      dlookr::describe() %>%
      as.data.frame()

     

    Results
      variable n na     mean       sd   se_mean  IQR   skewness  kurtosis p00  p01  p05 p10 p20  p25
    1       id 8  0 4.500000 2.449490 0.8660254 3.50  0.0000000 -1.200000   1 1.07 1.35 1.7 2.4 2.75
    2       q1 8  0 3.250000 1.488048 0.5261043 2.25 -0.2167811 -1.410198   1 1.07 1.35 1.7 2.0 2.00
    3       q2 8  0 2.750000 1.752549 0.6196197 3.25  0.2919336 -1.914116   1 1.00 1.00 1.0 1.0 1.00
    4       q3 7  1 4.142857 1.069045 0.4040610 1.00 -1.5200483  2.712500   2 2.12 2.60 3.2 4.0 4.00
    5       q4 8  0 3.250000 1.581139 0.5590170 1.75 -0.5421047 -1.024000   1 1.00 1.00 1.0 1.8 2.50
      p30 p40 p50 p60 p70  p75 p80 p90  p95  p99 p100
    1 3.1 3.8 4.5 5.2 5.9 6.25 6.6 7.3 7.65 7.93    8
    2 2.1 2.8 3.5 4.0 4.0 4.25 4.6 5.0 5.00 5.00    5
    3 1.1 1.8 2.5 3.2 3.9 4.25 4.6 5.0 5.00 5.00    5
    4 4.0 4.0 4.0 4.6 5.0 5.00 5.0 5.0 5.00 5.00    5
    5 3.0 3.0 3.5 4.0 4.0 4.25 4.6 5.0 5.00 5.00    5

     

     

    R Programming
    %%R
    
    withmooc %>%
      purrr::keep(.p = is.numeric) %>% # 숫자형 데이터만 남기기
      dlookr::describe()

     

    Results
    # A tibble: 5 x 26
      variable     n    na  mean    sd se_mean   IQR skewness kurtosis   p00   p01   p05   p10   p20
      <chr>    <int> <int> <dbl> <dbl>   <dbl> <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 id           8     0  4.5   2.45   0.866  3.5     0        -1.2      1  1.07  1.35   1.7   2.4
    2 q1           8     0  3.25  1.49   0.526  2.25   -0.217    -1.41     1  1.07  1.35   1.7   2  
    3 q2           8     0  2.75  1.75   0.620  3.25    0.292    -1.91     1  1     1      1     1  
    4 q3           7     1  4.14  1.07   0.404  1      -1.52      2.71     2  2.12  2.6    3.2   4  
    5 q4           8     0  3.25  1.58   0.559  1.75   -0.542    -1.02     1  1     1      1     1.8
    # ... with 12 more variables: p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
    #   p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>

     

     

    R Programming
    %%R
    
    withmooc %>%
      purrr::keep(.p = is.numeric) %>% # 숫자형 데이터만 남기기
      dlookr::describe() %>%
      as.data.frame()

     

    Results
      variable n na     mean       sd   se_mean  IQR   skewness  kurtosis p00  p01  p05 p10 p20  p25
    1       id 8  0 4.500000 2.449490 0.8660254 3.50  0.0000000 -1.200000   1 1.07 1.35 1.7 2.4 2.75
    2       q1 8  0 3.250000 1.488048 0.5261043 2.25 -0.2167811 -1.410198   1 1.07 1.35 1.7 2.0 2.00
    3       q2 8  0 2.750000 1.752549 0.6196197 3.25  0.2919336 -1.914116   1 1.00 1.00 1.0 1.0 1.00
    4       q3 7  1 4.142857 1.069045 0.4040610 1.00 -1.5200483  2.712500   2 2.12 2.60 3.2 4.0 4.00
    5       q4 8  0 3.250000 1.581139 0.5590170 1.75 -0.5421047 -1.024000   1 1.00 1.00 1.0 1.8 2.50
      p30 p40 p50 p60 p70  p75 p80 p90  p95  p99 p100
    1 3.1 3.8 4.5 5.2 5.9 6.25 6.6 7.3 7.65 7.93    8
    2 2.1 2.8 3.5 4.0 4.0 4.25 4.6 5.0 5.00 5.00    5
    3 1.1 1.8 2.5 3.2 3.9 4.25 4.6 5.0 5.00 5.00    5
    4 4.0 4.0 4.0 4.6 5.0 5.00 5.0 5.0 5.00 5.00    5
    5 3.0 3.0 3.5 4.0 4.0 4.25 4.6 5.0 5.00 5.00    5

     

     

    • Workshop변수를 Factor로 변경.
    R Programming
    %%R
    
    withmooc$workshop <- factor( withmooc$workshop,
                                 levels=c(1,2,3,4),
                                 labels=c("R","SAS","SPSS","Stata") )
    
    withmooc

     

    Results
    # A tibble: 8 x 7
         id workshop gender    q1    q2    q3    q4
      <dbl> <fct>    <chr>  <dbl> <dbl> <dbl> <dbl>
    1     1 R        f          1     1     5     1
    2     2 SAS      f          2     1     4     1
    3     3 R        f          2     2     4     3
    4     4 SAS      f          3     1    NA     3
    5     5 R        m          4     5     2     4
    6     6 SAS      m          5     4     5     5
    7     7 R        m          5     3     4     4
    8     8 SAS      m          4     5     5     5

     

     

    • Summary함수는 workshop변수의 출현 횟수를 카운트한다.
    • 현재의 workshop의 평균은 잘못된 기록이다.
    R Programming
    %%R
    
    summary(withmooc)

     

    Results
           id        workshop    gender                q1             q2             q3              q4      
     Min.   :1.00   R    :4   Length:8           Min.   :1.00   Min.   :1.00   Min.   :2.000   Min.   :1.00  
     1st Qu.:2.75   SAS  :4   Class :character   1st Qu.:2.00   1st Qu.:1.00   1st Qu.:4.000   1st Qu.:2.50  
     Median :4.50   SPSS :0   Mode  :character   Median :3.50   Median :2.50   Median :4.000   Median :3.50  
     Mean   :4.50   Stata:0                      Mean   :3.25   Mean   :2.75   Mean   :4.143   Mean   :3.25  
     3rd Qu.:6.25                                3rd Qu.:4.25   3rd Qu.:4.25   3rd Qu.:5.000   3rd Qu.:4.25  
     Max.   :8.00                                Max.   :5.00   Max.   :5.00   Max.   :5.000   Max.   :5.00  
                                                                               NA's   :1                     

     

     

    • Hmisc 패키지에서 Describe함수를 이용.
    • Summary함수와 틀리게, Describe함수는 q변수의 빈도와 평균, 백분율을 계산한다.
    • Describe함수를 사용하기 위해서 Hmisc 라이브러리를 인스톨해야 한다.
    R Programming
    %%R
    
    Hmisc::describe(withmooc)

     

    Results
    withmooc 
    
     7  Variables      8  Observations
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    id 
           n  missing distinct     Info     Mean      Gmd 
           8        0        8        1      4.5        3 
    
    lowest : 1 2 3 4 5, highest: 4 5 6 7 8
    
    Value          1     2     3     4     5     6     7     8
    Frequency      1     1     1     1     1     1     1     1
    Proportion 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    workshop 
           n  missing distinct 
           8        0        2 
    
    Value        R SAS
    Frequency    4   4
    Proportion 0.5 0.5
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    gender 
           n  missing distinct 
           8        0        2 
    
    Value        f   m
    Frequency    4   4
    Proportion 0.5 0.5
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    q1 
           n  missing distinct     Info     Mean      Gmd 
           8        0        5    0.964     3.25    1.786 
    
    lowest : 1 2 3 4 5, highest: 1 2 3 4 5
    
    Value          1     2     3     4     5
    Frequency      1     2     1     2     2
    Proportion 0.125 0.250 0.125 0.250 0.250
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    q2 
           n  missing distinct     Info     Mean      Gmd 
           8        0        5     0.94     2.75    2.071 
    
    lowest : 1 2 3 4 5, highest: 1 2 3 4 5
    
    Value          1     2     3     4     5
    Frequency      3     1     1     1     2
    Proportion 0.375 0.125 0.125 0.125 0.250
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    q3 
           n  missing distinct     Info     Mean      Gmd 
           7        1        3    0.857    4.143    1.143 
    
    Value          2     4     5
    Frequency      1     3     3
    Proportion 0.143 0.429 0.429
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    q4 
           n  missing distinct     Info     Mean      Gmd 
           8        0        4    0.952     3.25    1.857 
    
    Value         1    3    4    5
    Frequency     2    2    2    2
    Proportion 0.25 0.25 0.25 0.25
    ------------------------------------------------------------------------------------------------------------------------------------------------------

     

     

    R Programming
    %%R
    
    describeData(withmooc)

     

    Results
    n.obs =  8 of which  7   are complete cases.   Number of variables =  7  of which all are numeric  FALSE  
              variable # n.obs type H1  H2 H3   H4 T1  T2 T3  T4
    id*                1     8    4  1   2  3    4  5   6  7   8
    workshop*          2     8    4  R SAS  R  SAS  R SAS  R SAS
    gender*            3     8    4  f   f  f    f  m   m  m   m
    q1*                4     8    4  1   2  2    3  4   5  5   4
    q2*                5     8    4  1   1  2    1  5   4  3   5
    q3*                6     7    4  5   4  4 <NA>  2   5  4   5
    q4*                7     8    4  1   1  3    3  4   5  4   5

     

     

    • 어떻게 레벨이 값으로 매치되는지 확인.
    R Programming
    %%R
    
    unclass(withmooc$workshop)

     

    Results
    [1] 1 2 1 2 1 2 1 2
    attr(,"levels")
    [1] "R"     "SAS"   "SPSS"  "Stata"

     

     

    • m은 male로 f는 female로 순서를 변경하자.
    • 만약 값이 대문자이면, 실제적으로 결측값을 생성한다.
    R Programming
    %%R
    
    withmooc$genderF <- factor( withmooc$gender,
                                levels=c("m","f"),labels=c("male","female") )
    
    withmooc

     

    Results
    # A tibble: 8 x 8
         id workshop gender    q1    q2    q3    q4 genderF
      <dbl> <fct>    <chr>  <dbl> <dbl> <dbl> <dbl> <fct>  
    1     1 R        f          1     1     5     1 female 
    2     2 SAS      f          2     1     4     1 female 
    3     3 R        f          2     2     4     3 female 
    4     4 SAS      f          3     1    NA     3 female 
    5     5 R        m          4     5     2     4 male   
    6     6 SAS      m          5     4     5     5 male   
    7     7 R        m          5     3     4     4 male   
    8     8 SAS      m          4     5     5     5 male   

     

     

    • 매치된 결과를 확인하기 위해서 Gender와 Genderf를 출력.
    R Programming
    %%R
    
    withmooc[ ,c("gender","genderF")]

     

    Results
    # A tibble: 8 x 2
      gender genderF
      <chr>  <fct>  
    1 f      female 
    2 f      female 
    3 f      female 
    4 f      female 
    5 m      male   
    6 m      male   
    7 m      male   
    8 m      male   

     

     

    • 각각의 기초되는 값을 추출.
    • genderNums는 변수 값의 알파벳 순서가 할당된다.
    • genderFNums은 위에서 factor함수의 levels의 순서에 따라서 m이 2, f가 1이 할당된다.
    R Programming
    %%R
    
    withmooc$genderNums  <- as.numeric(withmooc$gender)
    
    withmooc$genderFNums <- as.numeric(withmooc$genderF)
    
    withmooc

     

    Results
    # A tibble: 8 x 10
         id workshop gender    q1    q2    q3    q4 genderF genderNums genderFNums
      <dbl> <fct>    <chr>  <dbl> <dbl> <dbl> <dbl> <fct>        <dbl>       <dbl>
    1     1 R        f          1     1     5     1 female          NA           2
    2     2 SAS      f          2     1     4     1 female          NA           2
    3     3 R        f          2     2     4     3 female          NA           2
    4     4 SAS      f          3     1    NA     3 female          NA           2
    5     5 R        m          4     5     2     4 male            NA           1
    6     6 SAS      m          5     4     5     5 male            NA           1
    7     7 R        m          5     3     4     4 male            NA           1
    8     8 SAS      m          4     5     5     5 male            NA           1

     

     

    • Factor로 이용하기 위해 q변수의 복사본을 생성하고, 그것을 카운트할 수 있다.
    • 반복하여 사용하기 위해 라벨을 저장.
    R Programming
    %%R
    
    myQlevels <- c(1,2,3,4,5)
    
    myQlabels <- c("Strongly Disagree",
                   "Disagree",
                   "Neutral",
                   "Agree",
                   "Strongly Agree")

     

     

    • Factor함수를 이용하여 새로운 변수 세트를 생성.
    R Programming
    %%R
    
    withmooc$q1f <- factor(q1, myQlevels, myQlabels)
    
    withmooc$q2f <- factor(q2, myQlevels, myQlabels)
    
    withmooc$q3f <- factor(q3, myQlevels, myQlabels)
    
    withmooc$q4f <- factor(q4, myQlevels, myQlabels)
    
    as.data.frame(withmooc)

     

    Results
      id workshop gender q1 q2 q3 q4 genderF genderNums genderFNums               q1f               q2f            q3f               q4f
    1  1        R      f  1  1  5  1  female         NA           2 	Strongly Disagree	Strongly Disagree	Strongly Agree	Strongly Disagree
    2  2      SAS      f  2  1  4  1  female         NA           2         Disagree		Strongly Disagree	Agree		Strongly Disagree
    3  3        R      f  2  2  4  3  female         NA           2         Disagree		Disagree		Agree		Neutral
    4  4      SAS      f  3  1 NA  3  female         NA           2         Neutral 		Strongly Disagree	<NA>		Neutral
    5  5        R      m  4  5  2  4    male         NA           1         Agree    		Strongly Agree		Disagree	Agree
    6  6      SAS      m  5  4  5  5    male         NA           1    	Strongly Agree		Agree Strongly		Agree		Strongly Agree
    7  7        R      m  5  3  4  4    male         NA           1    	Strongly Agree		Neutral			Agree		Agree
    8  8      SAS      m  4  5  5  5    male         NA           1         Agree			Strongly Agree		Strongly Agree	Strongly Agree

     

     

    • Summary함수 결과.
    R Programming
    %%R
    
    summary( withmooc[ c("q1f","q2f","q3f","q4f") ] )

     

    Results
                    q1f                   q2f                   q3f                   q4f   
     Strongly Disagree:1   Strongly Disagree:3   Strongly Disagree:0   Strongly Disagree:2  
     Disagree         :2   Disagree         :1   Disagree         :1   Disagree         :0  
     Neutral          :1   Neutral          :1   Neutral          :0   Neutral          :2  
     Agree            :2   Agree            :1   Agree            :3   Agree            :2  
     Strongly Agree   :2   Strongly Agree   :2   Strongly Agree   :3   Strongly Agree   :2  
                                                 NA's             :1                        

     

     

    • Factor로 이용하기 위해서 q변수의 복사번을 생성. 만약 변수 수가 많다면, 자동적으로 쉽게 할 수 있는 방법.
    • Factor로써 이용하기 위해 q 변수의 복사본을 생성, 그 결과 그것들을 카운트할 수 있다.
    R Programming
    %%R
    
    myQlevels <- c(1,2,3,4,5)
    
    myQlabels <- c("Strongly Disagree",
                   "Disagree",
                   "Neutral",
                   "Agree",
                   "Strongly Agree")
    
    print(myQlevels)
    
    print(myQlabels)

     

    Results
    [1] 1 2 3 4 5
    [1] "Strongly Disagree" "Disagree"          "Neutral"           "Agree"             "Strongly Agree"   

     

     

    • 이용될 변수 이름의 두 개 세트를 생성.
    R Programming
    %%R
    
    myQnames  <- paste( "q",  1:4, sep="")
    
    myQFnames <- paste( "qf", 1:4, sep="")
    
    print(myQnames) # 원 변수명.
    
    print(myQFnames)  # 새로운 factor 변수의 이름.

     

    Results
    [1] "q1" "q2" "q3" "q4"
    [1] "qf1" "qf2" "qf3" "qf4"

     

     

    • 데이터 프레임을 분리하기 위해 q변수 추출.
    R Programming
    %%R
    
    myQFvars <- withmooc[ ,myQnames]
    
    print(myQFvars)

     

    Results
    # A tibble: 8 x 4
         q1    q2    q3    q4
      <dbl> <dbl> <dbl> <dbl>
    1     1     1     5     1
    2     2     1     4     1
    3     2     2     4     3
    4     3     1    NA     3
    5     4     5     2     4
    6     5     4     5     5
    7     5     3     4     4
    8     4     5     5     5

     

     

    • Factor에 대하여 F를 가진 모든 변수로 변수명을 변경.
    R Programming
    %%R
    
    names(myQFvars) <- myQFnames
    
    print(myQFvars)

     

    Results
    # A tibble: 8 x 4
        qf1   qf2   qf3   qf4
      <dbl> <dbl> <dbl> <dbl>
    1     1     1     5     1
    2     2     1     4     1
    3     2     2     4     3
    4     3     1    NA     3
    5     4     5     2     4
    6     5     4     5     5
    7     5     3     4     4
    8     4     5     5     5

     

     

    • 많은 변수의 라벨을 적용하기 위해 함수 생성.
    R Programming
    %%R
    
    myLabeler <- function(x) { factor(x, myQlevels, myQlabels) }

     

     

    • 한 변수가 함수로 어떻게 적용되는지 확인할 수 있다.
    R Programming
    %%R
    
    summary( myLabeler(myQFvars["qf1"]) )

     

    Results
    Strongly Disagree          Disagree           Neutral             Agree    Strongly Agree              NA's 
                    0                 0                 0                 0                 0                 1 

     

     

    • 모든 변수에 적용.
    R Programming
    %%R
    
    myQFvars[ ,myQFnames] <- lapply( myQFvars[ ,myQFnames ], myLabeler )
    
    myQFvars

     

    Results
    # A tibble: 8 x 4
      qf1               qf2               qf3            qf4              
      <fct>             <fct>             <fct>          <fct>            
    1 Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree
    2 Disagree          Strongly Disagree Agree          Strongly Disagree
    3 Disagree          Disagree          Agree          Neutral          
    4 Neutral           Strongly Disagree <NA>           Neutral          
    5 Agree             Strongly Agree    Disagree       Agree            
    6 Strongly Agree    Agree             Strongly Agree Strongly Agree   
    7 Strongly Agree    Neutral           Agree          Agree            
    8 Agree             Strongly Agree    Strongly Agree Strongly Agree   

     

     

    • Summary함수의 결과.
    R Programming
    %%R
    
    summary(myQFvars)

     

    Results
                    qf1                   qf2                   qf3                   qf4   
     Strongly Disagree:1   Strongly Disagree:3   Strongly Disagree:0   Strongly Disagree:2  
     Disagree         :2   Disagree         :1   Disagree         :1   Disagree         :0  
     Neutral          :1   Neutral          :1   Neutral          :0   Neutral          :2  
     Agree            :2   Agree            :1   Agree            :3   Agree            :2  
     Strongly Agree   :2   Strongly Agree   :2   Strongly Agree   :3   Strongly Agree   :2  
                                                 NA's             :1                        

     

     

    • withmooc에 새로운 변수를 결합.
    R Programming
    %%R
    
    withmooc<-cbind(withmooc,myQFvars)
    
    withmooc

     

    Results
      id workshop gender q1 q2 q3 q4 genderF genderNums genderFNums               q1f               q2f            q3f               q4f
    1  1        R      f  1  1  5  1  female         NA           2 Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree
    2  2      SAS      f  2  1  4  1  female         NA           2          Disagree Strongly Disagree          Agree Strongly Disagree
    3  3        R      f  2  2  4  3  female         NA           2          Disagree          Disagree          Agree           Neutral
    4  4      SAS      f  3  1 NA  3  female         NA           2           Neutral Strongly Disagree           <NA>           Neutral
    5  5        R      m  4  5  2  4    male         NA           1             Agree    Strongly Agree       Disagree             Agree
    6  6      SAS      m  5  4  5  5    male         NA           1    Strongly Agree             Agree Strongly Agree    Strongly Agree
    7  7        R      m  5  3  4  4    male         NA           1    Strongly Agree           Neutral          Agree             Agree
    8  8      SAS      m  4  5  5  5    male         NA           1             Agree    Strongly Agree Strongly Agree    Strongly Agree
                    qf1               qf2            qf3               qf4
    1 Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree
    2          Disagree Strongly Disagree          Agree Strongly Disagree
    3          Disagree          Disagree          Agree           Neutral
    4           Neutral Strongly Disagree           <NA>           Neutral
    5             Agree    Strongly Agree       Disagree             Agree
    6    Strongly Agree             Agree Strongly Agree    Strongly Agree
    7    Strongly Agree           Neutral          Agree             Agree
    8             Agree    Strongly Agree Strongly Agree    Strongly Agree

     

     


    5. R - Tidyverse

     

    R Programming
    from rpy2.robjects import r
    %load_ext rpy2.ipython

     

    The rpy2.ipython extension is already loaded. To reload it, use:
      %reload_ext rpy2.ipython

     

     

    R Programming
    %%R
    
    library(tidyverse)
    library(psych)
    mydata <- read_csv("C:/work/data/mydata.csv", 
      col_types = cols( id       = col_double(),
                        workshop = col_character(),
                        gender   = col_character(),
                        q1       = col_double(),
                        q2       = col_double(),
                        q3       = col_double(),
                        q4       = col_double()
      )
    )
    
    withmooc = mydata
    
    attach(withmooc) # mydata를 기본 데이터 세트로 지정.
    
    withmooc

     

    Results
    R[write to console]: -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
    
    From cffi callback :
    Traceback (most recent call last):
    
    ========================================
    
    R[write to console]: The following objects are masked from 'package:ggplot2':
    
        %+%, alpha

     

    Results
    # A tibble: 8 x 7
         id workshop gender    q1    q2    q3    q4
      <dbl> <chr>    <chr>  <dbl> <dbl> <dbl> <dbl>
    1     1 1        f          1     1     5     1
    2     2 2        f          2     1     4     1
    3     3 1        f          2     2     4     3
    4     4 2        f          3     1    NA     3
    5     5 1        m          4     5     2     4
    6     6 2        m          5     4     5     5
    7     7 1        m          5     3     4     4
    8     8 2        m          4     5     5     5

     

     

    • 기본적으로, Group은 수치형으로 읽히고, Gender는 Factor로써 읽힌다.
    • Gender가 문자 이기 때문이다.
    • 하나의 긴 텍스트 문자열로 데이터 저장.
    R Programming
    %%R
    
    mystring<-("id,workshop,gender,q1,q2,q3,q4
    1,1,f,1,1,5,1
    2,2,f,2,1,4,1
    3,1,f,2,2,4,3
    4,2,f,3,1, ,3
    5,1,m,4,5,2,4
    6,2,m,5,4,5,5
    7,1,m,5,3,4,4
    8,2,m,4,5,5,5")
    
    mystring

     

    Results
    [1] "id,workshop,gender,q1,q2,q3,q4\n1,1,f,1,1,5,1\n2,2,f,2,1,4,1\n3,1,f,2,2,4,3\n4,2,f,3,1, ,3\n5,1,m,4,5,2,4\n6,2,m,5,4,5,5\n7,1,m,5,3,4,4\n8,2,m,4,5,5,5"

     

     

    • 파일 위치 대신에 textConnection 함수를 이용하여서 프로그램 내의 mystring(긴 문자 벡터)을 텍스트 파일로 읽기.
    R Programming
    %%R
    
    withmooc<-read.table(textConnection(mystring),
                       header=TRUE,sep=",",row.names="id")
    
    withmooc

     

    Results
      workshop gender q1 q2 q3 q4
    1        1      f  1  1  5  1
    2        2      f  2  1  4  1
    3        1      f  2  2  4  3
    4        2      f  3  1 NA  3
    5        1      m  4  5  2  4
    6        2      m  5  4  5  5
    7        1      m  5  3  4  4
    8        2      m  4  5  5  5

     

     

    • 기본적으로, Summary는 Group을 수치형으로 취급하지만, Gender는 Factor로 가정하고, 그것의 레벨을 카운트한다.
    R Programming
    %%R
    
    summary(withmooc)

     

    Results
           id         workshop            gender                q1             q2             q3              q4      
     Min.   :1.00   Length:8           Length:8           Min.   :1.00   Min.   :1.00   Min.   :2.000   Min.   :1.00  
     1st Qu.:2.75   Class :character   Class :character   1st Qu.:2.00   1st Qu.:1.00   1st Qu.:4.000   1st Qu.:2.50  
     Median :4.50   Mode  :character   Mode  :character   Median :3.50   Median :2.50   Median :4.000   Median :3.50  
     Mean   :4.50                                         Mean   :3.25   Mean   :2.75   Mean   :4.143   Mean   :3.25  
     3rd Qu.:6.25                                         3rd Qu.:4.25   3rd Qu.:4.25   3rd Qu.:5.000   3rd Qu.:4.25  
     Max.   :8.00                                         Max.   :5.00   Max.   :5.00   Max.   :5.000   Max.   :5.00  
                                                                                        NA's   :1                     

     

     

    R Programming
    %%R
    
    withmooc %>%
      dlookr::diagnose_numeric()

     

    Results
    # A tibble: 5 x 10
      variables   min    Q1  mean median    Q3   max  zero minus outlier
      <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <int> <int>   <int>
    1 id            1  2.75  4.5     4.5  6.25     8     0     0       0
    2 q1            1  2     3.25    3.5  4.25     5     0     0       0
    3 q2            1  1     2.75    2.5  4.25     5     0     0       0
    4 q3            2  4     4.14    4    5        5     0     0       1
    5 q4            1  2.5   3.25    3.5  4.25     5     0     0       0

     

     

    R Programming
    %%R
    
    withmooc %>% 
      dlookr::describe()

     

    Results
    # A tibble: 5 x 26
      variable     n    na  mean    sd se_mean   IQR skewness kurtosis   p00   p01   p05   p10   p20
      <chr>    <int> <int> <dbl> <dbl>   <dbl> <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 id           8     0  4.5   2.45   0.866  3.5     0        -1.2      1  1.07  1.35   1.7   2.4
    2 q1           8     0  3.25  1.49   0.526  2.25   -0.217    -1.41     1  1.07  1.35   1.7   2  
    3 q2           8     0  2.75  1.75   0.620  3.25    0.292    -1.91     1  1     1      1     1  
    4 q3           7     1  4.14  1.07   0.404  1      -1.52      2.71     2  2.12  2.6    3.2   4  
    5 q4           8     0  3.25  1.58   0.559  1.75   -0.542    -1.02     1  1     1      1     1.8
    # ... with 12 more variables: p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
    #   p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>

     

     

    R Programming
    %%R
    
    withmooc %>%
      purrr::keep(.p = is.numeric) %>% # 숫자형 데이터만 남기기
      dlookr::describe()

     

    Results
    # A tibble: 5 x 26
      variable     n    na  mean    sd se_mean   IQR skewness kurtosis   p00   p01   p05   p10   p20
      <chr>    <int> <int> <dbl> <dbl>   <dbl> <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 id           8     0  4.5   2.45   0.866  3.5     0        -1.2      1  1.07  1.35   1.7   2.4
    2 q1           8     0  3.25  1.49   0.526  2.25   -0.217    -1.41     1  1.07  1.35   1.7   2  
    3 q2           8     0  2.75  1.75   0.620  3.25    0.292    -1.91     1  1     1      1     1  
    4 q3           7     1  4.14  1.07   0.404  1      -1.52      2.71     2  2.12  2.6    3.2   4  
    5 q4           8     0  3.25  1.58   0.559  1.75   -0.542    -1.02     1  1     1      1     1.8
    # ... with 12 more variables: p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
    #   p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>

     

     

    R Programming
    %%R
    
    print(packageVersion("tidyr"))
    print(packageVersion("dplyr"))

     

    Results
    [1] '1.1.1'
    [1] '1.0.2'

     

     

    • 아래 에러 발생 시 재구동 : 정확한 원인 모름
    • Error: Input must be a vector, not a describe object.
    • Run rlang::last_error() to see where the error occurred.
    R Programming
    %%R
    
    withmooc %>% 
      purrr::keep(.p = is.numeric) %>%                  # 숫자형 데이터만 남기기
      purrr::map_df(.x = ., .f = psych::describe) %>%  # 앞의 데이터에 대해 기술통계량을 구해주는 함수 적용
      base::transform(vars = colnames(purrr::keep(.x = withmooc,
                                             .p = is.numeric)))

     

    Results
           vars n     mean       sd median  trimmed    mad min max range       skew
    X1...1   id 8 4.500000 2.449490    4.5 4.500000 2.9652   1   8     7  0.0000000
    X1...2   q1 8 3.250000 1.488048    3.5 3.250000 2.2239   1   5     4 -0.1422626
    X1...3   q2 8 2.750000 1.752549    2.5 2.750000 2.2239   1   5     4  0.1915814
    X1...4   q3 7 4.142857 1.069045    4.0 4.142857 1.4826   2   5     3 -0.9306418
    X1...5   q4 8 3.250000 1.581139    3.5 3.250000 1.4826   1   5     4 -0.3557562
             kurtosis        se
    X1...1 -1.6510417 0.8660254
    X1...2 -1.7276762 0.5261043
    X1...3 -1.9113964 0.6196197
    X1...4 -0.5165816 0.4040610
    X1...5 -1.5868750 0.5590170

     

     

    • Workshop변수를 Factor로 변경.
    • Summary함수는 workshop변수의 출현 횟수를 카운트한다.
    • 현재의 workshop의 평균은 잘못된 기록이다.
    R Programming
    %%R
    
    withmooc %>%
      mutate(workshop = factor(workshop,
                                levels=c(1,2,3,4),
                                labels=c("R","SAS","SPSS","Stata"))) %>%
      summary()

     

    Results
           id        workshop    gender                q1             q2             q3              q4      
     Min.   :1.00   R    :4   Length:8           Min.   :1.00   Min.   :1.00   Min.   :2.000   Min.   :1.00  
     1st Qu.:2.75   SAS  :4   Class :character   1st Qu.:2.00   1st Qu.:1.00   1st Qu.:4.000   1st Qu.:2.50  
     Median :4.50   SPSS :0   Mode  :character   Median :3.50   Median :2.50   Median :4.000   Median :3.50  
     Mean   :4.50   Stata:0                      Mean   :3.25   Mean   :2.75   Mean   :4.143   Mean   :3.25  
     3rd Qu.:6.25                                3rd Qu.:4.25   3rd Qu.:4.25   3rd Qu.:5.000   3rd Qu.:4.25  
     Max.   :8.00                                Max.   :5.00   Max.   :5.00   Max.   :5.000   Max.   :5.00  
                                                                               NA's   :1                     

     

     

    • Hmisc 패키지에서 Describe함수를 이용.
    • Summary함수와 틀리게, Describe함수는 q변수의 빈도와 평균, 백분율을 계산한다.
    • Describe함수를 사용하기 위해서 Hmisc 라이브러리를 인스톨해야 한다.
    R Programming
    %%R
    
    withmooc %>%
      mutate(workshop = factor(workshop,
                               levels=c(1,2,3,4),
                               labels=c("R","SAS","SPSS","Stata"))) %>%
      describe()

     

    Results
              vars n mean   sd median trimmed  mad min max range  skew kurtosis   se
    id           1 8 4.50 2.45    4.5    4.50 2.97   1   8     7  0.00    -1.65 0.87
    workshop*    2 8 1.50 0.53    1.5    1.50 0.74   1   2     1  0.00    -2.23 0.19
    gender*      3 8 1.50 0.53    1.5    1.50 0.74   1   2     1  0.00    -2.23 0.19
    q1           4 8 3.25 1.49    3.5    3.25 2.22   1   5     4 -0.14    -1.73 0.53
    q2           5 8 2.75 1.75    2.5    2.75 2.22   1   5     4  0.19    -1.91 0.62
    q3           6 7 4.14 1.07    4.0    4.14 1.48   2   5     3 -0.93    -0.52 0.40
    q4           7 8 3.25 1.58    3.5    3.25 1.48   1   5     4 -0.36    -1.59 0.56

     

     

    • 어떻게 레벨이 값으로 매치되는지 확인.
    R Programming
    %%R
    
    unclass(withmooc$gender)

     

    Results
    [1] "f" "f" "f" "f" "m" "m" "m" "m"

     

     

    • m은 male로 f는 female로 순서를 변경하자.
    • 만약 값이 대문자이면, 실제적으로 결측값을 생성한다.
    • 각각의 기초되는 값을 추출.
    • genderNums는 변수 값의 알파벳 순서가 할당된다.
    • genderFNums은 위에서 factor함수의 levels의 순서에 따라서 m이 2, f가 1이 할당된다.
    R Programming
    %%R
    
    withmooc<-withmooc %>%
      mutate(gender  = factor(gender,levels=c("f","m"),labels=c("f","m")),
             genderF = factor(gender,levels=c("m","f"),labels=c("male","female")))
    withmooc

     

    Results
    # A tibble: 8 x 8
         id workshop gender    q1    q2    q3    q4 genderF
      <dbl> <chr>    <fct>  <dbl> <dbl> <dbl> <dbl> <fct>  
    1     1 1        f          1     1     5     1 female 
    2     2 2        f          2     1     4     1 female 
    3     3 1        f          2     2     4     3 female 
    4     4 2        f          3     1    NA     3 female 
    5     5 1        m          4     5     2     4 male   
    6     6 2        m          5     4     5     5 male   
    7     7 1        m          5     3     4     4 male   
    8     8 2        m          4     5     5     5 male   

     

     

    R Programming
    %%R
    
    print(unclass(withmooc$gender))
    
    unclass(withmooc$genderF)

     

    Results
    [1] 1 1 1 1 2 2 2 2
    attr(,"levels")
    [1] "f" "m"
    [1] 2 2 2 2 1 1 1 1
    attr(,"levels")
    [1] "male"   "female"

     

     

    • 각각의 기초되는 값을 추출.
    • genderNums는 변수 값의 알파벳 순서가 할당된다.
    • genderFNums은 위에서 factor함수의 levels의 순서에 따라서 m이 2, f가 1이 할당된다.
    R Programming
    %%R
    
    withmooc$genderNums  <- as.numeric(withmooc$gender)
    withmooc$genderFNums <- as.numeric(withmooc$genderF)
    
    # 실제 할당된 값을 확인.
    withmooc

     

    Results
    # A tibble: 8 x 10
         id workshop gender    q1    q2    q3    q4 genderF genderNums genderFNums
      <dbl> <chr>    <fct>  <dbl> <dbl> <dbl> <dbl> <fct>        <dbl>       <dbl>
    1     1 1        f          1     1     5     1 female           1           2
    2     2 2        f          2     1     4     1 female           1           2
    3     3 1        f          2     2     4     3 female           1           2
    4     4 2        f          3     1    NA     3 female           1           2
    5     5 1        m          4     5     2     4 male             2           1
    6     6 2        m          5     4     5     5 male             2           1
    7     7 1        m          5     3     4     4 male             2           1
    8     8 2        m          4     5     5     5 male             2           1

     

     

    • Factor로 이용하기 위해 q변수의 복사본을 생성하고, 그것을 카운트할 수 있다.
    • 반복하여 사용하기 위해 라벨을 저장.
    R Programming
    %%R
    
    myQlevels <- c(1,2,3,4,5)
    
    # 반복하여 이용하기 위해 라벨을 저장.
    
    myQlabels <- c("Strongly Disagree",
                   "Disagree",
                   "Neutral",
                   "Agree",
                   "Strongly Agree")

     

     

    • Factor함수를 이용하여 새로운 변수 세트를 생성.
    R Programming
    %%R
    
    withmooc %>%
      mutate(q1f = factor(q1, myQlevels, myQlabels), 
             q2f = factor(q2, myQlevels, myQlabels), 
             q3f = factor(q3, myQlevels, myQlabels), 
             q4f = factor(q4, myQlevels, myQlabels) ) %>%
      select(q1f,q2f,q3f,q4f) %>%
      summary()

     

    Results
                    q1f                   q2f                   q3f                   q4f   
     Strongly Disagree:1   Strongly Disagree:3   Strongly Disagree:0   Strongly Disagree:2  
     Disagree         :2   Disagree         :1   Disagree         :1   Disagree         :0  
     Neutral          :1   Neutral          :1   Neutral          :0   Neutral          :2  
     Agree            :2   Agree            :1   Agree            :3   Agree            :2  
     Strongly Agree   :2   Strongly Agree   :2   Strongly Agree   :3   Strongly Agree   :2  
                                                 NA's             :1                        

     

     

    • Factor로 이용하기 위해서 q변수의 복사 번을 생성. 만약 변수 수가 많다면, 자동적으로 쉽게 할 수 있는 방법.
    • Factor로써 이용하기 위해 q 변수의 복사본을 생성, 그 결과 그것들을 카운트할 수 있다.
    R Programming
    %%R
    
    myQlevels <- c(1,2,3,4,5)
    
    myQlabels <- c("Strongly Disagree",
                   "Disagree",
                   "Neutral",
                   "Agree",
                   "Strongly Agree")
    
    print(myQlevels)
    
    print(myQlabels)

     

    Results
    [1] 1 2 3 4 5
    [1] "Strongly Disagree" "Disagree"          "Neutral"           "Agree"             "Strongly Agree"   

     

     

    • 이용될 변수 이름의 두 개 세트를 생성.
    R Programming
    %%R
    
    myQnames  <- paste( "q",  1:4, sep="")
    myQFnames <- paste( "qf", 1:4, sep="")
    
    print(myQnames) # 원 변수명.
    print(myQFnames)  # 새로운 factor 변수의 이름.

     

    Results
    [1] "q1" "q2" "q3" "q4"
    [1] "qf1" "qf2" "qf3" "qf4"

     

     

    • 많은 변수의 라벨을 적용하기 위해 함수 생성.
    R Programming
    %%R
    
    myLabeler <- function(x) { factor(x, myQlevels, myQlabels) }

     

     

    • 한 변수가 함수로 어떻게 적용되는지 확인할 수 있다.
    R Programming
    %%R
    
    withmooc %>%
      mutate(qf1 = myLabeler(q1))

     

    Results
    # A tibble: 8 x 11
         id workshop gender    q1    q2    q3    q4 genderF genderNums genderFNums qf1              
      <dbl> <chr>    <fct>  <dbl> <dbl> <dbl> <dbl> <fct>        <dbl>       <dbl> <fct>            
    1     1 1        f          1     1     5     1 female           1           2 Strongly Disagree
    2     2 2        f          2     1     4     1 female           1           2 Disagree         
    3     3 1        f          2     2     4     3 female           1           2 Disagree         
    4     4 2        f          3     1    NA     3 female           1           2 Neutral          
    5     5 1        m          4     5     2     4 male             2           1 Agree            
    6     6 2        m          5     4     5     5 male             2           1 Strongly Agree   
    7     7 1        m          5     3     4     4 male             2           1 Strongly Agree   
    8     8 2        m          4     5     5     5 male             2           1 Agree            

     

     

    • 모든 변수에 적용.
    • map : 각 변수 별로 함수 적용 후 하나의 테이블로 재구성됨.
    • transmute() 함수는 신규 변수를 생성하고 기존 변수 삭제
    • 파일 위치 대신에 textConnection 함수를 이용하여서 프로그램 내의 mystring(긴 문자 벡터)을 텍스트 파일로 읽기.
    R Programming
    %%R
    
    withmooc %>%
      purrr::keep(.p = is.numeric) %>% # 숫자형 데이터만 남기기
      purrr::map(myLabeler) %>%
      as_tibble()

     

    Results
    # A tibble: 8 x 7
      id                q1                q2                q3             q4                genderNums        genderFNums      
      <fct>             <fct>             <fct>             <fct>          <fct>             <fct>             <fct>            
    1 Strongly Disagree Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree Strongly Disagree Disagree         
    2 Disagree          Disagree          Strongly Disagree Agree          Strongly Disagree Strongly Disagree Disagree         
    3 Neutral           Disagree          Disagree          Agree          Neutral           Strongly Disagree Disagree         
    4 Agree             Neutral           Strongly Disagree <NA>           Neutral           Strongly Disagree Disagree         
    5 Strongly Agree    Agree             Strongly Agree    Disagree       Agree             Disagree          Strongly Disagree
    6 <NA>              Strongly Agree    Agree             Strongly Agree Strongly Agree    Disagree          Strongly Disagree
    7 <NA>              Strongly Agree    Neutral           Agree          Agree             Disagree          Strongly Disagree
    8 <NA>              Agree             Strongly Agree    Strongly Agree Strongly Agree    Disagree          Strongly Disagree

     

     

    R Programming
    %%R
    
    withmooc %>%
      select(starts_with("q")) %>% # 숫자형 데이터만 남기기
      purrr::map_dfc(myLabeler)

     

    Results
    # A tibble: 8 x 4
      q1                q2                q3             q4               
      <fct>             <fct>             <fct>          <fct>            
    1 Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree
    2 Disagree          Strongly Disagree Agree          Strongly Disagree
    3 Disagree          Disagree          Agree          Neutral          
    4 Neutral           Strongly Disagree <NA>           Neutral          
    5 Agree             Strongly Agree    Disagree       Agree            
    6 Strongly Agree    Agree             Strongly Agree Strongly Agree   
    7 Strongly Agree    Neutral           Agree          Agree            
    8 Agree             Strongly Agree    Strongly Agree Strongly Agree   

     

     

    R Programming
    %%R
    
    withmooc %>%
      purrr::keep(.p = is.numeric) %>% # 숫자형 데이터만 남기기
      purrr::map_df(.x = .,
                    .f = myLabeler)

     

    Results
    # A tibble: 8 x 7
      id                q1                q2                q3             q4                genderNums        genderFNums      
      <fct>             <fct>             <fct>             <fct>          <fct>             <fct>             <fct>            
    1 Strongly Disagree Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree Strongly Disagree Disagree         
    2 Disagree          Disagree          Strongly Disagree Agree          Strongly Disagree Strongly Disagree Disagree         
    3 Neutral           Disagree          Disagree          Agree          Neutral           Strongly Disagree Disagree         
    4 Agree             Neutral           Strongly Disagree <NA>           Neutral           Strongly Disagree Disagree         
    5 Strongly Agree    Agree             Strongly Agree    Disagree       Agree             Disagree          Strongly Disagree
    6 <NA>              Strongly Agree    Agree             Strongly Agree Strongly Agree    Disagree          Strongly Disagree
    7 <NA>              Strongly Agree    Neutral           Agree          Agree             Disagree          Strongly Disagree
    8 <NA>              Agree             Strongly Agree    Strongly Agree Strongly Agree    Disagree          Strongly Disagree

     

     

    R Programming
    %%R
    
    withmooc %>% mutate_at( (withmooc %>%
                             select(starts_with("q")) %>%
                             colnames()),
                             myLabeler)

     

    Results
    # A tibble: 8 x 10
         id workshop gender q1                q2                q3             q4                genderF genderNums genderFNums
      <dbl> <chr>    <fct>  <fct>             <fct>             <fct>          <fct>             <fct>        <dbl>       <dbl>
    1     1 1        f      Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree female           1           2
    2     2 2        f      Disagree          Strongly Disagree Agree          Strongly Disagree female           1           2
    3     3 1        f      Disagree          Disagree          Agree          Neutral           female           1           2
    4     4 2        f      Neutral           Strongly Disagree <NA>           Neutral           female           1           2
    5     5 1        m      Agree             Strongly Agree    Disagree       Agree             male             2           1
    6     6 2        m      Strongly Agree    Agree             Strongly Agree Strongly Agree    male             2           1
    7     7 1        m      Strongly Agree    Neutral           Agree          Agree             male             2           1
    8     8 2        m      Agree             Strongly Agree    Strongly Agree Strongly Agree    male             2           1

     

     

    R Programming
    %%R
    
    withmooc %>% mutate_at( vars(starts_with("q")),
                            myLabeler)

     

    Results
    # A tibble: 8 x 10
         id workshop gender q1                q2                q3             q4                genderF genderNums genderFNums
      <dbl> <chr>    <fct>  <fct>             <fct>             <fct>          <fct>             <fct>        <dbl>       <dbl>
    1     1 1        f      Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree female           1           2
    2     2 2        f      Disagree          Strongly Disagree Agree          Strongly Disagree female           1           2
    3     3 1        f      Disagree          Disagree          Agree          Neutral           female           1           2
    4     4 2        f      Neutral           Strongly Disagree <NA>           Neutral           female           1           2
    5     5 1        m      Agree             Strongly Agree    Disagree       Agree             male             2           1
    6     6 2        m      Strongly Agree    Agree             Strongly Agree Strongly Agree    male             2           1
    7     7 1        m      Strongly Agree    Neutral           Agree          Agree             male             2           1
    8     8 2        m      Agree             Strongly Agree    Strongly Agree Strongly Agree    male             2           1

     

     

    • 함수 직접 작성
    R Programming
    %%R
    
    withmooc %>% mutate_at( vars(starts_with("q")),
                            funs(factor(., myQlevels, myQlabels)))

     

    Results
    # A tibble: 8 x 10
         id workshop gender q1                q2                q3             q4                genderF genderNums genderFNums
      <dbl> <chr>    <fct>  <fct>             <fct>             <fct>          <fct>             <fct>        <dbl>       <dbl>
    1     1 1        f      Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree female           1           2
    2     2 2        f      Disagree          Strongly Disagree Agree          Strongly Disagree female           1           2
    3     3 1        f      Disagree          Disagree          Agree          Neutral           female           1           2
    4     4 2        f      Neutral           Strongly Disagree <NA>           Neutral           female           1           2
    5     5 1        m      Agree             Strongly Agree    Disagree       Agree             male             2           1
    6     6 2        m      Strongly Agree    Agree             Strongly Agree Strongly Agree    male             2           1
    7     7 1        m      Strongly Agree    Neutral           Agree          Agree             male             2           1
    8     8 2        m      Agree             Strongly Agree    Strongly Agree Strongly Agree    male             2           1

     

     

    R Programming
    %%R
    
    withmooc %>%
      select(q1,q2,q3,q4) %>%
      purrr::map_dfc(~ withmooc %>% transmute( {{.x}} := myLabeler(.x))) %>%
      set_names(c('q1','q2','q3','q4'))

     

    Results
    R[write to console]: New names:
    * `..1` -> ...1
    * `..1` -> ...2
    * `..1` -> ...3
    * `..1` -> ...4
    
    
    
    # A tibble: 8 x 4
      q1                q2                q3             q4               
      <fct>             <fct>             <fct>          <fct>            
    1 Strongly Disagree Strongly Disagree Strongly Agree Strongly Disagree
    2 Disagree          Strongly Disagree Agree          Strongly Disagree
    3 Disagree          Disagree          Agree          Neutral          
    4 Neutral           Strongly Disagree <NA>           Neutral          
    5 Agree             Strongly Agree    Disagree       Agree            
    6 Strongly Agree    Agree             Strongly Agree Strongly Agree   
    7 Strongly Agree    Neutral           Agree          Agree            
    8 Agree             Strongly Agree    Strongly Agree Strongly Agree   

     


    6. Python - Pandas

     

    Python Programming
    import pandas as pd
    import numpy as np
    import sweetviz as sv
    
    mydata = pd.read_csv("C:/work/data/mydata.csv",sep=",",
                         dtype={'id':object,'workshop':object,
                                'q1':int, 'q2':int, 'q3':float, 'q4':int},
                         na_values=['NaN'],skipinitialspace =True)
    
    withmooc= mydata.copy()
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	1		f	1	1	5.0	1
    1	2	2		f	2	1	4.0	1
    2	3	1		f	2	2	4.0	3
    3	4	2		f	3	1	NaN	3
    4	5	1		m	4	5	2.0	4
    5	6	2		m	5	4	5.0	5
    6	7	1		m	5	3	4.0	4
    7	8	2		m	4	5	5.0	5
     
     
    • 수치형 변수에 대한 요약 통계
    Python Programming
    withmooc= mydata.copy()
    
    withmooc.describe()

     

    Results
    	q1		q2		q3		q4
    count	8.000000	8.000000	7.000000	8.000000
    mean	3.250000	2.750000	4.142857	3.250000
    std	1.488048	1.752549	1.069045	1.581139
    min	1.000000	1.000000	2.000000	1.000000
    25%	2.000000	1.000000	4.000000	2.500000
    50%	3.500000	2.500000	4.000000	3.500000
    75%	4.250000	4.250000	5.000000	4.250000
    max	5.000000	5.000000	5.000000	5.000000
     
     
    • 문자형 변수에 대한 요약 통계
    Python Programming
    withmooc= mydata.copy()
    
    withmooc.describe(include=[np.object])

     

    Results
    	id	workshop	gender
    count	8	8		8
    unique	8	2		2
    top	2	1		f
    freq	1	4		4
     

     

    Python Programming
    withmooc= mydata.copy()
    
    withmooc.apply(lambda x : x.describe())

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    25%	NaN	NaN	NaN	2.000000	1.000000	4.000000	2.500000
    50%	NaN	NaN	NaN	3.500000	2.500000	4.000000	3.500000
    75%	NaN	NaN	NaN	4.250000	4.250000	5.000000	4.250000
    count	8	8	8	8.000000	8.000000	7.000000	8.000000
    freq	1	4	4	NaN		NaN		NaN		NaN
    max	NaN	NaN	NaN	5.000000	5.000000	5.000000	5.000000
    mean	NaN	NaN	NaN	3.250000	2.750000	4.142857	3.250000
    min	NaN	NaN	NaN	1.000000	1.000000	2.000000	1.000000
    std	NaN	NaN	NaN	1.488048	1.752549	1.069045	1.581139
    top	2	1	f	NaN		NaN		NaN		NaN
    unique	8	2	2	NaN		NaN		NaN		NaN
     

     

    Python Programming
    withmooc= mydata.copy()
    
    labels2={'1':'R','2':'SAS','3':'SPSS', '4':'Python'}
    
    withmooc['workshop'] = withmooc['workshop'].apply(lambda x: labels2.get(x))
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	R		f	1	1	5.0	1
    1	2	SAS		f	2	1	4.0	1
    2	3	R		f	2	2	4.0	3
    3	4	SAS		f	3	1	NaN	3
    4	5	R		m	4	5	2.0	4
    5	6	SAS		m	5	4	5.0	5
    6	7	R		m	5	3	4.0	4
    7	8	SAS		m	4	5	5.0	5
     

     

    Python Programming
    withmooc= mydata.copy()
    withmooc['workshop'] = withmooc['workshop'].map(labels2)
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	R		f	1	1	5.0	1
    1	2	SAS		f	2	1	4.0	1
    2	3	R		f	2	2	4.0	3
    3	4	SAS		f	3	1	NaN	3
    4	5	R		m	4	5	2.0	4
    5	6	SAS		m	5	4	5.0	5
    6	7	R		m	5	3	4.0	4
    7	8	SAS		m	4	5	5.0	5
     

     

    Python Programming
    withmooc= mydata.copy()
    
    withmooc['workshop'] = withmooc['workshop'].astype('category')
    withmooc['workshop'] = withmooc['workshop'].cat.rename_categories(["R", "SAS"])
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	R		f	1	1	5.0	1
    1	2	SAS		f	2	1	4.0	1
    2	3	R		f	2	2	4.0	3
    3	4	SAS		f	3	1	NaN	3
    4	5	R		m	4	5	2.0	4
    5	6	SAS		m	5	4	5.0	5
    6	7	R		m	5	3	4.0	4
    7	8	SAS		m	4	5	5.0	5
     

     

    Python Programming
    withmooc.groupby(withmooc['workshop']).describe()

     

    Results
    	q1							q2		...	q3		q4
    	count	mean	std	min		25%	50%	75%	max	count	mean	...	75%	max	count	mean	std		min	25%	50%	75%	max
    workshop																					
    R	4.0	3.0	1.825742	1.0	1.75	3.0	4.25	5.0	4.0	2.75	...	4.25	5.0	4.0	3.0	1.414214	1.0	2.5	3.5	4.0	4.0
    SAS	4.0	3.5	1.290994	2.0	2.75	3.5	4.25	5.0	4.0	2.75	...	5.00	5.0	4.0	3.5	1.914854	1.0	2.5	4.0	5.0	5.0

    2 rows × 32 columns

     

     

    • 어떻게 레벨이 값으로 매치되는지 확인.
    Python Programming
    withmooc.info()
    withmooc.dtypes
    withmooc['workshop'].dtype

     

    Results
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 8 entries, 0 to 7
    Data columns (total 7 columns):
     #   Column    Non-Null Count  Dtype   
    ---  ------    --------------  -----   
     0   id        8 non-null      object  
     1   workshop  8 non-null      category
     2   gender    8 non-null      object  
     3   q1        8 non-null      int32   
     4   q2        8 non-null      int32   
     5   q3        7 non-null      float64 
     6   q4        8 non-null      int32   
    dtypes: category(1), float64(1), int32(3), object(2)
    memory usage: 520.0+ bytes
    
    
    
    
    
    CategoricalDtype(categories=['R', 'SAS'], ordered=False)

     

     

    • m은 male로 f는 female로 순서를 변경하자.
    • 만약 값이 대문자이면, 실제적으로 결측 값을 생성한다.
    Python Programming
    withmooc['gender']  = withmooc['gender'].astype('category')
    withmooc['genderF'] = withmooc['gender'].cat.rename_categories(["female", "male"])
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF
    0	1	R		f	1	1	5.0	1	female
    1	2	SAS		f	2	1	4.0	1	female
    2	3	R		f	2	2	4.0	3	female
    3	4	SAS		f	3	1	NaN	3	female
    4	5	R		m	4	5	2.0	4	male
    5	6	SAS		m	5	4	5.0	5	male
    6	7	R		m	5	3	4.0	4	male
    7	8	SAS		m	4	5	5.0	5	male
     
     
     
    • 각각의 기초되는 값을 추출.
    • genderNums는 변수 값의 알파벳 순서가 할당된다.
    • genderFNums은 위에서 factor함수의 levels의 순서에 따라서 m이 2, f가 1이 할당된다.
    Python Programming
    withmooc= mydata.copy()
    
    withmooc['gender']  = withmooc['gender'].astype('category')
    withmooc['genderF'] = withmooc['gender'].cat.rename_categories(["female", "male"])
    
    withmooc["genderNums"]  = withmooc["gender"].cat.codes
    withmooc["genderFNums"] = withmooc["genderF"].cat.codes
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums
    0	1	1		f	1	1	5.0	1	female	0		0
    1	2	2		f	2	1	4.0	1	female	0		0
    2	3	1		f	2	2	4.0	3	female	0		0
    3	4	2		f	3	1	NaN	3	female	0		0
    4	5	1		m	4	5	2.0	4	male	1		1
    5	6	2		m	5	4	5.0	5	male	1		1
    6	7	1		m	5	3	4.0	4	male	1		1
    7	8	2		m	4	5	5.0	5	male	1		1
     

     

    Python Programming
    withmooc= mydata.copy()
    
    
    withmooc['gender']  = withmooc['gender'].astype('category')
    withmooc['genderF'] = withmooc['gender'].cat.rename_categories(["female", "male"])
    
    withmooc['genderNums']  = pd.factorize(withmooc.gender)[0]
    withmooc['genderFNums'] = pd.factorize(withmooc.genderF)[0]
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums
    0	1	1		f	1	1	5.0	1	female	0		0
    1	2	2		f	2	1	4.0	1	female	0		0
    2	3	1		f	2	2	4.0	3	female	0		0
    3	4	2		f	3	1	NaN	3	female	0		0
    4	5	1		m	4	5	2.0	4	male	1		1
    5	6	2		m	5	4	5.0	5	male	1		1
    6	7	1		m	5	3	4.0	4	male	1		1
    7	8	2		m	4	5	5.0	5	male	1		1
     

     

    Python Programming
    withmooc= mydata.copy()
    
    withmooc['gender']  = withmooc['gender'].astype('category')
    withmooc['genderF'] = withmooc['gender'].cat.rename_categories(["female", "male"])
    
    from sklearn.preprocessing import LabelEncoder
    
    number = LabelEncoder()
    
    withmooc['genderNums']  = number.fit_transform(withmooc['gender'])
    withmooc['genderFNums'] = number.fit_transform(withmooc['genderF'])
    
    withmooc.info()
    
    withmooc

     

    Results
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 8 entries, 0 to 7
    Data columns (total 10 columns):
     #   Column       Non-Null Count  Dtype   
    ---  ------       --------------  -----   
     0   id           8 non-null      object  
     1   workshop     8 non-null      object  
     2   gender       8 non-null      category
     3   q1           8 non-null      int32   
     4   q2           8 non-null      int32   
     5   q3           7 non-null      float64 
     6   q4           8 non-null      int32   
     7   genderF      8 non-null      category
     8   genderNums   8 non-null      int32   
     9   genderFNums  8 non-null      int32   
    dtypes: category(2), float64(1), int32(5), object(2)
    memory usage: 688.0+ bytes

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums
    0	1	1		f	1	1	5.0	1	female	0		0
    1	2	2		f	2	1	4.0	1	female	0		0
    2	3	1		f	2	2	4.0	3	female	0		0
    3	4	2		f	3	1	NaN	3	female	0		0
    4	5	1		m	4	5	2.0	4	male	1		1
    5	6	2		m	5	4	5.0	5	male	1		1
    6	7	1		m	5	3	4.0	4	male	1		1
    7	8	2		m	4	5	5.0	5	male	1		1

     

    • Factor로 이용하기 위해 q변수의 복사본을 생성하고, 그것을 카운트할 수 있다.
    • 반복하여 사용하기 위해 라벨을 저장.
    • 반복하여 이용하기 위해 라벨을 저장.
    • Factor함수를 이용하여 새로운 변수 세트를 생성.
    Python Programming
    withmooc= mydata.copy()
    
    withmooc['q1f']=withmooc['q1'].astype('category').cat.rename_categories(["Strongly Disagree","Disagree","Neutral","Agree","Strongly Agree"])
    withmooc['q2f']=withmooc['q2'].astype('category').cat.rename_categories(["Strongly Disagree","Disagree","Neutral","Agree","Strongly Agree"])
    withmooc['q3f']=withmooc['q3'].astype('category').cat.rename_categories({1:"Strongly Disagree",2:"Disagree",3:"Neutral",4:"Agree",5:"Strongly Agree"})
    withmooc['q4f']=withmooc['q4'].astype('category').cat.rename_categories({1:"Strongly Disagree",2:"Disagree",3:"Neutral",4:"Agree",5:"Strongly Agree"})
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q1f			q2f			q3f		q4f
    0	1	1		f	1	1	5.0	1	Strongly Disagree	Strongly Disagree	Strongly Agree	Strongly Disagree
    1	2	2		f	2	1	4.0	1	Disagree		Strongly Disagree	Agree		Strongly Disagree
    2	3	1		f	2	2	4.0	3	Disagree		Disagree		Agree		Neutral
    3	4	2		f	3	1	NaN	3	Neutral			Strongly Disagree	NaN		Neutral
    4	5	1		m	4	5	2.0	4	Agree			Strongly Agree		Disagree	Agree
    5	6	2		m	5	4	5.0	5	Strongly Agree		Agree			Strongly Agree	Strongly Agree
    6	7	1		m	5	3	4.0	4	Strongly Agree		Neutral			Agree		Agree
    7	8	2		m	4	5	5.0	5	Agree			Strongly Agree		Strongly Agree	Strongly Agree
     

     

    Python Programming
    withmooc.info()

     

    Results
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 8 entries, 0 to 7
    Data columns (total 11 columns):
     #   Column    Non-Null Count  Dtype   
    ---  ------    --------------  -----   
     0   id        8 non-null      object  
     1   workshop  8 non-null      object  
     2   gender    8 non-null      object  
     3   q1        8 non-null      int32   
     4   q2        8 non-null      int32   
     5   q3        7 non-null      float64 
     6   q4        8 non-null      int32   
     7   q1f       8 non-null      category
     8   q2f       8 non-null      category
     9   q3f       7 non-null      category
     10  q4f       8 non-null      category
    dtypes: category(4), float64(1), int32(3), object(3)
    memory usage: 1.2+ KB

     

     

    Python Programming
    pd.DataFrame([(withmooc[val].value_counts()) for val in ['q1f','q2f','q3f','q4f']]).T

     

    Results
    			q1f	q2f	q3f	q4f
    Agree			2.0	1.0	3.0	2.0
    Disagree		2.0	1.0	1.0	NaN
    Neutral			1.0	1.0	NaN	2.0
    Strongly Agree		2.0	2.0	3.0	2.0
    Strongly Disagree	1.0	3.0	NaN	2.0
     
     
    • Factor로 이용하기 위해서 q변수의 복사 번을 생성. 만약 변수 수가 많다면, 자동적으로 쉽게 할 수 있는 방법.
    • Factor로써 이용하기 위해 q 변수의 복사본을 생성, 그 결과 그것들을 카운트할 수 있다.
    Python Programming
    myQlevels = [1,2,3,4,5]
    
    myQlabels =   {1:"Strongly Disagree",2:"Disagree",3:"Neutral",4:"Agree",5:"Strongly Agree"}
    
    print(myQlevels)
    print(myQlabels)

     

    Results
    [1, 2, 3, 4, 5]
    {1: 'Strongly Disagree', 2: 'Disagree', 3: 'Neutral', 4: 'Agree', 5: 'Strongly Agree'}

     

     

    Python Programming
    myQnames  = ["q" + str(i) for i in range(1,5)]
    myQFnames = ["q" + str(i) + "f" for i in range(1,5)]
    
    print(myQnames)  # 변수명 출력
    
    print(myQFnames)  # 새로운 factor 변수의 이름.

     

    Results
    ['q1', 'q2', 'q3', 'q4']
    ['q1f', 'q2f', 'q3f', 'q4f']

     

     

    • 데이터 프레임을 분리하기 위해 q변수 추출.
    Python Programming
    myQFvars = withmooc.loc[:,myQnames]
    
    myQFvars

     

    Results
    	q1	q2	q3	q4
    0	1	1	5.0	1
    1	2	1	4.0	1
    2	2	2	4.0	3
    3	3	1	NaN	3
    4	4	5	2.0	4
    5	5	4	5.0	5
    6	5	3	4.0	4
    7	4	5	5.0	5
     
     
    • Factor에 대하여 F를 가진 모든 변수로 변수명을 변경.
    Python Programming
    myQFvars.columns = myQFnames
    myQFvars

     

    Results
    	q1f	q2f	q3f	q4f
    0	1	1	5.0	1
    1	2	1	4.0	1
    2	2	2	4.0	3
    3	3	1	NaN	3
    4	4	5	2.0	4
    5	5	4	5.0	5
    6	5	3	4.0	4
    7	4	5	5.0	5
     

     

    Python Programming
    withmooc['q4'].astype('category').cat.rename_categories(myQlabels)

     

    Results
    0    Strongly Disagree
    1    Strongly Disagree
    2              Neutral
    3              Neutral
    4                Agree
    5       Strongly Agree
    6                Agree
    7       Strongly Agree
    Name: q4, dtype: category
    Categories (4, object): ['Strongly Disagree', 'Neutral', 'Agree', 'Strongly Agree']

     

     

    Python Programming
    def categories(x):
        return x.astype('category').cat.rename_categories(myQlabels)
    
    categories(withmooc['q4'])

     

    Results
    0    Strongly Disagree
    1    Strongly Disagree
    2              Neutral
    3              Neutral
    4                Agree
    5       Strongly Agree
    6                Agree
    7       Strongly Agree
    Name: q4, dtype: category
    Categories (4, object): ['Strongly Disagree', 'Neutral', 'Agree', 'Strongly Agree']

     

     

    Python Programming
    myQFvars.loc[ :,myQFnames ] = myQFvars.loc[ :,myQFnames ].apply(lambda x:categories(x))
    myQFvars

     

    Results
    	q1f			q2f			q3f			q4f
    0	Strongly Disagree	Strongly Disagree	Strongly Agree		Strongly Disagree
    1	Disagree		Strongly Disagree	Agree			Strongly Disagree
    2	Disagree		Disagree		Agree			Neutral
    3	Neutral			Strongly Disagree	NaN			Neutral
    4	Agree			Strongly Agree		Disagree		Agree
    5	Strongly Agree		Agree			Strongly Agree		Strongly Agree
    6	Strongly Agree		Neutral			Agree			Agree
    7	Agree			Strongly Agree		Strongly Agree		Strongly Agree
     
     
    • Summary함수의 결과.
    Python Programming
    pd.DataFrame([(myQFvars[val].value_counts()) for val in ['q1f','q2f','q3f','q4f']]).T

     

    Results
    			q1f	q2f	q3f	q4f
    Agree			2.0	1.0	3.0	2.0
    Disagree		2.0	1.0	1.0	NaN
    Neutral			1.0	1.0	NaN	2.0
    Strongly Agree		2.0	2.0	3.0	2.0
    Strongly Disagree	1.0	3.0	NaN	2.0
     

     

    Python Programming
    pd.merge(withmooc, myQFvars, how='inner')

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	q1f			q2f			q3f			q4f
    0	1	1		f	1	1	5.0	1	Strongly Disagree	Strongly Disagree	Strongly Agree		Strongly Disagree
    1	2	2		f	2	1	4.0	1	Disagree		Strongly Disagree	Agree			Strongly Disagree
    2	3	1		f	2	2	4.0	3	Disagree		Disagree		Agree			Neutral
    3	4	2		f	3	1	NaN	3	Neutral			Strongly Disagree	NaN			Neutral
    4	5	1		m	4	5	2.0	4	Agree			Strongly Agree		Disagree		Agree
    5	6	2		m	5	4	5.0	5	Strongly Agree		Agree			Strongly Agree		Strongly Agree
    6	7	1		m	5	3	4.0	4	Strongly Agree		Neutral			Agree			Agree
    7	8	2		m	4	5	5.0	5	Agree			Strongly Agree		Strongly Agree		Strongly Agree

     


    7. Python - dfply

    • 기본적으로, Summary는 Group을 수치형으로 취급하지만, Gender는 Factor로 가정하고, 그것의 레벨을 카운트한다.
    Python Programming
    import pandas as pd
    from dfply import *
    
    mydata   = pd.read_csv("c:/work/data/mydata.csv",sep=",",
                           dtype={'id':object,'workshop':object,
                                  'q1':int, 'q2':int, 'q3':float, 'q4':int},
                           na_values=['NaN'],skipinitialspace =True)
    
    withmooc= mydata.copy()
    
    # 모든 변수 선택하기.
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	1		f	1	1	5.0	1
    1	2	2		f	2	1	4.0	1
    2	3	1		f	2	2	4.0	3
    3	4	2		f	3	1	NaN	3
    4	5	1		m	4	5	2.0	4
    5	6	2		m	5	4	5.0	5
    6	7	1		m	5	3	4.0	4
    7	8	2		m	4	5	5.0	5
     

     

    Python Programming
    withmooc >> summarize(**{
      **{f"{x}_mean": X[x].mean() for x in mydata.select_dtypes(int).columns},
      **{f"{x}_std" : X[x].std() for x in mydata.select_dtypes(int).columns},
      **{f"{x}_var" : X[x].var() for x in mydata.select_dtypes(int).columns},
      **{f"{x}_median" : X[x].median() for x in mydata.select_dtypes(int).columns}
      })

     

    Results
    	q1_mean	q2_mean	q4_mean	q1_std		q2_std		q4_std		q1_var		q2_var		q4_var	q1_median	q2_median	q4_median
    0	3.25	2.75	3.25	1.488048	1.752549	1.581139	2.214286	3.071429	2.5	3.5		2.5		3.5
     

     

    Python Programming
    (withmooc >> select(withmooc.select_dtypes(include=np.number).columns.tolist())).describe()

     

    Results
    	q1		q2		q3		q4
    count	8.000000	8.000000	7.000000	8.000000
    mean	3.250000	2.750000	4.142857	3.250000
    std	1.488048	1.752549	1.069045	1.581139
    min	1.000000	1.000000	2.000000	1.000000
    25%	2.000000	1.000000	4.000000	2.500000
    50%	3.500000	2.500000	4.000000	3.500000
    75%	4.250000	4.250000	5.000000	4.250000
    max	5.000000	5.000000	5.000000	5.000000
     

     

    Python Programming
    (withmooc >> select(withmooc.select_dtypes(include=np.number).columns.tolist())).describe().T

     

    Results
    	count	mean		std		min	25%	50%	75%	max
    q1	8.0	3.250000	1.488048	1.0	2.0	3.5	4.25	5.0
    q2	8.0	2.750000	1.752549	1.0	1.0	2.5	4.25	5.0
    q3	7.0	4.142857	1.069045	2.0	4.0	4.0	5.00	5.0
    q4	8.0	3.250000	1.581139	1.0	2.5	3.5	4.25	5.0
     

     

    Python Programming
    withmooc= mydata.copy()
    
    labels2={'1':'R','2':'SAS','3':'SPSS', '4':'Python'}
    
    withmooc >> mutate(workshop = X['workshop'].apply(lambda x: labels2.get(x)))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	R		f	1	1	5.0	1
    1	2	SAS		f	2	1	4.0	1
    2	3	R		f	2	2	4.0	3
    3	4	SAS		f	3	1	NaN	3
    4	5	R		m	4	5	2.0	4
    5	6	SAS		m	5	4	5.0	5
    6	7	R		m	5	3	4.0	4
    7	8	SAS		m	4	5	5.0	5
     

     

    Python Programming
    withmooc= mydata.copy()
    
    labels2={'1':'R','2':'SAS','3':'SPSS', '4':'Python'}
    
    withmooc >> mutate(workshop = X['workshop'].map(labels2))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	R		f	1	1	5.0	1
    1	2	SAS		f	2	1	4.0	1
    2	3	R		f	2	2	4.0	3
    3	4	SAS		f	3	1	NaN	3
    4	5	R		m	4	5	2.0	4
    5	6	SAS		m	5	4	5.0	5
    6	7	R		m	5	3	4.0	4
    7	8	SAS		m	4	5	5.0	5
     

     

    Python Programming
    withmooc= mydata.copy()
    
    withmooc = withmooc >> mutate(workshop = X['workshop'].astype('category'))
    withmooc = withmooc >> mutate(workshop = X['workshop'].cat.rename_categories(["R", "SAS"]))
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	R		f	1	1	5.0	1
    1	2	SAS		f	2	1	4.0	1
    2	3	R		f	2	2	4.0	3
    3	4	SAS		f	3	1	NaN	3
    4	5	R		m	4	5	2.0	4
    5	6	SAS		m	5	4	5.0	5
    6	7	R		m	5	3	4.0	4
    7	8	SAS		m	4	5	5.0	5
     

     

    Python Programming
    withmooc >> group_by('workshop') >> \
      summarize(**{
      **{f"{x}_mean"   : X[x].mean() for x in withmooc.select_dtypes(int).columns},
      **{f"{x}_std"    : X[x].std() for x in withmooc.select_dtypes(int).columns},
      **{f"{x}_var"    : X[x].var() for x in withmooc.select_dtypes(int).columns},
      **{f"{x}_median" : X[x].median() for x in withmooc.select_dtypes(int).columns}
      })

     

    Results
    	workshop	q1_mean	q2_mean	q4_mean	q1_std		q2_std		q4_std		q1_var		q2_var		q4_var		q1_median	q2_median	q4_median
    0	R		3.0	2.75	3.0	1.825742	1.707825	1.414214	3.333333	2.916667	2.000000	3.0		2.5		3.5
    1	SAS		3.5	2.75	3.5	1.290994	2.061553	1.914854	1.666667	4.250000	3.666667	3.5		2.5		4.0

     

     

    Python Programming
    withmooc >> group_by('workshop') >> \
      summarize(q1_mean=X.q1.mean(), q1_std=X.q1.std(),
                q2_mean=X.q1.mean(), q2_std=X.q1.std(),
                q3_mean=X.q1.mean(), q3_std=X.q1.std(),
                q4_mean=X.q1.mean(), q4_std=X.q1.std())

     

    Results
    	workshop	q1_mean	q1_std		q2_mean	q2_std		q3_mean	q3_std		q4_mean	q4_std
    0	R		3.0	1.825742	3.0	1.825742	3.0	1.825742	3.0	1.825742
    1	SAS		3.5	1.290994	3.5	1.290994	3.5	1.290994	3.5	1.290994
     

     

    Python Programming
    @pipe
    @symbolic_evaluation()
    def symbolic_double(df, *serieses):
        result = []
        for series in serieses:
            result.append(series.describe())
        return pd.DataFrame(result)
    
    # withmooc >> symbolic_double(X.q1,X.q2,X.q3,X.q4)
    
    withmooc >> symbolic_double(X.q1,X.q2,X.q3,X.q4)

     

    Results
    	count	mean		std		min	25%	50%	75%	max
    q1	8.0	3.250000	1.488048	1.0	2.0	3.5	4.25	5.0
    q2	8.0	2.750000	1.752549	1.0	1.0	2.5	4.25	5.0
    q3	7.0	4.142857	1.069045	2.0	4.0	4.0	5.00	5.0
    q4	8.0	3.250000	1.581139	1.0	2.5	3.5	4.25	5.0
     

     

    Python Programming
    @pipe
    @symbolic_evaluation()
    def num_variable(df,serieses):
        result = []
    
        for series in serieses:
            if df[series].dtypes in (["int32","float64"]):
                result.append(df[series].describe())
        return pd.DataFrame(result)
    
    withmooc >> num_variable(mydata.columns.tolist())

     

    Results
    	count	mean		std		min	25%	50%	75%	max
    q1	8.0	3.250000	1.488048	1.0	2.0	3.5	4.25	5.0
    q2	8.0	2.750000	1.752549	1.0	1.0	2.5	4.25	5.0
    q3	7.0	4.142857	1.069045	2.0	4.0	4.0	5.00	5.0
    q4	8.0	3.250000	1.581139	1.0	2.5	3.5	4.25	5.0
     

     

    Python Programming
    @pipe
    @symbolic_evaluation()
    def num_variable(df,serieses):
        result = []
    
        for series in serieses:
            if df[series].dtypes in (["int32","float64"]):
                result.append(df[series].describe())
            elif df[series].dtypes in (["object"]):
                result.append(df[series].describe())
        return pd.DataFrame(result)
    
    withmooc >> num_variable(mydata.columns.tolist())

     

    Results
    	count	unique	top	freq	mean		std		min	25%	50%	75%	max
    id	8.0	8.0	2	1.0	NaN		NaN		NaN	NaN	NaN	NaN	NaN
    gender	8.0	2.0	f	4.0	NaN		NaN		NaN	NaN	NaN	NaN	NaN
    q1	8.0	NaN	NaN	NaN	3.250000	1.488048	1.0	2.0	3.5	4.25	5.0
    q2	8.0	NaN	NaN	NaN	2.750000	1.752549	1.0	1.0	2.5	4.25	5.0
    q3	7.0	NaN	NaN	NaN	4.142857	1.069045	2.0	4.0	4.0	5.00	5.0
    q4	8.0	NaN	NaN	NaN	3.250000	1.581139	1.0	2.5	3.5	4.25	5.0
     
     
     
    • m은 male로 f는 female로 순서를 변경하자.
    • 만약 값이 대문자이면, 실제적으로 결측 값을 생성한다.
    Python Programming
    withmooc = withmooc \
        >> mutate(gender = X.gender.astype('category')) \
        >> mutate(genderF = X.gender.cat.rename_categories(["female", "male"]))
    
    print(withmooc.dtypes)
    
    withmooc

     

    Results
    id            object
    workshop    category
    gender      category
    q1             int32
    q2             int32
    q3           float64
    q4             int32
    genderF     category
    dtype: object

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF
    0	1	R		f	1	1	5.0	1	female
    1	2	SAS		f	2	1	4.0	1	female
    2	3	R		f	2	2	4.0	3	female
    3	4	SAS		f	3	1	NaN	3	female
    4	5	R		m	4	5	2.0	4	male
    5	6	SAS		m	5	4	5.0	5	male
    6	7	R		m	5	3	4.0	4	male
    7	8	SAS		m	4	5	5.0	5	male
     
     
    • 각각의 기초되는 값을 추출.
    • genderNums는 변수 값의 알파벳 순서가 할당된다.
    • genderFNums은 위에서 factor함수의 levels의 순서에 따라서 m이 2, f가 1이 할당된다.
    Python Programming
    mydata1= mydata.copy()
    
    withmooc = withmooc \
        >> mutate(gender = X.gender.astype('category')) \
        >> mutate(genderF = X.gender.cat.rename_categories(["female", "male"])) \
        >> mutate(genderNums = X.gender.cat.codes) \
        >> mutate(genderFNums = X.genderF.cat.codes)
    
    print(withmooc.dtypes)
    
    withmooc

     

    Results
    id               object
    workshop       category
    gender         category
    q1                int32
    q2                int32
    q3              float64
    q4                int32
    genderF        category
    genderNums         int8
    genderFNums        int8
    dtype: object

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums
    0	1	R		f	1	1	5.0	1	female	0		0
    1	2	SAS		f	2	1	4.0	1	female	0		0
    2	3	R		f	2	2	4.0	3	female	0		0
    3	4	SAS		f	3	1	NaN	3	female	0		0
    4	5	R		m	4	5	2.0	4	male	1		1
    5	6	SAS		m	5	4	5.0	5	male	1		1
    6	7	R		m	5	3	4.0	4	male	1		1
    7	8	SAS		m	4	5	5.0	5	male	1		1
     

     

    Python Programming
    mydata1= mydata.copy()
    
    withmooc = withmooc \
        >> mutate(gender = X.gender.astype('category')) \
        >> mutate(genderF = X.gender.cat.rename_categories(["female", "male"])) \
        >> mutate(genderNums = pd.DataFrame(pd.factorize(withmooc.gender)[0])) \
        >> mutate(genderFNums = pd.DataFrame(pd.factorize(withmooc.genderF)[0]))
    
    print(withmooc.dtypes)
    
    withmooc

     

    Results
    id               object
    workshop       category
    gender         category
    q1                int32
    q2                int32
    q3              float64
    q4                int32
    genderF        category
    genderNums        int64
    genderFNums       int64
    dtype: object

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums
    0	1	R		f	1	1	5.0	1	female	0		0
    1	2	SAS		f	2	1	4.0	1	female	0		0
    2	3	R		f	2	2	4.0	3	female	0		0
    3	4	SAS		f	3	1	NaN	3	female	0		0
    4	5	R		m	4	5	2.0	4	male	1		1
    5	6	SAS		m	5	4	5.0	5	male	1		1
    6	7	R		m	5	3	4.0	4	male	1		1
    7	8	SAS		m	4	5	5.0	5	male	1		1
     
     
     
    • Factor로 이용하기 위해 q변수의 복사본을 생성하고, 그것을 카운트할 수 있다.
    • 반복하여 사용하기 위해 라벨을 저장.
    • Factor함수를 이용하여 새로운 변수 세트를 생성.
    Python Programming
    mydata1= mydata.copy()
    
    withmooc = withmooc \
      >> mutate(q1f = X.q1.astype('category').cat.rename_categories(["Strongly Disagree","Disagree","Neutral","Agree","Strongly Agree"])) \
      >> mutate(q2f = X.q2.astype('category').cat.rename_categories(["Strongly Disagree","Disagree","Neutral","Agree","Strongly Agree"])) \
      >> mutate(q3f = X.q3.astype('category').cat.rename_categories({1:"Strongly Disagree",2:"Disagree",3:"Neutral",4:"Agree",5:"Strongly Agree"})) \
      >> mutate(q4f = X.q4.astype('category').cat.rename_categories({1:"Strongly Disagree",2:"Disagree",3:"Neutral",4:"Agree",5:"Strongly Agree"}))
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums	q1f			q2f			q3f		q4f
    0	1	R		f	1	1	5.0	1	female	0		0		Strongly Disagree	Strongly Disagree	Strongly Agree	Strongly Disagree
    1	2	SAS		f	2	1	4.0	1	female	0		0		Disagree		Strongly Disagree	Agree		Strongly Disagree
    2	3	R		f	2	2	4.0	3	female	0		0		Disagree		Disagree		Agree		Neutral
    3	4	SAS		f	3	1	NaN	3	female	0		0		Neutral			Strongly Disagree	NaN		Neutral
    4	5	R		m	4	5	2.0	4	male	1		1		Agree			Strongly Agree		Disagree	Agree
    5	6	SAS		m	5	4	5.0	5	male	1		1		Strongly Agree		Agree			Strongly Agree	Strongly Agree
    6	7	R		m	5	3	4.0	4	male	1		1		Strongly Agree		Neutral			Agree		Agree
    7	8	SAS		m	4	5	5.0	5	male	1		1		Agree			Strongly Agree		Strongly Agree	Strongly Agree
     

     

    Python Programming
    withmooc.info()

     

    Results
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 8 entries, 0 to 7
    Data columns (total 14 columns):
     #   Column       Non-Null Count  Dtype   
    ---  ------       --------------  -----   
     0   id           8 non-null      object  
     1   workshop     8 non-null      category
     2   gender       8 non-null      category
     3   q1           8 non-null      int32   
     4   q2           8 non-null      int32   
     5   q3           7 non-null      float64 
     6   q4           8 non-null      int32   
     7   genderF      8 non-null      category
     8   genderNums   8 non-null      int64   
     9   genderFNums  8 non-null      int64   
     10  q1f          8 non-null      category
     11  q2f          8 non-null      category
     12  q3f          7 non-null      category
     13  q4f          8 non-null      category
    dtypes: category(7), float64(1), int32(3), int64(2), object(1)
    memory usage: 1.5+ KB

     

     

    Python Programming
    @pipe
    @symbolic_evaluation()
    def qf_counts(df,serieses):
        result = []
    
        for series in serieses:
            result.append(df[series].value_counts())
        return pd.DataFrame(result).T
    
    withmooc >> qf_counts(['q1f','q2f','q3f','q4f'])

     

    Results
    			q1f	q2f	q3f	q4f
    Agree			2.0	1.0	3.0	2.0
    Disagree		2.0	1.0	1.0	NaN
    Neutral			1.0	1.0	NaN	2.0
    Strongly Agree		2.0	2.0	3.0	2.0
    Strongly Disagree	1.0	3.0	NaN	2.0
     
     
     
    • Factor로 이용하기 위해서 q변수의 복사 번을 생성. 만약 변수 수가 많다면, 자동적으로 쉽게 할 수 있는 방법.
    • Factor로써 이용하기 위해 q 변수의 복사본을 생성, 그 결과 그것들을 카운트할 수 있다.
    Python Programming
    myQlevels = [1,2,3,4,5]
    
    myQlabels =   {1:"Strongly Disagree",2:"Disagree",3:"Neutral",4:"Agree",5:"Strongly Agree"}
    
    print(myQlevels)
    print(myQlabels)

     

    Results
    [1, 2, 3, 4, 5]
    {1: 'Strongly Disagree', 2: 'Disagree', 3: 'Neutral', 4: 'Agree', 5: 'Strongly Agree'}

     

     

    • 데이터 프레임을 분리하기 위해 q변수 추출.
    Python Programming
    myQFvars = withmooc >> select(num_range("q", range(1,5)))

     

    Python Programming
    # Factor에 대하여 F를 가진 모든 변수로 변수명을 변경.
    myQFnames = ['q1f', 'q2f', 'q3f', 'q4f']
    
    myQFvars.columns = myQFnames
    myQFvars

     

    Results
    	q1f	q2f	q3f	q4f
    0	1	1	5.0	1
    1	2	1	4.0	1
    2	2	2	4.0	3
    3	3	1	NaN	3
    4	4	5	2.0	4
    5	5	4	5.0	5
    6	5	3	4.0	4
    7	4	5	5.0	5
     

     

    Python Programming
    mydata1= mydata.copy()
    
    withmooc \
      >> mutate(q4f = X.q4.astype('category').cat.rename_categories(myQlabels))

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums	q1f			q2f			q3f			q4f
    0	1	R		f	1	1	5.0	1	female	0		0		Strongly Disagree	Strongly Disagree	Strongly Agree	Strongly Disagree
    1	2	SAS		f	2	1	4.0	1	female	0		0		Disagree		Strongly Disagree	Agree		Strongly Disagree
    2	3	R		f	2	2	4.0	3	female	0		0		Disagree		Disagree		Agree		Neutral
    3	4	SAS		f	3	1	NaN	3	female	0		0		Neutral			Strongly Disagree	NaN		Neutral
    4	5	R		m	4	5	2.0	4	male	1		1		Agree			Strongly Agree		Disagree	Agree
    5	6	SAS		m	5	4	5.0	5	male	1		1		Strongly Agree		Agree			Strongly Agree	Strongly Agree
    6	7	R		m	5	3	4.0	4	male	1		1		Strongly Agree		Neutral			Agree		Agree
    7	8	SAS		m	4	5	5.0	5	male	1		1		Agree			Strongly Agree		Strongly Agree	Strongly Agree
     

     

    Python Programming
    withmooc["q1"].astype('category').cat.rename_categories(myQlabels)

     

    Results
    0    Strongly Disagree
    1             Disagree
    2             Disagree
    3              Neutral
    4                Agree
    5       Strongly Agree
    6       Strongly Agree
    7                Agree
    Name: q1, dtype: category
    Categories (5, object): ['Strongly Disagree', 'Disagree', 'Neutral', 'Agree', 'Strongly Agree']

     

     

    Python Programming
    @pipe
    @symbolic_evaluation()
    def qf_counts(df,serieses):
        result = []
    
        for series in serieses:
            result.append(df[series].astype('category').cat.rename_categories(myQlabels))
        return pd.DataFrame(result).T
    
    myQFvars = withmooc >> qf_counts(['q1f','q2f','q3f','q4f'])

     

    Python Programming
    myQFvars

     

    Results
    	q1f			q2f			q3f			q4f
    0	Strongly Disagree	Strongly Disagree	Strongly Agree		Strongly Disagree
    1	Disagree		Strongly Disagree	Agree			Strongly Disagree
    2	Disagree		Disagree		Agree			Neutral
    3	Neutral			Strongly Disagree	NaN			Neutral
    4	Agree			Strongly Agree		Disagree		Agree
    5	Strongly Agree		Agree			Strongly Agree		Strongly Agree
    6	Strongly Agree		Neutral			Agree			Agree
    7	Agree			Strongly Agree		Strongly Agree		Strongly Agree
     

     

    Python Programming
    @pipe
    @symbolic_evaluation()
    def qf_counts(df,serieses):
        result = []
    
        for series in serieses:
            result.append(df[series].value_counts())
        return pd.DataFrame(result).T
    
    myQFvars >> qf_counts(['q1f','q2f','q3f','q4f'])

     

    Results
    			q1f	q2f	q3f	q4f
    Strongly Agree		2.0	2.0	3.0	2.0
    Disagree		2.0	1.0	1.0	NaN
    Agree			2.0	1.0	3.0	2.0
    Strongly Disagree	1.0	3.0	NaN	2.0
    Neutral			1.0	1.0	NaN	2.0
     

     

    Python Programming
    both = withmooc >> bind_cols(myQFvars)
    both

     

    Results
    	id	workshop	gender	q1	q2	q3	q4	genderF	genderNums	genderFNums	q1f			q2f			q3f		q4f			q1f			q2f			q3f		q4f
    0	1	R		f	1	1	5.0	1	female	0		0		Strongly Disagree	Strongly Disagree	Strongly Agree	Strongly Disagree	Strongly Disagree	Strongly Disagree	Strongly Agree	Strongly Disagree
    1	2	SAS		f	2	1	4.0	1	female	0		0		Disagree		Strongly Disagree	Agree		Strongly Disagree	Disagree		Strongly Disagree	Agree		Strongly Disagree
    2	3	R		f	2	2	4.0	3	female	0		0		Disagree		Disagree		Agree		Neutral			Disagree		Disagree		Agree		Neutral
    3	4	SAS		f	3	1	NaN	3	female	0		0		Neutral			Strongly Disagree	NaN		Neutral			Neutral			Strongly Disagree	NaN		Neutral
    4	5	R		m	4	5	2.0	4	male	1		1		Agree			Strongly Agree		Disagree	Agree			Agree			Strongly Agree		Disagree	Agree
    5	6	SAS		m	5	4	5.0	5	male	1		1		Strongly Agree		Agree			Strongly Agree	Strongly Agree		Strongly Agree		Agree			Strongly Agree	Strongly Agree
    6	7	R		m	5	3	4.0	4	male	1		1		Strongly Agree		Neutral			Agree		Agree			Strongly Agree		Neutral			Agree		Agree
    7	8	SAS		m	4	5	5.0	5	male	1		1		Agree			Strongly Agree		Strongly Agree	Strongly Agree		Agree			Strongly Agree		Strongly Agree	Strongly Agree

     



     

    통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply)
    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크
    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]
    SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크
    반응형

    댓글