본문 바로가기
통계프로그램 비교 시리즈/데이터 전처리 비교

15. 변수 라벨(Variable Labels)

by 기서무나구물 2022. 1. 19.

포스팅 목차

    통계프로그램 비교 시리즈 – 15. 변수 라벨(Variable Labels)

     


    1. Proc SQL

     

    SAS Programming
    proc sql;
      select id,
             workshop ,
             gender   ,
             q1       label='The instructor was well prepared.',
             q2       label='The instructor communicated well.',
             q3       label='The course materials were helpful.',
             q4       label='Overall, I found this workshop useful.'
      from   BACK.mydata;
    
    quit;

     

    Results
                                                         The  Overall,
                                   The           The     course   I found
                            instructor    instructor  materials      this
                              was well  communicated       were  workshop
    id  workshop  gender     prepared.         well.   helpful.   useful.
    ---------------------------------------------------------------------
     1         1  f                  1             1          5         1
     2         2  f                  2             1          4         1
     3         1  f                  2             2          4         3
     4         2  f                  3             1          .         3
     5         1  m                  4             5          2         4
     6         2  m                  5             4          5         5
     7         1  m                  5             3          4         4
     8         2  m                  4             5          5         5

     


    2. SAS Programming

    • SAS Program for Variable Labels;
    SAS Programming
    DATA mydata; 
     SET BACK.mydata ;
         LABEL Q1="문제1"
               Q2="문제2"
               Q3="문제3"
               Q4="문제4";
    run;
    
    PROC FREQ;
         TABLES q1-q4;
    RUN;

     

    Results
                       문제1
                                  누적       누적
    q1      빈도      백분율      빈도      백분율
    ----------------------------------------------
     1           1     12.50           1     12.50
     2           2     25.00           3     37.50
     3           1     12.50           4     50.00
     4           2     25.00           6     75.00
     5           2     25.00           8    100.00
    
    
    
                         문제2
                                  누적       누적
    q2      빈도      백분율      빈도      백분율
    ----------------------------------------------
     1           3     37.50           3     37.50
     2           1     12.50           4     50.00
     3           1     12.50           5     62.50
     4           1     12.50           6     75.00
     5           2     25.00           8    100.00

     


    3. SPSS

    • SPSS Program for Variable Labels.
    SPSS Programming
    VARIABLE LABELS
      Q1 "문제1"
      Q2 "문제2"
      Q3 "문제3"
      Q4 "문제4".
    FREQUENCIES  VARIABLES=q1 q2 q3 q4.
    
    EXECUTE.

     


    4. R Programming (R-PROJECT)

     

    R Programming
    from rpy2.robjects import r
    %load_ext rpy2.ipython

     

    Results
    The rpy2.ipython extension is already loaded. To reload it, use:
      %reload_ext rpy2.ipython

     

     

    R Programming
    %%R
    
    library(tidyverse)
    library(psych)
    library(Hmisc)
    
    mydata <- read_csv("C:/work/data/mydata.csv", 
      col_types = cols( id       = col_double(),
                        workshop = col_character(),
                        gender   = col_character(),
                        q1       = col_double(),
                        q2       = col_double(),
                        q3       = col_double(),
                        q4       = col_double()
      )
    )
    
    withmooc = mydata
    
    attach(withmooc) # mydata를 기본 데이터 세트로 지정.
    
    withmooc

     

    Results
    R[write to console]: The following objects are masked from withmooc (pos = 3):
    
        gender, id, q1, q2, q3, q4, workshop
    
    
    
    
    # A tibble: 8 x 7
         id workshop gender    q1    q2    q3    q4
      <dbl> <chr>    <chr>  <dbl> <dbl> <dbl> <dbl>
    1     1 1        f          1     1     5     1
    2     2 2        f          2     1     4     1
    3     3 1        f          2     2     4     3
    4     4 2        f          3     1    NA     3
    5     5 1        m          4     5     2     4
    6     6 2        m          5     4     5     5
    7     7 1        m          5     3     4     4
    8     8 2        m          4     5     5     5

     

    • 변수 라벨을 위한 R-Project 프로그램.
    • Hmisc의 Label함수 이용.
    R Programming
    %%R
    
    label(withmooc$q1)<-"The instructor was well prepared."
    label(withmooc$q2)<-"The instructor communicated well."
    label(withmooc$q3)<-"The course materials were helpful."
    label(withmooc$q4)<- "Overall, I found this workshop useful."
    
    withmooc

     

    Results
    # A tibble: 8 x 7
         id workshop gender q1         q2         q3         q4        
      <dbl> <chr>    <chr>  <labelled> <labelled> <labelled> <labelled>
    1     1 1        f      1          1           5         1         
    2     2 2        f      2          1           4         1         
    3     3 1        f      2          2           4         3         
    4     4 2        f      3          1          NA         3         
    5     5 1        m      4          5           2         4         
    6     6 2        m      5          4           5         5         
    7     7 1        m      5          3           4         4         
    8     8 2        m      4          5           5         5         

     

    • Hmisc의 describe함수는 변수 라벨을 이용.
    R Programming
    %%R
    
    Hmisc::describe(withmooc)

     

    Results
    withmooc 
    
     7  Variables      8  Observations
    --------------------------------------------------------------------------------
    id 
           n  missing distinct     Info     Mean      Gmd 
           8        0        8        1      4.5        3 
    
    lowest : 1 2 3 4 5, highest: 4 5 6 7 8
    
    Value          1     2     3     4     5     6     7     8
    Frequency      1     1     1     1     1     1     1     1
    Proportion 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
    --------------------------------------------------------------------------------
    workshop 
           n  missing distinct 
           8        0        2 
    
    Value        1   2
    Frequency    4   4
    Proportion 0.5 0.5
    --------------------------------------------------------------------------------
    gender 
           n  missing distinct 
           8        0        2 
    
    Value        f   m
    Frequency    4   4
    Proportion 0.5 0.5
    --------------------------------------------------------------------------------
    q1 : The instructor was well prepared. 
           n  missing distinct     Info     Mean      Gmd 
           8        0        5    0.964     3.25    1.786 
    
    lowest : 1 2 3 4 5, highest: 1 2 3 4 5
    
    Value          1     2     3     4     5
    Frequency      1     2     1     2     2
    Proportion 0.125 0.250 0.125 0.250 0.250
    --------------------------------------------------------------------------------
    q2 : The instructor communicated well. 
           n  missing distinct     Info     Mean      Gmd 
           8        0        5     0.94     2.75    2.071 
    
    lowest : 1 2 3 4 5, highest: 1 2 3 4 5
    
    Value          1     2     3     4     5
    Frequency      3     1     1     1     2
    Proportion 0.375 0.125 0.125 0.125 0.250
    --------------------------------------------------------------------------------
    q3 : The course materials were helpful. 
           n  missing distinct     Info     Mean      Gmd 
           7        1        3    0.857    4.143    1.143 
    
    Value          2     4     5
    Frequency      1     3     3
    Proportion 0.143 0.429 0.429
    --------------------------------------------------------------------------------
    q4 : Overall, I found this workshop useful. 
           n  missing distinct     Info     Mean      Gmd 
           8        0        4    0.952     3.25    1.857 
    
    Value         1    3    4    5
    Frequency     2    2    2    2
    Proportion 0.25 0.25 0.25 0.25
    --------------------------------------------------------------------------------

     

    • Summary 함수는 라벨을 무시한다.
    R Programming
    %%R
    
    summary(withmooc)

     

    Results
           id         workshop            gender                q1      
     Min.   :1.00   Length:8           Length:8           Min.   :1.00  
     1st Qu.:2.75   Class :character   Class :character   1st Qu.:2.00  
     Median :4.50   Mode  :character   Mode  :character   Median :3.50  
     Mean   :4.50                                         Mean   :3.25  
     3rd Qu.:6.25                                         3rd Qu.:4.25  
     Max.   :8.00                                         Max.   :5.00  
    
           q2             q3              q4      
     Min.   :1.00   Min.   :2.000   Min.   :1.00  
     1st Qu.:1.00   1st Qu.:4.000   1st Qu.:2.50  
     Median :2.50   Median :4.000   Median :3.50  
     Mean   :2.75   Mean   :4.143   Mean   :3.25  
     3rd Qu.:4.25   3rd Qu.:5.000   3rd Qu.:4.25  
     Max.   :5.00   Max.   :5.000   Max.   :5.00  
                    NA's   :1                     

     

    • 변수 라벨로써 실행하기 위해서 긴 변수 이름을 할당.
    R Programming
    %%R
    
    names(withmooc) <- c("id","Workshop","Gender",
                       "The instructor was well prepared.",
                       "The instructor communicated well.",
                       "The course materials were helpful.",
                       "Overall, I found this workshop useful.")
    
    names(withmooc)

     

    Results
    [1] "id"                                    
    [2] "Workshop"                              
    [3] "Gender"                                
    [4] "The instructor was well prepared."     
    [5] "The instructor communicated well."     
    [6] "The course materials were helpful."    
    [7] "Overall, I found this workshop useful."

     

    • 변수 라벨로 실행하기 위해서 긴 변수 이름을 할당.
    • 위의 예제는 모든 변수명을 할당하였으나, 여기서는 부분 변수명을 할당.
    R Programming
    %%R
    
    names(withmooc)[4:7] <- c("The instructor was well prepared.",
                            "The instructor communicated well.",
                            "The course materials were helpful.",
                            "Overall, I found this workshop useful.")
    
    names(withmooc)

     

    Results
    [1] "id"                                    
    [2] "Workshop"                              
    [3] "Gender"                                
    [4] "The instructor was well prepared."     
    [5] "The instructor communicated well."     
    [6] "The course materials were helpful."    
    [7] "Overall, I found this workshop useful."

     

    • Summary 하면, R함수는 긴 변수명을 사용.
    R Programming
    %%R
    
    summary(withmooc)

     

    Results
           id         Workshop            Gender         
     Min.   :1.00   Length:8           Length:8          
     1st Qu.:2.75   Class :character   Class :character  
     Median :4.50   Mode  :character   Mode  :character  
     Mean   :4.50                                        
     3rd Qu.:6.25                                        
     Max.   :8.00                                        
    
     The instructor was well prepared. The instructor communicated well.
     Min.   :1.00                      Min.   :1.00                     
     1st Qu.:2.00                      1st Qu.:1.00                     
     Median :3.50                      Median :2.50                     
     Mean   :3.25                      Mean   :2.75                     
     3rd Qu.:4.25                      3rd Qu.:4.25                     
     Max.   :5.00                      Max.   :5.00                     
    
     The course materials were helpful. Overall, I found this workshop useful.
     Min.   :2.000                      Min.   :1.00                          
     1st Qu.:4.000                      1st Qu.:2.50                          
     Median :4.000                      Median :3.50                          
     Mean   :4.143                      Mean   :3.25                          
     3rd Qu.:5.000                      3rd Qu.:4.25                          
     Max.   :5.000                      Max.   :5.00                          
     NA's   :1                                                                

     

    • 변수명에 의해 변수를 선택할 수 있다.
    R Programming
    %%R
    
    summary( withmooc["Overall, I found this workshop useful."] )

     

    Results
     Overall, I found this workshop useful.
     Min.   :1.00                          
     1st Qu.:2.50                          
     Median :3.50                          
     Mean   :3.25                          
     3rd Qu.:4.25                          
     Max.   :5.00                          

     

    • 인덱스를 이용하여 선택하는 것이 더 편하다.
    R Programming
    %%R
    
    summary(withmooc[4:7])

     

    Results
     The instructor was well prepared. The instructor communicated well.
     Min.   :1.00                      Min.   :1.00                     
     1st Qu.:2.00                      1st Qu.:1.00                     
     Median :3.50                      Median :2.50                     
     Mean   :3.25                      Mean   :2.75                     
     3rd Qu.:4.25                      3rd Qu.:4.25                     
     Max.   :5.00                      Max.   :5.00                     
    
     The course materials were helpful. Overall, I found this workshop useful.
     Min.   :2.000                      Min.   :1.00                          
     1st Qu.:4.000                      1st Qu.:2.50                          
     Median :4.000                      Median :3.50                          
     Mean   :4.143                      Mean   :3.25                          
     3rd Qu.:5.000                      3rd Qu.:4.25                          
     Max.   :5.000                      Max.   :5.00                          
     NA's   :1                                                                

     

    • Grep함수를 이용하여 열 명에서 문자열에 대한 검색을 할 수 있다.
    R Programming
    %%R
    
    myvars<-grep('instructor',names(withmooc))

     

    • ‘instructor’을 소유한 변수 3과 4를 출력.
    R Programming
    %%R
    
    myvars

     

    Results
    [1] 4 5

     

     

    R Programming
    %%R
    
    summary ( withmooc[myvars] )

     

    Results
     The instructor was well prepared. The instructor communicated well.
     Min.   :1.00                      Min.   :1.00                     
     1st Qu.:2.00                      1st Qu.:1.00                     
     Median :3.50                      Median :2.50                     
     Mean   :3.25                      Mean   :2.75                     
     3rd Qu.:4.25                      3rd Qu.:4.25                     
     Max.   :5.00                      Max.   :5.00                     

     


    5. R - Tidyverse

     


    6. Python - Pandas

     


    7. Python - dfply

     


     


     

    통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply)
    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크
    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]
    SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크

     

    반응형

    댓글