본문 바로가기
통계프로그램 비교 시리즈/데이터 전처리 비교

[데이터 관리] 7. 변수 Keeping과 Dropping

by 기서무나구물 2022. 1. 10.

포스팅 목차

    [데이터 관리] 7. 변수 Keeping과 Dropping

     


    1. Proc SQL

     

    • 변수 Keeping과 Droping을 위한 PROC SQL 프로그램.
    SAS Programming
    * Select 문에서 변수 선택;
    
    proc sql;
      create table withmooc as
        select id,
               workshop,
               gender,
               q1,
               q2
        from   BACK.mydata a;
    
      select * from withmooc;
    quit;
    
    
    
    * 입력데이터의 변수 선택; 
    
    proc sql;
      create table withmooc as
        select a.*
        from   BACK.mydata(KEEP =id workshop gender q1 q2) a;
    
      select * from withmooc;
    quit;
    
    
    
    * 출력 데이터의 변수 선택;
    
    proc sql;
      create table withmooc(KEEP =id workshop gender q1 q2) as
        select a.*
        from   BACK.mydata a;
    
      select * from withmooc;
    quit;
    
    
    
    * 입력데이터의 변수 삭제;
    
    proc sql;
      create table withmooc as
        select a.*
        from   BACK.mydata a(DROP=q3 q4);
    
      select * from withmooc;
    quit;

     

    Results
    id  workshop  gender          q1        q2
    ------------------------------------------
     1         1  f                1         1
     2         2  f                2         1
     3         1  f                2         2
     4         2  f                3         1
     5         1  m                4         5
     6         2  m                5         4
     7         1  m                5         3
     8         2  m                4         5

     


    2. SAS Programming

     

    keep 함수와 drop 함수

    변수 Keeping과 Droping을 위한 SAS프로그램.

    SAS Programming
    * KEEP 구문을 사용하여 myleft 데이터 세트에 출력하기 위한 변수 선택;
    
    DATA withmooc;
     SET mydata;
         KEEP id workshop gender q1 q2;
    RUN;
    
    
    
    DATA withmooc;
     SET mydata(KEEP =id workshop gender q1 q2);
    RUN;
    
    
    
    DATA withmooc(KEEP =id workshop gender q1 q2);
     SET mydata;
    RUN;
    
    proc print;run;
    
    
    
    * DROP 구문을 사용하여 변수 삭제;
    
    DATA withmooc;
     SET mydata;
         DROP q3 q4;
    RUN;

     

    Results
    OBS    id    workshop    gender    q1    q2
     1      1        1         f        1     1
     2      2        2         f        2     1
     3      3        1         f        2     2
     4      4        2         f        3     1
     5      5        1         m        4     5
     6      6        2         m        5     4
     7      7        1         m        5     3
     8      8        2         m        4     5

     


    3. SPSS

    delete 함수

    변수 Keeping과 Droping을 위한 SPSS 프로그램.;

    SPSS Programming
    GET FILE='C:\mydata.sav'.
    DELETE VARIABLES q3 to q4.
    SAVE OUTFILE='C:\myleft.sav'.
    EXECUTE.

     


    4. R Programming (R-PROJECT)

     

    R Programming
    from rpy2.robjects import r
    %load_ext rpy2.ipython

     

    Results
    The rpy2.ipython extension is already loaded. To reload it, use:
      %reload_ext rpy2.ipython

     


     

    R Programming
    %%R
    
    library(tidyverse)
    load(file="C:/work/data/mydata.Rdata")
    
    withmooc = mydata
    attach(withmooc) # mydata를 기본 데이터 세트로 지정.
    
    withmooc

     

    Results
    R[write to console]: The following objects are masked from withmooc (pos = 3):
    
        gender, q1, q2, q3, q4, workshop
    
    
    
    
      workshop gender q1 q2 q3 q4
    1        1      f  1  1  5  1
    2        2      f  2  1  4  1
    3        1      f  2  2  4  3
    4        2      f  3  1 NA  3
    5        1      m  4  5  2  4
    6        2      m  5  4  5  5
    7        1      m  5  3  4  4
    8        2      m  4  5  5  5

     


    • 변수 Keeping 과 Droping을 위한 R-Project 프로그램.;
    R Programming
    %%R
    
    # 원본 데이터 보호를 위해서 데이터 복사.
    mysubset<-withmooc
    
    # q3 과 q4 변수 삭제.
    mysubset$q3 <- mysubset$q4 <- NULL
    
    mysubset

     

    Results
      workshop gender q1 q2
    1        1      f  1  1
    2        2      f  2  1
    3        1      f  2  2
    4        2      f  3  1
    5        1      m  4  5
    6        2      m  5  4
    7        1      m  5  3
    8        2      m  4  5

     


    5. R - Tidyverse

     

    R Programming
    from rpy2.robjects import r
    %load_ext rpy2.ipython

     

    Results
    The rpy2.ipython extension is already loaded. To reload it, use:
      %reload_ext rpy2.ipython

     

     

    R Programming
    %%R
    
    library(tidyverse)
    load(file="C:/work/data/mydata.Rdata")
    
    withmooc = mydata
    
    attach(withmooc) # mydata를 기본 데이터 세트로 지정.
    
    withmooc

     

    Results
    R[write to console]: The following objects are masked from withmooc (pos = 3):
    
        gender, q1, q2, q3, q4, workshop
    
    
    R[write to console]: The following objects are masked from withmooc (pos = 4):
    
        gender, q1, q2, q3, q4, workshop
    
    
    
    
      workshop gender q1 q2 q3 q4
    1        1      f  1  1  5  1
    2        2      f  2  1  4  1
    3        1      f  2  2  4  3
    4        2      f  3  1 NA  3
    5        1      m  4  5  2  4
    6        2      m  5  4  5  5
    7        1      m  5  3  4  4
    8        2      m  4  5  5  5

     


    select() 함수

     

    R Programming
    %%R
    
    withmooc %>%
      dplyr::select(q1,q2,q3,q4)

     

    Results
      q1 q2 q3 q4
    1  1  1  5  1
    2  2  1  4  1
    3  2  2  4  3
    4  3  1 NA  3
    5  4  5  2  4
    6  5  4  5  5
    7  5  3  4  4
    8  4  5  5  5

     

     

    R Programming
    %%R
    
    withmooc %>%
      dplyr::select(-q3,-q4)

     

    Results
      workshop gender q1 q2
    1        1      f  1  1
    2        2      f  2  1
    3        1      f  2  2
    4        2      f  3  1
    5        1      m  4  5
    6        2      m  5  4
    7        1      m  5  3
    8        2      m  4  5

     

     

    R Programming
    %%R
    
    withmooc = mydata
    
    withmooc %>%
      dplyr::select(-(5:6))

     

    Results
      workshop gender q1 q2
    1        1      f  1  1
    2        2      f  2  1
    3        1      f  2  2
    4        2      f  3  1
    5        1      m  4  5
    6        2      m  5  4
    7        1      m  5  3
    8        2      m  4  5

     


    6. Python - Pandas

     

    Python Programming
    import pandas as pd
    import numpy as np
    import sweetviz as sv
    
    mydata   = pd.read_csv("C:/work/data/mydata.csv",sep=",",
                           dtype={'id':object,'workshop':object,
                                  'q1':int, 'q2':int, 'q3':float, 'q4':int},
                           na_values=['NaN'],skipinitialspace =True)
    
    withmooc= mydata.copy()
    
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	1		f	1	1	5.0	1
    1	2	2		f	2	1	4.0	1
    2	3	1		f	2	2	4.0	3
    3	4	2		f	3	1	NaN	3
    4	5	1		m	4	5	2.0	4
    5	6	2		m	5	4	5.0	5
    6	7	1		m	5	3	4.0	4
    7	8	2		m	4	5	5.0	5
     

    변수 Keeping 과 Droping을 위한 Pandas프로그램

    • del 함수

    Del함수를 이용하여 q3와 q4변수를 삭제

    hon Programming
    del withmooc['q3']
    del withmooc['q4']
    withmooc

     

    Results
    	id	workshop	gender	q1	q2
    0	1	1		f	1	1
    1	2	2		f	2	1
    2	3	1		f	2	2
    3	4	2		f	3	1
    4	5	1		m	4	5
    5	6	2		m	5	4
    6	7	1		m	5	3
    7	8	2		m	4	5

     


    • pop 함수

    Pop함수를 이용하여 q3와 q4변수를 삭제.

    Python Programming
    withmooc= mydata.copy()
    withmooc
    
    mydata_q3 =  withmooc.pop('q3') # 삭제된 데이터를 별도로 반환하여 저장 가능
    # mydata1.pop('guys')
    print(mydata_q3)
    
    mydata_q4 =  withmooc.pop('q4') # 삭제된 데이터를 별도로 반환하여 저장 가능
    # mydata1.pop('gals') 
    print(mydata_q4)
    
    withmooc

     

    Results
    0    5.0
    1    4.0
    2    4.0
    3    NaN
    4    2.0
    5    5.0
    6    4.0
    7    5.0
    Name: q3, dtype: float64
    
    
    0    1
    1    1
    2    3
    3    3
    4    4
    5    5
    6    4
    7    5
    Name: q4, dtype: int32

     

    Results
    	id	workshop	gender	q1	q2
    0	1	1		f	1	1
    1	2	2		f	2	1
    2	3	1		f	2	2
    3	4	2		f	3	1
    4	5	1		m	4	5
    5	6	2		m	5	4
    6	7	1		m	5	3
    7	8	2		m	4	5

     


    • Drop 구문
    Python Programming
    withmooc= mydata.copy()
    withmooc
    
    withmooc.drop(['q3','q4'], axis='columns', inplace=True)
    withmooc

     

    Results
    	id	workshop	gender	q1	q2
    0	1	1		f	1	1
    1	2	2		f	2	1
    2	3	1		f	2	2
    3	4	2		f	3	1
    4	5	1		m	4	5
    5	6	2		m	5	4
    6	7	1		m	5	3
    7	8	2		m	4	5
     

     

     

    Python Programming
    withmooc= mydata.copy()
    withmooc
    
    withmooc = withmooc.drop(['q3','q4'], axis=1)
    withmooc

     

    Results
    	id	workshop	gender	q1	q2
    0	1	1		f	1	1
    1	2	2		f	2	1
    2	3	1		f	2	2
    3	4	2		f	3	1
    4	5	1		m	4	5
    5	6	2		m	5	4
    6	7	1		m	5	3
    7	8	2		m	4	5
     
     
    • 열 번호 지정하여 drop 사용하기
    Python Programming
    withmooc= mydata.copy()
    withmooc
    
    withmooc.drop(withmooc.columns[[5, 6]], axis='columns', inplace=True)
    withmooc

     

    Results
    	id	workshop	gender	q1	q2
    0	1	1		f	1	1
    1	2	2		f	2	1
    2	3	1		f	2	2
    3	4	2		f	3	1
    4	5	1		m	4	5
    5	6	2		m	5	4
    6	7	1		m	5	3
    7	8	2		m	4	5

     


    7. Python - dfply

     

    Python Programming
    import pandas as pd
    from dfply import *
    
    mydata   = pd.read_csv("c:/work/data/mydata.csv",sep=",",
                      dtype={'id':object,'workshop':object,
                                                       'q1':int, 'q2':int, 'q3':float, 'q4':int},
                      na_values=['NaN'],skipinitialspace =True)
    
    
    withmooc= mydata.copy()
    
    # 모든 변수 선택하기.
    withmooc

     

    Results
    	id	workshop	gender	q1	q2	q3	q4
    0	1	1		f	1	1	5.0	1
    1	2	2		f	2	1	4.0	1
    2	3	1		f	2	2	4.0	3
    3	4	2		f	3	1	NaN	3
    4	5	1		m	4	5	2.0	4
    5	6	2		m	5	4	5.0	5
    6	7	1		m	5	3	4.0	4
    7	8	2		m	4	5	5.0	5
     

     

    Python Programming
    withmooc >> select(~X.q3,~X.q4)

     

    Results
    	id	workshop	gender	q1	q2
    0	1	1		f	1	1
    1	2	2		f	2	1
    2	3	1		f	2	2
    3	4	2		f	3	1
    4	5	1		m	4	5
    5	6	2		m	5	4
    6	7	1		m	5	3
    7	8	2		m	4	5
     

     

    • drop() 함수
    Python Programming
    withmooc= mydata.copy()
    
    # 모든 변수 선택하기.
    withmooc
    
    withmooc >> drop(X.q3,X.q4)

     

    Results
    	id	workshop	gender	q1	q2
    0	1	1		f	1	1
    1	2	2		f	2	1
    2	3	1		f	2	2
    3	4	2		f	3	1
    4	5	1		m	4	5
    5	6	2		m	5	4
    6	7	1		m	5	3
    7	8	2		m	4	5
     

     

     


     

    통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply)
    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크
    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]
    SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크
    반응형

    댓글