본문 바로가기
통계프로그램 비교 시리즈/데이터 전처리 비교

[데이터 관리] 7. 변수 Keeping과 Dropping

by 기서무나구물 2022. 1. 10.

[데이터 관리] 7. 변수 Keeping과 Dropping

 


1. Proc SQL

 

  • 변수 Keeping과 Droping을 위한 PROC SQL 프로그램.
SAS Programming
* Select 문에서 변수 선택;

proc sql;
  create table withmooc as
    select id,
           workshop,
           gender,
           q1,
           q2
    from   BACK.mydata a;

  select * from withmooc;
quit;



* 입력데이터의 변수 선택; 

proc sql;
  create table withmooc as
    select a.*
    from   BACK.mydata(KEEP =id workshop gender q1 q2) a;

  select * from withmooc;
quit;



* 출력 데이터의 변수 선택;

proc sql;
  create table withmooc(KEEP =id workshop gender q1 q2) as
    select a.*
    from   BACK.mydata a;

  select * from withmooc;
quit;



* 입력데이터의 변수 삭제;

proc sql;
  create table withmooc as
    select a.*
    from   BACK.mydata a(DROP=q3 q4);

  select * from withmooc;
quit;

 

Results
id  workshop  gender          q1        q2
------------------------------------------
 1         1  f                1         1
 2         2  f                2         1
 3         1  f                2         2
 4         2  f                3         1
 5         1  m                4         5
 6         2  m                5         4
 7         1  m                5         3
 8         2  m                4         5

 


2. SAS Programming

 

keep 함수와 drop 함수

변수 Keeping과 Droping을 위한 SAS프로그램.

SAS Programming
* KEEP 구문을 사용하여 myleft 데이터 세트에 출력하기 위한 변수 선택;

DATA withmooc;
 SET mydata;
     KEEP id workshop gender q1 q2;
RUN;



DATA withmooc;
 SET mydata(KEEP =id workshop gender q1 q2);
RUN;



DATA withmooc(KEEP =id workshop gender q1 q2);
 SET mydata;
RUN;

proc print;run;



* DROP 구문을 사용하여 변수 삭제;

DATA withmooc;
 SET mydata;
     DROP q3 q4;
RUN;

 

Results
OBS    id    workshop    gender    q1    q2
 1      1        1         f        1     1
 2      2        2         f        2     1
 3      3        1         f        2     2
 4      4        2         f        3     1
 5      5        1         m        4     5
 6      6        2         m        5     4
 7      7        1         m        5     3
 8      8        2         m        4     5

 


3. SPSS

delete 함수

변수 Keeping과 Droping을 위한 SPSS 프로그램.;

SPSS Programming
GET FILE='C:\mydata.sav'.
DELETE VARIABLES q3 to q4.
SAVE OUTFILE='C:\myleft.sav'.
EXECUTE.

 


4. R Programming (R-PROJECT)

 

R Programming
from rpy2.robjects import r
%load_ext rpy2.ipython

 

Results
The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

 


 

R Programming
%%R

library(tidyverse)
load(file="C:/work/data/mydata.Rdata")

withmooc = mydata
attach(withmooc) # mydata를 기본 데이터 세트로 지정.

withmooc

 

Results
R[write to console]: The following objects are masked from withmooc (pos = 3):

    gender, q1, q2, q3, q4, workshop




  workshop gender q1 q2 q3 q4
1        1      f  1  1  5  1
2        2      f  2  1  4  1
3        1      f  2  2  4  3
4        2      f  3  1 NA  3
5        1      m  4  5  2  4
6        2      m  5  4  5  5
7        1      m  5  3  4  4
8        2      m  4  5  5  5

 


  • 변수 Keeping 과 Droping을 위한 R-Project 프로그램.;
R Programming
%%R

# 원본 데이터 보호를 위해서 데이터 복사.
mysubset<-withmooc

# q3 과 q4 변수 삭제.
mysubset$q3 <- mysubset$q4 <- NULL

mysubset

 

Results
  workshop gender q1 q2
1        1      f  1  1
2        2      f  2  1
3        1      f  2  2
4        2      f  3  1
5        1      m  4  5
6        2      m  5  4
7        1      m  5  3
8        2      m  4  5

 


5. R - Tidyverse

 

R Programming
from rpy2.robjects import r
%load_ext rpy2.ipython

 

Results
The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

 

 

R Programming
%%R

library(tidyverse)
load(file="C:/work/data/mydata.Rdata")

withmooc = mydata

attach(withmooc) # mydata를 기본 데이터 세트로 지정.

withmooc

 

Results
R[write to console]: The following objects are masked from withmooc (pos = 3):

    gender, q1, q2, q3, q4, workshop


R[write to console]: The following objects are masked from withmooc (pos = 4):

    gender, q1, q2, q3, q4, workshop




  workshop gender q1 q2 q3 q4
1        1      f  1  1  5  1
2        2      f  2  1  4  1
3        1      f  2  2  4  3
4        2      f  3  1 NA  3
5        1      m  4  5  2  4
6        2      m  5  4  5  5
7        1      m  5  3  4  4
8        2      m  4  5  5  5

 


select() 함수

 

R Programming
%%R

withmooc %>%
  dplyr::select(q1,q2,q3,q4)

 

Results
  q1 q2 q3 q4
1  1  1  5  1
2  2  1  4  1
3  2  2  4  3
4  3  1 NA  3
5  4  5  2  4
6  5  4  5  5
7  5  3  4  4
8  4  5  5  5

 

 

R Programming
%%R

withmooc %>%
  dplyr::select(-q3,-q4)

 

Results
  workshop gender q1 q2
1        1      f  1  1
2        2      f  2  1
3        1      f  2  2
4        2      f  3  1
5        1      m  4  5
6        2      m  5  4
7        1      m  5  3
8        2      m  4  5

 

 

R Programming
%%R

withmooc = mydata

withmooc %>%
  dplyr::select(-(5:6))

 

Results
  workshop gender q1 q2
1        1      f  1  1
2        2      f  2  1
3        1      f  2  2
4        2      f  3  1
5        1      m  4  5
6        2      m  5  4
7        1      m  5  3
8        2      m  4  5

 


6. Python - Pandas

 

Python Programming
import pandas as pd
import numpy as np
import sweetviz as sv

mydata   = pd.read_csv("C:/work/data/mydata.csv",sep=",",
                       dtype={'id':object,'workshop':object,
                              'q1':int, 'q2':int, 'q3':float, 'q4':int},
                       na_values=['NaN'],skipinitialspace =True)

withmooc= mydata.copy()

withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4
0	1	1		f	1	1	5.0	1
1	2	2		f	2	1	4.0	1
2	3	1		f	2	2	4.0	3
3	4	2		f	3	1	NaN	3
4	5	1		m	4	5	2.0	4
5	6	2		m	5	4	5.0	5
6	7	1		m	5	3	4.0	4
7	8	2		m	4	5	5.0	5
 

변수 Keeping 과 Droping을 위한 Pandas프로그램

  • del 함수

Del함수를 이용하여 q3와 q4변수를 삭제

hon Programming
del withmooc['q3']
del withmooc['q4']
withmooc

 

Results
	id	workshop	gender	q1	q2
0	1	1		f	1	1
1	2	2		f	2	1
2	3	1		f	2	2
3	4	2		f	3	1
4	5	1		m	4	5
5	6	2		m	5	4
6	7	1		m	5	3
7	8	2		m	4	5

 


  • pop 함수

Pop함수를 이용하여 q3와 q4변수를 삭제.

Python Programming
withmooc= mydata.copy()
withmooc

mydata_q3 =  withmooc.pop('q3') # 삭제된 데이터를 별도로 반환하여 저장 가능
# mydata1.pop('guys')
print(mydata_q3)

mydata_q4 =  withmooc.pop('q4') # 삭제된 데이터를 별도로 반환하여 저장 가능
# mydata1.pop('gals') 
print(mydata_q4)

withmooc

 

Results
0    5.0
1    4.0
2    4.0
3    NaN
4    2.0
5    5.0
6    4.0
7    5.0
Name: q3, dtype: float64


0    1
1    1
2    3
3    3
4    4
5    5
6    4
7    5
Name: q4, dtype: int32

 

Results
	id	workshop	gender	q1	q2
0	1	1		f	1	1
1	2	2		f	2	1
2	3	1		f	2	2
3	4	2		f	3	1
4	5	1		m	4	5
5	6	2		m	5	4
6	7	1		m	5	3
7	8	2		m	4	5

 


  • Drop 구문
Python Programming
withmooc= mydata.copy()
withmooc

withmooc.drop(['q3','q4'], axis='columns', inplace=True)
withmooc

 

Results
	id	workshop	gender	q1	q2
0	1	1		f	1	1
1	2	2		f	2	1
2	3	1		f	2	2
3	4	2		f	3	1
4	5	1		m	4	5
5	6	2		m	5	4
6	7	1		m	5	3
7	8	2		m	4	5
 

 

 

Python Programming
withmooc= mydata.copy()
withmooc

withmooc = withmooc.drop(['q3','q4'], axis=1)
withmooc

 

Results
	id	workshop	gender	q1	q2
0	1	1		f	1	1
1	2	2		f	2	1
2	3	1		f	2	2
3	4	2		f	3	1
4	5	1		m	4	5
5	6	2		m	5	4
6	7	1		m	5	3
7	8	2		m	4	5
 
 
  • 열 번호 지정하여 drop 사용하기
Python Programming
withmooc= mydata.copy()
withmooc

withmooc.drop(withmooc.columns[[5, 6]], axis='columns', inplace=True)
withmooc

 

Results
	id	workshop	gender	q1	q2
0	1	1		f	1	1
1	2	2		f	2	1
2	3	1		f	2	2
3	4	2		f	3	1
4	5	1		m	4	5
5	6	2		m	5	4
6	7	1		m	5	3
7	8	2		m	4	5

 


7. Python - dfply

 

Python Programming
import pandas as pd
from dfply import *

mydata   = pd.read_csv("c:/work/data/mydata.csv",sep=",",
                  dtype={'id':object,'workshop':object,
                                                   'q1':int, 'q2':int, 'q3':float, 'q4':int},
                  na_values=['NaN'],skipinitialspace =True)


withmooc= mydata.copy()

# 모든 변수 선택하기.
withmooc

 

Results
	id	workshop	gender	q1	q2	q3	q4
0	1	1		f	1	1	5.0	1
1	2	2		f	2	1	4.0	1
2	3	1		f	2	2	4.0	3
3	4	2		f	3	1	NaN	3
4	5	1		m	4	5	2.0	4
5	6	2		m	5	4	5.0	5
6	7	1		m	5	3	4.0	4
7	8	2		m	4	5	5.0	5
 

 

Python Programming
withmooc >> select(~X.q3,~X.q4)

 

Results
	id	workshop	gender	q1	q2
0	1	1		f	1	1
1	2	2		f	2	1
2	3	1		f	2	2
3	4	2		f	3	1
4	5	1		m	4	5
5	6	2		m	5	4
6	7	1		m	5	3
7	8	2		m	4	5
 

 

  • drop() 함수
Python Programming
withmooc= mydata.copy()

# 모든 변수 선택하기.
withmooc

withmooc >> drop(X.q3,X.q4)

 

Results
	id	workshop	gender	q1	q2
0	1	1		f	1	1
1	2	2		f	2	1
2	3	1		f	2	2
3	4	2		f	3	1
4	5	1		m	4	5
5	6	2		m	5	4
6	7	1		m	5	3
7	8	2		m	4	5
 

 

 


 

통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply)
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]
SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크
반응형

댓글