포스팅 목차
통계프로그램 비교 시리즈 – 15. 변수 라벨(Variable Labels)
1. Proc SQL
SAS Programming |
proc sql;
select id,
workshop ,
gender ,
q1 label='The instructor was well prepared.',
q2 label='The instructor communicated well.',
q3 label='The course materials were helpful.',
q4 label='Overall, I found this workshop useful.'
from BACK.mydata;
quit;
Results |
The Overall,
The The course I found
instructor instructor materials this
was well communicated were workshop
id workshop gender prepared. well. helpful. useful.
---------------------------------------------------------------------
1 1 f 1 1 5 1
2 2 f 2 1 4 1
3 1 f 2 2 4 3
4 2 f 3 1 . 3
5 1 m 4 5 2 4
6 2 m 5 4 5 5
7 1 m 5 3 4 4
8 2 m 4 5 5 5
2. SAS Programming
- SAS Program for Variable Labels;
SAS Programming |
DATA mydata;
SET BACK.mydata ;
LABEL Q1="문제1"
Q2="문제2"
Q3="문제3"
Q4="문제4";
run;
PROC FREQ;
TABLES q1-q4;
RUN;
Results |
문제1
누적 누적
q1 빈도 백분율 빈도 백분율
----------------------------------------------
1 1 12.50 1 12.50
2 2 25.00 3 37.50
3 1 12.50 4 50.00
4 2 25.00 6 75.00
5 2 25.00 8 100.00
문제2
누적 누적
q2 빈도 백분율 빈도 백분율
----------------------------------------------
1 3 37.50 3 37.50
2 1 12.50 4 50.00
3 1 12.50 5 62.50
4 1 12.50 6 75.00
5 2 25.00 8 100.00
3. SPSS
- SPSS Program for Variable Labels.
SPSS Programming |
VARIABLE LABELS
Q1 "문제1"
Q2 "문제2"
Q3 "문제3"
Q4 "문제4".
FREQUENCIES VARIABLES=q1 q2 q3 q4.
EXECUTE.
4. R Programming (R-PROJECT)
R Programming |
from rpy2.robjects import r
%load_ext rpy2.ipython
Results |
The rpy2.ipython extension is already loaded. To reload it, use:
%reload_ext rpy2.ipython
R Programming |
%%R
library(tidyverse)
library(psych)
library(Hmisc)
mydata <- read_csv("C:/work/data/mydata.csv",
col_types = cols( id = col_double(),
workshop = col_character(),
gender = col_character(),
q1 = col_double(),
q2 = col_double(),
q3 = col_double(),
q4 = col_double()
)
)
withmooc = mydata
attach(withmooc) # mydata를 기본 데이터 세트로 지정.
withmooc
Results |
R[write to console]: The following objects are masked from withmooc (pos = 3):
gender, id, q1, q2, q3, q4, workshop
# A tibble: 8 x 7
id workshop gender q1 q2 q3 q4
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 1 f 1 1 5 1
2 2 2 f 2 1 4 1
3 3 1 f 2 2 4 3
4 4 2 f 3 1 NA 3
5 5 1 m 4 5 2 4
6 6 2 m 5 4 5 5
7 7 1 m 5 3 4 4
8 8 2 m 4 5 5 5
- 변수 라벨을 위한 R-Project 프로그램.
- Hmisc의 Label함수 이용.
R Programming |
%%R
label(withmooc$q1)<-"The instructor was well prepared."
label(withmooc$q2)<-"The instructor communicated well."
label(withmooc$q3)<-"The course materials were helpful."
label(withmooc$q4)<- "Overall, I found this workshop useful."
withmooc
Results |
# A tibble: 8 x 7
id workshop gender q1 q2 q3 q4
<dbl> <chr> <chr> <labelled> <labelled> <labelled> <labelled>
1 1 1 f 1 1 5 1
2 2 2 f 2 1 4 1
3 3 1 f 2 2 4 3
4 4 2 f 3 1 NA 3
5 5 1 m 4 5 2 4
6 6 2 m 5 4 5 5
7 7 1 m 5 3 4 4
8 8 2 m 4 5 5 5
- Hmisc의 describe함수는 변수 라벨을 이용.
R Programming |
%%R
Hmisc::describe(withmooc)
Results |
withmooc
7 Variables 8 Observations
--------------------------------------------------------------------------------
id
n missing distinct Info Mean Gmd
8 0 8 1 4.5 3
lowest : 1 2 3 4 5, highest: 4 5 6 7 8
Value 1 2 3 4 5 6 7 8
Frequency 1 1 1 1 1 1 1 1
Proportion 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
--------------------------------------------------------------------------------
workshop
n missing distinct
8 0 2
Value 1 2
Frequency 4 4
Proportion 0.5 0.5
--------------------------------------------------------------------------------
gender
n missing distinct
8 0 2
Value f m
Frequency 4 4
Proportion 0.5 0.5
--------------------------------------------------------------------------------
q1 : The instructor was well prepared.
n missing distinct Info Mean Gmd
8 0 5 0.964 3.25 1.786
lowest : 1 2 3 4 5, highest: 1 2 3 4 5
Value 1 2 3 4 5
Frequency 1 2 1 2 2
Proportion 0.125 0.250 0.125 0.250 0.250
--------------------------------------------------------------------------------
q2 : The instructor communicated well.
n missing distinct Info Mean Gmd
8 0 5 0.94 2.75 2.071
lowest : 1 2 3 4 5, highest: 1 2 3 4 5
Value 1 2 3 4 5
Frequency 3 1 1 1 2
Proportion 0.375 0.125 0.125 0.125 0.250
--------------------------------------------------------------------------------
q3 : The course materials were helpful.
n missing distinct Info Mean Gmd
7 1 3 0.857 4.143 1.143
Value 2 4 5
Frequency 1 3 3
Proportion 0.143 0.429 0.429
--------------------------------------------------------------------------------
q4 : Overall, I found this workshop useful.
n missing distinct Info Mean Gmd
8 0 4 0.952 3.25 1.857
Value 1 3 4 5
Frequency 2 2 2 2
Proportion 0.25 0.25 0.25 0.25
--------------------------------------------------------------------------------
- Summary 함수는 라벨을 무시한다.
R Programming |
%%R
summary(withmooc)
Results |
id workshop gender q1
Min. :1.00 Length:8 Length:8 Min. :1.00
1st Qu.:2.75 Class :character Class :character 1st Qu.:2.00
Median :4.50 Mode :character Mode :character Median :3.50
Mean :4.50 Mean :3.25
3rd Qu.:6.25 3rd Qu.:4.25
Max. :8.00 Max. :5.00
q2 q3 q4
Min. :1.00 Min. :2.000 Min. :1.00
1st Qu.:1.00 1st Qu.:4.000 1st Qu.:2.50
Median :2.50 Median :4.000 Median :3.50
Mean :2.75 Mean :4.143 Mean :3.25
3rd Qu.:4.25 3rd Qu.:5.000 3rd Qu.:4.25
Max. :5.00 Max. :5.000 Max. :5.00
NA's :1
- 변수 라벨로써 실행하기 위해서 긴 변수 이름을 할당.
R Programming |
%%R
names(withmooc) <- c("id","Workshop","Gender",
"The instructor was well prepared.",
"The instructor communicated well.",
"The course materials were helpful.",
"Overall, I found this workshop useful.")
names(withmooc)
Results |
[1] "id"
[2] "Workshop"
[3] "Gender"
[4] "The instructor was well prepared."
[5] "The instructor communicated well."
[6] "The course materials were helpful."
[7] "Overall, I found this workshop useful."
- 변수 라벨로 실행하기 위해서 긴 변수 이름을 할당.
- 위의 예제는 모든 변수명을 할당하였으나, 여기서는 부분 변수명을 할당.
R Programming |
%%R
names(withmooc)[4:7] <- c("The instructor was well prepared.",
"The instructor communicated well.",
"The course materials were helpful.",
"Overall, I found this workshop useful.")
names(withmooc)
Results |
[1] "id"
[2] "Workshop"
[3] "Gender"
[4] "The instructor was well prepared."
[5] "The instructor communicated well."
[6] "The course materials were helpful."
[7] "Overall, I found this workshop useful."
- Summary 하면, R함수는 긴 변수명을 사용.
R Programming |
%%R
summary(withmooc)
Results |
id Workshop Gender
Min. :1.00 Length:8 Length:8
1st Qu.:2.75 Class :character Class :character
Median :4.50 Mode :character Mode :character
Mean :4.50
3rd Qu.:6.25
Max. :8.00
The instructor was well prepared. The instructor communicated well.
Min. :1.00 Min. :1.00
1st Qu.:2.00 1st Qu.:1.00
Median :3.50 Median :2.50
Mean :3.25 Mean :2.75
3rd Qu.:4.25 3rd Qu.:4.25
Max. :5.00 Max. :5.00
The course materials were helpful. Overall, I found this workshop useful.
Min. :2.000 Min. :1.00
1st Qu.:4.000 1st Qu.:2.50
Median :4.000 Median :3.50
Mean :4.143 Mean :3.25
3rd Qu.:5.000 3rd Qu.:4.25
Max. :5.000 Max. :5.00
NA's :1
- 변수명에 의해 변수를 선택할 수 있다.
R Programming |
%%R
summary( withmooc["Overall, I found this workshop useful."] )
Results |
Overall, I found this workshop useful.
Min. :1.00
1st Qu.:2.50
Median :3.50
Mean :3.25
3rd Qu.:4.25
Max. :5.00
- 인덱스를 이용하여 선택하는 것이 더 편하다.
R Programming |
%%R
summary(withmooc[4:7])
Results |
The instructor was well prepared. The instructor communicated well.
Min. :1.00 Min. :1.00
1st Qu.:2.00 1st Qu.:1.00
Median :3.50 Median :2.50
Mean :3.25 Mean :2.75
3rd Qu.:4.25 3rd Qu.:4.25
Max. :5.00 Max. :5.00
The course materials were helpful. Overall, I found this workshop useful.
Min. :2.000 Min. :1.00
1st Qu.:4.000 1st Qu.:2.50
Median :4.000 Median :3.50
Mean :4.143 Mean :3.25
3rd Qu.:5.000 3rd Qu.:4.25
Max. :5.000 Max. :5.00
NA's :1
- Grep함수를 이용하여 열 명에서 문자열에 대한 검색을 할 수 있다.
R Programming |
%%R
myvars<-grep('instructor',names(withmooc))
- ‘instructor’을 소유한 변수 3과 4를 출력.
R Programming |
%%R
myvars
Results |
[1] 4 5
R Programming |
%%R
summary ( withmooc[myvars] )
Results |
The instructor was well prepared. The instructor communicated well.
Min. :1.00 Min. :1.00
1st Qu.:2.00 1st Qu.:1.00
Median :3.50 Median :2.50
Mean :3.25 Mean :2.75
3rd Qu.:4.25 3rd Qu.:4.25
Max. :5.00 Max. :5.00
5. R - Tidyverse
6. Python - Pandas
7. Python - dfply
통계프로그램 비교 목록(Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply) |
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 |
반응형
'통계프로그램 비교 시리즈 > 데이터 전처리 비교' 카테고리의 다른 글
통계프로그램 전처리 비교 (Proc sql, SAS, SPSS, R 프로그래밍, R Tidyverse, Python Pandas, Python Dfply) (0) | 2022.01.19 |
---|---|
14. 변수 특성에 따른 통계량 일괄 처리 & Value Labels Or Formats(& Measurement Level)) (0) | 2022.01.17 |
통계프로그램 비교 시리즈 - 13. 데이터 프레임 정렬과 중복제거-Sorting & duplicate (0) | 2022.01.15 |
[데이터 관리] 12. 변수를 관측치로 전치후 원상태로 복구 (0) | 2022.01.15 |
[데이터 관리] 11. Aggregating Or Summarizing 데이터 (0) | 2022.01.15 |
댓글