포스팅 목차
129. Display those employees whose first 2 characters from hire date-last 2 characters of salary?
* 입사일자에서 첫 2글자와 급여의 마지막 2글자로 구성된 신규 변수를 생성하시오.
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- substr 함수 : 69 / 129 / 189
- concat 함수 : 68 / 129 / 142 / 181
- [변수 생성] 신규 변수 생성 - 문자 추출과 문자 결합
- 데이터 전처리 - SQL, Pandas, R Prog, Dplyr, Dfply, SQLDF, PANDASQL, DATA.TABLE, SAS, Proc Sql
|
1. Oracle(오라클)
파이프 연산자를 사용하여서 입사일자의 첫 2글자와 급여의 마지막 2글자를 연결하여서 신규 문자 변수를 생성한다.
Oracle Programming |
select ename,hiredate,sal,
SUBSTR(to_char(hiredate,'yyyymmdd'),1,2)||ENAME||substr(sal,-2,2) as ename_str
from emp
2. Python Pandas(파이썬)
Python Programming |
import copy
withmooc = copy.copy(emp)
withmooc
withmooc['ename_string'] = withmooc['hiredate'].str.slice(0,2) + withmooc['ename'] + withmooc['sal'].astype(str).str.slice(-2,)
display(withmooc.head())
Results |
empno | ename | job | mgr | hiredate | sal | comm | deptno | ename_string | |
0 | 7369 | SMITH | CLERK | 7902.0 | 1980/12/17 | 800 | NaN | 20 | 19SMITH00 |
1 | 7499 | ALLEN | SALESMAN | 7698.0 | 1981/02/20 | 1600 | 300.0 | 30 | 19ALLEN00 |
2 | 7521 | WARD | SALESMAN | 7698.0 | 1981/02/22 | 1250 | 500.0 | 30 | 19WARD50 |
3 | 7566 | JONES | MANAGER | 7839.0 | 1981/04/02 | 2975 | NaN | 20 | 19JONES75 |
4 | 7654 | MARTIN | SALESMAN | 7698.0 | 1981/09/28 | 1250 | 1400.0 | 30 | 19MARTIN50 |
3. R Programming (R Package)
R Programming |
%%R
withmooc <- emp
withmooc['str_combined_1'] <- paste(base::substr(withmooc$hiredate,1,2),withmooc$ename,stringr::str_sub(withmooc$sal,-2) ,sep="")
withmooc['str_combined_2'] <- paste0(base::substr(withmooc$hiredate,1,2),'',withmooc$ename,'', stringr::str_sub(withmooc$sal,-2))
head(withmooc)
Results |
# A tibble: 6 x 10
empno ename job mgr hiredate sal comm deptno str_combined_1 str_combined_2
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr> <chr>
1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 19SMITH00 19SMITH00
2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 19ALLEN00 19ALLEN00
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 19WARD50 19WARD50
4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 19JONES75 19JONES75
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 19MARTIN50 19MARTIN50
6 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 19BLAKE50 19BLAKE50
4. R Dplyr Package
R Programming |
%%R
library(stringi)
emp %>%
dplyr::mutate(str_combined_2 = stringi::stri_join(stringr::str_sub(withmooc$hiredate,1,2),ename,stringr::str_sub(withmooc$sal,-2),sep="")) %>%
dplyr::mutate(str_combined_3 = stringr::str_c(stringr::str_sub(withmooc$hiredate,1,2),ename,stringr::str_sub(withmooc$sal,-2),sep="")) %>%
dplyr::mutate(str_combined_4 = base::paste(stringr::str_sub(withmooc$hiredate,1,2),ename,stringr::str_sub(withmooc$sal,-2),sep="")) %>%
dplyr::mutate(str_combined_5 = base::paste0(stringr::str_sub(withmooc$hiredate,1,2),"",ename,"",stringr::str_sub(withmooc$sal,-2))) %>%
head()
Results |
# A tibble: 6 x 12
empno ename job mgr hiredate sal comm deptno str_combined_2 str_combined_3 str_combined_4 str_combined_5
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 19SMITH00 19SMITH00 19SMITH00 19SMITH00
2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 19ALLEN00 19ALLEN00 19ALLEN00 19ALLEN00
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 19WARD50 19WARD50 19WARD50 19WARD50
4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 19JONES75 19JONES75 19JONES75 19JONES75
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 19MARTIN50 19MARTIN50 19MARTIN50 19MARTIN50
6 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 19BLAKE50 19BLAKE50 19BLAKE50 19BLAKE50
5. R sqldf Package
- 현재 hiredate가 “real” 형태로 저장되어 있음. / sqlite에는 날짜형이 존재 안 함 (129번 sqldf)
R Programming |
%%R
sqldf(" select hiredate,
typeof(hiredate) hire_type,
datetime(hiredate * 3600 * 24,'unixepoch') real_to_datetext,
typeof(datetime(hiredate * 3600 * 24,'unixepoch')) type2,
substr(datetime(hiredate * 3600 * 24,'unixepoch'),1,2) k,
strftime('%Y/%M/%d', datetime(hiredate * 3600 * 24,'unixepoch')) dateformat_chagne,
strftime('%Y-%M-%d', hiredate) date_error,
substr(datetime(hiredate * 3600 * 24,'unixepoch'),1,2)||ENAME||substr(cast(sal as int),-2,2) ename_1
from emp") %>% head(10)
Results |
hiredate hire_type real_to_datetext type2 k dateformat_chagne date_error ename_1
1 1980-12-17 real 1980-12-17 00:00:00 text 19 1980/00/17 -470-00-09 19SMITH00
2 1981-02-20 real 1981-02-20 00:00:00 text 19 1981/00/20 -470-00-13 19ALLEN00
3 1981-02-22 real 1981-02-22 00:00:00 text 19 1981/00/22 -470-00-15 19WARD50
4 1981-04-02 real 1981-04-02 00:00:00 text 19 1981/00/02 -470-00-23 19JONES75
5 1981-09-28 real 1981-09-28 00:00:00 text 19 1981/00/28 -470-00-21 19MARTIN50
6 1981-03-01 real 1981-03-01 00:00:00 text 19 1981/00/01 -470-00-22 19BLAKE50
7 1981-01-09 real 1981-01-09 00:00:00 text 19 1981/00/09 -470-00-02 19CLARK50
8 1982-12-09 real 1982-12-09 00:00:00 text 19 1982/00/09 -470-00-01 19SCOTT00
9 1981-11-17 real 1981-11-17 00:00:00 text 19 1981/00/17 -470-00-10 19KING00
10 1981-09-08 real 1981-09-08 00:00:00 text 19 1981/00/08 -470-00-01 19TURNER00
6. Python pandasql Package
Python Programming |
ps.sqldf(" select hiredate, sal, \
substr(datetime(hiredate * 3600 * 24,'unixepoch'),1,2)||ENAME||substr(cast(sal as int),-2,2) ename_1 \
from emp ").head()
Results |
hiredate | sal | ename_1 | |
0 | 1980/12/17 | 800 | 19SMITH00 |
1 | 1981/02/20 | 1600 | 19ALLEN00 |
2 | 1981/02/22 | 1250 | 19WARD50 |
3 | 1981/04/02 | 2975 | 19JONES75 |
4 | 1981/09/28 | 1250 | 19MARTIN50 |
7. R data.table Package
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
salgrade_DT <- data.table(salgrade)
DT[, c('str_combined_2','str_combined_3','str_combined_4','str_combined_5')
:= list( stringi::stri_join(stringr::str_sub(hiredate,1,2),ename,stringr::str_sub(sal,-2),sep="") ,
stringr::str_c(stringr::str_sub(hiredate,1,2),ename,stringr::str_sub(sal,-2),sep=""),
base::paste(stringr::str_sub(hiredate,1,2),ename,stringr::str_sub(sal,-2),sep=""),
base::paste0(stringr::str_sub(hiredate,1,2),"",ename,"",stringr::str_sub(sal,-2))
) ]
DT
Results |
empno ename job mgr hiredate sal comm deptno str_combined_2 str_combined_3 str_combined_4 str_combined_5
1: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 19SMITH00 19SMITH00 19SMITH00 19SMITH00
2: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 19ALLEN00 19ALLEN00 19ALLEN00 19ALLEN00
3: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 19WARD50 19WARD50 19WARD50 19WARD50
4: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 19JONES75 19JONES75 19JONES75 19JONES75
5: 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 19MARTIN50 19MARTIN50 19MARTIN50 19MARTIN50
6: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 19BLAKE50 19BLAKE50 19BLAKE50 19BLAKE50
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 19CLARK50 19CLARK50 19CLARK50 19CLARK50
8: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 19SCOTT00 19SCOTT00 19SCOTT00 19SCOTT00
9: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 19KING00 19KING00 19KING00 19KING00
10: 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 19TURNER00 19TURNER00 19TURNER00 19TURNER00
11: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 19ADAMS00 19ADAMS00 19ADAMS00 19ADAMS00
12: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 19JAMES50 19JAMES50 19JAMES50 19JAMES50
13: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 19FORD00 19FORD00 19FORD00 19FORD00
14: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 19MILLER00 19MILLER00 19MILLER00 19MILLER00
8. SAS Proc SQL
SAS Programming |
%%SAS sas
PROC SQL;
CREATE TABLE STATSAS_1 AS
select ename,
put(hiredate, yymmddn.) as head_base,
strip(put(sal,5.)) as tail_base,
substr( strip(put(sal,5.)) ,length(strip(put(sal,5.))) -1,2 ) as last_char_1,
reverse( substr( reverse( strip(put(sal,5.)) ),1,2) ) as last_char_2,
prxchange('s/.*(\d{2})/$1/',-1, strip(put(sal,5.)) ) as last_char_3,
SUBSTR( calculated head_base , 1,2)||compress(ENAME)|| calculated last_char_1 as concat_str format=$12.
from emp;
QUIT;
PROC PRINT data=STATSAS_1(obs=5);RUN;
Results |
OBS | ename | head_base | tail_base | last_char_1 | last_char_2 | last_char_3 | concat_str |
1 | SMITH | 19801217 | 800 | 00 | 00 | 00 | 19SMITH00 |
2 | ALLEN | 19810220 | 1600 | 00 | 00 | 00 | 19ALLEN00 |
3 | WARD | 19810222 | 1250 | 50 | 50 | 50 | 19WARD50 |
4 | JONES | 19810402 | 2975 | 75 | 75 | 75 | 19JONES75 |
5 | MARTIN | 19810928 | 1250 | 50 | 50 | 50 | 19MARTIN50 |
9. SAS Data Step
SAS Programming |
%%SAS sas
DATA STATSAS_2;
SET emp;
head_base = put(hiredate, yymmddn.);
tail_base = strip(put(sal,5.));
last_char_1 = substr( strip(put(sal,5.)) ,length(strip(put(sal,5.))) -1,2 );
last_char_2 = reverse( substr( reverse( strip(put(sal,5.)) ),1,2) );
last_char_3 = prxchange('s/.*(\d{2})/$1/',-1, strip(put(sal,5.)) );
concat_str = SUBSTR( head_base , 1,2)||compress(ENAME)|| last_char_1;
keep empno head_base tail_base last_: concat_str;
RUN;
PROC PRINT Data=STATSAS_2(Obs=5);RUN;
Results |
OBS | empno | head_base | tail_base | last_char_1 | last_char_2 | last_char_3 | concat_str |
1 | 7369 | 19801217 | 800 | 00 | 00 | 00 | 19SMITH00 |
2 | 7499 | 19810220 | 1600 | 00 | 00 | 00 | 19ALLEN00 |
3 | 7521 | 19810222 | 1250 | 50 | 50 | 50 | 19WARD50 |
4 | 7566 | 19810402 | 2975 | 75 | 75 | 75 | 19JONES75 |
5 | 7654 | 19810928 | 1250 | 50 | 50 | 50 | 19MARTIN50 |
10. Python Dfply Package
Python Programming |
emp >> mutate( ename_string = X.hiredate.str.slice(0,2) + X.ename + X.sal.astype(str).str.slice(-2,) ) >> \
head()
Results |
empno | ename | job | mgr | hiredate | sal | comm | deptno | ename_string | |
0 | 7369 | SMITH | CLERK | 7902.0 | 1980/12/17 | 800 | NaN | 20 | 19SMITH00 |
1 | 7499 | ALLEN | SALESMAN | 7698.0 | 1981/02/20 | 1600 | 300.0 | 30 | 19ALLEN00 |
2 | 7521 | WARD | SALESMAN | 7698.0 | 1981/02/22 | 1250 | 500.0 | 30 | 19WARD50 |
3 | 7566 | JONES | MANAGER | 7839.0 | 1981/04/02 | 2975 | NaN | 20 | 19JONES75 |
4 | 7654 | MARTIN | SALESMAN | 7698.0 | 1981/09/28 | 1250 | 1400.0 | 30 | 19MARTIN50 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 전처리 방법 리스트
반응형
댓글