포스팅 목차
103. Display employee name, his job, his dept name, his manager name, his grade and make out of an under department wise.
* 직원이름, 직무, 부서명, 관리자명, 직원 급여 등급을 부서 번호를 기준으로 정렬하여서 출력하시오.
- 다중 테이블 결합 : 103 / 104
- 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
- 비등가 FULL 조인(NON-EQUI FULL JOIN)을 통한 데이터 결합
- Self join : 셀프조인, 자체 결합, 자체 조인
|
1. Oracle(오라클)
emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.
Oracle Programming |
select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade
from emp e, emp m, dept d, salgrade s
where e.deptno = d.deptno
and e.sal between s.losal and s.hisal
and e.mgr = m.empno
order
by e.deptno;
2. Python Pandas(파이썬)
emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.
Python Programming |
pd.merge( pd.merge( pd.merge(emp, emp, how='inner', left_on=['mgr'], right_on=['empno'])[['sal_x',"deptno_x","ename_y"]],
dept, how='inner', left_on=['deptno_x'], right_on=['deptno'])[['sal_x',"deptno_x","ename_y","dname"]].assign(foo=1),
salgrade.assign(foo=1), on ='foo').query("losal <= sal_x <= hisal ")[["sal_x","deptno_x","ename_y","dname","losal","hisal"]].sort_values(by=['deptno_x'])
Results |
sal_x | deptno_x | ename_y | dname | losal | hisal |
2450 | 10 | KING | ACCOUNTING | 2001 | 3000 |
1300 | 10 | CLARK | ACCOUNTING | 1201 | 1400 |
800 | 20 | FORD | RESEARCH | 700 | 1200 |
2975 | 20 | KING | RESEARCH | 2001 | 3000 |
3000 | 20 | JONES | RESEARCH | 2001 | 3000 |
3000 | 20 | JONES | RESEARCH | 2001 | 3000 |
1100 | 20 | SCOTT | RESEARCH | 700 | 1200 |
1600 | 30 | BLAKE | SALES | 1401 | 2000 |
1250 | 30 | BLAKE | SALES | 1201 | 1400 |
1250 | 30 | BLAKE | SALES | 1201 | 1400 |
1500 | 30 | BLAKE | SALES | 1401 | 2000 |
950 | 30 | BLAKE | SALES | 700 | 1200 |
2850 | 30 | KING | SALES | 2001 | 3000 |
- 위 테이블 결합 작업을 개별적으로 분리하여서 처리
Python Programming |
withmooc_1 = pd.merge(emp, emp, how='inner', left_on=['mgr'], right_on=['empno'])[['sal_x',"deptno_x","ename_y"]]
withmooc_2 = pd.merge( withmooc_1,dept, how='inner', left_on=['deptno_x'], right_on=['deptno'])[['sal_x',"deptno_x","ename_y","dname"]]
withmooc_3 = pd.merge( withmooc_2.assign(foo=1),salgrade.assign(foo=1), on ='foo').query("losal <= sal_x <= hisal ")
withmooc_3[["sal_x","deptno_x","ename_y","dname","losal","hisal"]].sort_values(by=['deptno_x']).head()
Results |
sal_x | deptno_x | ename_y | dname | losal | hisal |
2450 | 10 | KING | ACCOUNTING | 2001 | 3000 |
1300 | 10 | CLARK | ACCOUNTING | 1201 | 1400 |
800 | 20 | FORD | RESEARCH | 700 | 1200 |
2975 | 20 | KING | RESEARCH | 2001 | 3000 |
3000 | 20 | JONES | RESEARCH | 2001 | 3000 |
3. R Programming (R Package)
emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후에 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다. with 구문을 사용하여서 테이블 명 지정하지 않고 변수명을 바로 지정할 수 있다.
R Programming |
%%R
withmooc <- subset( base::merge( ( base::merge( base::merge(emp,emp,by.x=c("mgr"),by.y=c("empno"),all=F)[,c("sal.x","deptno.x","ename.y")] ,
dept, by.x=c("deptno.x"),by.y=c("deptno"),all=F )[,c("sal.x","deptno.x","ename.y","dname")]
),
salgrade, by = character(),all.x=TRUE, all.y=TRUE )[,c("sal.x","deptno.x","ename.y","dname","losal","hisal","grade")],
between(sal.x, losal, hisal) )
# withmooc[order(withmooc$deptno.x), ]
withmooc[with(withmooc, order(deptno.x)), ][1:7, ] # with 구문을 사용하여서 테이블 명 지정하지 않고 변수명을 바로 지정 할 수 있다.
Results |
sal.x deptno.x ename.y dname losal hisal grade
15 1300 10 CLARK ACCOUNTING 1201 1400 2
40 2450 10 KING ACCOUNTING 2001 3000 4
6 1100 20 SCOTT RESEARCH 700 1200 1
7 800 20 FORD RESEARCH 700 1200 1
42 3000 20 JONES RESEARCH 2001 3000 4
43 3000 20 JONES RESEARCH 2001 3000 4
44 2975 20 KING RESEARCH 2001 3000 4
- 위 테이블 결합 작업을 개별적으로 분리하여서 처리
R Programming |
%%R
withmooc_1 <- base::merge( emp , emp, by.x=c("mgr") ,by.y=c("empno") ,all=F)[,c("sal.x","deptno.x","ename.y")]
withmooc_2 <- base::merge( withmooc_1 , dept, by.x=c("deptno.x"),by.y=c("deptno"),all=F )[,c("sal.x","deptno.x","ename.y","dname")]
withmooc_3 <- base::merge( withmooc_2 , salgrade, by = character(),all.x=TRUE, all.y=TRUE )[,c("sal.x","deptno.x","ename.y","dname","losal","hisal","grade")]
withmooc_4 <- subset( withmooc_3, between(sal.x, losal, hisal) )
withmooc_4[with(withmooc_4, order(deptno.x)), ][1:7,]
Results |
sal.x deptno.x ename.y dname losal hisal grade
15 1300 10 CLARK ACCOUNTING 1201 1400 2
40 2450 10 KING ACCOUNTING 2001 3000 4
6 1100 20 SCOTT RESEARCH 700 1200 1
7 800 20 FORD RESEARCH 700 1200 1
42 3000 20 JONES RESEARCH 2001 3000 4
43 3000 20 JONES RESEARCH 2001 3000 4
44 2975 20 KING RESEARCH 2001 3000 4
4. R Dplyr Package
emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(inner_join)하여 부서명을 추가한 후에 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.
R Programming |
%%R
emp %>%
dplyr::inner_join( emp , by = c('mgr' = 'empno')) %>%
dplyr::inner_join( dept , by = c('deptno.x' = 'deptno')) %>%
dplyr::full_join( salgrade , by = character()) %>%
dplyr::filter(between(sal.x, losal, hisal )) %>%
dplyr::select(ends_with('.x'),dname,grade ) %>%
dplyr::arrange(deptno.x) %>%
head(7)
Results |
# A tibble: 7 x 8
ename.x job.x hiredate.x sal.x comm.x deptno.x dname grade
<chr> <chr> <date> <dbl> <dbl> <dbl> <chr> <dbl>
1 CLARK MANAGER 1981-01-09 2450 NA 10 ACCOUNTING 4
2 MILLER CLERK 1982-01-23 1300 NA 10 ACCOUNTING 2
3 SMITH CLERK 1980-12-17 800 NA 20 RESEARCH 1
4 JONES MANAGER 1981-04-02 2975 NA 20 RESEARCH 4
5 SCOTT ANALYST 1982-12-09 3000 NA 20 RESEARCH 4
6 ADAMS CLERK 1983-01-12 1100 NA 20 RESEARCH 1
7 FORD ANALYST 1981-12-03 3000 NA 20 RESEARCH 4
5. R sqldf Package
emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.
R Programming |
%%R
sqldf(" select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade
from emp e, emp m, dept d, salgrade s
where e.deptno = d.deptno
and e.sal between s.losal and s.hisal
and e.mgr = m.empno
order by e.deptno") %>% head(7)
Results |
deptno ename job dname ename grade
1 10 CLARK MANAGER ACCOUNTING KING 4
2 10 MILLER CLERK ACCOUNTING CLARK 2
3 20 SMITH CLERK RESEARCH FORD 1
4 20 JONES MANAGER RESEARCH KING 4
5 20 SCOTT ANALYST RESEARCH JONES 4
6 20 ADAMS CLERK RESEARCH SCOTT 1
7 20 FORD ANALYST RESEARCH JONES 4
6. Python pandasql Package
emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.
Python Programming |
ps.sqldf(" select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade \
from emp e \
inner join emp m \
ON e.mgr=m.empno \
inner join dept d \
ON e.deptno=d.deptno \
inner join salgrade s \
ON e.sal between s.losal and s.hisal \
order by e.deptno").head(7)
Results |
deptno | ename | job | dname | ename | grade |
10 | CLARK | MANAGER | ACCOUNTING | KING | 4 |
10 | MILLER | CLERK | ACCOUNTING | CLARK | 2 |
20 | SMITH | CLERK | RESEARCH | FORD | 1 |
20 | JONES | MANAGER | RESEARCH | KING | 4 |
20 | SCOTT | ANALYST | RESEARCH | JONES | 4 |
20 | ADAMS | CLERK | RESEARCH | SCOTT | 1 |
20 | FORD | ANALYST | RESEARCH | JONES | 4 |
7. R data.table Package
- 1차 : emp 테이블을 양쪽에 지정하여서 self Join을 수행하여 관리자 이름을 추가
- 2차 : dept 테이블을 결합(DT 문법)하여 부서명
- 3차 : salgrade 테이블을 비등가조인(NON-EQUI JOIN) 방식으로 결합하여 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력.
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
salgrade_DT <- data.table(salgrade)
DT[DT, nomatch=NULL,
on = .( mgr=empno ),
.(sal,deptno,ename,job,i.ename)][dept_DT, nomatch=NULL,
on = .( deptno=deptno ),
.(sal,deptno,ename,job,i.ename,dname)][salgrade_DT, nomatch=NULL,
on = .( sal >= losal , sal <= hisal),
.(x.sal,deptno,ename,job,dname,i.ename,grade)][order(deptno)]
Results |
x.sal deptno ename job dname i.ename grade
1: 1300 10 MILLER CLERK ACCOUNTING CLARK 2
2: 2450 10 CLARK MANAGER ACCOUNTING KING 4
3: 1100 20 ADAMS CLERK RESEARCH SCOTT 1
4: 800 20 SMITH CLERK RESEARCH FORD 1
5: 3000 20 SCOTT ANALYST RESEARCH JONES 4
6: 3000 20 FORD ANALYST RESEARCH JONES 4
7: 2975 20 JONES MANAGER RESEARCH KING 4
8: 950 30 JAMES CLERK SALES BLAKE 1
9: 1250 30 WARD SALESMAN SALES BLAKE 2
10: 1250 30 MARTIN SALESMAN SALES BLAKE 2
11: 1600 30 ALLEN SALESMAN SALES BLAKE 3
12: 1500 30 TURNER SALESMAN SALES BLAKE 3
13: 2850 30 BLAKE MANAGER SALES KING 4
- 위 테이블 결합 작업을 개별적으로 분리하여서 처리
R Programming |
%%R
DT_1 = DT[DT , nomatch=NULL, on = .( mgr=empno ) , .(sal,deptno,ename,job,i.ename)]
DT_2 = DT_1[dept_DT , nomatch=NULL, on = .( deptno=deptno ) , .(sal,deptno,ename,job,i.ename,dname)]
DT_3 = DT_2[salgrade_DT, nomatch=NULL, on = .( sal >= losal , sal <= hisal), .(x.sal,deptno,ename,job,dname,i.ename,grade)][order(deptno)]
DT_3[1:7,]
Results |
x.sal deptno ename job dname i.ename grade
1: 1300 10 MILLER CLERK ACCOUNTING CLARK 2
2: 2450 10 CLARK MANAGER ACCOUNTING KING 4
3: 1100 20 ADAMS CLERK RESEARCH SCOTT 1
4: 800 20 SMITH CLERK RESEARCH FORD 1
5: 3000 20 SCOTT ANALYST RESEARCH JONES 4
6: 3000 20 FORD ANALYST RESEARCH JONES 4
7: 2975 20 JONES MANAGER RESEARCH KING 4
8. SAS Proc SQL
emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.
SAS Programming |
%%SAS sas
PROC SQL;
CREATE TABLE STATSAS_1 AS
select E.EMPNO,E.MGR,
d.deptno, e.ename, e.job, d.dname, m.ename AS MGR_ENAME, s.grade
from emp e, emp m, dept d, salgrade s
where e.mgr = m.empno
AND e.deptno = d.deptno
and e.sal between s.losal and s.hisal
order
by e.deptno;
QUIT;
PROC SORT DATA=STATSAS_1;
BY EMPNO;
RUN;
PROC PRINT data=STATSAS_1(obs=3);RUN;
Results |
empno | mgr | deptno | ename | job | dname | MGR_ENAME | grade |
7369 | 7902 | 20 | SMITH | CLERK | RESEARCH | FORD | 1 |
7499 | 7698 | 30 | ALLEN | SALESMAN | SALES | BLAKE | 3 |
7521 | 7698 | 30 | WARD | SALESMAN | SALES | BLAKE | 2 |
9. SAS Data Step
SAS Programming |
%%SAS sas
PROC SORT DATA=emp OUT=EMP_1(RENAME=( mgr = MGR_EMPNO ) );
BY mgr;
RUN;
PROC SORT DATA=emp OUT=EMP_2(RENAME=( empno = MGR_EMPNO ename = mrg_ename )
KEEP=empno ename) NODUPKEY;
BY empno;
RUN;
DATA STATSAS_2;
MERGE EMP_1(IN=A) EMP_2(IN=B);
BY MGR_EMPNO;
IF A AND B;
RUN;
proc sort data=STATSAS_2 out=STATSAS_21;
by deptno;
run;
proc sort data=dept out=dept_1;
by deptno;
run;
DATA STATSAS_3;
merge STATSAS_21(in=a) dept_1(in=b);
by deptno;
if a;
RUN;
DATA STATSAS_4;
SET STATSAS_3;
DO I=1 TO KOBS;
SET salgrade NOBS=KOBS POINT=I;
IF SAL >= losal AND SAL <= hisal THEN OUTPUT;
END;
DROP ename comm SAL hiredate LOC LOSAL HISAL;
RUN;
PROC SORT DATA=STATSAS_4;
BY EMPNO;
RUN;
PROC PRINT data=STATSAS_4(obs=3);RUN;
Results |
empno | job | MGR_EMPNO | deptno | mgr_ename | dname | grade |
7369 | CLERK | 7902 | 20 | FORD | RESEARCH | 1 |
7499 | SALESMAN | 7698 | 30 | BLAKE | SALES | 3 |
7521 | SALESMAN | 7698 | 30 | BLAKE | SALES | 2 |
10. Python Dfply Package
Python Programming |
@pipe
def inner_join_merge(df, other, left_on,right_on,suffixes=['_x','_y']):
joined = df.merge(other, how='inner', left_on=left_on,
right_on=right_on , suffixes=suffixes)
return joined
emp.assign(foo=1) >> \
inner_join_merge( emp , left_on = ["mgr"] , right_on = ["empno"]) >> \
inner_join_merge( dept , left_on = ["deptno_x"], right_on = ["deptno"]) >> \
full_join( salgrade.assign(foo=1), by='foo' ) >> \
filter_by( X.sal_x.between(X.losal, X.hisal )) >> \
select(ends_with('_x'),X.ename_y, X.dname, X.grade ) >> \
arrange(X.deptno_x) >> \
select(~X.job_x, ~X.hiredate_x, ~X.comm_x) >> \
head()
Results |
empno_x | ename_x | mgr_x | sal_x | deptno_x | ename_y | dname | grade |
7782 | CLARK | 7839.0 | 2450 | 10 | KING | ACCOUNTING | 4 |
7934 | MILLER | 7782.0 | 1300 | 10 | CLARK | ACCOUNTING | 2 |
7369 | SMITH | 7902.0 | 800 | 20 | FORD | RESEARCH | 1 |
7566 | JONES | 7839.0 | 2975 | 20 | KING | RESEARCH | 4 |
7788 | SCOTT | 7566.0 | 3000 | 20 | JONES | RESEARCH | 4 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 전처리 방법 리스트
댓글