본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 추출] 비등가 FULL 조인(NON-EQUI FULL JOIN)을 통한 데이터 결합 - 103

by 기서무나구물 2022. 10. 2.

 

103. Display employee name, his job, his dept name, his manager name, his grade and make out of an under department wise.

 

* 직원이름, 직무, 부서명, 관리자명, 직원 급여 등급을 부서 번호를 기준으로 정렬하여서 출력하시오.


  • Oracle : 비등가조인(NON-EQUI JOIN)
  • 파이썬 Pandas : pd.merge, assign(), query()
  • R 프로그래밍 : base::merge(), subset()
  • R Dplyr Package : dplyr::inner_join(), dplyr::full_join(), dplyr::filter()
  • R sqldf Package : 비등가조인(NON-EQUI JOIN)
  • Python pandasql Package : 비등가조인(NON-EQUI JOIN)
  • R data.table Package : 비등가조인(NON-EQUI JOIN) 방식 데이터 결합
  • SAS Proc SQL : 비등가조인(NON-EQUI JOIN)
  • SAS Data Step : Merge 구문, IF 조건문, CROSS JOIN (카테시안 곱) 구현
  • Python Dfply Package : @pipe & Def 사용자 정의 함수(inner_join_merge), full_join()
  • 파이썬 Base 프로그래밍 :

 


1. Oracle(오라클)

emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

 

Oracle Programming
select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade 
from   emp e, emp m, dept d, salgrade s 
where  e.deptno = d.deptno 
  and  e.sal between s.losal and s.hisal 
  and  e.mgr    = m.empno 
order 
   by  e.deptno;

 


2. Python Pandas(파이썬)

emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

 

Python Programming
pd.merge( pd.merge( pd.merge(emp, emp, how='inner', left_on=['mgr'], right_on=['empno'])[['sal_x',"deptno_x","ename_y"]],
                    dept, how='inner', left_on=['deptno_x'], right_on=['deptno'])[['sal_x',"deptno_x","ename_y","dname"]].assign(foo=1),
          salgrade.assign(foo=1), on ='foo').query("losal <= sal_x <= hisal ")[["sal_x","deptno_x","ename_y","dname","losal","hisal"]].sort_values(by=['deptno_x'])

 


Results
sal_x deptno_x ename_y dname losal hisal
2450 10 KING ACCOUNTING 2001 3000
1300 10 CLARK ACCOUNTING 1201 1400
800 20 FORD RESEARCH 700 1200
2975 20 KING RESEARCH 2001 3000
3000 20 JONES RESEARCH 2001 3000
3000 20 JONES RESEARCH 2001 3000
1100 20 SCOTT RESEARCH 700 1200
1600 30 BLAKE SALES 1401 2000
1250 30 BLAKE SALES 1201 1400
1250 30 BLAKE SALES 1201 1400
1500 30 BLAKE SALES 1401 2000
950 30 BLAKE SALES 700 1200
2850 30 KING SALES 2001 3000

 


  • 위 테이블 결합 작업을 개별적으로 분리하여서 처리
Python Programming
withmooc_1 = pd.merge(emp, emp, how='inner', left_on=['mgr'], right_on=['empno'])[['sal_x',"deptno_x","ename_y"]]
withmooc_2 = pd.merge( withmooc_1,dept, how='inner', left_on=['deptno_x'], right_on=['deptno'])[['sal_x',"deptno_x","ename_y","dname"]]
withmooc_3 = pd.merge( withmooc_2.assign(foo=1),salgrade.assign(foo=1), on ='foo').query("losal <= sal_x <= hisal ")

withmooc_3[["sal_x","deptno_x","ename_y","dname","losal","hisal"]].sort_values(by=['deptno_x']).head()

 

Results
sal_x deptno_x ename_y dname losal hisal
2450 10 KING ACCOUNTING 2001 3000
1300 10 CLARK ACCOUNTING 1201 1400
800 20 FORD RESEARCH 700 1200
2975 20 KING RESEARCH 2001 3000
3000 20 JONES RESEARCH 2001 3000

 


3. R Programming (R Package)

emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후에 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다. with 구문을 사용하여서 테이블 명 지정하지 않고 변수명을 바로 지정할 수 있다.

 

R Programming
%%R
withmooc <- subset( base::merge( ( base::merge( base::merge(emp,emp,by.x=c("mgr"),by.y=c("empno"),all=F)[,c("sal.x","deptno.x","ename.y")] ,
                                                dept, by.x=c("deptno.x"),by.y=c("deptno"),all=F )[,c("sal.x","deptno.x","ename.y","dname")]
                                  ),
                                  salgrade, by = character(),all.x=TRUE, all.y=TRUE )[,c("sal.x","deptno.x","ename.y","dname","losal","hisal","grade")],
                    between(sal.x, losal, hisal) )

# withmooc[order(withmooc$deptno.x), ]

withmooc[with(withmooc, order(deptno.x)), ][1:7, ] # with 구문을 사용하여서 테이블 명 지정하지 않고 변수명을 바로 지정 할 수 있다.

 

Results
   sal.x deptno.x ename.y      dname losal hisal grade
15  1300       10   CLARK ACCOUNTING  1201  1400     2
40  2450       10    KING ACCOUNTING  2001  3000     4
6   1100       20   SCOTT   RESEARCH   700  1200     1
7    800       20    FORD   RESEARCH   700  1200     1
42  3000       20   JONES   RESEARCH  2001  3000     4
43  3000       20   JONES   RESEARCH  2001  3000     4
44  2975       20    KING   RESEARCH  2001  3000     4

 


  • 위 테이블 결합 작업을 개별적으로 분리하여서 처리
R Programming
%%R

withmooc_1 <- base::merge( emp        , emp,  by.x=c("mgr")     ,by.y=c("empno") ,all=F)[,c("sal.x","deptno.x","ename.y")]
withmooc_2 <- base::merge( withmooc_1 , dept, by.x=c("deptno.x"),by.y=c("deptno"),all=F )[,c("sal.x","deptno.x","ename.y","dname")]
withmooc_3 <- base::merge( withmooc_2 , salgrade, by = character(),all.x=TRUE, all.y=TRUE )[,c("sal.x","deptno.x","ename.y","dname","losal","hisal","grade")]
withmooc_4 <- subset( withmooc_3, between(sal.x, losal, hisal) )

withmooc_4[with(withmooc_4, order(deptno.x)), ][1:7,]

 

Results
   sal.x deptno.x ename.y      dname losal hisal grade
15  1300       10   CLARK ACCOUNTING  1201  1400     2
40  2450       10    KING ACCOUNTING  2001  3000     4
6   1100       20   SCOTT   RESEARCH   700  1200     1
7    800       20    FORD   RESEARCH   700  1200     1
42  3000       20   JONES   RESEARCH  2001  3000     4
43  3000       20   JONES   RESEARCH  2001  3000     4
44  2975       20    KING   RESEARCH  2001  3000     4

 


4. R Dplyr Package

emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(inner_join)하여 부서명을 추가한 후에 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

 

R Programming
%%R

emp %>% 
  dplyr::inner_join( emp     , by = c('mgr'      = 'empno'))  %>%
  dplyr::inner_join( dept    , by = c('deptno.x' = 'deptno')) %>%
  dplyr::full_join( salgrade , by = character())              %>%
  dplyr::filter(between(sal.x, losal, hisal ))                %>%
  dplyr::select(ends_with('.x'),dname,grade )                 %>%
  dplyr::arrange(deptno.x)                                    %>%
  head(7)

 

Results
# A tibble: 7 x 8
  ename.x job.x   hiredate.x sal.x comm.x deptno.x dname      grade
  <chr>   <chr>   <date>     <dbl>  <dbl>    <dbl> <chr>      <dbl>
1 CLARK   MANAGER 1981-01-09  2450     NA       10 ACCOUNTING     4
2 MILLER  CLERK   1982-01-23  1300     NA       10 ACCOUNTING     2
3 SMITH   CLERK   1980-12-17   800     NA       20 RESEARCH       1
4 JONES   MANAGER 1981-04-02  2975     NA       20 RESEARCH       4
5 SCOTT   ANALYST 1982-12-09  3000     NA       20 RESEARCH       4
6 ADAMS   CLERK   1983-01-12  1100     NA       20 RESEARCH       1
7 FORD    ANALYST 1981-12-03  3000     NA       20 RESEARCH       4

 


5. R sqldf Package

emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

 

R Programming
%%R

sqldf(" select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade
        from emp e, emp m, dept d, salgrade s
        where e.deptno = d.deptno 
          and e.sal between s.losal and s.hisal 
          and e.mgr    = m.empno 
        order by e.deptno") %>% head(7)

 

Results
  deptno  ename     job      dname ename grade
1     10  CLARK MANAGER ACCOUNTING  KING     4
2     10 MILLER   CLERK ACCOUNTING CLARK     2
3     20  SMITH   CLERK   RESEARCH  FORD     1
4     20  JONES MANAGER   RESEARCH  KING     4
5     20  SCOTT ANALYST   RESEARCH JONES     4
6     20  ADAMS   CLERK   RESEARCH SCOTT     1
7     20   FORD ANALYST   RESEARCH JONES     4

 


6. Python pandasql Package

emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

 

Python Programming
ps.sqldf(" select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade \
           from emp e \
                inner join emp m           \
                      ON e.mgr=m.empno     \
                inner join dept d          \
                      ON e.deptno=d.deptno \
                inner join salgrade s      \
                      ON e.sal between s.losal and s.hisal \
          order by e.deptno").head(7)

 


Results
deptno ename job dname ename grade
10 CLARK MANAGER ACCOUNTING KING 4
10 MILLER CLERK ACCOUNTING CLARK 2
20 SMITH CLERK RESEARCH FORD 1
20 JONES MANAGER RESEARCH KING 4
20 SCOTT ANALYST RESEARCH JONES 4
20 ADAMS CLERK RESEARCH SCOTT 1
20 FORD ANALYST RESEARCH JONES 4

 


7. R data.table Package

  • 1차 : emp 테이블을 양쪽에 지정하여서 self Join을 수행하여 관리자 이름을 추가
  • 2차 : dept 테이블을 결합(DT 문법)하여 부서명
  • 3차 : salgrade 테이블을 비등가조인(NON-EQUI JOIN) 방식으로 결합하여 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력.
R Programming
%%R

DT          <- data.table(emp)
dept_DT     <- data.table(dept)
salgrade_DT <- data.table(salgrade)

DT[DT, nomatch=NULL, 
       on = .( mgr=empno ), 
       .(sal,deptno,ename,job,i.ename)][dept_DT, nomatch=NULL, 
                                                 on = .( deptno=deptno ), 
                                                 .(sal,deptno,ename,job,i.ename,dname)][salgrade_DT, nomatch=NULL,
                                                                                                     on = .( sal >= losal , sal <= hisal),
                                                                                                     .(x.sal,deptno,ename,job,dname,i.ename,grade)][order(deptno)]

 

Results
    x.sal deptno  ename      job      dname i.ename grade
 1:  1300     10 MILLER    CLERK ACCOUNTING   CLARK     2
 2:  2450     10  CLARK  MANAGER ACCOUNTING    KING     4
 3:  1100     20  ADAMS    CLERK   RESEARCH   SCOTT     1
 4:   800     20  SMITH    CLERK   RESEARCH    FORD     1
 5:  3000     20  SCOTT  ANALYST   RESEARCH   JONES     4
 6:  3000     20   FORD  ANALYST   RESEARCH   JONES     4
 7:  2975     20  JONES  MANAGER   RESEARCH    KING     4
 8:   950     30  JAMES    CLERK      SALES   BLAKE     1
 9:  1250     30   WARD SALESMAN      SALES   BLAKE     2
10:  1250     30 MARTIN SALESMAN      SALES   BLAKE     2
11:  1600     30  ALLEN SALESMAN      SALES   BLAKE     3
12:  1500     30 TURNER SALESMAN      SALES   BLAKE     3
13:  2850     30  BLAKE  MANAGER      SALES    KING     4

 


  • 위 테이블 결합 작업을 개별적으로 분리하여서 처리
R Programming
%%R

DT_1 = DT[DT           , nomatch=NULL, on = .( mgr=empno )                 , .(sal,deptno,ename,job,i.ename)]
DT_2 = DT_1[dept_DT    , nomatch=NULL, on = .( deptno=deptno )             , .(sal,deptno,ename,job,i.ename,dname)]
DT_3 = DT_2[salgrade_DT, nomatch=NULL, on = .( sal >= losal , sal <= hisal), .(x.sal,deptno,ename,job,dname,i.ename,grade)][order(deptno)]

DT_3[1:7,]

 

Results
   x.sal deptno  ename     job      dname i.ename grade
1:  1300     10 MILLER   CLERK ACCOUNTING   CLARK     2
2:  2450     10  CLARK MANAGER ACCOUNTING    KING     4
3:  1100     20  ADAMS   CLERK   RESEARCH   SCOTT     1
4:   800     20  SMITH   CLERK   RESEARCH    FORD     1
5:  3000     20  SCOTT ANALYST   RESEARCH   JONES     4
6:  3000     20   FORD ANALYST   RESEARCH   JONES     4
7:  2975     20  JONES MANAGER   RESEARCH    KING     4

 


8. SAS Proc SQL

emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

 

SAS Programming
%%SAS sas

PROC SQL;
  CREATE TABLE STATSAS_1 AS
    select E.EMPNO,E.MGR,
           d.deptno, e.ename, e.job, d.dname, m.ename AS MGR_ENAME, s.grade 
    from   emp e, emp m, dept d, salgrade s 
    where  e.mgr    = m.empno 
      AND  e.deptno = d.deptno 
      and  e.sal between s.losal and s.hisal
    order 
       by e.deptno;
QUIT;
PROC SORT DATA=STATSAS_1;
      BY EMPNO;
RUN;

PROC PRINT data=STATSAS_1(obs=3);RUN;

 


Results
empno mgr deptno ename job dname MGR_ENAME grade
7369 7902 20 SMITH CLERK RESEARCH FORD 1
7499 7698 30 ALLEN SALESMAN SALES BLAKE 3
7521 7698 30 WARD SALESMAN SALES BLAKE 2

 


9. SAS Data Step

 

SAS Programming
%%SAS sas


PROC SORT DATA=emp OUT=EMP_1(RENAME=( mgr = MGR_EMPNO ) );
     BY mgr;
RUN;

PROC SORT DATA=emp OUT=EMP_2(RENAME=( empno = MGR_EMPNO ename = mrg_ename )
                              KEEP=empno ename) NODUPKEY;
     BY empno;
RUN;

DATA STATSAS_2;
 MERGE EMP_1(IN=A) EMP_2(IN=B);
     BY MGR_EMPNO;
     IF A AND B;
RUN;

proc sort data=STATSAS_2 out=STATSAS_21;
     by deptno;
run;

proc sort data=dept out=dept_1;
     by deptno;
run;

DATA STATSAS_3; 
 merge STATSAS_21(in=a) dept_1(in=b);
     by deptno;
     if a;
RUN;

DATA STATSAS_4; 
 SET STATSAS_3;
     DO I=1 TO KOBS; 
        SET salgrade NOBS=KOBS POINT=I;
        IF  SAL >= losal AND SAL <= hisal THEN OUTPUT;
     END;
     DROP ename comm SAL hiredate LOC LOSAL HISAL;
RUN;
PROC SORT DATA=STATSAS_4;
      BY EMPNO;
RUN;


PROC PRINT data=STATSAS_4(obs=3);RUN;

 


Results
empno job MGR_EMPNO deptno mgr_ename dname grade
7369 CLERK 7902 20 FORD RESEARCH 1
7499 SALESMAN 7698 30 BLAKE SALES 3
7521 SALESMAN 7698 30 BLAKE SALES 2

 


10. Python Dfply Package

 

Python Programming
@pipe
def inner_join_merge(df, other, left_on,right_on,suffixes=['_x','_y']):

    joined = df.merge(other, how='inner', left_on=left_on,
                      right_on=right_on , suffixes=suffixes)
    return joined

emp.assign(foo=1)                                                           >> \
  inner_join_merge( emp  , left_on  = ["mgr"]     , right_on = ["empno"])   >> \
  inner_join_merge( dept , left_on  = ["deptno_x"], right_on = ["deptno"])  >> \
  full_join( salgrade.assign(foo=1), by='foo' )                             >> \
  filter_by( X.sal_x.between(X.losal, X.hisal ))                            >> \
  select(ends_with('_x'),X.ename_y, X.dname, X.grade )                      >> \
  arrange(X.deptno_x)                                                       >> \
  select(~X.job_x, ~X.hiredate_x, ~X.comm_x)                                >> \
  head()

 


Results
empno_x ename_x mgr_x sal_x deptno_x ename_y dname grade
7782 CLARK 7839.0 2450 10 KING ACCOUNTING 4
7934 MILLER 7782.0 1300 10 CLARK ACCOUNTING 2
7369 SMITH 7902.0 800 20 FORD RESEARCH 1
7566 JONES 7839.0 2975 20 KING RESEARCH 4
7788 SCOTT 7566.0 3000 20 JONES RESEARCH 4

 


 

[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 전처리 방법 리스트

반응형

댓글