본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 추출] 비등가 FULL 조인(NON-EQUI FULL JOIN)을 통한 데이터 결합 - 103

by 기서무나구물 2022. 10. 2.

포스팅 목차

     

    103. Display employee name, his job, his dept name, his manager name, his grade and make out of an under department wise.

     

    * 직원이름, 직무, 부서명, 관리자명, 직원 급여 등급을 부서 번호를 기준으로 정렬하여서 출력하시오.


    • Oracle : 비등가조인(NON-EQUI JOIN)
    • 파이썬 Pandas : pd.merge, assign(), query()
    • R 프로그래밍 : base::merge(), subset()
    • R Dplyr Package : dplyr::inner_join(), dplyr::full_join(), dplyr::filter()
    • R sqldf Package : 비등가조인(NON-EQUI JOIN)
    • Python pandasql Package : 비등가조인(NON-EQUI JOIN)
    • R data.table Package : 비등가조인(NON-EQUI JOIN) 방식 데이터 결합
    • SAS Proc SQL : 비등가조인(NON-EQUI JOIN)
    • SAS Data Step : Merge 구문, IF 조건문, CROSS JOIN (카테시안 곱) 구현
    • Python Dfply Package : @pipe & Def 사용자 정의 함수(inner_join_merge), full_join()
    • 파이썬 Base 프로그래밍 :

     


    1. Oracle(오라클)

    emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

     

    Oracle Programming
    select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade 
    from   emp e, emp m, dept d, salgrade s 
    where  e.deptno = d.deptno 
      and  e.sal between s.losal and s.hisal 
      and  e.mgr    = m.empno 
    order 
       by  e.deptno;

     


    2. Python Pandas(파이썬)

    emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

     

    Python Programming
    pd.merge( pd.merge( pd.merge(emp, emp, how='inner', left_on=['mgr'], right_on=['empno'])[['sal_x',"deptno_x","ename_y"]],
                        dept, how='inner', left_on=['deptno_x'], right_on=['deptno'])[['sal_x',"deptno_x","ename_y","dname"]].assign(foo=1),
              salgrade.assign(foo=1), on ='foo').query("losal <= sal_x <= hisal ")[["sal_x","deptno_x","ename_y","dname","losal","hisal"]].sort_values(by=['deptno_x'])

     


    Results
    sal_x deptno_x ename_y dname losal hisal
    2450 10 KING ACCOUNTING 2001 3000
    1300 10 CLARK ACCOUNTING 1201 1400
    800 20 FORD RESEARCH 700 1200
    2975 20 KING RESEARCH 2001 3000
    3000 20 JONES RESEARCH 2001 3000
    3000 20 JONES RESEARCH 2001 3000
    1100 20 SCOTT RESEARCH 700 1200
    1600 30 BLAKE SALES 1401 2000
    1250 30 BLAKE SALES 1201 1400
    1250 30 BLAKE SALES 1201 1400
    1500 30 BLAKE SALES 1401 2000
    950 30 BLAKE SALES 700 1200
    2850 30 KING SALES 2001 3000

     


    • 위 테이블 결합 작업을 개별적으로 분리하여서 처리
    Python Programming
    withmooc_1 = pd.merge(emp, emp, how='inner', left_on=['mgr'], right_on=['empno'])[['sal_x',"deptno_x","ename_y"]]
    withmooc_2 = pd.merge( withmooc_1,dept, how='inner', left_on=['deptno_x'], right_on=['deptno'])[['sal_x',"deptno_x","ename_y","dname"]]
    withmooc_3 = pd.merge( withmooc_2.assign(foo=1),salgrade.assign(foo=1), on ='foo').query("losal <= sal_x <= hisal ")
    
    withmooc_3[["sal_x","deptno_x","ename_y","dname","losal","hisal"]].sort_values(by=['deptno_x']).head()

     

    Results
    sal_x deptno_x ename_y dname losal hisal
    2450 10 KING ACCOUNTING 2001 3000
    1300 10 CLARK ACCOUNTING 1201 1400
    800 20 FORD RESEARCH 700 1200
    2975 20 KING RESEARCH 2001 3000
    3000 20 JONES RESEARCH 2001 3000

     


    3. R Programming (R Package)

    emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후에 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다. with 구문을 사용하여서 테이블 명 지정하지 않고 변수명을 바로 지정할 수 있다.

     

    R Programming
    %%R
    withmooc <- subset( base::merge( ( base::merge( base::merge(emp,emp,by.x=c("mgr"),by.y=c("empno"),all=F)[,c("sal.x","deptno.x","ename.y")] ,
                                                    dept, by.x=c("deptno.x"),by.y=c("deptno"),all=F )[,c("sal.x","deptno.x","ename.y","dname")]
                                      ),
                                      salgrade, by = character(),all.x=TRUE, all.y=TRUE )[,c("sal.x","deptno.x","ename.y","dname","losal","hisal","grade")],
                        between(sal.x, losal, hisal) )
    
    # withmooc[order(withmooc$deptno.x), ]
    
    withmooc[with(withmooc, order(deptno.x)), ][1:7, ] # with 구문을 사용하여서 테이블 명 지정하지 않고 변수명을 바로 지정 할 수 있다.

     

    Results
       sal.x deptno.x ename.y      dname losal hisal grade
    15  1300       10   CLARK ACCOUNTING  1201  1400     2
    40  2450       10    KING ACCOUNTING  2001  3000     4
    6   1100       20   SCOTT   RESEARCH   700  1200     1
    7    800       20    FORD   RESEARCH   700  1200     1
    42  3000       20   JONES   RESEARCH  2001  3000     4
    43  3000       20   JONES   RESEARCH  2001  3000     4
    44  2975       20    KING   RESEARCH  2001  3000     4

     


    • 위 테이블 결합 작업을 개별적으로 분리하여서 처리
    R Programming
    %%R
    
    withmooc_1 <- base::merge( emp        , emp,  by.x=c("mgr")     ,by.y=c("empno") ,all=F)[,c("sal.x","deptno.x","ename.y")]
    withmooc_2 <- base::merge( withmooc_1 , dept, by.x=c("deptno.x"),by.y=c("deptno"),all=F )[,c("sal.x","deptno.x","ename.y","dname")]
    withmooc_3 <- base::merge( withmooc_2 , salgrade, by = character(),all.x=TRUE, all.y=TRUE )[,c("sal.x","deptno.x","ename.y","dname","losal","hisal","grade")]
    withmooc_4 <- subset( withmooc_3, between(sal.x, losal, hisal) )
    
    withmooc_4[with(withmooc_4, order(deptno.x)), ][1:7,]

     

    Results
       sal.x deptno.x ename.y      dname losal hisal grade
    15  1300       10   CLARK ACCOUNTING  1201  1400     2
    40  2450       10    KING ACCOUNTING  2001  3000     4
    6   1100       20   SCOTT   RESEARCH   700  1200     1
    7    800       20    FORD   RESEARCH   700  1200     1
    42  3000       20   JONES   RESEARCH  2001  3000     4
    43  3000       20   JONES   RESEARCH  2001  3000     4
    44  2975       20    KING   RESEARCH  2001  3000     4

     


    4. R Dplyr Package

    emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(inner_join)하여 부서명을 추가한 후에 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

     

    R Programming
    %%R
    
    emp %>% 
      dplyr::inner_join( emp     , by = c('mgr'      = 'empno'))  %>%
      dplyr::inner_join( dept    , by = c('deptno.x' = 'deptno')) %>%
      dplyr::full_join( salgrade , by = character())              %>%
      dplyr::filter(between(sal.x, losal, hisal ))                %>%
      dplyr::select(ends_with('.x'),dname,grade )                 %>%
      dplyr::arrange(deptno.x)                                    %>%
      head(7)

     

    Results
    # A tibble: 7 x 8
      ename.x job.x   hiredate.x sal.x comm.x deptno.x dname      grade
      <chr>   <chr>   <date>     <dbl>  <dbl>    <dbl> <chr>      <dbl>
    1 CLARK   MANAGER 1981-01-09  2450     NA       10 ACCOUNTING     4
    2 MILLER  CLERK   1982-01-23  1300     NA       10 ACCOUNTING     2
    3 SMITH   CLERK   1980-12-17   800     NA       20 RESEARCH       1
    4 JONES   MANAGER 1981-04-02  2975     NA       20 RESEARCH       4
    5 SCOTT   ANALYST 1982-12-09  3000     NA       20 RESEARCH       4
    6 ADAMS   CLERK   1983-01-12  1100     NA       20 RESEARCH       1
    7 FORD    ANALYST 1981-12-03  3000     NA       20 RESEARCH       4

     


    5. R sqldf Package

    emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

     

    R Programming
    %%R
    
    sqldf(" select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade
            from emp e, emp m, dept d, salgrade s
            where e.deptno = d.deptno 
              and e.sal between s.losal and s.hisal 
              and e.mgr    = m.empno 
            order by e.deptno") %>% head(7)

     

    Results
      deptno  ename     job      dname ename grade
    1     10  CLARK MANAGER ACCOUNTING  KING     4
    2     10 MILLER   CLERK ACCOUNTING CLARK     2
    3     20  SMITH   CLERK   RESEARCH  FORD     1
    4     20  JONES MANAGER   RESEARCH  KING     4
    5     20  SCOTT ANALYST   RESEARCH JONES     4
    6     20  ADAMS   CLERK   RESEARCH SCOTT     1
    7     20   FORD ANALYST   RESEARCH JONES     4

     


    6. Python pandasql Package

    emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

     

    Python Programming
    ps.sqldf(" select d.deptno, e.ename, e.job, d.dname, m.ename, s.grade \
               from emp e \
                    inner join emp m           \
                          ON e.mgr=m.empno     \
                    inner join dept d          \
                          ON e.deptno=d.deptno \
                    inner join salgrade s      \
                          ON e.sal between s.losal and s.hisal \
              order by e.deptno").head(7)

     


    Results
    deptno ename job dname ename grade
    10 CLARK MANAGER ACCOUNTING KING 4
    10 MILLER CLERK ACCOUNTING CLARK 2
    20 SMITH CLERK RESEARCH FORD 1
    20 JONES MANAGER RESEARCH KING 4
    20 SCOTT ANALYST RESEARCH JONES 4
    20 ADAMS CLERK RESEARCH SCOTT 1
    20 FORD ANALYST RESEARCH JONES 4

     


    7. R data.table Package

    • 1차 : emp 테이블을 양쪽에 지정하여서 self Join을 수행하여 관리자 이름을 추가
    • 2차 : dept 테이블을 결합(DT 문법)하여 부서명
    • 3차 : salgrade 테이블을 비등가조인(NON-EQUI JOIN) 방식으로 결합하여 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력.
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    salgrade_DT <- data.table(salgrade)
    
    DT[DT, nomatch=NULL, 
           on = .( mgr=empno ), 
           .(sal,deptno,ename,job,i.ename)][dept_DT, nomatch=NULL, 
                                                     on = .( deptno=deptno ), 
                                                     .(sal,deptno,ename,job,i.ename,dname)][salgrade_DT, nomatch=NULL,
                                                                                                         on = .( sal >= losal , sal <= hisal),
                                                                                                         .(x.sal,deptno,ename,job,dname,i.ename,grade)][order(deptno)]

     

    Results
        x.sal deptno  ename      job      dname i.ename grade
     1:  1300     10 MILLER    CLERK ACCOUNTING   CLARK     2
     2:  2450     10  CLARK  MANAGER ACCOUNTING    KING     4
     3:  1100     20  ADAMS    CLERK   RESEARCH   SCOTT     1
     4:   800     20  SMITH    CLERK   RESEARCH    FORD     1
     5:  3000     20  SCOTT  ANALYST   RESEARCH   JONES     4
     6:  3000     20   FORD  ANALYST   RESEARCH   JONES     4
     7:  2975     20  JONES  MANAGER   RESEARCH    KING     4
     8:   950     30  JAMES    CLERK      SALES   BLAKE     1
     9:  1250     30   WARD SALESMAN      SALES   BLAKE     2
    10:  1250     30 MARTIN SALESMAN      SALES   BLAKE     2
    11:  1600     30  ALLEN SALESMAN      SALES   BLAKE     3
    12:  1500     30 TURNER SALESMAN      SALES   BLAKE     3
    13:  2850     30  BLAKE  MANAGER      SALES    KING     4

     


    • 위 테이블 결합 작업을 개별적으로 분리하여서 처리
    R Programming
    %%R
    
    DT_1 = DT[DT           , nomatch=NULL, on = .( mgr=empno )                 , .(sal,deptno,ename,job,i.ename)]
    DT_2 = DT_1[dept_DT    , nomatch=NULL, on = .( deptno=deptno )             , .(sal,deptno,ename,job,i.ename,dname)]
    DT_3 = DT_2[salgrade_DT, nomatch=NULL, on = .( sal >= losal , sal <= hisal), .(x.sal,deptno,ename,job,dname,i.ename,grade)][order(deptno)]
    
    DT_3[1:7,]

     

    Results
       x.sal deptno  ename     job      dname i.ename grade
    1:  1300     10 MILLER   CLERK ACCOUNTING   CLARK     2
    2:  2450     10  CLARK MANAGER ACCOUNTING    KING     4
    3:  1100     20  ADAMS   CLERK   RESEARCH   SCOTT     1
    4:   800     20  SMITH   CLERK   RESEARCH    FORD     1
    5:  3000     20  SCOTT ANALYST   RESEARCH   JONES     4
    6:  3000     20   FORD ANALYST   RESEARCH   JONES     4
    7:  2975     20  JONES MANAGER   RESEARCH    KING     4

     


    8. SAS Proc SQL

    emp 테이블을 self Join하여서 관리자 이름을 추가하고, dept 테이블을 결합(merge)하여 부서명을 추가한 후 salgrade 테이블을 비등가 조인(NON-EQUI JOIN)을 수행하여서 해당 급여를 포함하는 범위의 급여 등급을 선택하여 출력한다.

     

    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        select E.EMPNO,E.MGR,
               d.deptno, e.ename, e.job, d.dname, m.ename AS MGR_ENAME, s.grade 
        from   emp e, emp m, dept d, salgrade s 
        where  e.mgr    = m.empno 
          AND  e.deptno = d.deptno 
          and  e.sal between s.losal and s.hisal
        order 
           by e.deptno;
    QUIT;
    PROC SORT DATA=STATSAS_1;
          BY EMPNO;
    RUN;
    
    PROC PRINT data=STATSAS_1(obs=3);RUN;

     


    Results
    empno mgr deptno ename job dname MGR_ENAME grade
    7369 7902 20 SMITH CLERK RESEARCH FORD 1
    7499 7698 30 ALLEN SALESMAN SALES BLAKE 3
    7521 7698 30 WARD SALESMAN SALES BLAKE 2

     


    9. SAS Data Step

     

    SAS Programming
    %%SAS sas
    
    
    PROC SORT DATA=emp OUT=EMP_1(RENAME=( mgr = MGR_EMPNO ) );
         BY mgr;
    RUN;
    
    PROC SORT DATA=emp OUT=EMP_2(RENAME=( empno = MGR_EMPNO ename = mrg_ename )
                                  KEEP=empno ename) NODUPKEY;
         BY empno;
    RUN;
    
    DATA STATSAS_2;
     MERGE EMP_1(IN=A) EMP_2(IN=B);
         BY MGR_EMPNO;
         IF A AND B;
    RUN;
    
    proc sort data=STATSAS_2 out=STATSAS_21;
         by deptno;
    run;
    
    proc sort data=dept out=dept_1;
         by deptno;
    run;
    
    DATA STATSAS_3; 
     merge STATSAS_21(in=a) dept_1(in=b);
         by deptno;
         if a;
    RUN;
    
    DATA STATSAS_4; 
     SET STATSAS_3;
         DO I=1 TO KOBS; 
            SET salgrade NOBS=KOBS POINT=I;
            IF  SAL >= losal AND SAL <= hisal THEN OUTPUT;
         END;
         DROP ename comm SAL hiredate LOC LOSAL HISAL;
    RUN;
    PROC SORT DATA=STATSAS_4;
          BY EMPNO;
    RUN;
    
    
    PROC PRINT data=STATSAS_4(obs=3);RUN;

     


    Results
    empno job MGR_EMPNO deptno mgr_ename dname grade
    7369 CLERK 7902 20 FORD RESEARCH 1
    7499 SALESMAN 7698 30 BLAKE SALES 3
    7521 SALESMAN 7698 30 BLAKE SALES 2

     


    10. Python Dfply Package

     

    Python Programming
    @pipe
    def inner_join_merge(df, other, left_on,right_on,suffixes=['_x','_y']):
    
        joined = df.merge(other, how='inner', left_on=left_on,
                          right_on=right_on , suffixes=suffixes)
        return joined
    
    emp.assign(foo=1)                                                           >> \
      inner_join_merge( emp  , left_on  = ["mgr"]     , right_on = ["empno"])   >> \
      inner_join_merge( dept , left_on  = ["deptno_x"], right_on = ["deptno"])  >> \
      full_join( salgrade.assign(foo=1), by='foo' )                             >> \
      filter_by( X.sal_x.between(X.losal, X.hisal ))                            >> \
      select(ends_with('_x'),X.ename_y, X.dname, X.grade )                      >> \
      arrange(X.deptno_x)                                                       >> \
      select(~X.job_x, ~X.hiredate_x, ~X.comm_x)                                >> \
      head()

     


    Results
    empno_x ename_x mgr_x sal_x deptno_x ename_y dname grade
    7782 CLARK 7839.0 2450 10 KING ACCOUNTING 4
    7934 MILLER 7782.0 1300 10 CLARK ACCOUNTING 2
    7369 SMITH 7902.0 800 20 FORD RESEARCH 1
    7566 JONES 7839.0 2975 20 KING RESEARCH 4
    7788 SCOTT 7566.0 3000 20 JONES RESEARCH 4

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 전처리 방법 리스트

    반응형

    댓글