본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

LEAD 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table]

by 기서무나구물 2021. 11. 26.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ LEAD Oracle Function ]

     


    LEAD함수는 분석 함수이다. 이 함수는 Self 조인 없이 동시에 테이블의 하나 이상의 행에 Access를 실행한다. 쿼리로부터 반환되는 행의 Series나 커서의 위치를 지정하면, LEAD함수는 그 위치보다 이후에 있는 주어진 물리적 Offset의 행에 ACCESS 한다.

     

     

     


    1. Oracle(오라클)

     

    개별 직무 내에서 바로 직후에 입사한 직원의 급여 정보를 반환한다.

     

    Oracle Programming
    SELECT hiredate, ename, sal,job,
           LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
    FROM   emp
    ORDER  BY job,hiredate

     

    Results
    HIREDATE	ENAME	SAL	JOB		PREV_SAL
    --------------------------------------------------------------
    03-DEC-81	FORD	3000	ANALYST		3000
    19-APR-87	SCOTT	3000	ANALYST		0
    17-DEC-80	SMITH	800	CLERK		950
    03-DEC-81	JAMES	950	CLERK		1300
    23-JAN-82	MILLER	1300	CLERK		1100
    23-MAY-87	ADAMS	1100	CLERK		0
    02-APR-81	JONES	2975	MANAGER		2850
    01-MAY-81	BLAKE	2850	MANAGER		2450
    09-JUN-81	CLARK	2450	MANAGER		0
    17-NOV-81	KING	5000	PRESIDENT	0
    20-FEB-81	ALLEN	1600	SALESMAN	1250
    22-FEB-81	WARD	1250	SALESMAN	1500
    08-SEP-81	TURNER	1500	SALESMAN	1250
    28-SEP-81	MARTIN	1250	SALESMAN	0

     


    2. Python Pandas(파이썬)

     

    Python Programming
    withmooc = emp.copy()
    
    withmooc['sal_lag'] = emp.sort_values(by='hiredate').groupby('job')['sal'].shift(-1,fill_value = 0)
    
    withmooc.sort_values(['job','hiredate']).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_lag
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	3000
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	0
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	950
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	1300
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	1100
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	0
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	2850
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	2975
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	0
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	0

     

     


    3. R Programming (R Package)

     

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    first_dat <- plyr::ddply(emp,.(job),transform,lead_value = dplyr::lead(sal, order_by = hiredate,default=0 ))
    
    first_dat[order(first_dat$job,first_dat$hiredate),]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno lead_value
    2   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       3000
    1   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          0
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        950
    5   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       1300
    6   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       1100
    4   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          0
    9   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       2850
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2975
    7   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          0
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          0
    11  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       1250
    12  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       1500
    14  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30       1250
    13  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30          0

     


     

    R Programming
    
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    first_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,lead_value = dplyr::lead(sal,default=0))
    
    first_dat[order(first_dat$job,first_dat$hiredate),]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno lead_value
    1   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       3000
    2   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          0
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        950
    4   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       1300
    5   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       1100
    6   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          0
    7   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       2850
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2975
    9   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          0
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          0
    11  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       1250
    12  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       1500
    13  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30       1250
    14  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30          0

     


    shift 함수

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    first_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,lead_value = shift(sal,-1,fill=0))
    
    first_dat[order(first_dat$job,first_dat$hiredate),]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno lead_value
    1   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       3000
    2   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          0
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        950
    4   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       1300
    5   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       1100
    6   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          0
    7   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       2850
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2975
    9   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          0
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          0
    11  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       1250
    12  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       1500
    13  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30       1250
    14  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30          0

     


     

    R Programming
    %%R
    
    withmooc<- emp[order(emp$hiredate),]
    
    withmooc['lead_value'] = with(withmooc , ave(x = sal, job, FUN =function(x) { dplyr::lead(x,default=0) }))
    
    withmooc[order(withmooc$job,withmooc$hiredate),]

     

    Results
    # A tibble: 14 x 9
       empno ename  job         mgr hiredate     sal  comm deptno lead_value
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>      <dbl>
     1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20       3000
     2  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20          0
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20        950
     4  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30       1300
     5  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10       1100
     6  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20          0
     7  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10       2850
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30       2975
     9  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20          0
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10          0
    11  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30       1250
    12  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30       1500
    13  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30       1250
    14  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30          0

     


    Purrr 패키지 : 리스트 데이터로 처리

     

    R Programming
    %%R
    
    withmooc <- emp[order(emp$job,emp$hiredate),]
    
    lead_dat <- as.data.frame(unlist(purrr::map( split(withmooc,withmooc$job), ~lead(.x$sal,default=0) )))
    
    colnames(lead_dat) <- c('lead_value')
    
    withmooc <- cbind(withmooc,lead_dat)
    withmooc[order(withmooc$job,withmooc$hiredate),]

     

    Results
              empno  ename       job  mgr   hiredate  sal comm deptno lead_value
    ANALYST1   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       3000
    ANALYST2   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          0
    CLERK1     7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        950
    CLERK2     7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       1300
    CLERK3     7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       1100
    CLERK4     7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          0
    MANAGER1   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       2850
    MANAGER2   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2975
    MANAGER3   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          0
    PRESIDENT  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          0
    SALESMAN1  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       1250
    SALESMAN2  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       1500
    SALESMAN3  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30       1250
    SALESMAN4  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30          0

     

     

     

     


    4. R Dplyr Package

     

    R Programming
    %%R
    
    emp %>%
        dplyr::group_by(job) %>%
        dplyr::mutate(sal_lag = lead(sal,n=1,default=0, order_by = hiredate)) %>%
        arrange(job,hiredate)

     

    Results
    # A tibble: 14 x 9
    # Groups:   job [5]
       empno ename  job         mgr hiredate     sal  comm deptno sal_lag
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>
     1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20    3000
     2  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20       0
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20     950
     4  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30    1300
     5  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10    1100
     6  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20       0
     7  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10    2850
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30    2975
     9  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20       0
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10       0
    11  7499 ALLEN  SALESMAN   7698 1981-02-20  1600   300     30    1250
    12  7521 WARD   SALESMAN   7698 1981-02-22  1250   500     30    1500
    13  7844 TURNER SALESMAN   7698 1981-09-08  1500     0     30    1250
    14  7654 MARTIN SALESMAN   7698 1981-09-28  1250  1400     30       0

     

     

     

     


    5. R sqldf Package

     

    R Programming
    %%R
    
    sqldf(" SELECT hiredate, ename, sal,job,
                   LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
            FROM emp
            ORDER BY job,hiredate ")

     

    Results
         hiredate  ename  sal       job prev_sal
    1  1981-12-03   FORD 3000   ANALYST     3000
    2  1982-12-09  SCOTT 3000   ANALYST        0
    3  1980-12-17  SMITH  800     CLERK      950
    4  1981-12-03  JAMES  950     CLERK     1300
    5  1982-01-23 MILLER 1300     CLERK     1100
    6  1983-01-12  ADAMS 1100     CLERK        0
    7  1981-01-09  CLARK 2450   MANAGER     2850
    8  1981-03-01  BLAKE 2850   MANAGER     2975
    9  1981-04-02  JONES 2975   MANAGER        0
    10 1981-11-17   KING 5000 PRESIDENT        0
    11 1981-02-20  ALLEN 1600  SALESMAN     1250
    12 1981-02-22   WARD 1250  SALESMAN     1500
    13 1981-09-08 TURNER 1500  SALESMAN     1250
    14 1981-09-28 MARTIN 1250  SALESMAN        0

     

     

     

     


    6. Python pandasql Package

     

    Python Programming
    ps.sqldf(" SELECT hiredate, ename, sal,job,     \
                      LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal     \
               FROM emp     \
               ORDER BY job,hiredate ")

     

    Results
    	hiredate	ename	sal	job		prev_sal
    0	1981/12/03	FORD	3000	ANALYST		3000
    1	1982/12/09	SCOTT	3000	ANALYST		0
    2	1980/12/17	SMITH	800	CLERK		950
    3	1981/12/03	JAMES	950	CLERK		1300
    4	1982/01/23	MILLER	1300	CLERK		1100
    5	1983/01/12	ADAMS	1100	CLERK		0
    6	1981/01/09	CLARK	2450	MANAGER		2850
    7	1981/03/01	BLAKE	2850	MANAGER		2975
    8	1981/04/02	JONES	2975	MANAGER		0
    9	1981/11/17	KING	5000	PRESIDENT	0
    10	1981/02/20	ALLEN	1600	SALESMAN	1250
    11	1981/02/22	WARD	1250	SALESMAN	1500
    12	1981/09/08	TURNER	1500	SALESMAN	1250
    13	1981/09/28	MARTIN	1250	SALESMAN	0

     

     

     

     


    7. R data.table Package

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_lead := lead(sal,n=1,default=0, order_by = hiredate), by=job][order(job,hiredate),]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_lead
     1:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20     3000
     2:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20        0
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20      950
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30     1300
     5:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10     1100
     6:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20        0
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10     2850
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30     2975
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20        0
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        0
    11:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30     1250
    12:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30     1500
    13:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30     1250
    14:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30        0

     


     

    R Programming
    %%R
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(hiredate) , lead.value := shift(sal,type='lead',fill=0), by=job][order(job,hiredate),]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno lead.value
     1:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       3000
     2:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20          0
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        950
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       1300
     5:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       1100
     6:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20          0
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       2850
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2975
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20          0
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          0
    11:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       1250
    12:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       1500
    13:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30       1250
    14:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30          0

     


    참 고

    • [stackoverflow 참고] Create multiple lead variables in data.table [링크]
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(hiredate) , lead.value:=c(sal[-.1],NA), by=job][order(job,hiredate),]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno lead.value
     1:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       3000
     2:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20         NA
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        950
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       1300
     5:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       1100
     6:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20         NA
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       2850
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       2975
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20         NA
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10         NA
    11:  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30       1250
    12:  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       1500
    13:  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30       1250
    14:  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30         NA

     


    8. Python DuckDB의 SQL

     

    Python Programming
    %%sql
      SELECT hiredate, ename, sal,job,
             LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
      FROM   emp
      ORDER  BY job,hiredate
      LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT hiredate, ename, sal,job,                                              \
                        LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal  \
                 FROM   emp                                                                   \
                 ORDER  BY job,hiredate                                                       \
                 LIMIT  6 ").df()

     

     

    Results
         hiredate   ename   sal      job  prev_sal
    0  1981/12/03    FORD  3000  ANALYST      3000
    1  1982/12/09   SCOTT  3000  ANALYST         0
    2  1980/12/17   SMITH   800    CLERK       950
    3  1981/12/03   JAMES   950    CLERK      1300
    4  1982/01/23  MILLER  1300    CLERK      1100
    5  1983/01/12   ADAMS  1100    CLERK         0

     

     


    a wonderful library in the middle of a shopping center in Seoul. (https://unsplash.com/photos/jpeTs-VfP68)

      --------------------------------------------  

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글