본문 바로가기
통계프로그램 비교 시리즈/오라클함수 비교(R & Python)

LAG 오라클 함수 [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table, DuckDB]

by 기서무나구물 2021. 11. 25.

포스팅 목차

    * 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크


    [ LAG Oracle Function ]

     


    LAG함수는 분석 함수이다. 이 함수는 Self 조인을 수행하지 않고 테이블에 존재하는 여러 개의 행에 접근하여 작업할 수 있다. 사용자가 지정한 횟수만큼 현재 기준 행의 앞에 위치한 행에 접근할 수 있다. 쿼리로부터 반환되는 행의 Series나 커서의 위치를 지정하면, LAG는 현재 기준 위치보다 이전에 지정한 행에 ACCESS를 실행한다.

     

     


    1. Oracle(오라클)

     

    다음 예제는 employees 테이블의 개별 직무 내에서 입사일을 기준으로 직전에 고용된 직원의 급여를 반환한다. 윈도우의 범위를 초과하여 반환되는 값이 없는 경우 0을 기본 값으로 설정한다.

     

    Oracle Programming
    SELECT hiredate, ename, job, sal,
           LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
    FROM   emp
    ORDER  BY job,hiredate

     

    Results
    HIREDATE	ENAME	JOB		SAL	PREV_SAL
    -------------------------------------------------------
    03-DEC-81	FORD	ANALYST		3000	0
    19-APR-87	SCOTT	ANALYST		3000	3000
    17-DEC-80	SMITH	CLERK		800	0
    03-DEC-81	JAMES	CLERK		950	800
    23-JAN-82	MILLER	CLERK		1300	950
    23-MAY-87	ADAMS	CLERK		1100	1300
    02-APR-81	JONES	MANAGER		2975	0
    01-MAY-81	BLAKE	MANAGER		2850	2975
    09-JUN-81	CLARK	MANAGER		2450	2850
    17-NOV-81	KING	PRESIDENT	5000	0
    20-FEB-81	ALLEN	SALESMAN	1600	0
    22-FEB-81	WARD	SALESMAN	1250	1600
    08-SEP-81	TURNER	SALESMAN	1500	1250
    28-SEP-81	MARTIN	SALESMAN	1250	1500

     


    2. Python Pandas(파이썬)

     

    다음 예제는 employees 테이블의 개별 직무내에서 입사일을 기준으로 직전에 고용된 직원의 급여를 반환한다. 윈도우의 범위를 초과하여 반환되는 값이 없는 경우 0을 기본 값으로 설정한다.

     

    Python Programming
    withmooc = emp.copy()
    
    withmooc['sal_lag'] = emp.sort_values(by='hiredate').groupby('job')['sal'].shift(1,fill_value = 0)
    
    withmooc.sort_values(['job','hiredate']).head(10)

     

    Results
    	empno	ename	job		mgr	hiredate	sal	comm	deptno	sal_lag
    12	7902	FORD	ANALYST		7566.0	1981/12/03	3000	NaN	20	0
    7	7788	SCOTT	ANALYST		7566.0	1982/12/09	3000	NaN	20	3000
    0	7369	SMITH	CLERK		7902.0	1980/12/17	800	NaN	20	0
    11	7900	JAMES	CLERK		7698.0	1981/12/03	950	NaN	30	800
    13	7934	MILLER	CLERK		7782.0	1982/01/23	1300	NaN	10	950
    10	7876	ADAMS	CLERK		7788.0	1983/01/12	1100	NaN	20	1300
    6	7782	CLARK	MANAGER		7839.0	1981/01/09	2450	NaN	10	0
    5	7698	BLAKE	MANAGER		7839.0	1981/03/01	2850	NaN	30	2450
    3	7566	JONES	MANAGER		7839.0	1981/04/02	2975	NaN	20	2850
    8	7839	KING	PRESIDENT	NaN	1981/11/17	5000	NaN	10	0​

     

     


    3. R Programming (R Package)

     

     

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    lag_dat <- plyr::ddply(emp,.(job), transform, lag_value = dplyr::lag(sal, order_by = hiredate, default=0 ))
    
    lag_dat[order(lag_dat$job,lag_dat$hiredate),][1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno lag_value
    2   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20         0
    1   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20      3000
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20         0
    5   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       800
    6   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       950
    4   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      1300
    9   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10         0
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      2450
    7   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      2850
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10         0

     


     

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    lag_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,first_value = dplyr::lag(sal, default=0))
    
    lag_dat[order(lag_dat$job,lag_dat$hiredate),][1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno first_value
    1   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20           0
    2   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20        3000
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20           0
    4   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30         800
    5   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10         950
    6   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20        1300
    7   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10           0
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30        2450
    9   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20        2850
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10           0

     


    shift 함수

    R Programming
    %%R
    
    library(plyr)   # dplyr 보다 먼저 로딩
    library(dplyr)
    
    lag_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,first_value = shift(sal))
    
    lag_dat[order(lag_dat$job,lag_dat$hiredate),][1:10, ]

     

    Results
       empno  ename       job  mgr   hiredate  sal comm deptno first_value
    1   7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20          NA
    2   7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20        3000
    3   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20          NA
    4   7900  JAMES     CLERK 7698 1981-12-03  950   NA     30         800
    5   7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10         950
    6   7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20        1300
    7   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10          NA
    8   7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30        2450
    9   7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20        2850
    10  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10          NA

     


    dplyr::lag()

    R Programming
    %%R
    
    withmooc<- emp[order(emp$hiredate),]
    
    withmooc['lag_value'] = with(withmooc , ave(x = sal, job, FUN =function(x) { dplyr::lag(x,default=0) }))
    
    withmooc[order(withmooc$job,withmooc$hiredate),][1:10, ]

     

    Results
    # A tibble: 10 x 9
       empno ename  job         mgr hiredate     sal  comm deptno lag_value
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>     <dbl>
     1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20         0
     2  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20      3000
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20         0
     4  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30       800
     5  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10       950
     6  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20      1300
     7  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10         0
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30      2450
     9  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20      2850
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10         0

     


    Purrr 패키지 : 리스트 데이터로 처리

    R Programming
    %%R
    
    withmooc <- emp[order(emp$hiredate),]
    
    lag_dat <- as.data.frame(unlist(purrr::map( split(withmooc,withmooc$job), ~lag(.x$sal) )))
    
    colnames(lag_dat) <- c('lag_value')
    
    cbind(withmooc,lag_dat)[1:10, ]

     

    Results
              empno  ename       job  mgr   hiredate  sal comm deptno lag_value
    ANALYST1   7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        NA
    ANALYST2   7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10      3000
    CLERK1     7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30        NA
    CLERK2     7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30       800
    CLERK3     7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30       950
    CLERK4     7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      1300
    MANAGER1   7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30        NA
    MANAGER2   7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30      2450
    MANAGER3   7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10      2850
    PRESIDENT  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30        NA

     

     


    4. R Dplyr Package

     

    dplyr::lag()

    R Programming
    %%R
    
    emp %>%
        dplyr::group_by(job) %>%
        dplyr::mutate(sal_lag = dplyr::lag(sal,n=1,default=0, order_by = hiredate)) %>%
        arrange(job,hiredate) %>%
        head(10)

     

    Results
    # A tibble: 10 x 9
    # Groups:   job [4]
       empno ename  job         mgr hiredate     sal  comm deptno sal_lag
       <dbl> <chr>  <chr>     <dbl> <date>     <dbl> <dbl>  <dbl>   <dbl>
     1  7902 FORD   ANALYST    7566 1981-12-03  3000    NA     20       0
     2  7788 SCOTT  ANALYST    7566 1982-12-09  3000    NA     20    3000
     3  7369 SMITH  CLERK      7902 1980-12-17   800    NA     20       0
     4  7900 JAMES  CLERK      7698 1981-12-03   950    NA     30     800
     5  7934 MILLER CLERK      7782 1982-01-23  1300    NA     10     950
     6  7876 ADAMS  CLERK      7788 1983-01-12  1100    NA     20    1300
     7  7782 CLARK  MANAGER    7839 1981-01-09  2450    NA     10       0
     8  7698 BLAKE  MANAGER    7839 1981-03-01  2850    NA     30    2450
     9  7566 JONES  MANAGER    7839 1981-04-02  2975    NA     20    2850
    10  7839 KING   PRESIDENT    NA 1981-11-17  5000    NA     10       0

     

     


    5. R sqldf Package

     

    lag()

    R Programming
    %%R
    
    sqldf(" SELECT hiredate, ename, job, sal,
                   LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
            FROM emp
            ORDER BY job,hiredate ")[1:10, ]

     

    Results
         hiredate  ename       job  sal prev_sal
    1  1981-12-03   FORD   ANALYST 3000        0
    2  1982-12-09  SCOTT   ANALYST 3000     3000
    3  1980-12-17  SMITH     CLERK  800        0
    4  1981-12-03  JAMES     CLERK  950      800
    5  1982-01-23 MILLER     CLERK 1300      950
    6  1983-01-12  ADAMS     CLERK 1100     1300
    7  1981-01-09  CLARK   MANAGER 2450        0
    8  1981-03-01  BLAKE   MANAGER 2850     2450
    9  1981-04-02  JONES   MANAGER 2975     2850
    10 1981-11-17   KING PRESIDENT 5000        0

     

     

     


    6. Python pandasql Package

     

    lag()

    Python Programming
    ps.sqldf(" SELECT hiredate, ename, sal,job,     \
                      LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal     \
               FROM emp     \
               ORDER BY job,hiredate ").head(10)

     

    Results
    	hiredate	ename	sal	job		prev_sal
    0	1981/12/03	FORD	3000	ANALYST		0
    1	1982/12/09	SCOTT	3000	ANALYST		3000
    2	1980/12/17	SMITH	800	CLERK		0
    3	1981/12/03	JAMES	950	CLERK		800
    4	1982/01/23	MILLER	1300	CLERK		950
    5	1983/01/12	ADAMS	1100	CLERK		1300
    6	1981/01/09	CLARK	2450	MANAGER		0
    7	1981/03/01	BLAKE	2850	MANAGER		2450
    8	1981/04/02	JONES	2975	MANAGER		2850
    9	1981/11/17	KING	5000	PRESIDENT	0

     

     

     


    7. R data.table Package

     

    dplyr::lag()

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[, sal_lag := dplyr::lag(sal,n=1,default=0, order_by = hiredate), by=job][order(job,hiredate),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno sal_lag
     1:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20       0
     2:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20    3000
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20       0
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30     800
     5:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10     950
     6:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20    1300
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10       0
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30    2450
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20    2850
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10       0

     


    shift()

    R Programming
    %%R
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[order(hiredate) , lag.value := shift(sal,type='lag',fill=0), by=job][order(job,hiredate),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno lag.value
     1:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20         0
     2:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20      3000
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20         0
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       800
     5:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       950
     6:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      1300
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10         0
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      2450
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      2850
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10         0

     


    [-.N]

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    # NA를 0으로 변경 가능.
    DT[order(hiredate) , lag.value:=c(NA, sal[-.N]), by=job][order(job,hiredate),][1:10, ]

     

    Results
        empno  ename       job  mgr   hiredate  sal comm deptno lag.value
     1:  7902   FORD   ANALYST 7566 1981-12-03 3000   NA     20        NA
     2:  7788  SCOTT   ANALYST 7566 1982-12-09 3000   NA     20      3000
     3:  7369  SMITH     CLERK 7902 1980-12-17  800   NA     20        NA
     4:  7900  JAMES     CLERK 7698 1981-12-03  950   NA     30       800
     5:  7934 MILLER     CLERK 7782 1982-01-23 1300   NA     10       950
     6:  7876  ADAMS     CLERK 7788 1983-01-12 1100   NA     20      1300
     7:  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10        NA
     8:  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30      2450
     9:  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20      2850
    10:  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10        NA

     


    8. Python DuckDB의 SQL

     

    Python Programming
    %%sql
      SELECT hiredate, ename, job, sal,
             LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
      FROM   emp
      ORDER  BY job,hiredate
      LIMIT  6

     

    Python Programming
    duckdb.sql(" SELECT hiredate, ename, job, sal,                                               \
                        LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal     \
                 FROM   emp                                                                      \
                 ORDER  BY job,hiredate                                                          \
                 LIMIT  6 ").df()

     

     

    Results
         hiredate   ename      job   sal  prev_sal
    0  1981/12/03    FORD  ANALYST  3000         0
    1  1982/12/09   SCOTT  ANALYST  3000      3000
    2  1980/12/17   SMITH    CLERK   800         0
    3  1981/12/03   JAMES    CLERK   950       800
    4  1982/01/23  MILLER    CLERK  1300       950
    5  1983/01/12   ADAMS    CLERK  1100      1300

     

     


    Dongdaemun-gu, South Korea (https://unsplash.com/photos/vw_XqUC1ORQ)

      --------------------------------------------  

     

    [Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크

     

    오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크

     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크
    반응형

    댓글