포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ LAG Oracle Function ]
LAG함수는 분석 함수이다. 이 함수는 Self 조인을 수행하지 않고 테이블에 존재하는 여러 개의 행에 접근하여 작업할 수 있다. 사용자가 지정한 횟수만큼 현재 기준 행의 앞에 위치한 행에 접근할 수 있다. 쿼리로부터 반환되는 행의 Series나 커서의 위치를 지정하면, LAG는 현재 기준 위치보다 이전에 지정한 행에 ACCESS를 실행한다.
- 함수 설명 : LAG 오라클 함수 링크
1. Oracle(오라클)
다음 예제는 employees 테이블의 개별 직무 내에서 입사일을 기준으로 직전에 고용된 직원의 급여를 반환한다. 윈도우의 범위를 초과하여 반환되는 값이 없는 경우 0을 기본 값으로 설정한다.
Oracle Programming |
SELECT hiredate, ename, job, sal,
LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
FROM emp
ORDER BY job,hiredate
Results |
HIREDATE ENAME JOB SAL PREV_SAL
-------------------------------------------------------
03-DEC-81 FORD ANALYST 3000 0
19-APR-87 SCOTT ANALYST 3000 3000
17-DEC-80 SMITH CLERK 800 0
03-DEC-81 JAMES CLERK 950 800
23-JAN-82 MILLER CLERK 1300 950
23-MAY-87 ADAMS CLERK 1100 1300
02-APR-81 JONES MANAGER 2975 0
01-MAY-81 BLAKE MANAGER 2850 2975
09-JUN-81 CLARK MANAGER 2450 2850
17-NOV-81 KING PRESIDENT 5000 0
20-FEB-81 ALLEN SALESMAN 1600 0
22-FEB-81 WARD SALESMAN 1250 1600
08-SEP-81 TURNER SALESMAN 1500 1250
28-SEP-81 MARTIN SALESMAN 1250 1500
2. Python Pandas(파이썬)
다음 예제는 employees 테이블의 개별 직무내에서 입사일을 기준으로 직전에 고용된 직원의 급여를 반환한다. 윈도우의 범위를 초과하여 반환되는 값이 없는 경우 0을 기본 값으로 설정한다.
Python Programming |
withmooc = emp.copy()
withmooc['sal_lag'] = emp.sort_values(by='hiredate').groupby('job')['sal'].shift(1,fill_value = 0)
withmooc.sort_values(['job','hiredate']).head(10)
Results |
empno ename job mgr hiredate sal comm deptno sal_lag
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 0
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 3000
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 0
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 800
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 950
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 1300
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 0
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 2450
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 2850
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0
3. R Programming (R Package)
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
lag_dat <- plyr::ddply(emp,.(job), transform, lag_value = dplyr::lag(sal, order_by = hiredate, default=0 ))
lag_dat[order(lag_dat$job,lag_dat$hiredate),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno lag_value
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
6 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
lag_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,first_value = dplyr::lag(sal, default=0))
lag_dat[order(lag_dat$job,lag_dat$hiredate),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno first_value
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
shift 함수
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
lag_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,first_value = shift(sal))
lag_dat[order(lag_dat$job,lag_dat$hiredate),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno first_value
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 NA
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 NA
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 NA
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 NA
dplyr::lag()
R Programming |
%%R
withmooc<- emp[order(emp$hiredate),]
withmooc['lag_value'] = with(withmooc , ave(x = sal, job, FUN =function(x) { dplyr::lag(x,default=0) }))
withmooc[order(withmooc$job,withmooc$hiredate),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno lag_value
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
Purrr 패키지 : 리스트 데이터로 처리
R Programming |
%%R
withmooc <- emp[order(emp$hiredate),]
lag_dat <- as.data.frame(unlist(purrr::map( split(withmooc,withmooc$job), ~lag(.x$sal) )))
colnames(lag_dat) <- c('lag_value')
cbind(withmooc,lag_dat)[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno lag_value
ANALYST1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 NA
ANALYST2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 3000
CLERK1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 NA
CLERK2 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 800
CLERK3 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 950
CLERK4 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1300
MANAGER1 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 NA
MANAGER2 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 2450
MANAGER3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 2850
PRESIDENT 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 NA
4. R Dplyr Package
dplyr::lag()
R Programming |
%%R
emp %>%
dplyr::group_by(job) %>%
dplyr::mutate(sal_lag = dplyr::lag(sal,n=1,default=0, order_by = hiredate)) %>%
arrange(job,hiredate) %>%
head(10)
Results |
# A tibble: 10 x 9
# Groups: job [4]
empno ename job mgr hiredate sal comm deptno sal_lag
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
5. R sqldf Package
lag()
R Programming |
%%R
sqldf(" SELECT hiredate, ename, job, sal,
LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
FROM emp
ORDER BY job,hiredate ")[1:10, ]
Results |
hiredate ename job sal prev_sal
1 1981-12-03 FORD ANALYST 3000 0
2 1982-12-09 SCOTT ANALYST 3000 3000
3 1980-12-17 SMITH CLERK 800 0
4 1981-12-03 JAMES CLERK 950 800
5 1982-01-23 MILLER CLERK 1300 950
6 1983-01-12 ADAMS CLERK 1100 1300
7 1981-01-09 CLARK MANAGER 2450 0
8 1981-03-01 BLAKE MANAGER 2850 2450
9 1981-04-02 JONES MANAGER 2975 2850
10 1981-11-17 KING PRESIDENT 5000 0
6. Python pandasql Package
lag()
Python Programming |
ps.sqldf(" SELECT hiredate, ename, sal,job, \
LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal \
FROM emp \
ORDER BY job,hiredate ").head(10)
Results |
hiredate ename sal job prev_sal
0 1981/12/03 FORD 3000 ANALYST 0
1 1982/12/09 SCOTT 3000 ANALYST 3000
2 1980/12/17 SMITH 800 CLERK 0
3 1981/12/03 JAMES 950 CLERK 800
4 1982/01/23 MILLER 1300 CLERK 950
5 1983/01/12 ADAMS 1100 CLERK 1300
6 1981/01/09 CLARK 2450 MANAGER 0
7 1981/03/01 BLAKE 2850 MANAGER 2450
8 1981/04/02 JONES 2975 MANAGER 2850
9 1981/11/17 KING 5000 PRESIDENT 0
7. R data.table Package
dplyr::lag()
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_lag := dplyr::lag(sal,n=1,default=0, order_by = hiredate), by=job][order(job,hiredate),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_lag
1: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
2: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
5: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
6: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
shift()
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(hiredate) , lag.value := shift(sal,type='lag',fill=0), by=job][order(job,hiredate),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno lag.value
1: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 0
2: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 0
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
5: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
6: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 0
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
[-.N]
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
# NA를 0으로 변경 가능.
DT[order(hiredate) , lag.value:=c(NA, sal[-.N]), by=job][order(job,hiredate),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno lag.value
1: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 NA
2: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 3000
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 NA
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 800
5: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 950
6: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1300
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 NA
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2450
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 2850
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 NA
8. Python DuckDB의 SQL
Python Programming |
%%sql
SELECT hiredate, ename, job, sal,
LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
FROM emp
ORDER BY job,hiredate
LIMIT 6
Python Programming |
duckdb.sql(" SELECT hiredate, ename, job, sal, \
LAG(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal \
FROM emp \
ORDER BY job,hiredate \
LIMIT 6 ").df()
Results |
hiredate ename job sal prev_sal
0 1981/12/03 FORD ANALYST 3000 0
1 1982/12/09 SCOTT ANALYST 3000 3000
2 1980/12/17 SMITH CLERK 800 0
3 1981/12/03 JAMES CLERK 950 800
4 1982/01/23 MILLER CLERK 1300 950
5 1983/01/12 ADAMS CLERK 1100 1300
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글