포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ LEAD Oracle Function ]
LEAD함수는 분석 함수이다. 이 함수는 Self 조인 없이 동시에 테이블의 하나 이상의 행에 Access를 실행한다. 쿼리로부터 반환되는 행의 Series나 커서의 위치를 지정하면, LEAD함수는 그 위치보다 이후에 있는 주어진 물리적 Offset의 행에 ACCESS 한다.
- 함수 설명 : LAST_DAY 오라클 함수 링크
1. Oracle(오라클)
개별 직무 내에서 바로 직후에 입사한 직원의 급여 정보를 반환한다.
Oracle Programming |
SELECT hiredate, ename, sal,job,
LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
FROM emp
ORDER BY job,hiredate
Results |
HIREDATE ENAME SAL JOB PREV_SAL
--------------------------------------------------------------
03-DEC-81 FORD 3000 ANALYST 3000
19-APR-87 SCOTT 3000 ANALYST 0
17-DEC-80 SMITH 800 CLERK 950
03-DEC-81 JAMES 950 CLERK 1300
23-JAN-82 MILLER 1300 CLERK 1100
23-MAY-87 ADAMS 1100 CLERK 0
02-APR-81 JONES 2975 MANAGER 2850
01-MAY-81 BLAKE 2850 MANAGER 2450
09-JUN-81 CLARK 2450 MANAGER 0
17-NOV-81 KING 5000 PRESIDENT 0
20-FEB-81 ALLEN 1600 SALESMAN 1250
22-FEB-81 WARD 1250 SALESMAN 1500
08-SEP-81 TURNER 1500 SALESMAN 1250
28-SEP-81 MARTIN 1250 SALESMAN 0
2. Python Pandas(파이썬)
Python Programming |
withmooc = emp.copy()
withmooc['sal_lag'] = emp.sort_values(by='hiredate').groupby('job')['sal'].shift(-1,fill_value = 0)
withmooc.sort_values(['job','hiredate']).head(10)
Results |
empno ename job mgr hiredate sal comm deptno sal_lag
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 3000
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 0
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 950
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 1300
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1100
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 0
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 2850
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 2975
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 0
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 0
3. R Programming (R Package)
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
first_dat <- plyr::ddply(emp,.(job),transform,lead_value = dplyr::lead(sal, order_by = hiredate,default=0 ))
first_dat[order(first_dat$job,first_dat$hiredate),]
Results |
empno ename job mgr hiredate sal comm deptno lead_value
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
6 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
14 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
13 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
first_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,lead_value = dplyr::lead(sal,default=0))
first_dat[order(first_dat$job,first_dat$hiredate),]
Results |
empno ename job mgr hiredate sal comm deptno lead_value
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
13 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
14 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
shift 함수
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
first_dat <- plyr::ddply(emp[order(emp$hiredate),] ,.(job),transform,lead_value = shift(sal,-1,fill=0))
first_dat[order(first_dat$job,first_dat$hiredate),]
Results |
empno ename job mgr hiredate sal comm deptno lead_value
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
13 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
14 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
R Programming |
%%R
withmooc<- emp[order(emp$hiredate),]
withmooc['lead_value'] = with(withmooc , ave(x = sal, job, FUN =function(x) { dplyr::lead(x,default=0) }))
withmooc[order(withmooc$job,withmooc$hiredate),]
Results |
# A tibble: 14 x 9
empno ename job mgr hiredate sal comm deptno lead_value
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
13 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
14 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
Purrr 패키지 : 리스트 데이터로 처리
R Programming |
%%R
withmooc <- emp[order(emp$job,emp$hiredate),]
lead_dat <- as.data.frame(unlist(purrr::map( split(withmooc,withmooc$job), ~lead(.x$sal,default=0) )))
colnames(lead_dat) <- c('lead_value')
withmooc <- cbind(withmooc,lead_dat)
withmooc[order(withmooc$job,withmooc$hiredate),]
Results |
empno ename job mgr hiredate sal comm deptno lead_value
ANALYST1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
ANALYST2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
CLERK1 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
CLERK2 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
CLERK3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
CLERK4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
MANAGER1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
MANAGER2 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
MANAGER3 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
PRESIDENT 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
SALESMAN1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
SALESMAN2 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
SALESMAN3 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
SALESMAN4 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
4. R Dplyr Package
R Programming |
%%R
emp %>%
dplyr::group_by(job) %>%
dplyr::mutate(sal_lag = lead(sal,n=1,default=0, order_by = hiredate)) %>%
arrange(job,hiredate)
Results |
# A tibble: 14 x 9
# Groups: job [5]
empno ename job mgr hiredate sal comm deptno sal_lag
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
4 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
5 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
6 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
7 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
9 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
13 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
14 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
5. R sqldf Package
R Programming |
%%R
sqldf(" SELECT hiredate, ename, sal,job,
LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
FROM emp
ORDER BY job,hiredate ")
Results |
hiredate ename sal job prev_sal
1 1981-12-03 FORD 3000 ANALYST 3000
2 1982-12-09 SCOTT 3000 ANALYST 0
3 1980-12-17 SMITH 800 CLERK 950
4 1981-12-03 JAMES 950 CLERK 1300
5 1982-01-23 MILLER 1300 CLERK 1100
6 1983-01-12 ADAMS 1100 CLERK 0
7 1981-01-09 CLARK 2450 MANAGER 2850
8 1981-03-01 BLAKE 2850 MANAGER 2975
9 1981-04-02 JONES 2975 MANAGER 0
10 1981-11-17 KING 5000 PRESIDENT 0
11 1981-02-20 ALLEN 1600 SALESMAN 1250
12 1981-02-22 WARD 1250 SALESMAN 1500
13 1981-09-08 TURNER 1500 SALESMAN 1250
14 1981-09-28 MARTIN 1250 SALESMAN 0
6. Python pandasql Package
Python Programming |
ps.sqldf(" SELECT hiredate, ename, sal,job, \
LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal \
FROM emp \
ORDER BY job,hiredate ")
Results |
hiredate ename sal job prev_sal
0 1981/12/03 FORD 3000 ANALYST 3000
1 1982/12/09 SCOTT 3000 ANALYST 0
2 1980/12/17 SMITH 800 CLERK 950
3 1981/12/03 JAMES 950 CLERK 1300
4 1982/01/23 MILLER 1300 CLERK 1100
5 1983/01/12 ADAMS 1100 CLERK 0
6 1981/01/09 CLARK 2450 MANAGER 2850
7 1981/03/01 BLAKE 2850 MANAGER 2975
8 1981/04/02 JONES 2975 MANAGER 0
9 1981/11/17 KING 5000 PRESIDENT 0
10 1981/02/20 ALLEN 1600 SALESMAN 1250
11 1981/02/22 WARD 1250 SALESMAN 1500
12 1981/09/08 TURNER 1500 SALESMAN 1250
13 1981/09/28 MARTIN 1250 SALESMAN 0
7. R data.table Package
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_lead := lead(sal,n=1,default=0, order_by = hiredate), by=job][order(job,hiredate),]
Results |
empno ename job mgr hiredate sal comm deptno sal_lead
1: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
2: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
5: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
6: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
11: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
13: 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
14: 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(hiredate) , lead.value := shift(sal,type='lead',fill=0), by=job][order(job,hiredate),]
Results |
empno ename job mgr hiredate sal comm deptno lead.value
1: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
2: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 0
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
5: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
6: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 0
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 0
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 0
11: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
13: 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
14: 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 0
참 고
- [stackoverflow 참고] Create multiple lead variables in data.table [링크]
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[order(hiredate) , lead.value:=c(sal[-.1],NA), by=job][order(job,hiredate),]
Results |
empno ename job mgr hiredate sal comm deptno lead.value
1: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 3000
2: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 NA
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 950
4: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1300
5: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1100
6: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 NA
7: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2850
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 2975
9: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 NA
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 NA
11: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 1250
12: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 1500
13: 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 1250
14: 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 NA
8. Python DuckDB의 SQL
Python Programming |
%%sql
SELECT hiredate, ename, sal,job,
LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal
FROM emp
ORDER BY job,hiredate
LIMIT 6
Python Programming |
duckdb.sql(" SELECT hiredate, ename, sal,job, \
LEAD(sal, 1, 0) OVER (partition by job ORDER BY hiredate) AS prev_sal \
FROM emp \
ORDER BY job,hiredate \
LIMIT 6 ").df()
Results |
hiredate ename sal job prev_sal
0 1981/12/03 FORD 3000 ANALYST 3000
1 1982/12/09 SCOTT 3000 ANALYST 0
2 1980/12/17 SMITH 800 CLERK 950
3 1981/12/03 JAMES 950 CLERK 1300
4 1982/01/23 MILLER 1300 CLERK 1100
5 1983/01/12 ADAMS 1100 CLERK 0
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글