포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ LAST_VALUE Oracle Function ]
FIRST_VALUE 분석 함수는 윈도우에서 정렬된 값 중에서 첫 번째 값을 반환하고, LAST_VALUE 분석 함수는 윈도우에서 정렬된 값 중에서 마지막 값을 반환한다.
LAST_VALUE함수는 분석함수로 순서형 집합에서 마지막 값을 반환한다. 만약 집합내의 마지막 값이 Null이라면, 그 함수는 IGNORE NULL 옵션을 지정하지 않는 한 Null을 반환한다. 이 설정은 데이터의 정밀화에 유용하다. 만약 IGNORE NULL을 지정하면, 그때 LAST_VALUE함수는 집합 내에서 최초 Non-Null 값을 반환하거나, 만약 모든 값이 Null이라면 NULL을 반환한다.
expr에 대하여 LAST_VALUE 또는 임의의 다른 분석 함수를 이용할수 없다. 분석함수는 중첩할 수 없지만, expr에 대하여 다른 이미 만들어진 함수 표현을 이용할 수 있다.
- 함수 설명 : LAST_DAY 오라클 함수 링크
1. Oracle(오라클)
개별 직무 내에서 급여(sal)가 가장 적은 직원의 이름을 반환한다.
Oracle Programming |
SELECT ENAME,
JOB,
SAL,
last_value(ENAME) over (partition by job order by sal desc) LAST_SALARY
FROM emp
order by job, sal desc
Results |
ENAME JOB SAL LAST_SALARY
--------------------------------------
FORD ANALYST 3000 SCOTT
SCOTT ANALYST 3000 SCOTT
MILLER CLERK 1300 MILLER
ADAMS CLERK 1100 ADAMS
JAMES CLERK 950 JAMES
SMITH CLERK 800 SMITH
JONES MANAGER 2975 JONES
BLAKE MANAGER 2850 BLAKE
CLARK MANAGER 2450 CLARK
KING PRESIDENT 5000 KING
ALLEN SALESMAN 1600 ALLEN
TURNER SALESMAN 1500 TURNER
MARTIN SALESMAN 1250 WARD
WARD SALESMAN 1250 WARD
2. Python Pandas(파이썬)
개별 직무 내에서 급여(sal)가 가장 적은 직원의 이름을 반환한다.
Python Programming |
withmooc = emp.copy()
group_first = withmooc.sort_values('sal',ascending = False).groupby('job')['ename'].last()
withmooc['Time'] = withmooc['job'].map(group_first)
withmooc.sort_values(['job','sal'],ascending=[True, False]).head(10)
Results |
empno ename job mgr hiredate sal comm deptno Time
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 FORD
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 FORD
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 SMITH
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 SMITH
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 SMITH
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 SMITH
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 CLARK
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 CLARK
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 CLARK
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 KING
Python Programming |
withmooc = emp.copy()
withmooc['first_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform(lambda x:x.iloc[-1])
withmooc.sort_values(['job','sal'],ascending = [True,False]).head(10)
Results |
empno ename job mgr hiredate sal comm deptno first_value
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 FORD
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 FORD
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 SMITH
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 SMITH
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 SMITH
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 SMITH
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 CLARK
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 CLARK
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 CLARK
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 KING
transform('last')
Python Programming |
withmooc = emp.copy()
withmooc['last_value'] = emp.sort_values('sal', ascending=False).groupby('job')['ename'].transform('last')
withmooc.sort_values(['job','sal'],ascending = [True,False]).head(10)
Results |
empno ename job mgr hiredate sal comm deptno last_value
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 FORD
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 FORD
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 SMITH
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 SMITH
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 SMITH
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 SMITH
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 CLARK
5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 CLARK
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 CLARK
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 KING
[참고] rank()
Python Programming |
withmooc = emp.copy()
withmooc.sort_values('sal', ascending=True).groupby('job').apply(lambda x : x[ x['ename'].rank() ==1 ]["ename"])
Results |
job
ANALYST 12 FORD
CLERK 10 ADAMS
MANAGER 5 BLAKE
PRESIDENT 8 KING
SALESMAN 1 ALLEN
Name: ename, dtype: object
min
Python Programming |
emp.sort_values('sal',ascending = False).groupby('job')['ename'].apply(min)
Results |
job
ANALYST FORD
CLERK ADAMS
MANAGER BLAKE
PRESIDENT KING
SALESMAN ALLEN
Name: ename, dtype: object
3. R Programming (R Package)
dplyr::last()
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
plyr::ddply(emp,.(job),transform,last_value = dplyr::last(ename, order_by = desc(sal) ))[1:12, ]
Results |
empno ename job mgr hiredate sal comm deptno last_value
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 FORD
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 FORD
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 CLARK
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 CLARK
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 CLARK
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 MARTIN
12 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 MARTIN
dplyr::last(x) 와 max()
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
withmooc['last_value_1'] = with(withmooc, ave(x = ename, job, FUN =function(x) { dplyr::last(x) }))
withmooc['last_value_2'] = with(emp[order(-emp$sal),], ave(x = ename, job, FUN =function(x) { x[max(which(!is.na(x)))] }))
withmooc[order(withmooc$job),c('empno','ename','job','sal','last_value_1','last_value_2')][1:12, ]
Results |
# A tibble: 12 x 6
empno ename job sal last_value_1 last_value_2
<dbl> <chr> <chr> <dbl> <chr> <chr>
1 7788 SCOTT ANALYST 3000 FORD FORD
2 7902 FORD ANALYST 3000 FORD FORD
3 7934 MILLER CLERK 1300 SMITH SMITH
4 7876 ADAMS CLERK 1100 SMITH SMITH
5 7900 JAMES CLERK 950 SMITH SMITH
6 7369 SMITH CLERK 800 SMITH SMITH
7 7566 JONES MANAGER 2975 CLARK CLARK
8 7698 BLAKE MANAGER 2850 CLARK CLARK
9 7782 CLARK MANAGER 2450 CLARK CLARK
10 7839 KING PRESIDENT 5000 KING KING
11 7499 ALLEN SALESMAN 1600 MARTIN MARTIN
12 7844 TURNER SALESMAN 1500 MARTIN MARTIN
dplyr::last(x) 와 max()
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
withmooc['last_value_1'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) { dplyr::last(x) } )
withmooc['last_value_2'] = ave(x = withmooc$ename, withmooc$job, FUN = function (x) { x[max(which(!is.na(x)))] } )
withmooc[order(withmooc$job),c('empno','ename','job','sal','last_value_1','last_value_2')][1:10, ]
Results |
# A tibble: 10 x 6
empno ename job sal last_value_1 last_value_2
<dbl> <chr> <chr> <dbl> <chr> <chr>
1 7788 SCOTT ANALYST 3000 FORD FORD
2 7902 FORD ANALYST 3000 FORD FORD
3 7934 MILLER CLERK 1300 SMITH SMITH
4 7876 ADAMS CLERK 1100 SMITH SMITH
5 7900 JAMES CLERK 950 SMITH SMITH
6 7369 SMITH CLERK 800 SMITH SMITH
7 7566 JONES MANAGER 2975 CLARK CLARK
8 7698 BLAKE MANAGER 2850 CLARK CLARK
9 7782 CLARK MANAGER 2450 CLARK CLARK
10 7839 KING PRESIDENT 5000 KING KING
last()
R Programming |
%%R
withmooc <- emp[order(emp$job, -emp$sal),]
last_var <- with(withmooc, purrr::map_chr(split(withmooc,withmooc$job), ~last(.x$ename)) )
data.frame(withmooc, last_value =rep(last_var, table(withmooc$job)))[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno last_value
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 FORD
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 FORD
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 CLARK
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 CLARK
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 CLARK
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
tapply 와 last
R Programming |
%%R
withmooc <- emp[order(emp$job, -emp$sal),]
Last_var <- with(withmooc, tapply(ename, job, last)) # head, 1
print(Last_var)
# First_var <- with(withmooc[order(-withmooc$sal),], tapply(ename, job, tail, 1))
data.frame(withmooc, First_value =rep(Last_var, table(withmooc$job)))[1:10, ]
# with(withmooc[order(-withmooc$sal),], data.frame(emp, max.per.group=rep(tapply(ename, job, last), table(job))) )
Results |
ANALYST CLERK MANAGER PRESIDENT SALESMAN
"FORD" "SMITH" "CLARK" "KING" "MARTIN"
empno ename job mgr hiredate sal comm deptno First_value
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 FORD
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 FORD
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 CLARK
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 CLARK
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 CLARK
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
데이터 Merge 방식
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
# split 되는 그룹내의 관측치 순서 보존을 위하여 데이터와 group 데이터 양쪽 재 정렬 필요
mapping <- t( purrr::map_dfc( base::split(withmooc[order(-withmooc$sal),],withmooc[order(-withmooc$sal),]$job) , ~last(.x$ename) ) )
trans_data <- melt(mapping, value.name="last_ename")
merge(withmooc,trans_data,by.x='job',by.y='Var1')[1:10,]
Results |
job empno ename mgr hiredate sal comm deptno Var2 last_ename
1 ANALYST 7788 SCOTT 7566 1982-12-09 3000 NA 20 1 FORD
2 ANALYST 7902 FORD 7566 1981-12-03 3000 NA 20 1 FORD
3 CLERK 7934 MILLER 7782 1982-01-23 1300 NA 10 1 SMITH
4 CLERK 7876 ADAMS 7788 1983-01-12 1100 NA 20 1 SMITH
5 CLERK 7900 JAMES 7698 1981-12-03 950 NA 30 1 SMITH
6 CLERK 7369 SMITH 7902 1980-12-17 800 NA 20 1 SMITH
7 MANAGER 7566 JONES 7839 1981-04-02 2975 NA 20 1 CLARK
8 MANAGER 7698 BLAKE 7839 1981-03-01 2850 NA 30 1 CLARK
9 MANAGER 7782 CLARK 7839 1981-01-09 2450 NA 10 1 CLARK
10 PRESIDENT 7839 KING NA 1981-11-17 5000 NA 10 1 KING
[참고] 그룹별(Group By) 최종값(last)을 추출
R Programming |
%%R
# base::split(emp, emp$job)
do.call(rbind, Map(last, base::split(emp, emp$job)))
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
* <dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
2 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
3 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
4 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
5 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
gather() 와 spread()
R Programming |
%%R
emp %>%
rownames_to_column %>%
gather(variable, value, -rowname) %>%
spread(variable, value)
Results |
# A tibble: 14 x 9
rowname comm deptno empno ename hiredate job mgr sal
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 <NA> 20 7369 SMITH 4003 CLERK 7902 800
2 10 0 30 7844 TURNER 4268 SALESMAN 7698 1500
3 11 <NA> 20 7876 ADAMS 4759 CLERK 7788 1100
4 12 <NA> 30 7900 JAMES 4354 CLERK 7698 950
5 13 <NA> 20 7902 FORD 4354 ANALYST 7566 3000
6 14 <NA> 10 7934 MILLER 4405 CLERK 7782 1300
7 2 300 30 7499 ALLEN 4068 SALESMAN 7698 1600
8 3 500 30 7521 WARD 4070 SALESMAN 7698 1250
9 4 <NA> 20 7566 JONES 4109 MANAGER 7839 2975
10 5 1400 30 7654 MARTIN 4288 SALESMAN 7698 1250
11 6 <NA> 30 7698 BLAKE 4077 MANAGER 7839 2850
12 7 <NA> 10 7782 CLARK 4026 MANAGER 7839 2450
13 8 <NA> 20 7788 SCOTT 4725 ANALYST 7566 3000
14 9 <NA> 10 7839 KING 4338 PRESIDENT <NA> 5000
purrr::map_dfr() 와 last
R Programming |
%%R
purrr::map_dfr(split(emp,emp$job), ~last(.x$ename))
Results |
# A tibble: 1 x 5
ANALYST CLERK MANAGER PRESIDENT SALESMAN
<chr> <chr> <chr> <chr> <chr>
1 FORD MILLER CLARK KING TURNER
purrr::map_dfr() 와 last
R Programming |
%%R
do.call(rbind, purrr::map_dfr(split(emp,emp$job), ~last(.x$ename)) )
Results |
[,1]
ANALYST "FORD"
CLERK "MILLER"
MANAGER "CLARK"
PRESIDENT "KING"
SALESMAN "TURNER"
map() 와 last
R Programming |
%%R
emp %>%
tbl_df() %>%
dplyr::arrange(job,-sal) %>%
nest(-job) %>%
mutate(Quantiles = map(data, ~ last(.$ename))) %>%
unnest(Quantiles)
Results |
# A tibble: 5 x 3
job data Quantiles
<chr> <list> <chr>
1 ANALYST <tibble [2 x 7]> FORD
2 CLERK <tibble [4 x 7]> SMITH
3 MANAGER <tibble [3 x 7]> CLARK
4 PRESIDENT <tibble [1 x 7]> KING
5 SALESMAN <tibble [4 x 7]> MARTIN
[참고] Emulate the window function of SQL in R
- [stackoverflow 참고] Emulate the window function of SQL in R [링크]
R Programming |
%%R
withmooc <- emp
unlist(with(withmooc[order(-emp$sal),], by(ename, job, tail,1))) # last
Results |
job: ANALYST
[1] "FORD"
------------------------------------------------------------
job: CLERK
[1] "SMITH"
------------------------------------------------------------
job: MANAGER
[1] "CLARK"
------------------------------------------------------------
job: PRESIDENT
[1] "KING"
------------------------------------------------------------
job: SALESMAN
[1] "MARTIN"
aggregate() 와 tail
R Programming |
%%R
aggregate(ename ~ job, data=emp[order(-emp$sal),], FUN=tail,1) # last
Results |
job ename
1 ANALYST FORD
2 CLERK SMITH
3 MANAGER CLARK
4 PRESIDENT KING
5 SALESMAN MARTIN
[참고] first / Last
- 리스트 반복 작업 purrr 패키지
R Programming |
%%R
purrr::map_chr(split(emp,emp$job), ~last(.x$ename))
Results |
ANALYST CLERK MANAGER PRESIDENT SALESMAN
"FORD" "MILLER" "CLARK" "KING" "TURNER"
R Programming |
%%R
withmooc <- emp[order(emp$sal, decreasing=TRUE),]
do.call(rbind, lapply(split(withmooc, withmooc$job), tail, 1)) # last
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
* <dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
2 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
3 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
4 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
R Programming |
%%R
withmooc <- emp[order(emp$sal, decreasing=TRUE),]
split(withmooc, withmooc$job)
Results |
$ANALYST
# A tibble: 2 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
$CLERK
# A tibble: 4 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
2 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20
3 7900 JAMES CLERK 7698 1981-12-03 950 NA 30
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
$MANAGER
# A tibble: 3 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
2 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30
3 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
$PRESIDENT
# A tibble: 1 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
$SALESMAN
# A tibble: 4 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
2 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
3 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
4 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
개별 직무 내에서 급여(sal)가 가장 적은 직원의 데이터를 반환한다.
R Programming |
%%R
withmooc <- emp[order(emp$sal, decreasing=TRUE),]
do.call(rbind, lapply(split(withmooc, withmooc$job), last )) # comma-space 1, ) : [1, ]
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
* <dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
2 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
3 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
4 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
참고 : base::split
R Programming |
%%R
withmooc <- emp[order(-emp$sal),]
base::split(withmooc,withmooc$job)$CLERK
Results |
# A tibble: 4 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
2 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20
3 7900 JAMES CLERK 7698 1981-12-03 950 NA 30
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
4. R Dplyr Package
R Programming |
%%R
dplyr::last
Results |
function (x, order_by = NULL, default = default_missing(x))
{
nth(x, -1L, order_by = order_by, default = default)
}
<bytecode: 0x0000019dc1a70040>
<environment: namespace:dplyr>
R Programming |
%%R
emp %>%
dplyr::group_by(job) %>%
dplyr::mutate(last_value = dplyr::last(ename, order_by = desc(sal) )) %>%
dplyr::arrange(job,-sal)
Results |
# A tibble: 14 x 9
# Groups: job [5]
empno ename job mgr hiredate sal comm deptno last_value
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 FORD
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 FORD
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 CLARK
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 CLARK
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 CLARK
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 MARTIN
12 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 MARTIN
13 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 MARTIN
14 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 MARTIN
purrr 패키지
R Programming |
%%R
withmooc <- emp
withmooc %>%
dplyr::left_join( ( purrr::map_dfr(base::split(withmooc[order(-withmooc$sal),],withmooc[order(-withmooc$sal),]$job) , ~last(.x$ename)) %>%
gather(key = job, value = last_job )
),
id = "job"
) %>%
dplyr::arrange(job,-sal)
Results |
Joining, by = "job"
# A tibble: 14 x 9
empno ename job mgr hiredate sal comm deptno last_job
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <chr>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 FORD
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 FORD
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
4 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
7 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 CLARK
8 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 CLARK
9 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 CLARK
10 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
11 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 MARTIN
12 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 MARTIN
13 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 MARTIN
14 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 MARTIN
- [참고] 직무(job) 내에서 급여(sal)를 내림차순으로 정렬 후 그룹 내의 First / Last 값을 반환
R Programming |
%%R
# 단일 건에 대한 중복 처리 필요(PRESIDENT)
emp %>%
arrange(desc(sal)) %>%
group_by(job) %>%
slice(c(1,n()))
Results |
# A tibble: 10 x 8
# Groups: job [5]
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
2 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
6 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
7 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
8 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
10 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
R Programming |
%%R
emp %>%
group_by(job) %>%
slice(c(which.min(sal), which.max(sal)))
Results |
# A tibble: 10 x 8
# Groups: job [5]
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
2 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
3 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
4 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
5 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
7 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
8 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
9 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
10 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
R Programming |
%%R
withmooc <- emp[order(withmooc$sal, decreasing=TRUE),]
withmooc %>%
split(f = withmooc$job) %>%
purrr::map_df(last)
Results |
# A tibble: 5 x 8
empno ename job mgr hiredate sal comm deptno
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl>
1 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
2 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
3 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
4 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
5 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
5. R sqldf Package
- last_value 현재 에러
R Programming |
%%R
sqldf(" SELECT ENAME,
JOB,
SAL,
FIRST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) FIRST_SALARY,
LAST_VALUE(ENAME) OVER (PARTITION BY JOB ORDER BY SAL DESC) LAST_SALARY
FROM emp
order by job, sal desc")
Results |
ename job sal FIRST_SALARY LAST_SALARY
1 SCOTT ANALYST 3000 SCOTT FORD
2 FORD ANALYST 3000 SCOTT FORD
3 MILLER CLERK 1300 MILLER MILLER
4 ADAMS CLERK 1100 MILLER ADAMS
5 JAMES CLERK 950 MILLER JAMES
6 SMITH CLERK 800 MILLER SMITH
7 JONES MANAGER 2975 JONES JONES
8 BLAKE MANAGER 2850 JONES BLAKE
9 CLARK MANAGER 2450 JONES CLARK
10 KING PRESIDENT 5000 KING KING
11 ALLEN SALESMAN 1600 ALLEN ALLEN
12 TURNER SALESMAN 1500 ALLEN TURNER
13 WARD SALESMAN 1250 ALLEN MARTIN
14 MARTIN SALESMAN 1250 ALLEN MARTIN
6. Python pandasql Package
Python Programming |
ps.sqldf(" SELECT ENAME, \
JOB, \
SAL, \
last_value(ENAME) over (partition by job order by sal desc) LAST_SALARY \
FROM emp \
order by job, sal desc ").head(10)
Results |
ename job sal LAST_SALARY
0 SCOTT ANALYST 3000 FORD
1 FORD ANALYST 3000 FORD
2 MILLER CLERK 1300 MILLER
3 ADAMS CLERK 1100 ADAMS
4 JAMES CLERK 950 JAMES
5 SMITH CLERK 800 SMITH
6 JONES MANAGER 2975 JONES
7 BLAKE MANAGER 2850 BLAKE
8 CLARK MANAGER 2450 CLARK
9 KING PRESIDENT 5000 KING
7. R data.table Package
R Programming |
%%R
DT <- data.table(emp)
DT[order(-sal) , last_value := last(ename), by = job][order(job),][1:12, ]
Results |
empno ename job mgr hiredate sal comm deptno last_value
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 FORD
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 FORD
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
4: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
7: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 CLARK
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 CLARK
9: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 CLARK
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
11: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 MARTIN
12: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 MARTIN
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
# s = dplyr::last(.SD, order_by = desc(sal) )
DT[, `:=` ( last_value_1 = dplyr::last(ename, order_by = desc(sal) ),
last_value_2 = tail(na.omit(unlist(.SD)),1),
first_value1 = na.omit(unlist(.SD))[1]
),
by = job,
.SDcols = c("ename")][order(job),][, c("mgr","comm","hiredate") := NULL][1:10, ]
Results |
empno ename job sal deptno last_value_1 last_value_2 first_value1
1: 7788 SCOTT ANALYST 3000 20 FORD FORD SCOTT
2: 7902 FORD ANALYST 3000 20 FORD FORD SCOTT
3: 7369 SMITH CLERK 800 20 SMITH MILLER SMITH
4: 7876 ADAMS CLERK 1100 20 SMITH MILLER SMITH
5: 7900 JAMES CLERK 950 30 SMITH MILLER SMITH
6: 7934 MILLER CLERK 1300 10 SMITH MILLER SMITH
7: 7566 JONES MANAGER 2975 20 CLARK CLARK JONES
8: 7698 BLAKE MANAGER 2850 30 CLARK CLARK JONES
9: 7782 CLARK MANAGER 2450 10 CLARK CLARK JONES
10: 7839 KING PRESIDENT 5000 10 KING KING KING
R Programming |
%%R
DT <- data.table(emp)
DT[order(-sal) , last_ename := xts::last(ename) , by=job][order(job), ][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno last_ename
1: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 FORD
2: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 FORD
3: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 SMITH
4: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 SMITH
5: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 SMITH
6: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 SMITH
7: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 CLARK
8: 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 CLARK
9: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 CLARK
10: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 KING
R Programming |
%%R
DT <- data.table(emp)
DT[ order(-sal), `:=` (fir_ename = ename[.N],
last_ename1 = last(ename),
max_ename = max(ename),
min_ename = min(ename),
fir_ename_1=first(ename)),by=job][order(job,-sal),][,.SD, .SDcols = !c('empno','hiredate', 'mgr','comm','deptno')]
Results |
ename job sal fir_ename last_ename1 max_ename min_ename fir_ename_1
1: SCOTT ANALYST 3000 FORD FORD SCOTT FORD SCOTT
2: FORD ANALYST 3000 FORD FORD SCOTT FORD SCOTT
3: MILLER CLERK 1300 SMITH SMITH SMITH ADAMS MILLER
4: ADAMS CLERK 1100 SMITH SMITH SMITH ADAMS MILLER
5: JAMES CLERK 950 SMITH SMITH SMITH ADAMS MILLER
6: SMITH CLERK 800 SMITH SMITH SMITH ADAMS MILLER
7: JONES MANAGER 2975 CLARK CLARK JONES BLAKE JONES
8: BLAKE MANAGER 2850 CLARK CLARK JONES BLAKE JONES
9: CLARK MANAGER 2450 CLARK CLARK JONES BLAKE JONES
10: KING PRESIDENT 5000 KING KING KING KING KING
11: ALLEN SALESMAN 1600 MARTIN MARTIN WARD ALLEN ALLEN
12: TURNER SALESMAN 1500 MARTIN MARTIN WARD ALLEN ALLEN
13: WARD SALESMAN 1250 MARTIN MARTIN WARD ALLEN ALLEN
14: MARTIN SALESMAN 1250 MARTIN MARTIN WARD ALLEN ALLEN
[참고] 그룹별 First / Last (단일 건의 중복 문제는 존재)
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[ DT[order(sal, decreasing=TRUE), .I[c(1L,.N)], by=job]$V1 ]
Results |
empno ename job mgr hiredate sal comm deptno
1: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10
3: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20
4: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20
6: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10
7: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
8: 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
9: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10
10: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글