포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ DENSE_RANK Oracle Function ]
DENSE_RANK()는 Rank()와 유사한 함수로 ORDER BY절에 사용된 컬럼이나 표현식에 대하여 순위를 부여하는데 RANK()와 달리 동일 순위 다음의 순위는 동일 순위의 수와 상관없이 1 증가된 값을 돌려준다.
- 함수 설명 : DENSE_RANK 오라클 함수 링크
- 윈도우/분석 함수에 대한 상세 예제는 다음 함수 참조 : CUME_DIST
1. Oracle(오라클)
Oracle Programming |
SELECT DEPTNO, ENAME, SAL,
DENSE_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL) DENSE_RANK
FROM EMP
ORDER
BY DEPTNO, DENSE_RANK;
Results |
DEPTNO ENAME SAL DENSE_RANK
--------------------------------
10 MILLER 1300 1
10 CLARK 2450 2
10 KING 5000 3
20 SMITH 800 1
20 ADAMS 1100 2
20 JONES 2975 3
20 SCOTT 3000 4
20 FORD 3000 4
30 JAMES 950 1
30 MARTIN 1250 2
30 WARD 1250 2
30 TURNER 1500 3
30 ALLEN 1600 4
30 BLAKE 2850 5
2. Python Pandas(파이썬)
Rank()
Python Programming |
withmooc = emp.copy()
withmooc['DENSE_RANK'] = emp.groupby('deptno')['sal'].transform(lambda x: x.rank(method='min'))
withmooc.sort_values(['deptno','DENSE_RANK']).head(10)
Results |
empno ename job mgr hiredate sal comm deptno DENSE_RANK
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 2
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 3
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 2
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 3
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 4
12 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 4
11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 1
2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 2
3. R Programming (R Package)
Rank
R Programming |
%%R
withmooc <- emp
withmooc['DENSE_RANK'] = ave(withmooc[, 'sal'], withmooc[, 'deptno'], FUN = function (x) { rank(x, ties.method = "min", na.last = "keep") } )
withmooc[order(withmooc$deptno,withmooc$DENSE_RANK), ][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno DENSE_RANK
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 4
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
dplyr::dense_rank()
R Programming |
%%R
library(plyr)
withmooc <- ddply(emp,.(deptno),transform,DENSE_RANK = dplyr::dense_rank(sal))
withmooc[order(withmooc$deptno,withmooc$DENSE_RANK), ][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno DENSE_RANK
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 4
14 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
4. R Dplyr Package
[참고] dense_rank 함수 구조
R Programming |
%%R
dplyr::dense_rank
Results (함수 구조) |
function (x)
{
match(x, sort(unique(x)))
}
<bytecode: 0x000001b0ddbe9258>
<environment: namespace:dplyr>
dplyr::dense_rank
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate( DENSE_RANK = dense_rank(sal)) %>%
dplyr::arrange(deptno,DENSE_RANK) %>%
head(10)
Results |
# A tibble: 10 x 9
# Groups: deptno [3]
empno ename job mgr hiredate sal comm deptno DENSE_RANK
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <int>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 4
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
sort & unique()
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate( DENSE_RANK = match(sal, sort(unique(sal)))) %>%
dplyr::arrange(deptno,DENSE_RANK) %>%
head(10)
Results |
# A tibble: 10 x 9
# Groups: deptno [3]
empno ename job mgr hiredate sal comm deptno DENSE_RANK
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <int>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 4
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
Rank()
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate( DENSE_RANK = rank(sal, ties.method = "min", na.last = "keep") ) %>%
dplyr::arrange(deptno,DENSE_RANK) %>%
head(10)
Results |
# A tibble: 10 x 9
# Groups: deptno [3]
empno ename job mgr hiredate sal comm deptno DENSE_RANK
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <int>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 4
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
5. R sqldf Package
R Programming |
%%R
sqldf(" select *, DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK from emp ")[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno DENSE_RANK
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 4
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
6. Python pandasql Package
Python Programming |
ps.sqldf(" select *, DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK from emp ").head(10)
Results |
empno ename job mgr hiredate sal comm deptno DENSE_RANK
0 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1
1 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 2
2 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 3
3 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1
4 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 2
5 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 3
6 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 4
7 7902 FORD ANALYST 7566.0 1981/12/03 3000 NaN 20 4
8 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30 1
9 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 2
7. R data.table Package
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, DENSE_RANK := match(sal, sort(unique(sal))),by = deptno][order(deptno,DENSE_RANK)][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno DENSE_RANK
1: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 4
9: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
8. Python DuckDB의 SQL
Python Programming |
%%sql
select *,
DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK
from emp
order by deptno, sal
LIMIT 5
Python Programming |
duckdb.sql(" select empno,ename,job,sal,comm,deptno, \
DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal) DENSE_RANK \
from emp \
order by deptno, sal \
LIMIT 6 ").df()
Results |
empno ename job sal comm deptno DENSE_RANK
0 7934 MILLER CLERK 1300 NaN 10 1
1 7782 CLARK MANAGER 2450 NaN 10 2
2 7839 KING PRESIDENT 5000 NaN 10 3
3 7369 SMITH CLERK 800 NaN 20 1
4 7876 ADAMS CLERK 1100 NaN 20 2
5 7566 JONES MANAGER 2975 NaN 20 3
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록? 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글