포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ RANK Oracle Function ]
RANK함수는 개별 그룹 별로 데이터를 정렬 후 그룹 내에서 기준 관측치 값의 순위를 계산한다. 반환되는 데이터형은 NUMBER이다.
같은 순위를 가지는 순위 기준에 대해서는 같은 값을 가진다. 오라클 데이터베이스는 동일한 행에 대해 동일한 순위를 더하고, 다음의 순위를 계산한다. 그런 이유로 RANK가 연속하지 않는 숫자일 수 있다. 이 함수는 top-N과 bottom-N 리포팅에 유용하다.
- 함수 설명 : RANK 오라클 함수 링크
- windows 함수 : count / first_value 참조
- 집계 함수는 sqldf 참조
1. Oracle(오라클)
1) 집계함수
- RANK() 함수
Oracle Programming |
SELECT DEPTNO,
RANK(3000) WITHIN GROUP (ORDER BY SAL) AS RANK
FROM EMP
GROUP BY DEPTNO
Results |
DEPTNO RANK
---------------
10 3
20 4
30 7
2) 분석함수
- RANK() 함수
Oracle Programming |
SELECT DEPTNO, EMPNO, SAL,
RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL) ROW_RANK
FROM EMP
ORDER
BY DEPTNO, SAL
Results |
DEPTNO EMPNO SAL ROW_RANK
-----------------------------------
10 7934 1300 1
10 7782 2450 2
10 7839 5000 3
20 7369 800 1
20 7876 1100 2
20 7566 2975 3
20 7788 3000 4
20 7902 3000 4
30 7900 950 1
30 7654 1250 2
30 7521 1250 2
30 7844 1500 4
30 7499 1600 5
30 7698 2850 6
2. Python Pandas(파이썬)
2) 분석함수
- RANK() 함수
Python Programming |
withmooc = emp.copy()
withmooc['sal_rank'] = withmooc.groupby('deptno')['sal'].rank(method="first",ascending=True)
withmooc.sort_values(['deptno','sal']).head()
Results |
empno ename job mgr hiredate sal comm deptno sal_rank
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1.0
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 2.0
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 3.0
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1.0
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 2.0
3. R Programming (R Package)
2) 분석함수
- RANK() 함수
R Programming |
%%R
withmooc <- emp
withmooc['sal_rank'] = with(withmooc, ave(sal, deptno, FUN =function(x) { rank(x,ties.method='first') } )) # FUN =function(x) { median(x) }
withmooc[order(withmooc$deptno,withmooc$sal),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_rank
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 5
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
- RANK() 함수
R Programming |
%%R
library(plyr) # dplyr 보다 먼저 로딩
library(dplyr)
withmooc<- emp[order(emp$sal),]
plyr::ddply(withmooc,.(deptno),transform,sal_median = rank(sal,ties.method='first'))[1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_median
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 5
9 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
4. R Dplyr Package
2) 분석함수
- RANK() 함수
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate(sal_rank = rank(sal,ties.method='first')) %>%
dplyr::arrange(deptno,sal) %>%
head()
Results |
# A tibble: 6 x 9
# Groups: deptno [2]
empno ename job mgr hiredate sal comm deptno sal_rank
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <int>
1 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
5. R sqldf Package
1) 집계함수
- RANK() 함수
R Programming |
%%R
sqldf(" WITH new_emp
AS ( SELECT deptno, sal from emp
union all
select 10 deptno, 3000 sal
union all
select 20 deptno, 3000 sal
union all
select 30 deptno, 3000 sal
)
SELECT *
FROM ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from new_emp order by deptno,sal )
where sal = 3000
")
Results |
deptno sal sal_PerRank
1 10 3000 3
2 20 3000 4
3 20 3000 4
4 20 3000 4
5 30 3000 7
2) 분석함수
- RANK() 함수
R Programming |
%%R
sqldf(" SELECT ename, deptno, sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank
FROM emp ")[1:10, ]
Results |
ename deptno sal sal_rank
1 MILLER 10 1300 1
2 CLARK 10 2450 2
3 KING 10 5000 3
4 SMITH 20 800 1
5 ADAMS 20 1100 2
6 JONES 20 2975 3
7 SCOTT 20 3000 4
8 FORD 20 3000 4
9 JAMES 30 950 1
10 WARD 30 1250 2
6. Python pandasql Package
1) 집계함수
- RANK() 함수
Python Programming |
ps.sqldf(" WITH Student \
AS ( SELECT deptno, sal from emp \
union all \
select 10 deptno, 3000 sal \
union all \
select 20 deptno, 3000 sal \
union all \
select 30 deptno, 3000 sal \
) \
SELECT * \
FROM ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from Student order by deptno,sal ) \
where sal = 3000 ")
Results |
deptno sal sal_PerRank
0 10 3000 3
1 20 3000 4
2 20 3000 4
3 20 3000 4
4 30 3000 7
2) 분석함수
- RANK() 함수
Python Programming |
ps.sqldf(" SELECT ename, deptno, sal, \
RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank \
FROM emp ").head()
Results |
ename deptno sal sal_rank
0 MILLER 10 1300 1
1 CLARK 10 2450 2
2 KING 10 5000 3
3 SMITH 20 800 1
4 ADAMS 20 1100 2
7. R data.table Package
2) 분석함수
- RANK() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
# DT[, sal_Perrank := frank(desc(sal),ties.method=c('first')), by=deptno][order(deptno,-sal),]
DT[, sal_Perrank := rank(sal,ties.method=c('first')), by=deptno][order(deptno,sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_Perrank
1: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 5
9: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
- frank 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
# DT[, sal_Perrank := frank(desc(sal),ties.method=c('first')), by=deptno][order(deptno,-sal),]
DT[, sal_Perrank := frank(sal,ties.method=c('first')), by=deptno][order(deptno,sal),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_Perrank
1: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1
2: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 2
3: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 3
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1
5: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 2
6: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 3
7: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 4
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 5
9: 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 1
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 2
8. Python Duckdb의 SQL
1) 집계함수
Python Programming |
%%sql
Student << SELECT deptno, sal from emp
union all
select 10 deptno, 3000 sal
union all
select 20 deptno, 3000 sal
union all
select 30 deptno, 3000 sal
%%sql
SELECT *
FROM ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from Student order by deptno,sal )
where sal = 3000
Python Programming |
duckdb.sql(" SELECT * \
FROM ( SELECT deptno,sal, RANK() OVER (PARTITION BY deptno ORDER BY sal) as sal_PerRank from Student order by deptno,sal ) \
where sal = 3000 ").df()
Results |
deptno sal sal_PerRank
0 10 3000 3
1 20 3000 4
2 20 3000 4
3 20 3000 4
4 30 3000 7
2) 분석함수
Python Programming |
%%sql
SELECT ename, deptno, sal,
RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank
FROM emp
ORDER BY DEPTNO, SAL
LIMIT 6
Python Programming |
duckdb.sql(" SELECT ename, deptno, sal, \
RANK() OVER (PARTITION BY deptno ORDER BY sal) sal_rank \
FROM emp \
ORDER BY DEPTNO, SAL \
LIMIT 6 ").df()
Results |
ename deptno sal sal_rank
0 MILLER 10 1300 1
1 CLARK 10 2450 2
2 KING 10 5000 3
3 SMITH 20 800 1
4 ADAMS 20 1100 2
5 JONES 20 2975 3
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글