포스팅 목차
* 파이썬 & R 패키지 호출 및 예제 데이터 생성 링크
[ STDDEV_POP Oracle Function ]
STDDEV_POP 함수는 지정한 인수 expr의 모표준편차를 반환한다. 집계 함수와 분석함수로 이용할 수 있다.
- 함수 설명 : STDDEV_POP 오라클 함수 링크
- VAR_POP 함수 참조
1. Oracle(오라클)
1) 집계 함수
- STDDEV_POP() 함수
급여의 모 표준편차를 반환한다.
Oracle Programming |
SELECT STDDEV_POP(sal) STDDEVPOP_VAR
FROM emp
Results |
STDDEVPOP_VAR
--------------------------------------------
1139.488618295281524185955567137220166443
2) 분석함수
- STDDEV_POP() 함수
분석함수를 사용하여서 개별 부서별 급여의 모표준편차를 반환한다.
Oracle Programming |
SELECT DEPTNO, EMPNO,
STDDEV_POP(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV_POP
FROM EMP;
Results |
DEPTNO EMPNO STDDEV_POP
--------------------------------------------------------
10 7782 1546.142152441215721851972457687124239742
10 7934 1546.142152441215721851972457687124239742
10 7839 1546.142152441215721851972457687124239742
20 7902 1004.738772019871839881384168371541966154
20 7788 1004.738772019871839881384168371541966154
20 7566 1004.738772019871839881384168371541966154
20 7369 1004.738772019871839881384168371541966154
20 7876 1004.738772019871839881384168371541966154
30 7521 610.100173924104222773473633873610159005
30 7654 610.100173924104222773473633873610159005
30 7844 610.100173924104222773473633873610159005
30 7900 610.100173924104222773473633873610159005
30 7499 610.100173924104222773473633873610159005
30 7698 610.100173924104222773473633873610159005
2. Python Pandas(파이썬)
1) 집계 함수
- pstdev() 함수
Python Programming |
from statistics import pstdev
pstdev(emp['sal'])
Results |
1139.4886182952814
2) 분석함수
- pstdev() 함수
Python Programming |
from statistics import pstdev
withmooc = emp.copy()
withmooc['sal_pstdev'] = emp.sort_values('sal', ascending=False).groupby('deptno')['sal'].transform(lambda x: pstdev(x))
withmooc.sort_values('deptno').head(7)
Results |
empno ename job mgr hiredate sal comm deptno sal_pstdev
6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1546.142152
8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 1546.142152
13 7934 MILLER CLERK 7782.0 1982/01/23 1300 NaN 10 1546.142152
0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20 1004.738772
3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 1004.738772
7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20 1004.738772
10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20 1004.738772
3. R Programming (R Package)
1) 집계 함수
- multicon::popsd 함수
R Programming |
%%R
library(multicon)
multicon::popsd(emp$sal)
Results |
[1] 1139.489
- 모표준편차 구현
R Programming |
%%R
sqrt((length(emp$sal)-1)/length(emp$sal)) * sd(emp$sal)
Results |
[1] 1139.489
2) 분석함수
- popsd() 함수
R Programming |
%%R
withmooc <- emp
withmooc['sal_pstdev'] = with(withmooc, ave(x = sal, deptno, FUN =function(x) { popsd(x) }))
withmooc[order(withmooc$deptno),][1:10, ]
Results |
# A tibble: 10 x 9
empno ename job mgr hiredate sal comm deptno sal_pstdev
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1546.
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1546.
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1546.
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1005.
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1005.
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1005.
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1005.
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1005.
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 610.
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 610.
4. R Dplyr Package
1) 집계 함수
- multicon::popsd() 함수
R Programming |
%%R
library(multicon)
emp %>%
dplyr::summarize(sal_pstdev = multicon::popsd(sal) )
Results |
# A tibble: 1 x 1
sal_pstdev
<dbl>
1 1139.
2) 분석함수
- popsd() 함수
R Programming |
%%R
emp %>%
dplyr::group_by(deptno) %>%
dplyr::mutate(sal_pstdev = popsd(sal) ) %>%
dplyr::arrange(deptno) %>%
head(10)
Results |
# A tibble: 10 x 9
# Groups: deptno [3]
empno ename job mgr hiredate sal comm deptno sal_pstdev
<dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1546.
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1546.
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1546.
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1005.
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1005.
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1005.
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1005.
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1005.
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 610.
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 610.
5. R sqldf Package
1) 집계 함수
- 모표준편차 구현
R Programming |
%%R
sqldf("select sqrt((count(sal)-1)/cast( count(sal) as real)) * stdev(sal) stddev_pop from emp ")
Results |
stddev_pop
1 1139.489
- stddev_pop() 함수
RH2 패키지에서 지원하는 stddev_pop() 함수를 사용한다.
R Programming |
%%R
library(RH2)
sqldf(" SELECT stddev_pop(sal) stddev_pop FROM emp ")
Results |
stddev_pop
1 1139.489
[참고] typeof / cat
R Programming |
%%R
sqldf("select typeof(13) int_num, typeof(14.0) real_num, 13/14 zero, 13/14.0 real_num_1, 13/cast(14 as real) real_num_2")
Results |
int_num real_num zero real_num_1 real_num_2
1 integer real 0 0.9285714 0.9285714
2) 분석함수
- stddev_pop() 함수
R Programming |
%%R
sqldf(" select a.*,stddev_pop
from emp a
left join (SELECT deptno,stddev_pop(sal) stddev_pop FROM emp group by deptno) b
on a.deptno = b.deptno
order by a.deptno")
Results |
empno ename job mgr hiredate sal comm deptno stddev_pop
1 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1546.1422
2 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1546.1422
3 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1546.1422
4 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1004.7388
5 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1004.7388
6 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1004.7388
7 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1004.7388
8 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1004.7388
9 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 610.1002
10 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 610.1002
11 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 610.1002
12 7698 BLAKE MANAGER 7839 1981-03-01 2850 NA 30 610.1002
13 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 610.1002
14 7900 JAMES CLERK 7698 1981-12-03 950 NA 30 610.1002
6. Python pandasql Package
1) 집계 함수
2) 분석함수
7. R data.table Package
1) 집계 함수
- popsd() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
library(multicon)
DT[, list(sal_pstdev = popsd(sal) )]
Results |
sal_pstdev
1: 1139.489
- popsd() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
library(multicon)
DT[, list(sal_pstdev = popsd(sal) ), by =deptno]
Results |
deptno sal_pstdev
1: 20 1004.7388
2: 30 610.1002
3: 10 1546.1422
2) 분석함수
- popsd() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, sal_pstdev := popsd(sal), by = deptno][order(deptno),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_pstdev
1: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1546.1422
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1546.1422
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1546.1422
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1004.7388
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1004.7388
6: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1004.7388
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1004.7388
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1004.7388
9: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 610.1002
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 610.1002
- popsd() 함수
R Programming |
%%R
DT <- data.table(emp)
dept_DT <- data.table(dept)
DT[, `:=` ( sal_pstdev = popsd(unlist(.SD)) ),
by = deptno,
.SDcols = c("sal")][order(deptno),][1:10, ]
Results |
empno ename job mgr hiredate sal comm deptno sal_pstdev
1: 7782 CLARK MANAGER 7839 1981-01-09 2450 NA 10 1546.1422
2: 7839 KING PRESIDENT NA 1981-11-17 5000 NA 10 1546.1422
3: 7934 MILLER CLERK 7782 1982-01-23 1300 NA 10 1546.1422
4: 7369 SMITH CLERK 7902 1980-12-17 800 NA 20 1004.7388
5: 7566 JONES MANAGER 7839 1981-04-02 2975 NA 20 1004.7388
6: 7788 SCOTT ANALYST 7566 1982-12-09 3000 NA 20 1004.7388
7: 7876 ADAMS CLERK 7788 1983-01-12 1100 NA 20 1004.7388
8: 7902 FORD ANALYST 7566 1981-12-03 3000 NA 20 1004.7388
9: 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 610.1002
10: 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 610.1002
8. Python Duckdb의 SQL
1) 집계 함수
- STDDEV_POP() 함수
Python Programming |
%%sql
SELECT STDDEV_POP(sal) as stdev_pop_func FROM emp
Python Programming |
duckdb.sql(" SELECT STDDEV_POP(sal) as stdev_pop_func FROM emp ").df()
Results |
stdev_pop_func
0 1139.488618
2) 분석 함수
- STDDEV_POP() 함수
Python Programming |
%%sql
SELECT DEPTNO, EMPNO,
STDDEV_POP(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV_POP
FROM emp
order by deptno
LIMIT 6
Python Programming |
duckdb.sql(" SELECT DEPTNO, EMPNO, \
STDDEV_POP(SAL) OVER (PARTITION BY DEPTNO) AS STDDEV_POP \
FROM emp \
order by deptno \
LIMIT 6 ").df()
Results |
deptno empno STDDEV_POP
0 10 7782 1546.142152
1 10 7839 1546.142152
2 10 7934 1546.142152
3 20 7369 1004.738772
4 20 7566 1004.738772
5 20 7788 1004.738772
--------------------------------------------
[Oracle, Pandas, R Prog, Dplyr, Sqldf, Pandasql, Data.Table] 오라클 함수와 R & Python 비교 사전 목록 링크 |
오라클 SQL 함수(Oracle SQL Function) 목록 리스트 링크 |
[SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE] SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트 링크 링크 |
반응형
댓글