본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 선택 - 해당 날짜 데이터 추출] 지정한 날짜에 속하는 데이터 추출 - 13 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 5.

포스팅 목차

    13. Display the list of employees who have joined the company before 30th June 90 or after 31st dec 90.

     

    * 1981년 1월 09일부터 1981년 11월 17일 사이에 입사한 직원들의 정보를 출력하시오.


    • [데이터 필터] 지정한 날짜에 속하는 데이터 추출
    • Oracle : 날짜를 문자형 날짜로 변환(TO_CHAR), between 연산자
    • 파이썬 Pandas : between 연산자, .query() 구문
    • R 프로그래밍 : & 연산자, which 함수, subset() 구문
    • R Dplyr Package : filter() 함수, as.Date() 함수, between 함수
    • R sqldf Package : as.numeric() 함수, gsubfn::fn$sqldf - 외부 변수 참조, strftime() 함수, 날짜형 데이터를 문자형 날짜로 변환
    • Python pandasql Package : between 연산자
    • R data.table Package : 슬라이싱(Slicing)
    • SAS Proc SQL : between 연산자
    • SAS Data Step : between 연산자, put() 함수
    • Python Dfply Package : filter_by() 함수, mask() 함수, between() 함수
    • 파이썬 Base 프로그래밍 :

     


    1. 오라클(Oracle)

     

    Oracle Programming
    select * from emp 
    where  hiredate between '09-jan-1981' and '17-nov-1981';

     


    Oracle Programming
    select * from emp 
    where  to_char(hiredate,'yyyymmdd') between '19810109' and '19811117';

     


    2. 파이썬(Pandas)

     

    Python Programming
    emp[ emp['hiredate'].between('1981/01/09', '1981/11/17', inclusive = True)][["ename"]]

     

    Results
      empno ename job mgr hiredate sal comm deptno hiredate_D
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 1981-02-20
    2 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 1981-02-22
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 1981-04-02
    4 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 1981-09-28
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 1981-03-01
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1981-01-09
    8 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 1981-11-17
    9 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30 1981-09-08

     


     

    Python Programming
    emp.loc[(emp.hiredate.between('1981/01/09', '1981/11/17', inclusive = True)) , ["ename"]]

     

    Results
      ename
    1 ALLEN
    2 WARD
    3 JONES
    4 MARTIN
    5 BLAKE
    6 CLARK
    8 KING
    9 TURNER

     


     

    Python Programming
    emp.query("'1981/01/09' <= hiredate <= '1981/11/17'")[["ename"]]

     

    Results
      ename
    1 ALLEN
    2 WARD
    3 JONES
    4 MARTIN
    5 BLAKE
    6 CLARK
    8 KING
    9 TURNER

     


    3. R Programming (R Package)

     

    R Programming
    %%R
    emp[emp$hiredate >=  '1981/01/09' & emp$hiredate <= '1981/11/17' , c("empno","ename") ]

     

    Results
    # A tibble: 8 x 2
      empno ename 
      <dbl> <chr> 
    1  7499 ALLEN 
    2  7521 WARD  
    3  7566 JONES 
    4  7654 MARTIN
    5  7698 BLAKE 
    6  7782 CLARK 
    7  7839 KING  
    8  7844 TURNER

     


     

    R Programming
    %%R
    emp[which(emp$hiredate >=  '1981/01/09' & emp$hiredate <= '1981/11/17'), c("empno","ename") ]

     

    Results
    # A tibble: 8 x 2
      empno ename 
      <dbl> <chr> 
    1  7499 ALLEN 
    2  7521 WARD  
    3  7566 JONES 
    4  7654 MARTIN
    5  7698 BLAKE 
    6  7782 CLARK 
    7  7839 KING  
    8  7844 TURNER

     


     

    R Programming
    %%R
    subset(emp,subset= (hiredate >=  '1981/01/09' & hiredate <= '1981/11/17') , select=c(empno,ename) )

     

    Results
    # A tibble: 8 x 2
      empno ename 
      <dbl> <chr> 
    1  7499 ALLEN 
    2  7521 WARD  
    3  7566 JONES 
    4  7654 MARTIN
    5  7698 BLAKE 
    6  7782 CLARK 
    7  7839 KING  
    8  7844 TURNER

     


    4. R Dplyr Package

    • between 함수 사용
    R Programming
    %%R
    
    # x : A numeric vector of values
    
    emp %>% filter(between(hiredate, as.Date('1981/01/09'), as.Date('1981/11/17') ) ) %>% select(ename)

     

    Results
    # A tibble: 8 x 1
      ename 
      <chr> 
    1 ALLEN 
    2 WARD  
    3 JONES 
    4 MARTIN
    5 BLAKE 
    6 CLARK 
    7 KING  
    8 TURNER

     


    • 부등호 처리 방식(문자비교)
    R Programming
    %%R
    
    emp %>% filter(hiredate >= "1981-01-09" & hiredate <= "1981-11-17" ) %>% dplyr::select(ename)

     

    Results
    # A tibble: 8 x 1
      ename 
      <chr> 
    1 ALLEN 
    2 WARD  
    3 JONES 
    4 MARTIN
    5 BLAKE 
    6 CLARK 
    7 KING  
    8 TURNER

     


    5. R sqldf Package

    • hiredate 가 날짜형식인 경우 처리 방식
    R Programming
    %%R
    
    date1=as.Date("1981-01-09")
    date2=as.Date("1981-11-17")
    
    s <- paste("select * from emp where hiredate between ", as.numeric(date1)," and ",as.numeric(date2) )
    sqldf(s, verbose = TRUE)

     

    Results
    sqldf: library(RSQLite)
    sqldf: m <- dbDriver("SQLite")
    sqldf: connection <- dbConnect(m, dbname = ":memory:")
    sqldf: initExtension(connection)
    sqldf: dbWriteTable(connection, 'emp', emp, row.names = FALSE)
    sqldf: dbGetQuery(connection, 'select * from emp where hiredate between  4026  and  4338')
    sqldf: dbDisconnect(connection)
      empno  ename       job  mgr   hiredate  sal comm deptno
    1  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30
    2  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30
    3  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20
    4  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30
    5  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30
    6  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10
    7  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10
    8  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30

     


     

    R Programming
    %%R
    
    vdate1 <- as.Date('1981-01-09')
    vdate2 <- as.Date('1981-11-17')
    
    gsubfn::fn$sqldf("select * from emp where hiredate between $vdate1 and $vdate2  ")

     

    Results
      empno  ename       job  mgr   hiredate  sal comm deptno
    1  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30
    2  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30
    3  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20
    4  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30
    5  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30
    6  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10
    7  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10
    8  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30

     


     

    R Programming
    %%R
    
    gsubfn::fn$sqldf("select *, strftime('%d-%m-%Y', 'now') now_char 
                      from emp 
                      where hiredate between `as.Date('1981-01-09')` and `as.Date('1981-11-17')`  ")

     

    Results
      empno  ename       job  mgr   hiredate  sal comm deptno   now_char
    1  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30 09-09-2020
    2  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30 09-09-2020
    3  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20 09-09-2020
    4  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30 09-09-2020
    5  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30 09-09-2020
    6  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10 09-09-2020
    7  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10 09-09-2020
    8  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30 09-09-2020

     


    • 날짜형 데이터를 문자형을 변경 후 비교
    R Programming
    %%R
    
    sqldf("select * from emp where strftime('%Y-%m-%d', hiredate * 3600 * 24, 'unixepoch') between '1981-01-09' and '1981-11-17' ")

     

    Results
      empno  ename       job  mgr   hiredate  sal comm deptno
    1  7499  ALLEN  SALESMAN 7698 1981-02-20 1600  300     30
    2  7521   WARD  SALESMAN 7698 1981-02-22 1250  500     30
    3  7566  JONES   MANAGER 7839 1981-04-02 2975   NA     20
    4  7654 MARTIN  SALESMAN 7698 1981-09-28 1250 1400     30
    5  7698  BLAKE   MANAGER 7839 1981-03-01 2850   NA     30
    6  7782  CLARK   MANAGER 7839 1981-01-09 2450   NA     10
    7  7839   KING PRESIDENT   NA 1981-11-17 5000   NA     10
    8  7844 TURNER  SALESMAN 7698 1981-09-08 1500    0     30

     


    6. Python pandasql Package

    • 문자형 날짜 처리 방식
    Python Programming
    ps.sqldf("select * from emp where hiredate between '1981/01/09' and '1981/11/17'")

     

    Results
      empno ename job mgr hiredate sal comm deptno hiredate_D to_date day_gap
    0 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 1981-02-20 00:00:00.000000 2020-09-10 09:45:49.454895 14447
    1 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 1981-02-22 00:00:00.000000 2020-09-10 09:45:49.454895 14445
    2 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 1981-04-02 00:00:00.000000 2020-09-10 09:45:49.454895 14406
    3 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 1981-09-28 00:00:00.000000 2020-09-10 09:45:49.454895 14227
    4 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 1981-03-01 00:00:00.000000 2020-09-10 09:45:49.454895 14438
    5 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1981-01-09 00:00:00.000000 2020-09-10 09:45:49.454895 14489
    6 7839 KING PRESIDENT NaN 1981/11/17 5000 NaN 10 1981-11-17 00:00:00.000000 2020-09-10 09:45:49.454895 14177
    7 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30 1981-09-08 00:00:00.000000 2020-09-10 09:45:49.454895 14247

     


    • 날짜 데이터 조회
    Python Programming
    emp['hiredate_D'] = pd.to_datetime(emp['hiredate'])
    
    ps.sqldf("select *,date('1981-01-09') from emp where hiredate_D between date('1981-01-09') and date('1981-11-17')")

     

    Results
      empno ename job mgr hiredate sal comm deptno hiredate_D to_date day_gap date('1981-01-09')
    0 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30 1981-02-20 00:00:00.000000 2020-09-10 09:45:49.454895 14447 1981-01-09
    1 7521 WARD SALESMAN 7698.0 1981/02/22 1250 500.0 30 1981-02-22 00:00:00.000000 2020-09-10 09:45:49.454895 14445 1981-01-09
    2 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20 1981-04-02 00:00:00.000000 2020-09-10 09:45:49.454895 14406 1981-01-09
    3 7654 MARTIN SALESMAN 7698.0 1981/09/28 1250 1400.0 30 1981-09-28 00:00:00.000000 2020-09-10 09:45:49.454895 14227 1981-01-09
    4 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30 1981-03-01 00:00:00.000000 2020-09-10 09:45:49.454895 14438 1981-01-09
    5 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10 1981-01-09 00:00:00.000000 2020-09-10 09:45:49.454895 14489 1981-01-09
    6 7844 TURNER SALESMAN 7698.0 1981/09/08 1500 0.0 30 1981-09-08 00:00:00.000000 2020-09-10 09:45:49.454895 14247 1981-01-09

     


    7. R data.table Package

     

    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[hiredate >= "1981-01-09" & hiredate <= "1981-11-17", .(ename)]

     

    Results
        ename
    1:  ALLEN
    2:   WARD
    3:  JONES
    4: MARTIN
    5:  BLAKE
    6:  CLARK
    7:   KING
    8: TURNER

     


    8. SAS Proc SQL

     

    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        SELECT *
        FROM   EMP
        WHERE  hiredate between '09jan81'd and '17nov81'd;
    QUIT;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    2 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20
    4 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
    5 7698 BLAKE MANAGER 7839 1981-03-01 2850 . 30
    6 7782 CLARK MANAGER 7839 1981-01-09 2450 . 10
    7 7839 KING PRESIDEN . 1981-11-17 5000 . 10
    8 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30

     


    9. SAS Data Step

     

    SAS Programming
    %%SAS sas
    
    DATA STATSAS_2;
     SET EMP;
         WHERE hiredate between '09jan81'd and '17nov81'd;
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    2 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20
    4 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
    5 7698 BLAKE MANAGER 7839 1981-03-01 2850 . 30
    6 7782 CLARK MANAGER 7839 1981-01-09 2450 . 10
    7 7839 KING PRESIDEN . 1981-11-17 5000 . 10
    8 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30

     


     

    SAS Programming
    %%SAS sas
    
    DATA STATSAS_3;
     SET EMP;
         hiredate_char = put(hiredate, yymmddn.);
         WHERE put(hiredate, yymmddn.) between '19810109' and '19811117';
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno hiredate_char
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30 19810220
    2 7521 WARD SALESMAN 7698 1981-02-22 1250 500 30 19810222
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20 19810402
    4 7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30 19810928
    5 7698 BLAKE MANAGER 7839 1981-03-01 2850 . 30 19810301
    6 7782 CLARK MANAGER 7839 1981-01-09 2450 . 10 19810109
    7 7839 KING PRESIDEN . 1981-11-17 5000 . 10 19811117
    8 7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30 19810908

     


    10. Python Dfply Package

     

    Python Programming
    emp >> filter_by(X.hiredate.between('1981/01/09', '1981/11/17', inclusive = True)) >> select(X.empno, X.ename)

     

    Results
      empno ename
    1 7499 ALLEN
    2 7521 WARD
    3 7566 JONES
    4 7654 MARTIN
    5 7698 BLAKE
    6 7782 CLARK
    8 7839 KING
    9 7844 TURNER

     


     

    Python Programming
    emp >> mask(X.hiredate.between('1981/01/09', '1981/11/17', inclusive = True)) >> select(X.empno, X.ename) >> head(3)

     

    Results
      empno ename
    1 7499 ALLEN
    2 7521 WARD
    3 7566 JONES

     

     


    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

     

    반응형

    댓글