본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 필터링 - 가운데 특정 문자 Like 검색] 테이블 데이터 전처리 비교 - 21 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 16.

포스팅 목차

     

    21. Display the names of employees whose names have second alphabet A in their names.

     

    * 직원들의 이름을 검색하여서 2번째 문자가 알파벳 'L'인 직원들의 이름을 출력하시오.


    - [문자열 검색 및 문자열 함수] 특정 문자열이 포함된 데이터 추출

    • Oracle : like, substr 함수
    • 파이썬 Pandas : .str.match() 함수, .loc(), .query()
    • R 프로그래밍 : substr(), grep() 함수, stringr::str_detect(), stringr::str_sub() 함수, which(), subset()
    • R Dplyr Package : stringr::str_detect(), base::grepl() 함수, filter()
    • R sqldf Package : like 연산자
    • Python pandasql Package : like 연산자
    • R data.table Package : grepl() 함수
    • SAS Proc SQL : like 연산자, substr 함수
    • SAS Data Step : like 연산자, substr 함수
    • Python Dfply Package : filter_by() 구문, mask() 구문, str.match() 함수
    • 파이썬 Base 프로그래밍 :

     


    1. 오라클(Oracle)

    • like 연산자
    Oracle Programming
    select ename from emp 
    where  ename like '_L%';

     


    • substr 함수
    Oracle Programming
    select ename from emp 
    where  substr(ename,2,1) = 'L';

     


    2. 파이썬(Pandas)

    • match 함수
    Python Programming
    emp[ emp['ename'].str.match("^.L") ]

     

    Results
      empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10

     


    • match 함수
    Python Programming
    emp.loc[(emp.ename.str.match("^.L")) , ["ename"]]

     

    Results
      ename
    1 ALLEN
    5 BLAKE
    6 CLARK

    • match 함수
    Python Programming
    emp.query('ename.str.match("^.L")', engine='python')[['ename']]

     

    Results
      ename
    1 ALLEN
    5 BLAKE
    6 CLARK

     


    3. R Programming (R Package)

     

    • substr 함수
    R Programming
    %%R
    emp[ substr(emp$ename,2,2) == "L"  , c("ename") ]

     

    Results
    # A tibble: 3 x 1
      ename
      <chr>
    1 ALLEN
    2 BLAKE
    3 CLARK

     


    • grep 함수
    R Programming
    %%R
    emp[ grep("^.L", emp$ename)  , c("ename") ]

     

    Results
    # A tibble: 3 x 1
      ename
      <chr>
    1 ALLEN
    2 BLAKE
    3 CLARK

     


    • substr 함수
    R Programming
    %%R
    emp[ (substr(emp$ename,2,2) == "L")  , c("ename") ]

     

    Results
    # A tibble: 3 x 1
      ename
      <chr>
    1 ALLEN
    2 BLAKE
    3 CLARK

     


    • which 함수와 substr 함수
    R Programming
    %%R
    
    emp[which(substr(emp$ename,2,2) == "L"), c("ename") ] %>% as.data.frame()

     

    Results
      ename
    1 ALLEN
    2 BLAKE
    3 CLARK

     


    • subset 함수와 substr 함수
    R Programming
    %%R
    subset(emp,subset= (substr(emp$ename,2,2) == "L") , select=c(empno,ename) )

     

    Results
    # A tibble: 3 x 2
      empno ename
      <dbl> <chr>
    1  7499 ALLEN
    2  7698 BLAKE
    3  7782 CLARK

     


    • stringr 패키지 사용
    R Programming
    %%R
    library(stringr)
    
    subset(emp,subset= (stringr::str_detect(emp$ename, "^.L") ) , select=c(empno,ename) )

     

    Results
    # A tibble: 3 x 2
      empno ename
      <dbl> <chr>
    1  7499 ALLEN
    2  7698 BLAKE
    3  7782 CLARK

     


    • subset 함수와 stringr 패키지의 str_sub 함수
    R Programming
    %%R
    library(stringr)
    
    subset(emp,subset= (stringr::str_sub(emp$ename,2,2) == "L") , select=c(empno,ename) )

     

    Results
    # A tibble: 3 x 2
      empno ename
      <dbl> <chr>
    1  7499 ALLEN
    2  7698 BLAKE
    3  7782 CLARK

     


    4. R Dplyr Package

    • filter 함수와 stringr 패키지의 str_detect 함수
    R Programming
    %%R
    
    emp %>% filter( stringr::str_detect(ename,"^.L") ) %>% dplyr::select(ename)

     

    Results
    # A tibble: 3 x 1
      ename
      <chr>
    1 ALLEN
    2 BLAKE
    3 CLARK

     


    • filter 함수와 grepl 함수
    R Programming
    %%R
    
    emp %>% filter( base::grepl("^.L", ename) ) %>% dplyr::select(ename)

     

    Results
    # A tibble: 3 x 1
      ename
      <chr>
    1 ALLEN
    2 BLAKE
    3 CLARK

     


    5. R sqldf Package

    • Like 연산자
    R Programming
    %%R
    
    require(sqldf)
    sqldf("select ename from emp where ename like '_L%';")

     

    Results
      ename
    1 ALLEN
    2 BLAKE
    3 CLARK

     


    6. Python pandasql Package

    • Like 연산자
    Python Programming
    ps.sqldf("select ename from emp where ename like '_L%'")

     

    Results
      ename
    0 ALLEN
    1 BLAKE
    2 CLARK

     


    7. R data.table Package

    • Grepl 함수
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[grepl("^.L", ename), .(ename)]

     

    Results
       ename
    1: ALLEN
    2: BLAKE
    3: CLARK

     


    8. SAS Proc SQL

    • Like 연산자
    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        SELECT A.*
        FROM   EMP A
        WHERE ename like '_L%';
    QUIT;
    proc print;run;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    2 7698 BLAKE MANAGER 7839 1981-03-01 2850 . 30
    3 7782 CLARK MANAGER 7839 1981-01-09 2450 . 10

     


     

    SAS Programming
    sas.sasdata2dataframe(table = 'STATSAS_1', libref = 'work', dsopts = None, method = 'MEMORY')

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300.0 30
    1 7698 BLAKE MANAGER 7839 1981-03-01 2850 NaN 30
    2 7782 CLARK MANAGER 7839 1981-01-09 2450 NaN 10

     


    9. SAS Data Step

     

    • Like 연산자
    SAS Programming
    %%SAS sas
    DATA STATSAS_2;
     SET EMP;
         WHERE ename like '_L%';
    RUN;
    PROC PRINT;RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    2 7698 BLAKE MANAGER 7839 1981-03-01 2850 . 30
    3 7782 CLARK MANAGER 7839 1981-01-09 2450 . 10

     


    10. Python Dfply Package

    • filter_by & mask
    Python Programming
    emp >> filter_by(X.ename.str.match("^.L"))

     

    Results
      empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10

     


     

    Python Programming
    emp >> mask(X.ename.str.match("^.L"))

     

    Results
      empno ename job mgr hiredate sal comm deptno
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10

     


    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

    반응형

    댓글