본문 바로가기
통계프로그램 비교 시리즈/프로그래밍비교(Oracle,Python,R,SAS)

[데이터 필터링 - 접두어 Like 검색] 특정 문자열의 길이를 측정 (Length 함수) - 22 (오라클 SQL, R, Python, SAS)

by 기서무나구물 2021. 8. 6.

포스팅 목차

     

    22. Display the names of employees whose name is exactly five characters in length.

     

    * 사원의 이름이 5글자인 사원에 대한 정보를 출력하시오


    • Oracle : like, length 함수
    • 파이썬 Pandas : str.match(), str.len() 함수, .loc(), .query()
    • R 프로그래밍 : grep(), stringr::str_length(), stringr::str_detect(), nchar() 함수, as.character(), subset()
    • R Dplyr Package : grepl(), stringr::str_length(), stringr::str_detect() 함수, dplyr::filter(), nchar() 함수, as.character()
    • R sqldf Package : length 함수
    • Python pandasql Package : length 함수
    • R data.table Package : str_length() 함수
    • SAS Proc SQL : like 연산자, length() 함수
    • SAS Data Step : like 연산자, length() 함수
    • Python Dfply Package : filter_by() 구문, mask() 구문, str.len() 함수
    • 파이썬 Base 프로그래밍 :

     


    1. 오라클(Oracle)

    Oracle Programming
    select ename from emp 
    where  length(ename)=5;

     


    Oracle Programming
    select ename from emp 
    where  ename like '_____';

     


    2. 파이썬(Pandas)

    패턴 검색 함수 match 함수의 정규표현식을 사용하여서 일치하는 패턴을 검색한다. "^"는 문자의 시작을 표시하고,"$"는 문자의 끝을 표시하고, "."는 하나의 문자를 표시한다. 사원 이름이 5개의 문자로 구성된 데이터를 추출한다.

    Python Programming
    emp[ emp['ename'].str.match("^.....$") ]

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10
    7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20
    10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20
    11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30

     


    Python Programming
    emp.loc[(emp.ename.str.match("^.....$")) , ["ename"]]

     

    Results
      ename
    0 SMITH
    1 ALLEN
    3 JONES
    5 BLAKE
    6 CLARK
    7 SCOTT
    10 ADAMS
    11 JAMES

    • len함수를 사용하여서 사원이름의 길이가 5글자로 구성된 데이터를 추출한다.
    Python Programming
    emp[ emp['ename'].str.len() == 5 ]

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10
    7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20
    10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20
    11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30

     


    Python Programming
    emp.loc[(emp.ename.str.len() == 5) , ["ename"]]

     

    Results
      ename
    0 SMITH
    1 ALLEN
    3 JONES
    5 BLAKE
    6 CLARK
    7 SCOTT
    10 ADAMS
    11 JAMES

     


    Python Programming
    emp.query('ename.str.match("^.....$")', engine='python')[['ename']]

     

    Results
      ename
    0 SMITH
    1 ALLEN
    3 JONES
    5 BLAKE
    6 CLARK
    7 SCOTT
    10 ADAMS
    11 JAMES

     


    Python Programming
    emp.query('ename.str.len() == 5', engine='python')[['ename']]

     

    Results
      ename
    0 SMITH
    1 ALLEN
    3 JONES
    5 BLAKE
    6 CLARK
    7 SCOTT
    10 ADAMS
    11 JAMES

     


    3. R Programming (R Package)

    R Programming
    %%R
    
    library(stringr)

     

    • 패턴 함수 grep를 사용하여서 사원 이름의 길이가 5글자인 사원 데이터를 추출한다.
    R Programming
    %%R
    emp[ grep("^.....$", emp$ename)  , c("ename") ]

     

    Results
    # A tibble: 8 x 1
      ename
      <chr>
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    R Programming
    %%R
    emp[ stringr::str_length(emp$ename) == 5  , c("ename") ]

     

    Results
    # A tibble: 8 x 1
      ename
      <chr>
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    • 사원 이름의 길이가 5글자인 사원 정보를 추출한다.
    R Programming
    %%R
    emp[ nchar(as.character(emp$ename)) == 5  , c("ename") ]

     

    Results
    # A tibble: 8 x 1
      ename
      <chr>
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    R Programming
    %%R
    subset(emp, subset=grepl("^.....$", emp$ename), select=c(empno,ename))

     

    Results
    # A tibble: 8 x 2
      empno ename
      <dbl> <chr>
    1  7369 SMITH
    2  7499 ALLEN
    3  7566 JONES
    4  7698 BLAKE
    5  7782 CLARK
    6  7788 SCOTT
    7  7876 ADAMS
    8  7900 JAMES

     


    R Programming
    %%R
    subset(emp, subset= (str_length(emp$ename) == 5) , select=c(empno,ename))

     

    Results
    # A tibble: 8 x 2
      empno ename
      <dbl> <chr>
    1  7369 SMITH
    2  7499 ALLEN
    3  7566 JONES
    4  7698 BLAKE
    5  7782 CLARK
    6  7788 SCOTT
    7  7876 ADAMS
    8  7900 JAMES

     


    R Programming
    %%R
    library(stringr)
    
    subset(emp,subset= (stringr::str_detect(emp$ename, "^.....$") ) , select=c(empno,ename) )

     

    Results
    # A tibble: 8 x 2
      empno ename
      <dbl> <chr>
    1  7369 SMITH
    2  7499 ALLEN
    3  7566 JONES
    4  7698 BLAKE
    5  7782 CLARK
    6  7788 SCOTT
    7  7876 ADAMS
    8  7900 JAMES

     


    4. R Dplyr Package

    • 패턴 검색 함수 grepl를 사용하여서 사원 이름의 길이가 5글자로 구성된 사원 정보를 추출한다.
    R Programming
    %%R
    
    emp %>% dplyr::filter( grepl('^.....$', ename)) %>% dplyr::select(ename)

     

    Results
    # A tibble: 8 x 1
      ename
      <chr>
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    R Programming
    %%R
    
    emp %>% filter( stringr::str_length(ename) == 5 ) %>% dplyr::select(ename)

     

    Results
    # A tibble: 8 x 1
      ename
      <chr>
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    R Programming
    %%R
    
    emp %>% filter( nchar(as.character(ename)) == 5 ) %>% dplyr::select(ename)

     

    Results
    # A tibble: 8 x 1
      ename
      <chr>
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    R Programming
    %%R
    
    emp %>% filter( stringr::str_detect(ename,"^.....$") ) %>% dplyr::select(ename)

     

    Results
    # A tibble: 8 x 1
      ename
      <chr>
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    5. R sqldf Package

    • 사원 이름의 길이가 5글자로 구성된 사원 정보를 추출한다.
    R Programming
    %%R
    
    require(sqldf)
    sqldf("select ename from emp where length(ename)=5")

     

    Results
      ename
    1 SMITH
    2 ALLEN
    3 JONES
    4 BLAKE
    5 CLARK
    6 SCOTT
    7 ADAMS
    8 JAMES

     


    6. Python pandasql Package

    • 사원 이름의 길이가 5글자로 구성된 사원 정보를 추출한다.
    Python Programming
    ps.sqldf("select ename from emp where length(ename)=5")

     

    Results
      ename
    0 SMITH
    1 ALLEN
    2 JONES
    3 BLAKE
    4 CLARK
    5 SCOTT
    6 ADAMS
    7 JAMES

     


    7. R data.table Package

    • str_length 함수를 사용하여서 사원이름이 5글자로 구성된 사원 정보를 추출한다.
    R Programming
    %%R
    
    DT          <- data.table(emp)
    dept_DT     <- data.table(dept)
    
    DT[str_length(ename) == 5, .(ename)]

     

    Results
       ename
    1: SMITH
    2: ALLEN
    3: JONES
    4: BLAKE
    5: CLARK
    6: SCOTT
    7: ADAMS
    8: JAMES

     


    R Programming
    %%R
    
    DT[stringr::str_length(ename) == 5, .(ename)]

     

    Results
       ename
    1: SMITH
    2: ALLEN
    3: JONES
    4: BLAKE
    5: CLARK
    6: SCOTT
    7: ADAMS
    8: JAMES

     


    8. SAS Proc SQL

     

    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        SELECT A.*
        FROM   EMP A
        where length(ename)=5;
    QUIT;
    
    PROC PRINT DATA=STATSAS_1(OBS=3);RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7369 SMITH CLERK 7902 1980-12-17 800 . 20
    2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20

     


    SAS Programming
    %%SAS sas
    
    PROC SQL;
      CREATE TABLE STATSAS_1 AS
        SELECT A.*
        FROM   EMP A
        where TRIM(ename) like '_____';
    QUIT;
    PROC PRINT DATA=STATSAS_1(OBS=3);RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7369 SMITH CLERK 7902 1980-12-17 800 . 20
    2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20

     


    9. SAS Data Step

     

    SAS Programming
    %%SAS sas
    DATA STATSAS_2;
     SET EMP;
         where length(ename)=5;
    RUN;
    PROC PRINT DATA=STATSAS_2(OBS=3);RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7369 SMITH CLERK 7902 1980-12-17 800 . 20
    2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20

     


    SAS Programming
    %%SAS sas
    DATA STATSAS_2;
     SET EMP;
         where TRIM(ename) like '_____';
    RUN;
    PROC PRINT DATA=STATSAS_2(OBS=3);RUN;

     

    Results
    OBS empno ename job mgr hiredate sal comm deptno
    1 7369 SMITH CLERK 7902 1980-12-17 800 . 20
    2 7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
    3 7566 JONES MANAGER 7839 1981-04-02 2975 . 20

     


    10. Python Dfply Package

     

    Python Programming
    emp >> filter_by( X.ename.str.len() == 5 )

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10
    7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20
    10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20
    11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30

     


    Python Programming
    emp >> mask( X.ename.str.len() == 5 )

     

    Results
      empno ename job mgr hiredate sal comm deptno
    0 7369 SMITH CLERK 7902.0 1980/12/17 800 NaN 20
    1 7499 ALLEN SALESMAN 7698.0 1981/02/20 1600 300.0 30
    3 7566 JONES MANAGER 7839.0 1981/04/02 2975 NaN 20
    5 7698 BLAKE MANAGER 7839.0 1981/03/01 2850 NaN 30
    6 7782 CLARK MANAGER 7839.0 1981/01/09 2450 NaN 10
    7 7788 SCOTT ANALYST 7566.0 1982/12/09 3000 NaN 20
    10 7876 ADAMS CLERK 7788.0 1983/01/12 1100 NaN 20
    11 7900 JAMES CLERK 7698.0 1981/12/03 950 NaN 30

     

     


     

    [SQL, Pandas, R Prog, Dplyr, SQLDF, PANDASQL, DATA.TABLE]   SQL EMP 예제로 만나는 테이블 데이터 처리 방법 리스트

    반응형

    댓글