apache/incubator-superset

[SQLLab] Allow common table expressions (CTE), a.k.a. "with" statements #3998

rumbin posted onGitHub

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.20.6

Expected results

CTEs can be used within SQLLab

Actual results

Only 'SELECT' statements are allowed against this database

Steps to reproduce

Run the following query in SQLLab (here: PostgreSQL syntax):

with common_table as (
    select 1 as mycol
)
select
    *
from common_table

Analysis of the issue


I am unable to reproduce this bug. As we're currently using sqlparse==0.2.4, and CTEs were added in 0.2.0 (see https://github.com/andialbrecht/sqlparse/issues/227 and https://github.com/andialbrecht/sqlparse/commit/8f39d3367e8878e5f9a09b0166e3f0f73626e67d), this works in the current version of Superset. Please let me know if I am missing something, otherwise I propose closing this issue.

posted by villebro almost 7 years ago

I am also facing a similar issue.I am using CTE.I get the same response as above.

posted by sangramdhal102 almost 7 years ago

@sangramdhal102 Can you post your (anonymized) query which fails? Better yet, can you write a CTE which references one of the tables in the Superset datamodel?

posted by villebro almost 7 years ago

with total_tasks_per_project AS ((select extract('hour' from soapgatewayservice_received ) AS hour,modality,count(ingestion_status) as Total,count(case lower(ingestion_status) when 'completed' then 1 else null end) as completed,count(case lower(ingestion_status) when 'failed' then 1 else null end) as failed,count(case lower(ingestion_status) when 'inprocess' then 1 else null end) as inprocess, avg(ingestioncheckpointservice_completed-soapgatewayservice_received) AS avg_ingestion_time from provenance.provenancedata where (soapgatewayservice_received >= 'startdate’ and soapgatewayservice_received <= 'enddate' ) and modality in('CV','DC','MR','ICAP')group by modality, hour order by modality) UNION (select extract('hour' from restingestiongatewayservice_received ) AS hour,modality,count(ingestion_status) as Total,count(case lower(ingestion_status) when 'completed' then 1 else null end) as completed ,count(case lower(ingestion_status) when 'failed' then 1 else null end) as failed,count(case lower(ingestion_status) when 'inprocess' then 1 else null end) as inprocess, avg(ingestioncheckpointservice_completed-restingestiongatewayservice_received) AS avg_ingestion_time from provenance.provenancedata where (restingestiongatewayservice_received >='startdate' and restingestiongatewayservice_received <=’enddate’) and modality ='CT'group by modality, hour))select hour,modality,total,completed,failed,inprocess,avg_ingestion_time from total_tasks_per_project order by hour))select hour,modality,total,completed,failed,inprocess,avg_ingestion_time from total_tasks_per_project order by hour;

posted by sangramdhal102 almost 7 years ago

This is pulling data from one table only

posted by sangramdhal102 almost 7 years ago

Thanks @sangramdhal102 , I was able to reproduce the error. Will investigate.

posted by villebro almost 7 years ago

@sangramdhal102 The query you posted does not validate using standard sql validator. What dialect is this? Furthermore, does it work when run in a native terminal?

posted by villebro almost 7 years ago

Yes ..it does work .i am able to run the query in pgadmin.

posted by sangramdhal102 almost 7 years ago

@sangramdhal102 I'm surprised the query runs, as you seem to have mixed ' and ’. Example: soapgatewayservice_received >= 'startdate’. I believe that's what is confusing sqlparse. I'm not an expert in Postgres, but I would perhaps refactor the query to make it slightly easier to read. Something along these lines:

with total_tasks_per_project as (
select 
  extract('hour' from soapgatewayservice_received) as hour,
  modality,
  case lower(ingestion_status) when 'completed' then 1 else null end as completed,
  case lower(ingestion_status) when 'failed' then 1 else null end as failed,
  case lower(ingestion_status) when 'inprocess' then 1 else null end as inprocess,
  ingestioncheckpointservice_completed - soapgatewayservice_received as ingestion_time
from
  provenance.provenancedata 
where
  soapgatewayservice_received >= 'startdate'
  and soapgatewayservice_received <= 'enddate'
  and modality in ('CV','DC','MR','ICAP')
union
select
  extract('hour' from restingestiongatewayservice_received) as hour,
  modality,
  case lower(ingestion_status) when 'completed' then 1 else null end as completed,
  case lower(ingestion_status) when 'failed' then 1 else null end as failed,
  case lower(ingestion_status) when 'inprocess' then 1 else null end as inprocess,
  ingestioncheckpointservice_completed - restingestiongatewayservice_received as ingestion_time
from
  provenance.provenancedata
where 
  restingestiongatewayservice_received >= 'startdate'
  and restingestiongatewayservice_received <= 'enddate'
  and modality = 'CT'
)
select 
  hour,
  modality,
  count(*) as total,
  count(completed) as completed,
  count(failed) as failed,
  count(inprocess) as inprocess,
  avg(ingestion_time) as avg_ingestion_time
from 
  total_tasks_per_project
group by
  hour,
  modality
order by
  hour
posted by villebro almost 7 years ago

The query failed to execute.

posted by sangramdhal102 almost 7 years ago

As I don't have access to your database, this is as far as I can help you. At any rate this didn't seem to be an issue with Superset, but rather the underlying query.

posted by villebro almost 7 years ago

Currently we use Python's sqlparse to figure out the statement type, when it's a CTE, it says UNKNOWN.

Here's where the current logic is: https://github.com/apache/incubator-superset/blob/master/superset/sql_parse.py#L34

It may be ok to look whether the query starts with a WITH token and confirm that no DELETE or UPDATE tokens are present in the query (as CTEs can be used in DML contexts).

posted by mistercrunch almost 7 years ago

@mistercrunch Did you see my reply above (10 days ago)? Running the query in the initial bug report does not produce an error on the master branch, as sqlparse.parse(query)[0].get_type(), which is essentially what is_select() is calling, returns SELECT for the CTE query in question on the current version of sqlparse.

<img width="943" alt="screen shot 2018-04-02 at 20 59 33" src="https://user-images.githubusercontent.com/33317356/38208202-f4308c16-36b8-11e8-9d74-04d8f085a44c.png">

posted by villebro almost 7 years ago

Oh I ran a quick test with a very basic CTE against MySQL and get_type() returned UNKNOWN

posted by mistercrunch almost 7 years ago

Actually I take this back, I was facing competing issues with my local MySQL not supporting CTE to test this...

posted by mistercrunch almost 7 years ago

This shouldn't be database type specific, as sqlparse is used prior to query execution. So if this has a database dependency, then the problem is probably somewhere else in the code.

posted by villebro almost 7 years ago

@rororofff has funded $5.00 to this issue.


posted by IssueHuntBot almost 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

posted by stale[bot] almost 6 years ago

Fund this Issue

$5.00
Funded
Only logged in users can fund an issue

Pull requests

Recent activities

rororofff funded 5.00 for apache/incubator-superset# 3998
almost 6 years ago