how to export data in a specified time to csv with cassandra?

You can use DSBulk utility with custom query, but you need to be careful and put the optimized condition so it will perform full scan, but using the token ranges (see this blog post for details).

Something like (replace pk with the name of the actual partition key column, and unsplit query string – I split it just for readability):

dsbulk unload -url data.csv \
-query "SELECT * FROM ks.table WHERE token(pk) > :start AND token(pk) <= :end 
  AND upload_time >='2020-01-01 00:00:00.000+00' 
  AND upload_time <'2021-01-01 00:00:00.000+0000' allow filtering" 

Another way is to use Spark with Spark Cassandra Connector (even in the local master mode) – it will do the same under the hood, something like this (example for spark-shell in Scala, could be done similarly via pyspark):

import org.apache.spark.sql.cassandra._
val data = spark.read.cassandraFormat("table", "keyspace").load()
val filtered = data.filter("upload_time >= cast('2020-01-01 00:00:00.000+00' as timestamp) AND upload_time <= cast('2021-01-01 00:00:00.000+0000' as timestamp)")
filtered.write.format("csv").save("data.csv")

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top