Show HN: Hashquery, a Python library for defining reusable analysis

cpimhoff | 67 points

I'm exactly the target audience for this type of tool, a tech leader that has implemented a data warehouse and BI strategy. Some concrete tips for adoption:

- Break the dependency on your product. I need to be able to use the library even if your company goes under.

- Add a dbt library that makes it easy to use hashquery within dbt models. It gets you materialization for free and will answer a lot of questions you will get about dbt integration.

To comment more broadly, if you want to be a broad solution, the going trend in data integration seems to be at the warehouse level so you need to have SQL answers.

A bunch of tools all consume from our warehouse (existing BI, reverse ETL, data science systems). A BI definition tool won't work if I can't define segments in a way that all of those can access, even as just tables or views.

Our programmers and data science people know Python and are often very good at SQL, but their time is short and BI projects depending on them have been delayed. Our analysts know SQL, and have the dedicated time to make these projects happen.

This kind of code snippet isn't crazy to put into dbt, and if someone wants to do Python magic in the background they can:

    hq_project.models.events_model.with_activity_schema(
        group='user_id', timestamp='timestamp', event_key='event_type'
    ).funnel("ad_impression", "add_to_cart", "buy").as_sql()
code_biologist | 12 days ago

This looks neat. I'm the author of a similar project in typescript we use at Cotera called Era [0]. Y'all might be implement something similar to our caching layer [1] which we think is super useful. Once you have a decent cross warehouse representation it's pretty easy to "split" queries across the real warehouse and something like duckdb. The other thing that we find useful in Era that y'all might like are "Invariants"[2]. Invariants work by compiling lazily evaluated invalid casts into the query that only trigger under failing conditions. We use "invariants" to fail a query at _runtime_, which eliminates TOUTOC problems that come from a DBT tests style solution.

    [0] https://newera.dev/
    [1] https://cotera.co/blog/how-era-brings-last-mile-analytics-to-any-data-warehouse-via-duckdb
    [2] https://newera.dev/docs/invariants
grantjpowell | 12 days ago

I'm potentially super interested in this as am building this kind of feature for my job at the moment

But https://hashquery.dev/#faq says:

> the Hashquery SQL compiler is not available to run locally, so you do need to define your data connections inside of Hashboard and use its API to execute your queries.

> We do plan on making the full Hashquery stack available to run locally in the near future

I'm not quite sure what the use case for this library is at present

If I'm not a Hashboard customer and don't want to pay $60/mo for a nicer way to query my existing db, what am I going to do with it?

Hashboard seems roughly similar to Superset and/or Cube?

anentropic | 13 days ago

This looks cool. I built a similar open source semantic data / warehousing tool called Zillion. I use it to power my company's BI but haven't put as much time into the polish as you guys.

https://github.com/totalhack/zillion

totalhack | 12 days ago

Looks pretty darn cool! Two questions please:

1. How does this compare to dbt? If we're already using dbt, why migrate?

2. Will you consider making a tool that tries to transpile SQL back to Hashquery models? This way I can work against my database, then merge the changes back to the model.

Good luck!

guy4261 | 13 days ago

> way beyond the capabilities of standard SQL

Maybe some examples would help

> AI and LLMs (coming soon)

Why?

demaga | 13 days ago

This is pretty cool, although DSLs can bring their own set of challenges vs writing SQL – how does it compare to dbt? (apart from it being python instead of SQL)

mpeg | 13 days ago

How aware is the library regarding existing structure, e.g foreign key relationships?

tomrod | 13 days ago

Interested to see a comparison between hashquery and ibis. https://ibis-project.org/

zsdev | 12 days ago