從 PowerBI 引擎之父的專訪深入理解PowerBI

  • 2019 年 10 月 6 日
  • 筆記

以下內容轉自Lars的部落格,參考原文。

Power BI 背後的團隊:Jeffrey Wang

If you are dealing with Power BI/Power Pivot, it doesn』t take long before you encounter the DAX language for the first time. Jeffrey Wang is Principal Software Engineer Manager at Microsoft and is considered the father of DAX and the VertiPaq engine behind it.

如果你正在使用Power BI或者Power Pivot,你很快就會發現DAX語言。Jeffrey Wang是工作在背後的微軟高級軟體工程師,並被稱為DAX / VertiPaq 引擎之父。

Lars: Jeffrey, I』m so glad you agreed to answer my questions. ? Would you like to briefly describe what your main tasks at Microsoft are?

Lars:Jeffrey,我很高興你可以接受我的採訪,首先可以簡短地描述下您在微軟的主要工作嗎?

Jeffrey: I work on the Power BI team and am in charge of the development of the DAX engine.

Jeffrey:我在Power BI團隊工作,負責DAX引擎的開發。

Lars: In the community your name is often labelled „the father of DAX„. How would you describe your role in the invention of DAX?

Lars:社區中都稱你是DAX之父,那麼您自己會怎麼看您在DAX產生過程中的角色?

Jeffrey: The invention of DAX was a joint effort of many people. I was just one of the inventors, not even the most important ones. My contribution to DAX became more prominent later on when I became the manager of the development team. In my effort to spread DAX knowledge, I started to present at community meetings and some organizers began to introduce me that way in order to get more people to come to my talks ?

Jeffrey:DAX的誕生是很多人工作的結果,我只是其中一員,甚至並不是最重要的角色。只是在我負責DAX開發團隊後,我的貢獻似乎更明顯了。在我傳播DAX知識過程中,我參與了很多社區會議然後有的組織會介紹一些夥伴來聽這些分享。

Lars: Why did you think you had to invent a new language then? What can DAX do that the existing languages could not?

Lars:當時你是怎麼想到要創建一種新的語言?是不是有什麼功能是當時其他語言無法實現而必須要創建一門新的語言了?

Jeffrey: We needed a language that can elegantly express core BI concepts, is friendly to business users without formal programming training, and can be easily implemented to handle billions of rows of data efficiently. We had considered three existing programming languages: Excel formulas, SQL, MDX. Excel formulas are the language of choice for business analysts, the audience we were targeting for self-service BI, but it lacked basic constructs to deal with relational database operations on top of structured data stored in tables and columns. SQL is a great language to express operations on structured data of any size. It just lacks the core BI concept of measure and is too verbose to express common BI query patterns. Some BI vendors chose to extend SQL to support BI measures. They all ended up being very hacky and confusing since they kind of broke SQL by making SQL-like expressions return un-SQL-ly results. Finally, MDX was a successful niche language in the BI market but it』s too hard for business people since it required users to learn too many multi-dimensional concepts, such as dimensions, attributes, hierarchies, etc. before they could understand the language. We built DAX on top of simple relational database concepts like tables, columns, and relationships which, unlike multi-dimensional ones, are intuitively understood by business analysts, and then we added the core BI ingredient of measures as a first-class citizen, and adopted the friendly format of Excel formulas along with a lot of Excel functions with which business users are already familiar.

Jeffrey:我們的初衷是設計一種更貼近業務人員使用的語言,而不需要正統的編程基礎,這門語言可以很優雅地表示核心的BI概念並可以很容易駕馭多達數十億行數據。當時我們參考了三種市面已有的語言:Excel公式,SQL,MDX。商業分析師(BA)一般會選擇使用Excel,我們定位這類用戶為自助BI用戶,但Excel公式無法直接處理在資料庫中的關係型數據。SQL是一種可以處理任意規模的很強大的語言,但是SQL並沒有包含核心的BI概念(例如:度量值),而且用SQL處理常見的BI任務時在寫法上並不是很高效。有些BI廠商擴展了SQL的能力來支援BI度量值這些概念,這從整個設計上來講,這與SQL混在一起總是會導致一些混亂,不夠清晰。而第三種MDX則是已被印證的定位於BI市場很成功的語言,但是MDX太難了,業務用戶是很難掌握的,他們必須要學大量多維建模的概念,包括:維度,屬性,層級等,才有可能理解MDX。因此,我們想到只在簡單的表,列和關係的基礎上設計一個語言,相比MDX,這更容易被業務用戶自然地理解。隨後,我們將度量值作為非常重要的概念加入DAX並用和Excel公式一樣的形式表現出來,因為業務用戶一般已經很習慣於使用Excel函數的感覺了。

Lars: If you had to describe DAX to someone who』d never heard of it: How would that look?

Lars:如果你向一個從來沒有接觸過DAX的人來介紹什麼是DAX,那你會怎麼說呢?

Jeffrey: To business users, I』ll tell them that DAX is a programming language that has Excel formula-like syntax and empowers you to define business metrics, also known as measures in BI jargon, which are reusable across many business reporting and analytics. To BI professionals, I』ll tell them that DAX is a programming and query language that combines core features of SQL and MDX in an Excel formula-like syntax to enable BI developers to define a feature rich BI semantic layer and to unleash the full power of the Vertipaq Engine, the in-memory columnar database.

Jeffrey:看情況。如果是業務用戶,我會告訴他們:DAX是一種程式語言,寫法上有點像Excel公式,但這種公式允許用戶定義業務邏輯,例如:度量值,這樣可以在很多業務報告及分析中復用。而如果是BI專家的話,我會告訴他們DAX是一種程式語言,同時也是一種查詢語言,它整合了SQL和MDX的核心特性並以類似Excel公式的形式提供出來,讓BI開發人員可以充分利用名叫Vertipaq的記憶體列式資料庫構建豐富的BI語義層。

Lars: When you designed DAX back then, which potential users were you looking for at that time? Has this picture changed in the meantime?

Lars:在一開始設計DAX的時候,那時定位的使用者是哪類人群,現在是否有什麼變化?

Jeffrey: The goal back then was to build a set of core technologies that can serve the diversified needs of users across the spectrum of self-service BI, team BI, and corporate BI, therefore we needed a programming language that』s simple enough for business analysts to use in their daily work yet powerful enough to express complex modeling concepts as required by large enterprise applications. It was an ambitious goal that is still relevant today and we have seen a steadily increasing number of business users as well as BI specialists adopt the technology as we had hoped.

Jeffrey:當時是希望可以構建一種核心技術,可以滿足包括self-service BI,團隊BI以及企業BI的各種需求,因此定位於打造一種足夠簡單的程式語言,使業務分析人員也可以直接用於他們的日常工作,而且這種語言不能因為簡單而喪失可以構建複雜模型的強大性,因為它同時要能構建大型企業級的應用。即使是從今天來看,當時的定位也是有點大的,但從使用這項技術的不斷增長的業務用戶以及BI專家來看,它基本滿足了我們當時的預期。

Lars: Recently, the Excel team announced that Power Pivot will soon be available in all Excel versions for Windows. In the foreseeable future all Excel users will have access to the DAX Engine! Do you hope that this will make DAX even more popular?

Lars:最近,Excel團隊已經宣布未來Power Pivot會整合進入Windows中Excel的所有版本。如果是這樣的話,所有的Excel用戶都可以使用DAX了,這將使DAX更加流行嗎?

Jeffrey: I certainly hope so. I even hope one day Excel team will add DAX queries to pivot tables and pivot charts to unleash the full power of the Power Pivot engine.

Jeffrey:我當然希望如此。我甚至希望有一天Excel團隊可以把DAX查詢功能加入到透視表和透視圖的原生功能以充分釋放Power Pivot引擎的能力。

Lars: DAX has many „hidden「 characteristics and concepts. Starting with filter and row contexts, via internal conversions (implicit CALCULATE statements, etc.), context transitions, etc. The language appears very simple at the beginning, but then becomes relatively fast increasingly complex. I know that many DAX users are unsettled by these „hidden「 concepts. An example: You already described on your old blog in early 2011 that there are three different forms in DAX how the date column can be referenced within a Time Intelligence function. Internally, however, these three forms are interpreted differently by DAX, which can have certain side effects. I am sure that in the development of DAX it was necessary to build the language exactly as you did. Nevertheless, could you please give the interested reader an insight into why there are so many hidden characteristics and special cases?

Lars:DAX有很多「隱藏」特性和概念。從篩選上下文以及行上下文開始,到隱式CALCULATE在內,上下文轉換等。在一開始學習DAX時感覺是比較簡單的,但是很快就會發現DAX並不簡單。例如,你在你之前的部落格(2011年)描述過DAX內部在處理時間智慧函數實際是三種不同的形式。我知道寫DAX時應該準確知道自己到底在做什麼,就要理解每個細節。那你可以給愛好者們一些建議或啟發,為什麼DAX有這麼多隱藏得很深的特性嗎?

Jeffrey: A simple DAX expression often belies complex calculations under the hood. The appearance of simplicity of some common DAX expressions is a double-edged sword. On the one hand, syntactical simplicity and flexibility enable beginners to write powerful calculations relatively easily and quickly. On the other hand, the behind-the-scene complex calculations often have side-effects when users move beyond basic scenarios without a deeper understanding of a breakdown of steps underlying a calculated result. The original design goal was to make the syntax as simple as possible in common usage patterns while conforming to a coherent and semantically sound language model. For example,

Jeffrey:任何一個表面簡單的DAX表達式實際在內部都可能有著複雜的計算邏輯。常見的DAX表達式看上去簡潔確實是雙刃劍。一方面,在一定程度上,簡潔和靈活可以讓初學者很快就學會寫出強大的計算;而另一方面,一旦用戶度過了基礎概念區,如果不真正理解DAX就會發現很多計算結果並不是表面那麼簡單,例如:

= CALCULATE ( [Total Sales], Product[Color] = „Red「 )

is syntax sugar for 其實就是一個語法糖(編譯器將上述內容直接翻譯成下面內容):

= CALCULATE ( [Total Sales], FILTER ( ALL ( Product[Color] ), Product[Color] = „Red「 ) )

The simpler form is more intuitive for beginners who often think of filters as Boolean expressions which return true or false for a given row. In reality, all DAX filters in filter context are table expressions on one side of a semi-join operator in relational algebra with the other side open-ended. I don』t think it』s wise to introduce the concept of semi-join to business users on their first day of DAX adventure. On the other hand, the example illustrates that DAX stands on a solid theoretical foundation so advanced users can freely combine any number of interesting features together to formulate arbitrarily complex business logic. On a related note, DAX beginners ran into their first hurdle once they started writing non-trivial DAX because measure, which is a reusable, dynamic expression, is a new concept to both Excel users and SQL developers.

簡潔的形式對於初學者更加直觀,所以很多人都會認為過濾器就是對於某行返回true或false的布爾表達式。而實際上,所有在篩選上下文中的DAX篩選器都是返回表的表達式,並且滿足左外連接的關係代數邏輯(BI佐羅註:擴展表原理)。我認為剛剛接觸DAX業務用戶去了解這些概念並不合適,另一方面,這個案例也說明了DAX其實是建立在堅實的理論基礎之上的,所以高級用戶可以用DAX來實現非常複雜的業務邏輯。值得一提的是,DAX初學者只要開始接觸度量值(可重用的動態表達式),就很可能會遇到這方面的問題。不論對Excel用戶還是SQL開發者,度量值都是一種新概念。

Lars: Is there anything else you want to/ can tell us about the future development of the DAX? Lars:對於DAX未來的開發,有什麼可以透露的嗎?

Jeffrey: DAX, coupled with rich modeling capabilities, gives Microsoft BI an edge over many competitors』 solutions. We plan to introduce even more flexibilities in DAX and modeling so that BI users of all levels can build analytical and reporting applications they can』t imagine today. Jeffrey:DAX 的豐富建模能力,讓微軟的在BI領域超越了很多競爭對手。我們計劃未來在DAX中引入更加靈活的特性讓各種層次的BI用戶都可以來做建模,並遠遠超過現在可想到的能力。 Lars: Jeffrey, thank you very much for answering my questions and I wish you and your team the continued creative power that already made our working lives so much easier. ? Lars:Jeffrey,非常感謝你能回答我的問題,也一如既往地期待你和你的團隊能像現在一樣給我們帶來源源不斷的生產力,讓我們的工作更加簡單。

關於Power BI DAX引擎之父

文中提到的DAX之父 Jeffrey Wang,江湖人稱DAX之父,但其卻是非常低調的。很少能在PowerBI社區或其他社交網路看到他,當然,專註於技術研發怎麼可能有那麼多其他時間。這裡簡單介紹一下 Jeffrey 先生:Principal Engineering Manager,Microsoft Power BI。Jeffrey從2004年加入微軟分析服務(Analysis Service)團隊,作為軟體工程師。後開始參與存儲引擎的性能優化工作。後來從事對MDX引擎的改進工作。他是DAX的發明者之一,DAX於2009年正式發布。他目前領導Power BI產品組中DAX引擎以及Direct Query引擎的開發工作。

這還是在去年大神給領域夥伴分享時的照片,如下:

可以看出大神的感覺還是很平易近人的。最為振奮的是:Jeffrey是位華人。這應該是最讓人振奮的了。想像一下,我們用的Power BI的引擎的打造者原來和我們這麼近。

另外,如果您是DAX的深度用戶,你一定看過《The Definitive Guide to DAX:》,Jeffrey為該書做了前言。另外,在2018年最新上市的《Beginning DAX with Power BI》一書中,Jeffrey 是該書的技術評審。

總結

我們從了 DAX 之父的寄語中彷彿看到 Power BI 以及 DAX 在未來巨大的生命力。如果Power BI已經改變我們的工作和生活,並且您和我一樣期待 DAX 的發展,讓我們也一起期待吧。最後,如果可以對大神說點什麼,歡迎您在本文下方留言。