Automatic Classification of Legal Cross References Based on Semantic Intent

Nicolas Sannier, Morayo Adedjouma, Mehrdad Sabetzadeh and Lionel Briand
SNT - University of Luxembourg

contact: firstname.lastname @ uni.lu

Many software systems involved in domains such as healthcare, public administration, taxation or social security have to comply with the law and regulations. In order to assurance compliance with the law, requirements analysts have to deal with the legal provisions during all the system's lifecycle, at early design stages but also during the life of the system as the system shall evolve along with the law. Dealing with legal provisions is complex. One factor of complexity is that the necessary information that a requirements analyst needs to gather and interpret for one particular purpose is not located in one place. In particular, it is common practice in legal drafting to rely on and refer to other provisions where relevant information is already written and enforced. This is done through the extensive use of cross references.

Cross references have an impact in software requirements. First, One must follow them and have a non linear reading and interpretation of the law. Second, what is implied by a given cross reference has an impact on software requirements the software analyst is trying to infer from the law. Many taxonomies already exist, providing such an intent for legal cross references. However, some work also emphasize the difficulty for software engineers and requirements engineers to handle these cross references and thus, would require guidance, tooling and help. Unfortunately, none of the existing taxonomies comes equipped with such tool support nor provide adequate example that would guide analyst in interpreting legal cross references.

In this work, we want to address the above issues and aim to propose an approach for classifying CRs based on their semantic intent. Such automation has several benefits including the following ones: (1) The number of CRs that need to be considered by analysts may be large, in the hundreds or thousands and automated classification would help to reduce effort. (2) Such automation would provide a-priori knowledge about the intent of CRs. (3) Automatic classification of cross references can also lead to better organizations of requirements engineering activities. More specifically, the present work is aimed at addressing the following RQs:

To answers these questions, we follow the approach described in the picture below:

More particularly, this work is mainly based on the observations of a qualitative that we performed over a sizeable set of legal cross references from Luxembourg's legislative corpus. These observations allowed us to derive a catalog of natural language patterns (presented below), holding a particular intent type that can be associated to cross references. We built an automated classification solution, based on the catalog of patterns, and supported by the natural language processing framework GATE. We further evaluate the accuracy of our classifier over two case studies.

This page contains supporting material that was recorded all along the different steps of the approach. Technicalities are not addressed in this page. More speifically, this page gives an access to:

  1. The material of our qualitative study on semantic intents for legal cross references performed over two texts: The Income Tax Law and the Draft Law n° 6457 from Luxembourg's Legislation. (RQ1) More particularly, we investigated the first seven chapters of the Income Tax Law, and the first chapter of the Draft Law n° 6457, which represent a total of 1079 inspected cross references.
    Both texts are written in French. Though some of the comments are in English, the patterns, variants are in French and were translated to English for the sake of illustration in a paper.

    During the qualitative study, we recorded the following observations: the cross reference location (provision number) for traceability purpose, the cross reference itself, the complete sentence where the cross reference appears, the phrase that hold the semantic intent as well as the intent hold by this phrase. The picture below illustrates this activity. However, most of this information was recorded using spreadsheets and can be open using Microsoft Excel (2007 or above).



  2. The catalog of natural language patterns (and the numerous variants thereof) for each intent type under the form of several .lst files. These patterns are extended with their variant in gender and number as some language, including French, distinguish gender and plurality (and the combination thereof).
    These files are encoded in UTF-8 format. You may have to change the encoding display in your browser for these particular pages in order to avoid encoding issues. To do so, please change encoding in the "View" menu of your browser (This will only affect the current page). You can open these files using any standard text editor.

    list of auxiliary terms (and variants) reported during our qualitative study and expanded with their variant in gender and number. 
    One can open this files using any standard text editor.



  3. Evaluation material for the two case studies (RQ3).