CVR Online 3.0 - User Guide June 2009
Version Dato Beskrivelse Forfatter Version 0.1 4. December 2007 Creation Erik W. Rasmussen Version 1.0 9. January 2008 Added payment information. Changed various formulations. Erik W. Rasmussen Version 1.1 25. April 2008 Changed various URLs Erik W. Rasmussen Version 1.2 28. July 2008 Added access via SSL. Fixed metadata URL s. Erik W. Rasmussen Version 1.3 14. August 2008 OIOWSDL reference added. Erik W. Rasmussen Version 1.4 29. May 2009 Added access via client authenticated SSL. Removed WS- Security access description. Erik W. Rasmussen Version 1.5 9. June 2009 Added reference to OSWA Model T Erik W. Rasmussen 2
Table of contents 1 Introduction...4 1.1 Definition...4 1.2 WSDL...4 1.3 Other URL s...4 1.4 Further information...4 2 Data model...6 2.1 Introduction...6 2.2 Temporal data...6 2.3 The basic model...6 2.4 Details in the model...8 2.4.1 Common data types for legal unit and production unit...8 2.4.2 Legal unit...8 2.4.3 Production unit...9 2.4.4 Participant...9 3 Main description of the web services...10 3.1 Introduction...10 3.2 Access...10 3.3 Web services for searching...10 3.4 Web services for retrieval...11 4 Security...13 4.1 Introduction...13 4.2 Usage of SSL...13 5 Error handling...14 5.1 Introduction...14 5.2 Fatal errors...14 5.3 Other errors...14 6 XML Examples...15 7 Appendix I: Structure of unique identifiers...16 7.1 Introduction...16 7.2 Legal unit identifier...16 7.2.1 Example...16 7.3 Production unit identifier...16 7.3.1 Example...17 7.4 Participant identifier...17 8 Appendix II: Code example of SSL access...18 3
1 Introduction 1.1 Definition CVR Online 3.0 is a collection of stateless SOAP based web services that provide access to some of the data from the CVR system (in Danish: Det Centrale Virksomhedsregister). It is a requirement that one has a subscription in order to use these web services. 1.2 WSDL The web services are defined by a number WSDL files. They can be found at the site along with other information 1. The individual WSDL files are complete in the sense that they contain all used types, i.e. all XML Schema definitions are embedded in the files. All XML Schema definitions conform to the rules specified by Offentlig Information Online (OIO). These rules among other things require that all the XML Schema definitions must be published along with separate metadata files containing a textual description of the type definition. 1.3 Other URL s The current version number of the web services can be acquired by calling the site 2. This URL can be used to ping the web services if required. The public certificate of E&S can also be found at the site 3. 1.4 Further information For further information about subscriptions and other general information, please refer to the CVR web site at http://www.cvr.dk For information about the rules concerning the XML Schema definitions made by OIO ( OIOXML Naming and Design Rules ), please refer to http://www.itst.dk 4. For information about the rules concerning the WSDL files made by OIO (Vejledning for udvikling og anvendelse af OIOWSDL. In English: Guide for development and usage of OIOWSDL), Download from http://www.itst.dk 5. 1 URL: http://archprod.service.eogs.dk/cvronline/metacvronline/urls.htm 2 URL: http://archprod.service.eogs.dk/cvronline/metacvronline/revision 3 URL: http://archprod.service.eogs.dk/cvronline/metacvronline/eogs.cer 4 Full path: http://www.itst.dk/arkitektur-ogstandarder/standardisering/datastandardisering/oioxml-udvikling/regler/overhold-reglernendr 4
The individual XML Schema definitions and metadata files are published with the namespace http://rep.oio.dk/eogs/xml.schema/ in accordance to the rules specified by the OIO rules version 3.1. The definitions can be searched for and retrieved at http://digitaliser.dk The security model is conformant with the OWSA Model T specification, please refer to: http://www.itst.dk 6 5 Full path: http://www.itst.dk/arkitektur-og-standarder/standardisering/standarder-forserviceorienteret-infrastruktur/standarder-for-webservices/oiowsdl-kontrakt-forst-udviklingmed-oioxml 6 Full path: http://www.itst.dk/arkitektur-og-standarder/standardisering/standarder-forserviceorienteret-infrastruktur/standarder-for-webservices/oio-web-service-arkitekturen 5
2 Data model 2.1 Introduction The data model describes data and their relations in CVR Online. The data model is illustrated using UML class diagrams. 2.2 Temporal data Data in the CVR system are but a few exceptions fully temporal, i.e. all data are valid in a certain time interval only (including for ever). This can be illustrated with a time axis: X Y Z 15-02-2005 31-12-2005 01-02-2007 10-04-2007 11-04-2007 Here a datum has the value X in the interval 15th February 2005 to 31st December 2005 (both dates inclusive. The first date is known as the validfrom date and the last date is known as the validto date). Afterwards it has no value defined until the interval 1st February 2007 to 10th April 2007 where it has the value Y, which changes to the value Z from 11th April 2007 (inclusive). The point is that these data are all in the CVR system at the same time, i.e. data are time varying. Hence the retrieved value depends on the date of inquiry. This for example enables changing a telephone number in advance by setting the validfrom date for the telephone number in the future. It also enables a change backward in time. The telephone number can for example be changed with effect from the first day in the previous month. CVR Online only returns some of these data namely the data valid today s date at the time of inquiry. In the example above a request made the 1st May 2005 would return the value X, while a request made the 1st April 2006 would not return any value. Metadata are also returned along with the normal data, consisting of a validfrom and validto date, although the validto date will not be present for data that are valid for ever on (the value Z in the example above). Thus it is possible to see when data was created, changed and when they expire. 2.3 The basic model Data in the CVR system are based on three basic entities: Legal unit, production unit, and participant (in Danish: juridisk enhed, produktionsenhed og fuldt ansvarlig deltager). 6
The legal unit is defined as the legal entity that performs some kind of activity. An example of a legal unit is the Danish entity aktieselskab. A production unit is defined as a physical location where the legal unit permanently operates. Hence, if the activity is temporary by nature the physical location does not count as a production unit. Construction sites, for example, are thus not counted as production units. A legal unit can have multiple production units associated, for example corresponding to the case of an audit firm having multiple office locations around the country. A production unit does not have to be affiliated with a legal unit. This is for example the case if the legal unit is based abroad. A participant is a special kind of person that has special obligations related to a legal unit. The participant can either be a real person or a legal person, and either Danish or of other type (i.e. person that cannot be identified by a CPR-number or CVR-number, typically foreign persons). In total, there are three participant variants: legal unit, Danish real person, and other persons (real or legal). An example of a participant is the Danish term interessent for the Danish legal unit type called I/S. A legal unit can have multiple participants associated and a participant can be associated with multiple legal units. The resulting basic UML data model looks like this: * Participant * Legal unit Danish person Other persons 0..1 * Production unit 7
2.4 Details in the model Below is a more detailed description of the basic entities. For the very detailed description of the data, please refer to the metadata files, see paragraph 1.2. 2.4.1 Common data types for legal unit and production unit Both legal and production units are associated with information about their life cycle, addresses, activities, contact information, employment, and advertising protection. The life cycle contains the dates for the start and cessation of the unit. If the unit is active for ever on then no cessation date is present. If today s date is after the cessation date the unit is no longer active. Both legal and production units have an associated physical address (called official address for legal units and location address for production units). The physical address will normally have codes from the road register of CPR associated (see http://www.cpr.dk and http://www.adresse-info.dk/ for a further description). In addition to the physical address, a unit can also have a postal address and/or a post office box address. These addresses indicate that the unit wishes to have its mail forwarded to theses addresses. A unit has activity information attached consisting of activity codes defined by Statistics Denmark (see http://www.dst.dk/). All units have a main activity. In addition a unit can have one or more secondary activity codes attached. Contact information is information about the unit telephone number, fax number, and email address. The information is optional, i.e. a unit does not necessarily have this information attached. Information about the number of employees is retrieved from external authorities. Therefore, all units do not have this information. Furthermore, one cannot expect that the newest data are available. Legal units have both yearly and quarterly numbers while production units only have yearly numbers. Both legal and production units can have an advertising protection flag. In the model this flag is attached to the individual unit. If a unit has this flag set, the information in the CVR system must not be used in unsolicited mail campaigns or for other contact purposes. 2.4.2 Legal unit A legal unit is uniquely identified by a legal unit identifier (CVR number). In addition to the data listed in the previous paragraph a legal unit has the following data attached: Business format, creditor status, and obligations. The business format is a code that represents the legal identity of the legal unit. Examples of business formats are the Danish legal formats aktieselskab, frivillig forening, filial af udenlandsk aktieselskab, etc. The creditor status is only present if a legal unit is in a state of bankruptcy. The information indicates the detailed state of the bankruptcy. 8
Currently only one type of obligation is defined, namely the Danish term arbejdsgiver. 2.4.3 Production unit A production unit is uniquely defined using a production unit identifier (P-number). Data for a production unit consists primarily of the data mentioned in the paragraph 2.4.1. 2.4.4 Participant A participant is uniquely identified by a participant identifier (FAD-number). If the participant in question is a legal unit, the legal unit identifier is returned as well. If the participant is a Danish person the CPR-number is returned as well, if and only if the user in question is authorized to have access to these (see the next chapters). 9
3 Main description of the web services 3.1 Introduction This chapter describes the individual web services from a main point of view including input and output. 3.2 Access Authentication and authorization is done by attaching a user name and password to each web service call. Each user has individual access rights that determine whether protected data are returned or not. Usage and payment is also individual for each user. 3.3 Web services for searching There are two web services used for searching: One for searching legal units and one for searching production units. Both web services have a single operation that accepts an arbitrary combination of search terms and performs a search. The result is a list of found legal unit or production unit identifiers. Usage of the two web services is not charged. The table below gives an overview of the input and output: 10
Legal unit search Input (possible search terms): Name Address (street name, street building identifier, C/O name, postal code, municipality code, region code) Contact information Activity code Start date (life cycle) Status (life cycle) as active/all Number of employee interval (quarterly) Advertising protection Business format code Creditor status code Number of production units CPR-number (participant) Production unit search Input (possible search terms): Name Address (street name, street building identifier, C/O name, postal code, municipality code, region code) Contact information Activity code Start date (life cycle) Status (life cycle) as active/all Number of employee interval (quarterly) Advertising protection Legal unit identifier (affiliation) Output: List of legal unit identifiers Output: List of production unit identifiers Per default only active units are searched. By using the Status search term, it is possible to extend the search to include units currently not active. This can also be achieved by using the Start date search term. Wildcard searches are possible for the name and street name search terms. A wildcard is expressed with a star (*) in the end of a search word and multiple search words are possible. A search term for street names can thus look like this: mørk* vej. Regarding start date, number of employees and number of production units, the search is performed using an interval, i.e. the search term is an interval and only units having a value inside the interval are returned. All intervals are finite, i.e. the from and to values are included in the search. For all other data searching is done using an exact match of the search term. All search terms must match, i.e. a logical AND of the search terms is enforced. 3.4 Web services for retrieval There are two web services for retrieval of units: One for retrieving a legal unit and one for retrieving a production unit. Usage of these web services are charged. Both web services have a single operation that takes a unique identification identifier as input (legal unit identifier for legal units, production unit identifier for production units). As output the details of the unit are returned. 11
Furthermore a level parameter is also part of the input with the values 1, 2 or 3. This parameter dictates which type of data is to be returned. This parameter is inclusive i.e. level 2 includes data for both level 1 and 2. Usage charging depends on the level parameter and increases with the level. The table below presents an overview: Retrieval of legal unit Level 1: Legal unit identifier Name Official address Postal address Post office box address Start- and cessation dates Advertising protection Export/import flag Affiliated production unit identifiers Retrieval of production unit Level 1: Production unit identifier Name Official address Postal address Post office box address Start- and cessation dates Advertising protection Ancillary/main division flag Legal unit identifier affiliation Level 2: Main and secondary activities Activity responsability Number of employees interval (yearly + quarterly) Exact number of employees (yearly + quarterly) Business format Data supplier identifier Obligations Creditor status Participants Level 2: Main and secondary activities Activity responsability Number of employees interval (yearly) Exact number of employees (yearly) Level 3: Telephone number Fax number Email address Foreign telephone number Foreign fax number Level 3: Telephone number Fax number Email address Foreign telephone number Foreign fax number Data is returned regardless of whether the unit is active or not. The CPR-numbers of participants and the exact number of employees are only returned if the user is authorized to have access to these, see paragraph 3.2. Note that if data for level 3 are requested but no data exist for level 3, then the user is only charged for level 2. 12
4 Security 4.1 Introduction Communication with the individual web services is done by using a number of security techniques. Authentication and authorization are identical for all the web services. Authentication is achieved using a user name. Authorization is achieved using a password and attached roles in the CVR Online system. Confidentiality is achieved by using encryption. As encryption mechanism standard SSL or client authenticated SSL is used (refer to OWSA Model T). Note that customers CANNOT freely choose between the two protocols: If access to sensitive information (like e.g. CPR numbers) is required then client authenticated SSL must be used. Also note that a user name may be tied to a particular SSL mechanism, i.e. a user name issued to use client authenticated SSL cannot access the web services using standard SSL. 4.2 Usage of SSL Standard HTTP over SSL is used as the transport protocol (HTTPS). If access is by client authenticated SSL then the client must do a SSL handshake using a valid OCES certificate. An OCES certificate is a variant of X509 certificates, see http://www.danid.dk. The certificate is valid if it is not expired, not revoked and is signed by the TDC OCES CA root certificate. 13
5 Error handling 5.1 Introduction Different error situations can occur when calling CVR Online. Two types of errors are considered here: Fatal errors and other errors, as these are handled differently. 5.2 Fatal errors Fatal errors prevent the request from being processed or terminated. These include: The received XML does not validate against the XML Schema Illegal/missing user name and/or password Serious internal errors in CVROnline In these cases, a SOAP fault is generated. According to the SOAP standard the response will contain a header with a SOAPFault element describing the error. 5.3 Other errors Other errors are errors in the requests, e.g. originating from the user. Examples of this include: Wrong usage of wildcards (see paragraph 3.3). Illegal legal unit identifier (i.e. the modulo-11 check fails, see the appendix I) The approach used by the CVR Online system is one of fault tolerance: If at all possible, an error is not returned. Instead CVR Online either repairs the request (for example in the case of wrong wildcards) or ignores the error (for example by an illegal legal unit identifier: an empty response is returned). 14
6 XML Examples A number of examples of requests and responses exist in a separate zip file. The following xml examples exist: Service Files LegalUnitGet LegalUnitSearch ProductionUnitGet ProductionUnitSearch lug_request.xml lug_response.xml lus_request.xml lus_response.xml pug_request.xml pug_response.xml pus_request.xml pus_response.xml For each individual web service there is an example of a request and response. 15
7 Appendix I: Structure of unique identifiers 7.1 Introduction Each of the three basic entities legal unit, production unit and participant has a unique identifier attached. This appendix describes their structure. 7.2 Legal unit identifier A legal unit identifier is an eight-digit number that uniquely identifies a legal unit. A legal unit identifier is assigned by the CVR system. A legal unit identifier is structured in order to validate using a modulo-11 check, i.e. it is possible to check whether a legal unit identifier has a legal structure or not. The following weights are used in this control: Position 1 2 3 4 5 6 7 8 Weight 2 7 6 5 4 3 2 1 The modulo-11 check is performed as follows: for each digit position in the legal unit identifier the value is multiplied with the corresponding weight. All these values are added together and a modulo-11 operation is performed. If the result of this modulo-11 operation is not zero, then the legal unit identifier is invalid. 7.2.1 Example Description Value Legal unit identifier 29973334 Sum of position multiplied weight 2*2+9*7+9*6+7*5+3*4+3*3+3*2+4*1 = 187 Modulo-11 187 modulo 11 = 0 7.3 Production unit identifier A production unit identifier is a ten-digit number that uniquely identifies a production unit. A production unit identifier is assigned by the CVR system. A production unit identifier is structured in order to validate using a modulo-11 check, i.e. it is possible to check whether a production unit identifier has a legal structure or not. Depending 16
of the value of the production unit identifier the following weights are used (this is because the weights have changed): For production unit identifiers less than or equal to 1006959421: Position 1 2 3 4 5 6 7 8 9 10 Weight 1 5 6 7 3 6 4 8 9 1 For all other production unit identifiers: Position 1 2 3 4 5 6 7 8 9 10 Weight 4 3 2 7 6 5 4 3 2 1 The modulo-11 check is performed as follows: for each digit position in the production unit identifier the value is multiplied with the corresponding weight. All these values are added together and a modulo-11 operation is performed. If the result of this modulo-11 operation is not zero then the production unit identifier is invalid. 7.3.1 Example Description Value Production unit identifier 1012699294 Sum of position multiplied weight 1*4+0*3+1*2+2*7+6*6+9*5+9*4+2*3+9*2+4*1 = 165 Modulo-11 165 modulo 11 = 0 7.4 Participant identifier A participant identifier is a ten-digit number that uniquely identifies a participant. A participant identifier is assigned by the CVR system. A participant identifier is not subject to modulo-11 checks. 17
8 Appendix II: Code example of SSL access Two small clients have been produced that accesses the web services using client authenticated SSL: A Java client and a.net client. The example clients are not meant for production usage, but rather for demonstrating how to access the web services code wise (among other things the clients are bound to the preproduction environment). Both clients use the LegalUnitGet web service as an example. 18