Introduction & Information Vít PÁSZTO 3.3.2022 • Historical city (UNESCO) • 100,000 inhabitants (+23,000 students) • MVSO: – Regional college/university – One and only economic school in the region – Approx. 400 students – Closely connected with business enterprises/companies Introduction • New complex („smart building“) Introduction My background • PhD at Palacký University in Olomouc • Member of MVSO since 2013 • Field of study – geoinformatics • Scientific interests – Geocomputation (Fuzzy sets and logic in spatial analyses, Entropy measurement of geographical phemomena, Fractal geometry application in geoinformatics, Shape metrics in spatial analyses) – Physical and human geography – Economic geography (Spationomy) – Gamification/serious learning • Science studying spatial information and geo-related issues • „Geography in computer“ • GIS, Remote Sensing, GNSS, Cartography, Spatial statistics etc. • Application in many fields, including economy (Insurance, Geomarketing, Location Based Services, BIS, Facility mgmt., Customer analyses…) • Informatics for Economists • Computer Networks • Advanced Visualisation of (economic) data • Data transmission // HTML and web resources Geoinformatics MVSO YOU Course schedule & organization 1. Every week – Thursday – 13:15-14:45 (ACN) • but… Course programme (Computer Networks) 1. Introduction to computer networks 2. Network topology and classification, network architectures 3. ISO/OSI reference model 4. Physical layer 5. Data linking layer 6. Network layer 7. TCP/IP. IP protocol. IP addresses. Routing in the internet. Protocols TCP and UDP. The DHCP service 8. Computer network nodes and communication media Requirements/Exam • Attendandce • Active approach • Homeworks • Theoretical test Literature • TANENBAUM, A. S. - WETHERALL, D. J. Computer Networks. 5.vyd.. Prentice Hall, 2010. ISBN 978-0132126953. • LOWE, D. Networking All-in-One For Dummies, 5.vyd. John Wiley & Sons, 2012. ISBN 1118380983. • Miller J. B. Internet Technologies and Information Services (Library and Information Science Text Series). LIbraries Unlimited Inc., 2014. • Jacobson D., Woods D., Brail G. APIs: A Strategy Guide.. O'Reilly Media, 2011. ISBN 1449308929. • Sosinski B. Cloud Computing Bible. John Wiley & Sons, 2011. ISBN 0470903562. • Peterson M. P. Mapping in the Cloud. The Guilford Press, 2014. ISBN 1462510418. • And lot more + internet Computer configuration What drives/controls the World? What drives/control the World? Information is not data Information • information is not data 2 694 737 399 571 25 255 15 216 241 1 749 865 34 290 229 235 540 360 170 707 12 483 6 039 81 272 613 673 33 877 314 688 31 368 1 457 987 21 223 152 4 806 24 968 159 363 14 869 763 655 8 107 282 3 797 16 073 141 202 14 707 705 442 6 93 677 2 520 13 814 76 802 9 275 292 138 6 51 096 570 6 380 172 030 17 883 890 478 23 117 892 2 011 15 828 114 472 10 856 487 395 12 82 647 1 357 9 363 133 970 12 329 763 535 15 93 320 2 454 12 302 115 116 10 430 617 423 5 78 892 1 888 11 585 107 395 8 019 452 489 2 73 148 3 508 10 635 50.069387 14.448793 • information is not data 2 694 737 399 571 25 255 15 216 241 1 749 865 34 290 229 235 540 360 170 707 12 483 6 039 81 272 613 673 33 877 314 688 31 368 1 457 987 21 223 152 4 806 24 968 159 363 14 869 763 655 8 107 282 3 797 16 073 141 202 14 707 705 442 6 93 677 2 520 13 814 76 802 9 275 292 138 6 51 096 570 6 380 172 030 17 883 890 478 23 117 892 2 011 15 828 114 472 10 856 487 395 12 82 647 1 357 9 363 133 970 12 329 763 535 15 93 320 2 454 12 302 115 116 10 430 617 423 5 78 892 1 888 11 585 107 395 8 019 452 489 2 73 148 3 508 10 635 295 523 45 308 2 837 2 056 32 189 938 4 426 25 778 137 119 12 524 781 596 6 95 682 1 963 12 922 138 197 14 049 738 287 10 97 535 1 909 12 647 248 500 27 247 1 990 1 696 14 172 991 2 408 23 063 Information Registrované ekonomické subjekty Období31.12.2013 Počet subjektů celkem Obchodní společnosti Družstva Státní podniky Fyzické osoby celkem z toho akciové společnosti soukromí podnikatelé podnikající dle živnostenského zákona zemědělští podnikatelé soukromí podnikatelé podnikající dle jiných zákonů než živnostenského Česká republika 2 694 737 399 571 25 255 15 216 241 1 749 865 34 290 229 235 Hlavní město Praha 540 360 170 707 12 483 6 039 81 272 613 673 33 877 Středočeský kraj 314 688 31 368 1 457 987 21 223 152 4 806 24 968 Jihočeský kraj 159 363 14 869 763 655 8 107 282 3 797 16 073 Plzeňský kraj 141 202 14 707 705 442 6 93 677 2 520 13 814 Karlovarský kraj 76 802 9 275 292 138 6 51 096 570 6 380 Ústecký kraj 172 030 17 883 890 478 23 117 892 2 011 15 828 Liberecký kraj 114 472 10 856 487 395 12 82 647 1 357 9 363 Královéhradecký kraj 133 970 12 329 763 535 15 93 320 2 454 12 302 Pardubický kraj 115 116 10 430 617 423 5 78 892 1 888 11 585 Kraj Vysočina 107 395 8 019 452 489 2 73 148 3 508 10 635 Jihomoravský kraj 295 523 45 308 2 837 2 056 32 189 938 4 426 25 778 Olomoucký kraj 137 119 12 524 781 596 6 95 682 1 963 12 922 Zlínský kraj 138 197 14 049 738 287 10 97 535 1 909 12 647 Moravskoslezský kraj 248 500 27 247 1 990 1 696 14 172 991 2 408 23 063 Information Information • information is not data Information Information is not data 50.069387 & 14.448793 Information is not data 50.069387 & 14.448793 • Before party...part II. #getting #ready #sekt #strawberries #bubbles #food #mask #up #NYE #2013… • Tue Dec 31 20:18:28 +0000 2013 • I´m Maria. I live in Prague. I like playing guitar, singing, reading, traveling, writing stories, going to concerts, music♥ ATL, P!ATD, SP, MT, YMA6...! • 302/119 (Následuje/Následována) • 50.069387x14.448793 Information is not data 50.069387 & 14.448793 23 Data, Information, Knowledge, Wisdom • Data – Objective datum/figure about existing phenomenon – numbers, text, symbols… – Acquired by measurement, experiment, observation, survey – Is a representative tool of facts with one-way and unique importance/meaning – We try to interpret data 24 Data // Information // Knowledge // Wisdom Data, Information, Knowledge, Wisdom • Information – Informatio/informare – to put ideas into a form – Materialisation of ideas to: • Inform, communicate and transfer „message“ – Data with meaning based on user‘s: • Knowledge, experiences, cognisance, and skills – We try to synthesise information 25 Data // Information // Knowledge // Wisdom • Knowledge – When we understand relations/laws/rules – Information with added value – Allows decision-making based on: • Interpretation, experiences, exploring, understanding, intelligence and ability to put things into a context – Is broader and deeper than data or information 26 Data // Information // Knowledge // Wisdom • Wisdom – Connected with individual learning with personal context – Information with added value – Set of knowledge coming from understanding of the problems essence in given context – Is based on: • Knowledge competences (intellectual and emotional), high level of human cognition, its evaluation criteria and dual relationship with the environment 27 Data // Information // Knowledge // Wisdom 28 Data // Information // Knowledge // Wisdom • Data management (creation, organisation, manipulation, translation, exchange) is a key issue for subsequent interpretation of phenomena that data represents • Data mgmt is the most demanding part in data analyses – It might take even 80 % (or more) of the time • Example – presentation of the analysis to your boss – You might show „just“ few charts, maps, visualisation of data but: • First you have to acquire data, then • You choose format to work with, and also software • You check, adjust, filter, select data etc.; i.e. data editing • You perform analysis (calculations) • You prepare final outputs (visualisation in general), output format/form (e.g. in PDF) • Lastly, you do interpretation, outline relevant information, summary etc. 29 D-I-K-W summary • To acquire a comprehensive dataset, you have to think about the data first. You must distinguish between: – Primary data vs. secundary data • Primary data – Simply, data that does not exist and you create it, i.e. completely new data – Reasons of primary data creation: • Data you need does not exist in terms of quantity (underrepresentative sample) or quality (too much bias/uncertainty, data are not reliable or trusty) • Lack of detail (some of the parameters are missing) • Data is not actual • Data is not geographically coherent (e.g. some regions are missing) – You can collect/create data yourself, or ask another subject (e.g. data provider) 30 Origin of data and its metadata • Primary data – Ordinarily, we collect data by • Measurement • Survey (e.g. questionnaire) • Mass survey (e.g. census) • Testing • Interview • Observation – It is always important to choose a method that suits the analysis goal – Usualy, you can encounter mistakes when dealing with primary data. Most common are: • Systematic (easily to debug) – e.g. measurement errors (non-calibrated measurement with known deviance) • Random – you have to perform basic statistical check (e.g. outlier analysis) to reveal and eliminate/correct them 31 Origin of data and its metadata • Secondary data – Represents data taken over from other parties (e.g. statistical offices) – Already existing data sets (free of charge, or purchased) – First, you should explore secondary data sources before you decide to create primary data – Examples of secondary data sources: • Statistical databases • Scientific databases and documents • Official documents (e.g. legislative docs, norms, laws, yearbooks etc.) • Specialised surveys (e.g. labour market performance, household expenditures, etc.) • Annual reports • Mass and social media – In all cases, the data validity and reliability should be checked (even in case of data sources from public institutions/local authorities etc.) 32 Origin of data and its metadata • In order to check secondary data, use common sense • All good-quality data sets are complemented with „metadata“: – Simply said – data about data – Usually, it is a plain document describing all relevant aspects about the secondary data sets, e.g.: • Provider/author/owner • Date of acquisition/creation (data validity in terms of time) • Data updates • Attribute and geographical scale/coverage • Description of indicators • Data resolution • Method of acquisition • Licenses and restrictions for further use • Data format, coding etc. – Commonly, metadata is in XML file, or in txt, or directly e.g. in Excel sheet 33 Metadata • What is the difference between information and data? • What is the difference between primary and secondary data? • What you can/should find in metadata? 34 Q&A 35 Open Data Concept & Data Formats • A lot of data produced by national agencies or science • Data we are „paying for“ and are about „us“ • Availability for everybody without any restrictions • Great potential for building an applications upon it • Data requirements: – Openness (technical & legal) – Accessibility and originality (native data) – Clarity (catalogue/metadata) Open data principles • Open Data Barometer 37 Open data principles • Open Data Barometer • All good-quality data sets are complemented with „metadata“: – Simply said – data about data – Usually, it is a plain document describing all relevant aspects about the secondary data sets, e.g.: • Provider/author/owner • Date of acquisition/creation (data validity in terms of time) • Data updates • Attribute and geographical scale/coverage • Description of indicators • Data resolution • Method of acquisition • Licenses and restrictions for further use • Data format, coding etc. – Commonly, metadata is in XML file, or in txt, or directly e.g. in Excel sheet 38 Open data principles • Availability vs. „availability“ Open data principles • Some of the fundamental requirements (Kitchin, Open Definition): – Access – Redistribution – Reuse – Absence of technological restriction – attribution – Integrity – No discrimination against persons or groups – No discrimination against fields or endeavour – Distribution 40 Open data principles • Some of the fundamental requirements (Open Government Data ): – Complete – Primary – Actual – Accessible – Readable by computer – With no restriction of use – Not proprietary – No copyrights – Open (in terms of contact to the author) 41 Open data principles • Here is the list of most common data formats used: – .txt – .csv – .xls/xlsx – .ods/.odt – .xml/ for geographical data - .gml/.kml/.gpx – .pdf • Statistical sw formats – IBM SPSS - .sav – Statistica - .sta – Stata - .dta – R Project - .R/.Rdata – MATLAB - .M/.MAT – Mathematica - .NB 42 Data Formats • http://data.gov.uk • https://www.data.gov/ • Austria? • Task no. 1 – look up for your countries‘ open data „Open data“ Phenomenon Why it is useful to visualize? As of 31 December 2013, there were a total of two million six hundred and ninety-four thousand seven hundred and thirtyseven registered economic entities in the Czech Republic, of which 399,571 were commercial companies, of which a total of 25,255 entities had the status of a joint-stock company. In total, there were also 15,216 cooperatives and 241 state-owned enterprises. Most of the total number of registered economic entities existed by private entrepreneurs doing business according to the Trade Licensing Act (1,749,865). Natural persons and agricultural entrepreneurs were represented in the number of 34,290 entities. Finally, there were 229,235 private entrepreneurs doing business under laws other than the Trade Licensing Act. Visualization matters showcase 1 (anscombe) Visualization online tools Visualization online tools - selection . Infogr.am . Create.ly . Datawrapper.de . Easel.ly . Plotly Information&visualization tool Wolfram alpha Map Storytelling . Story Maps ArcGIS . Odyssey.js . StoryMap.js . Google Tour Builder . Google My Maps Data journalism . Český rozhlas / data . NY Times / Upshot . BBC / handbook . Guardian / data Big data a visualization 0 2000 4000 6000 8000 10000 12000 14000 0 2000 4000 6000 8000 10000 12000 14000 5th Sept.: Cease-fire Announcement; 7th Sept.: shelling near the Donetsk airport 0 2000 4000 6000 8000 10000 12000 14000 Since 3rd Oct.: Fighting for control of Donetsk airport intensifies – 12 separatists killed 0 2000 4000 6000 8000 10000 12000 14000 21st Oct.: UEFA Champions League – Bate Borisov vs. Shakhtar Donetsk… 0:7  • http://ny.spatial.ly/ „Big data“ Phenomenon • Volume up to 1000 TB (1015 bytes) • Every day 2,5x1018 of data • Becoming a problem for relation DB • př. Hadoop, HP Vertica • sources: – sensor networks – social media – web-logs – indexes – call-data – transport data Fenomén „Big data“ „Big data“ Phenomenon